Big data may be the technology getting all the buzz nowadays, but that does not mean that it is infallible. Big data has wreaked havoc in many situations, yet the exact reasons are not always clear. They could be the detection of false positives, technical glitches, lack of tools, shabby data, incorrect data or even unnecessary data.

Needless to say, if you have some of the errors mentioned above, the results will be completely different from what you were expecting. To make matters worse, the results are sometimes not analyzed, which can result in some unpleasant consequences.  

Flaws of Big Data

Thanks to big data and the Cloud, the powers of supercomputers are everybody’s for the taking. However, what we lose in the mix is that the tools we use to interpret, analyze and apply this tsunami of information usually has a fatal flaw. Most of the data analysis we conduct is based on erroneous models which means that mistakes are inevitable. And when our overblown expectations exceed our capacity, the consequences can be dire.

If big data was not so ginormous, this would not be such a big problem. Unfortunately, given the volume of that we have, we are able to use even flawed models to produce sometimes useful results. The issue here is that we often confuse the results with omniscience. We are enamored with our own technology, but when the models go haywire, it can get pretty ugly especially when the mistakes the data produces are proportionally as large.

Examples of Big Data Failures

Perhaps the largest and most well known big data flop was in 2013 with Google Flu Trends. Google launched this service in 2008 with the goal of predicting flu outbreaks in 25 countries. The logic was simple: analyze Google search queries about the flu in a given region. The next step was to compare the search results with a historical record of flu activity in that geographical area and based on these results the activity level was classified as either low, medium, high or extreme.

Even though, at first glance, this may seem like a cool idea, in reality, it was not. At the height of the 2013 flu season, Google Flu Trend failed miserably. In fact, it was off by an astounding 140%. The reason was that the algorithm was flawed and did not take into account several factors. For example, if people were searching words such as “cold” or “fever”, this does not necessarily mean that they were searching for flu-like symptoms. They could have been searching for seasonal illnesses. Unfortunately for Google Flu Trends, it could not recover from this disaster and ultimately led to its demise in 2013.

Reasons Why Big Data Fail

The unmitigated disaster that was Google Flu Trends, is by for not the only one. It is not possible to list all of the blunders of big data over the years, however, it is important that we analyze the failures so we can learn our lesson(s) and never repeat them in the feature. Some of the reasons for big data failures are:

  1. Lack of Data Governance and Data Management – very often organizations do not fully understand the data they already have, yet they still decide to undertake new projects based on this data. There is a lack of documentation, storage, policies and other procedures regarding data handling. It is a good idea to turn to a big data consulting company so you can provide your business with a clear roadmap and instructions on how to handle the data you already have and only after that go after the challenges of big data.
  2. Undefined goals and strategy – there are a lot of IT terminology and marketing slogans out there and it can be difficult to make some sense out of all this white noise. Furthermore, there are a lot of big data products out there on the market and it is really difficult to choose the right one. Before you decide on anything, it is important to figure out what services you need and what technologies you will be needing to accomplish your goals. “Do small data on big data” – that means that you should evaluate your big data architecture on small amounts of data to ensure that you choose the right products.
  3. It’s All Greek To Me – Data science & BigData is a complex combination of domain knowledge, mathematical and statistical expertise and programming skills. Yet at the same time it must make business sense. Usually what happens is that the IT department will make changes that management does not understand and vice-versa. Make sure that your big data actions make sense to both IT and business leaders. Build a bridge between IT and Business in the big data project. Business people should be deeply involved in any of the stages of big data project.
  4. Too Big Too Soon – When you first start implementing big data projects there are a lot of undefined factors such as budget, technologies, courses of action etc. When you start a big project right away early on, it is doomed to fail. Instead opt for a small project and measure the success (or lack thereof) incrementally. This way, if something goes wrong, you will be able to notice it right away and make the necessary adjustments right away before it dooms the project. A good way to benchmark your progress is to create prototypes or proof of concept to validate the work you have accomplished. There is no point in advancing to the next stages of the project if there are flaws in the early stages.
  5. Lack of IT Talent – finding and hiring the necessary people you need to successfully complete a project is a daunting task yet the people handling your data are a vital component of the overall project. Moreover, they must be well versed in new technologies which are a challenge given the fast-paced IT environment.

A common theme that we notice from the list above is that no matter how much we want to focus on the data, people keep getting in the way. Even though we want data to rule the decision-making process, people ultimately rule the big data process. This includes making basic decisions such as which data to collect and keep and which answers they seek from big data.

Innovation Via Iteration

Many organizations feel constrained when they decide to undertake a big data project, which is why it is vital to take an iterative approach to big data. Organizations should try to find ways to set their employees free to experiment with data. The “start small, fail fast” approach is enhanced by the fact that most significant big data technology is open source. Also, lots of platforms are immediately and affordably accessible as cloud services, this lowering the bar even further to a trial-and-error method.

Big data is all about asking the right questions, so relying on existing employees is critical. However, even with superior domain knowledge, organizations still will not correct the necessary data and the will not ask the proper questions from the very beginning. Such failures should be accepted and expected.

Since the early stages of your big data project can make or break the entire thing, this is where the advice of big data consultants can really pay off. They can advise you on how to create prototypes and proof of concepts, benchmarks your efforts, help create your microservice architecture and assist you in migration to new technologies. It is important to employ flexible, open-data infrastructure which enables employees to constantly modify and perfect their approach until they reap the fruits of their toils. This way, organizations can eliminate the fear and iterate towards effective use of big data.