Big Data Hubris

Scientific American recently published an interesting article on their Observation Blog, titled Why Big Data Isn’t Necessarily Better Data. It nicely highlights one of the pitfalls of implementing Big Data analytics, namely, believing your data and/or data analysis are better than they are.   The example used in the article is Google Flu Trends (GFT) which seek to use Google’s search data to track outbreaks of influenza.  The results have been mixed, with results being similar to the CDC tracking data but overestimating the prevalence of flu in 100 of 108 weeks in 2011-2013 which one study examined. Google itself has found that the GFT data is highly susceptible to media coverage, which may explain some of these issues.

The main takeaway from this article is to be conscious of factors that could affect the quality of both your data and your analysis. This quote sums it up nicely:

Big data hubris is the “often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.”

Big data is a useful tool, but like all tools it must be used properly.  Some food for thought.

Leave a Reply