Beware dodgy data analysis

Data science is having its 15 minutes of fame.

Everyone from John Oliver of HBO’s “Last Week Tonight” to famed election statistician Nate Silver of 538.com is getting on a soapbox about the perils of believing data-based findings that lead to seemingly crazy conclusions.

John Oliver noted one particularly dodgy finding that a glass of wine was as healthy as an hour at the gym. Another “study” supposedly proved the benefits of a chocolate diet for pregnant moms. And other studies have found that the number of suicides by hanging, strangulation and suffocation is highly correlated with U.S. spending on science, space and technology.

As those of us working in the business/data analytics field know only too well, the thing that each of these strange-but-unfortunately-true studies have in common is a failure to differentiate between data that shows correlations between variables — which is a statistician’s bread and butter — and data that establishes causality — data-tested conclusions that one thing actually causes another.

And while such confusion may not matter much if it leads to a pregnant mom eating an extra Hershey bar or two, it could be deadly to your company’s bottom line.

Source: InfoWorld Big Data