The Tragedy of Big Data

The Tragedy of Big Data. The more variables, the more correlations that can show significance in the hands of a “skilled” researcher. Falsity grows faster than information; it is nonlinear (convex) with respect to data. The inverse should be true, that valid correlations grow slower, relative to total observations. Thus, a concave function. 


From AntiFragile, NN Taleb

——————-

If you have a statistical model that seeks to explain eleven outputs but has to choose from among four million inputs to do so, many of the relationships it identifies are going to be spurious. (This is another classic case of overfitting-mistaking noise for a signal-the problem that befell earthquake forecasters in chapter 5.

From Signal and the Noise, Nate Silver

—————-

Likeness to truth is not the same as truth. Without any theoretical tructure to explain why patterns seem to repeat themselves across time or across systems, these innovations provide little assurance that today's signals will trigger tomorrow's events. We are left with only the subtle sequences of data that the enormous power of the computer can reveal.

From Against the Gods, Peter Bernstein 

_____________

In a report last month, the organization (WHO) warned of "a massive 'infodemic’ an overabundance of information -some accurate and some not- that makes it hard for people to find trustworthy sources and reliable guidance when they need it." That warning was seconded by the MIT Technology Review, which observed that the virus has the makings of "the first true social-media 'infodemic’” as "social media has zipped information and misinformation around the world at unprecedented speeds, fueling panic, racism and hope."

WSJ - 3/6/20 “‘Infodemic’: When Unreliable Information Spreads Far sand Wide”

Jeff - This is not the same, kind of a first cousin, as the Tragedy of Big Data. This is because individuals, not Data Scientists, are making correlations and inferences based on 1) incomplete information, 2) wrong information, 3) misunderstanding of the statistics methods to make proper inference. The likely spurious correlation is then passed on via SM as “truth,” making it very difficult for an individual to make proper decisions. 

0 views0 comments