The Numbers Game – Why Data Scientists, Statisticians & Forecasters Get It Wrong?

Philip E. Tetlock, one of the foremost global experts in forecasting behavior, conducted a comprehensive 20 year study of expert judgment in scientific literature, and concluded that the average expert performance was around the same as random guessing. Since this study was conducted between 1984 and 2004 – before the advent of Big Data, Deep Learning and High Performance Computing – the results might be different today.

So, are we becoming more efficient as a predictive society?

In many cases, Yes. Particularly, if you look at the evolution of analytics from predictive to prescriptive; the complexity levels of prediction tasks addressed in many Computer Vision and NLP applications today; the quantum of innovation happening in the Deep Learning space, with increasingly more advanced algorithms getting developed.

In some cases, No. Particularly, where there is an over-reliance on classical predictive and forecasting techniques; or an asymmetric/non-linear impact of data distribution changes or incomplete adherence to the assumption criteria; or the application of sub-optimal algorithms and techniques. There are other reasons too – such as the incorrect identification of patterns with high statistical significance when none actually exist (e.g. P-Hacking or Data Dredging), non-adherence to data pre-processing/processing standards, and others.

This paper talks about three major areas of concern that affect, in varying degrees and combinations, many of the Data Science and Forecasting projects of today.

[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]

The Case of Taleb’s Fourth Quadrant

Nassim Nicholas Taleb, the renowned author, statistician and risk management expert, has often talked about this phenomenon called the Fourth Quadrant in his papers and books. The Fourth Quadrant is the zone of the ‘unknown unknowns’, the domain of the Black Swan. It represents a realm where the dependence on small probability events is significantly high, and more importantly, the incidence of these events is incomputable.

This Fourth Quadrant represents the ‘limits of statistics’, i.e. conventional statistical or mathematical paradigms are unreliable in this zone. Consequently, the results gathered through traditional statistical or mathematical models will seldom prove to be accurate in this zone. Taleb’s classical Turkey metaphor presents a solid example. A farmer buys a turkey and takes care of it for a 1000 days. As each day passes, the statistical models keep predicting that the farmer cares about the turkey with ‘increased statistical significance’. On the 1001st day, that happens to be Thanksgiving, the turkey is killed for dinner. So, all statistical models pretty much fail here, and all predictions are rendered useless.

Not clearly understanding the boundaries of mathematical models is a fundamental error.

Furthermore, many techniques that form the backbone of Advanced Analytics or Data Science were developed in an earlier era with lesser uncertainties. They cannot, and should not, be applied in their ‘as-is forms’ in today’s age. For instance, several models can be applied only after their underlying assumption criteria (such as, normality of the data distribution) are fulfilled. However, it is not uncommon to come across cases, particularly in Big Data streams, where the data distribution might change midway. While there are emerging solutions to deal with these situations, the reliance on traditional techniques by practitioners (often due to ignorance) in such cases is a real problem.

So, how do we address problems in this Fourth Quadrant? Research is still in progress. One option, though yet to fully evolve, is to represent these events in terms of fractal power laws or alpha. An elaboration on this is beyond the scope of this paper. Other options, though sub-optimal, include building in redundancy measures, avoiding making predictions in this zone, and the acceptance of risk.

[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]

The Complexities of Outlier Detection

Outlier detection is a key requirement today, especially in cases that involve understanding anomaly behavior. Even otherwise, it should be part of most modeling activities to validate the overall suitability of the training or testing data. The outlier detection process is often complex, particularly when we deal with high-dimensional data points in large volumes.

At the most fundamental level, the detection of an outlier stems from an understanding of ‘what is normal?’. Today’s systems are generally complex, and their interactions produce even more complex dynamics. Consequently, it is difficult, if not impossible, to understand the entire spectrum of normal behaviour within a particular system. The line between the normality and de-normality of data is often ambiguous. This problem gets further aggravated with low data quality and high noise presence – these are practical issues in most Data Science or Forecasting projects.

In a 2018 International Monetary Fund (IMF) paper, How Well Do Economists Forecast Recessions?, the authors concluded that the ability to ‘predict turning points’ is quite limited. Furthermore, they stated that the reasons for this include the lack of adequate knowledge and effective forecasting models, and the forecasters’ personal inability to deviate from popularly-held beliefs.

I will go a step further and state that this inability to accurately predict turning points also stems from the inability to distinguish between outliers and inflection points. While professionals associated with prediction exercises understand this difference in theory, in practice, they often fail to distinguish between the two in complex data distributions. Let us understand this through a simple example.

If Data Scientists (instead of Psephologists) had been tasked with predicting the 2014 India Central Government formation based on the historical data since 1947, their conventional models would have likely predicted that the United Progressive Alliance (UPA), under the leadership of Congress (I), had the highest probability. However, in actuality, the National Democratic Alliance (NDA) came to power with the Bharatiya Janata Party (BJP) crossing the half-way mark on its own, an event that happened after 30 years. Now, if the data from 2014 to 2019 was added to the historical data, conventional Data Science models would likely have predicted that 2014 was an outlier, and the probability of the Congress/UPA forming the government in 2019 was higher than that of BJP/NDA. The actual scenario in 2019 was a bigger mandate for the BJP/NDA.

The 2014 Indian election result was not an outlier, nor was the 2019 one. 2014 was likely an Inflection Point, where the concavity of the curve changed, and this will continue till a new Inflection Point appears.

[siteorigin_widget class=”thinkup_builder_divider”][/siteorigin_widget]

The Pretender Problem

This is a fundamental problem where ‘pretenders’ masquerade as ‘experts’, and their inputs are seriously considered for complex studies. Their often inaccurate, biased-filled and, at times, even malice-intended inputs obviously lead to wrong conclusions. You see this problem everywhere today, at varying degrees of scale. Here are some examples.

  • Failed journalists, posing as ‘political analysts’, boldly predict the outcome of elections or how certain events will shape up in the near future. More often than not, the results prove them wrong.
  • Past sports professionals with poor records appear as experts in TV studios to ‘critically analyze’ the games of top players. They often spout unoriginal, repetitive and the most obvious statements. [On a personal note, it always amazes me how Sanjay Manjrekar, a man with a phenomenal ‘1 century’ in One Day Cricket, seems to know all about the mistakes in the batting technique of Virat Kohli, a modern day great with 41 (and counting) One Day centuries. More than anything else, you really need to appreciate the confidence of that person. I call this phenomenon ‘The Manjrekar Syndrome’.]
  • Some business leaders, that may not have written a single line of code in their lives, pretend to be technologists, and confidently explain to software engineers (the ‘actual’ technologists) how to develop or implement software applications. You meet these pretenders almost on a regular basis in the IT industry. (Just like Sanjay Manjrekar, you have to marvel at their confidence.)
  • Finally, there is my favorite group – the self-proclaimed experts in Machine Learning and Artificial Intelligence. They possess the unique ability to understand these complex fields without a strong foundation in algorithms, mathematics or computer science. They are almost omnipresent today (you see them in social media, in conferences, and in most organizations) – except in the actual functions that develop or implement Machine Learning and AI – that’s because they cannot really produce any actual output.

Organizations and leaders should conduct adequate due-diligence to ensure that the pretenders are not tasked with critical problem-solving, particularly those that need complex analysis, high-end mathematical or computing skills, and where minor mistakes can lead to large erroneous outcomes.

Closing Comments

It is not just regular professionals that do not get models and data right. Economists are famous (or infamous) for incorrect forecasting. In 2004, Ben Bernanke (who later served as the Chair of the US Federal Reserve) had stated that the global economy was becoming more moderate and stable than earlier years. However, all of us now know that this was the opposite – risks were getting built up in the economic system, and that eventually led to the 2008 global recession. So where did Bernanke go wrong? There were several factors, one of them being the fact that he used volatility as the indicator of stability. This was a flawed indicator because the absence of volatility does not necessarily mean the absence of risk.

There are worse examples. In 2008, at the peak of the global recession, Olivier Blanchard (who later served as the Chief Economist of the International Monetary Fund) had published a paper that stated that ‘the state of macro is good’. In all probability, the paper was authored a year or two before it was published, but the very fact that top economists around the world completely missed the impending recession is an indicator of the limitations of traditional theories, models and techniques.

The Data Science and Forecasting problems of today are much complex than before. Previous tools and technologies may not be adequate to fight today’s war. In many cases, there is a real need to ‘unlearn older techniques, theories and concepts’ that are less relevant today, and pick up new knowledge and skills. The long-term success or failure of these disciplines will greatly depend on that.

The ‘unlearning of older techniques & practices’, and the adoption of modern ones may be key to successfully deliver today’s complex Data Science & Forecasting projects.

Note: Some of the ideas in this paper are based on the path-breaking work of esteemed scholars, particularly Nassim Nicholas Taleb who has deeply influenced my thinking. I encourage everyone to deep dive into these topics, and read the original papers & books.

Share this article.