Statistics and Intuition
In 2023, I read Thinking, fast and Slow by Daniel Kahneman and was fascinated by pretty much the entire book. While watching the NHL playoffs, I was reminded of a discussion in the introduction of the book: we are bad at interpreting statistics. Kahneman writes:
We prepared a survey that included realistic scenarios of statistical issues that arise in research. Amos collected the responses of a group of expert participants in a meeting of the Society of Mathematical Psychology, including the authors of two statistical textbooks. As expected, we found that our expert colleagues, like us, greatly exaggerated the likelihood that the original result of an experiment would be successfully replicated even with a small sample. They also gave very poor advice to a fictitious graduate student about the number of observations she needed to collect. Event statisticians were not good intuitive statisticians.
The key point, for me, is that even experts have a hard time interpreting statistics. The book goes on to talk about all kinds of interesting examples of how communication and our past experiences and a number of other factors can affect our interpretation of statistics. It’s a very fascinating book. There are too many details to cover here and I want to focus on the NHL example to show how hard it is to understand numbers.
I think it was during the Canucks/Oilers series that I saw this statistic (here’s an older link with the same data):
In a series tied 2-2, the team that wins game 5 goes on to win the series 78.8% of the time.
What does that 78.8% really mean? NHL playoff series have at most 7 games so we are talking about the outcome of the remaining 2 games of a series. In theory there are 4 possible outcomes of 2 games (W = win, L = lose): W-W, W-L, L-W, L-L. In reality we don’t see all 4 outcomes because the teams will not always play 2 games but that doesn’t matter. If we were going to flip a coin twice, we have those same four outcomes (T = tails, H = heads): T-T, T-H, H-T, T-T and because each one is equally likely there is a 25% chance of a particular outcome. If we treat the hockey games like random outcomes then:
- If your team is trailing 3-2 then your only chance to win the series (the W-W outcome) is a 1 out of 4 or 25% chance.
- If your team is leading 3-2 then you can win the series with a L-W, W-L, and W-W outcome so 3 out of 4 or 75% chance.
So… if we compare NHL data to a random outcome, like a coin toss, then I guess we can say that the team that wins game 5 has a 3.8% advantage (78.8 - 75 = 3.8) over the random outcome. 3.8% sounds much smaller than 78.8%, doesn’t it? When we see a phrase like “the team that wins game 5 goes on to win the series 78.8% of the time.” then we think the team that wins game 5 will likely win the series. While the statement is true, it also feels misleading.
Summary
I’m writing here to remind myself that statistics are hard to interpret. Anytime I see a statistic I should pause and think about what that number means. Even better would be to talk the number through with another person, especially someone who thinks a little differently than me. Diverse opinions help us make better decisions because it challenges our biases.
Sports statistics, fortunately, have no real impact on my life and neither do coin tosses so they are not worth a lot of time but they are an interesting illustration. In life, there are statistics that really do matter: government policies, health care, family decisions, and sometimes even decisions at work. It is worth the time to pause and consider what the numbers mean. There can be bad actors out there who intentionally present statistics in a confusing way but more often it happens very innocently and goes unnoticed by many.
One more point: in the NBA, the team that wins game 5, after the series being tied 2-2, wins the series 82.8% of the time (a 7.8% advantage). Neat! I wonder what this number looks like in other sports? I wonder how the skill level of players would impact these numbers? For example, would junior hockey leagues be closer to 75% or further from it? I doubt I’ll take the time to find these numbers but it is interesting.