Was the run-up to the recent election an example of failed statistics? Pundits have been saying how bad the polling was. Sure, there might have been some things pollsters could have done better, but consider: FiveThirtyEight, on the morning of the election, gave Trump a 28.6% chance of winning.
And things with a probability of 1 in 4 (or, in this case, 2 in 7:) happen all the time.
This post is not about what the pollsters could have done better, but rather, how should we communicate uncertainty to the public? We humans seem to want certainty that isn’t there, so stats gives us ways of telling the consumer how much certainty there is.
In a traditional stats class, we learn about confidence intervals: a poll does not tell us the true population proportion, but we can calculate a range of plausible values for that unknown parameter. We attach that range to poll results as a margin of error: Hillary is leading 51–49, but there’s a 4% margin of error.
(Pundits say it’s a “statistical dead heat,” but that is somehow unsatisfying. As a member of the public, I still think, “but she is still ahead, right?”)
Bayesians might say that the 28.6% figure (a posterior probability, based on the evidence in the polls) represents what people really want to know, closer to human understanding than a confidence interval or P-value.
My “d’oh!” epiphany of a couple days ago was that the Bayesian percentage and the idea of a margin of error are both ways of expressing uncertainty in the prediction. They mean somewhat different things, but they serve that same purpose.
Yet which is better? Which way of expressing uncertainty is more likely to give a member of the public (or me) the wrong idea, and lead me to be more surprised than I should be? My gut feeling is that the probability formulation is less misleading, but that it is not enough: we still need to learn to interpret results of uncertain events and get a better intuition for what that probability means.
Okay, Ph.D. students. That’s a good nugget for a dissertation.
Meanwhile, consider: we read predictions for rain, which always come in the form of probabilities. Suppose they say there’s a 50% (or whatever) chance of rain this afternoon. Two questions:
- Do you take an umbrella?
- If it doesn’t rain, do you think, “the prediction was wrong?”