Reflection on 538, Trump, and Bayes

Was the run-up to the recent election an example of failed statistics? Pundits have been saying how bad the polling was. Sure, there might have been some things pollsters could have done better, but consider: FiveThirtyEight, on the morning of the election, gave Trump a 28.6% chance of winning.

And things with a probability of 1 in 4 (or, in this case, 2 in 7:) happen all the time.

2016-11-08prediction
Prediction by FiveThirtyEight on the morning of election day.

This post is not about what the pollsters could have done better, but rather, how should we communicate uncertainty to the public? We humans seem to want certainty that isn’t there, so stats gives us ways of telling the consumer how much certainty there is.

In a traditional stats class, we learn about confidence intervals: a poll does not tell us the true population proportion, but we can calculate a range of plausible values for that unknown parameter.  We attach that range to poll results as a margin of error: Hillary is leading 51–49, but there’s a 4% margin of error.

(Pundits say it’s a “statistical dead heat,” but that is somehow unsatisfying. As a member of the public, I still think, “but she is still ahead, right?”)

Bayesians might say that the 28.6% figure (a posterior probability, based on the evidence in the polls) represents what people really want to know, closer to human understanding than a confidence interval or P-value.

My “d’oh!” epiphany of a couple days ago was that the Bayesian percentage and the idea of a margin of error are both ways of expressing uncertainty in the prediction. They mean somewhat different things, but they serve that same purpose.

Yet which is better? Which way of expressing uncertainty is more likely to give a member of the public (or me) the wrong idea, and lead me to be more surprised than I should be? My gut feeling is that the probability formulation is less misleading, but that it is not enough: we still need to learn to interpret results of uncertain events and get a better intuition for what that probability means.

Okay, Ph.D. students. That’s a good nugget for a dissertation.

Meanwhile, consider: we read predictions for rain, which always come in the form of probabilities. Suppose they say there’s a 50% (or whatever) chance of rain this afternoon. Two questions:

  • Do you take an umbrella?
  • If it doesn’t rain, do you think, “the prediction was wrong?”

Starting the Second Semester: Liar’s Logic

Day one of semester two. In this “regular” stats class, we’ve basically spent the first semester on issues in descriptive statistics; it’s time to turn towards inferential stats. Not that we will leave all things descriptive behind. I can’t separate them. And neither will we arm ourselves with traditional, frequentist, Normal-based tests and interval estimates.

I prepared a bunch of slides as an easy intro to the semester; my idea was to give them an overview of the big issues. One thing I did right: the first draft of these slides began with presentation of the issues and ended with some short activities to illustrate them. When I realized how wrong that was, I moved the activities and interaction into the midst of the presentation so that you never went more than about two slides without breaking to do something else.

What I would do better: some ending wrap-up that did something to cement things, such as having them write about the big ideas or at least call out a few new concepts or vocabulary words. Instead, we started the homework—not as a pad, but to make sure they knew how to use Fathom Surveys (it’s been a couple of months). We could have done both, but it was OK.

The main thing I wanted to accomplish was to give some basis for the principles of inference. A plan in an AP class—on the first day of the year—might be to do a full-fledged inference activity such as Martin v Westvaco from Statistics in Action. You’d do that using randomization (cards or chips). But here that would be too much too soon. So I did Liar’s Logic, which you might want to know about.

Liar’s Logic

This is a whole-class game in three phases. First, I don’t call it “Liar’s Logic” in front of the class. It’s Guess My Number.

Phase 1: This is the guess-my-number game you have played ever since elementary school. I choose a whole number between 1 and 100 inclusive, and you have to find it using only yes-or-no questions.

This doesn’t take too long, and we can then ask how they did it. Today, they claimed they used the process of elimination, which fits just fine.

Phase 2: We‘ll play the same game except (I explain) there is a small change; see if you can figure out what it is.

Play begins, except this time, I occasionally lie. I mostly tell the truth, but make sure that after a few turns, they are faced with contradictory information. The game would never end, so if they don’t ask if I’m lying, I stop the game and tell them. Today, they actually asked if I was lying, and I said “yes.”

Wonderful disequilibrium ensued. We made the point that this was a stupid game, because I could make it so that you would never finish.

Phase 3: Same game, but with a different change. After every question, I will secretly roll a die. If I get a six, I’ll lie. Otherwise, I’ll tell the truth.

Students rapidly developed the strategy of asking the same question multiple times. The point being that although there is still lying involved, you can finish the game. At the end, they asked “is it 45?” five times and got four yesses and one no.

Yet, no matter how confident you are, you can never be completely sure. I said that one of our tasks this semester is to put numbers on that confidence.

I’m hoping that this will become one of those “touchstone” moments I can refer back to; you know, where I can just say, “remember 45?” and use that situation to talk about probability or confidence levels.

Other topics included:

  • Habits of mind, especially being skeptical
  • Meaning of inference in everyday life (infer/imply) and science (going from specific to general)
  • Importance of thinking about alternative explanations (we used some stats from the news for this)

Whew.

Trust the Data: A good idea?

When we last left our hero, he was wringing his hands about teaching stats and being behind; we saw a combination of atavistic coverage-worship and stick-it-to-the-man, can-do support for authenticity in math education. The gaping hole in the story was what was actually happening in the classroom. The plan in this post is to describe an arc of lessons we’ve been doing, tell what I like about it, and tell what I’m still worried about. Along the way we’ll talk about trusting the data. Ready? Good.

You know how students are exposed to proportional reasoning in Grade 5 or earlier, and they spend most of their middle-school years cementing their this essential understanding? And how, despite all this, a lot of high-school students—and college students, and adults—seem not to have exactly mastered proportional reasoning?

I figured this was likely to be the case in my class, so when someone showed me the Kaiser State Health Facts site, I jumped right in, and pulled the class in with me. In it, you find all kinds of stats, for example, this snip from a page about New Mexico:

Shows that the teen death rate in NM is 96
A small portion of the screen. Click to go to the whole page.

When you see something like this, you can’t make sense out of it until you know more, for example, what does the 96 mean? You have to look more carefully at the page to discover that it’s “per 100,000 population.” And nowhere do you see that it’s also “per year.”

But once you decode it, you can answer some questions. An obvious one is, “how many teenagers died in New Mexico that year?” Before we jump into proportions, though, let’s point out that this is probably not a very interesting question unless you live in New Mexico, and maybe not even then.

So I just did one quick example in front of the kids, and then the assignment was to spend at least 15 minutes on the site, finding some rate of any interest at all, decode it, and report one calculation you can make. We started in class. Kids found things that interested or horrified them. Abortion, pregnancy, and STD rates figured prominently.

For example:
Continue reading Trust the Data: A good idea?

More data! Pedestrian accidents in NYC

KSI by time of day. KSI is pedestrians Killed or Seriously Injured per 100,000.

The city’s transportation planners released this report, which looks at 7,000 vehicular crashes involving pedestrians, on Aug. 16, 2010. It finds that jaywalkers fared better than those who waited at intersections, and that privately owned vehicles were more likely to be involved in a crash.

Gotta love the Times and the Internet. Visit http://documents.nytimes.com/pedestrian-study . It’s glossy and well-designed, and features some great graphics as well as statements that are good examples of the kind of incomplete discourse that can be persuasive in the wrong hands. The report mostly redeems itself in a quick skim, but something like these bullet points might be good in-class examples:

  • Manhattan has four times as many pedestrians killed or severely injured per mile of street compared to the other four boroughs.
  • 79% of crashes that kill or seriously injure pedestrians involve private automobiles as opposed to taxis, trucks and buses.
  • Serious pedestrian crashes involving unsafe speeds are twice as deadly as other such crashes.

I mean, doesn’t Manhattan have more pedestrians per mile? How would you measure that? Then, what’s the balance of private autos on the streets as opposed to taxis, trucks, and buses? If they’re 90% of the vehicles, the implication that they’re worse is ill-founded. Finally, “unsafe speeds”—why do you suppose they’re called unsafe? Because people get killed, right? So is this tautology surprising? But I’m getting picky; you gotta check out the report.

Here is a pair of sobering graphs:

Who gets injured and killed in KSIs?
Continue reading More data! Pedestrian accidents in NYC

Outliers in the NYT: Reflections on normality

Image from the NYT post

I need a good system to deal with those moments when you’re reading the news or listening to NPR and they bring up something that could fit into an actual lesson, connecting math to everyday life. This probably happens more when thinking about teaching stats than with other areas of math. Of course I have thought of clipping the article, and I have several folders on my computer, but I can never find them. Here is another attempt: blog about them! And we get a new category, Data in the News.

Onward! Yesterday’s NYT prints what appears online as a blog post by Carl Richards. It makes the point that we often assume erroneously that everything is normally distributed (yay!) and that this affects our expectations about, for example, investing. The outliers, he says, are much more salient than we think they would be. And then we get this delicious passage:

If you take the daily returns of the Dow from 1900 to 2008 and you subtract the 10 best days, you end up with about 60 percent less money than if you had stayed invested the entire time. I know that story has been told by the buy-and-hold crowd for years, but what you don’t hear very often is what happens if you were to miss the worst 10 days. Keep in mind that we are talking about 10 days out of 29,694. If you remove the worst 10 days from history, you would have ended up with three times more money.

This is interesting in itself, but in terms of my desire for kids to get data goggles and to look at claims and cry, “evidence please!” this is perfect. Because we can do just that: go online, get the data he’s talking about, load it into Fathom, and see if this claim is correct.

Continue reading Outliers in the NYT: Reflections on normality