In which he describes his approach to inference.
The null hypothesis is never true.
I guess I knew this at some level, but I never really got it till this Spring. Then it hit me that this was worth telling students. (Can I get them to discover it? Maybe.)
Let me back up a bit and approach it from the direction of Aunt Belinda.
Aunt Belinda: A Touchstone Situation
Aunt Belinda claims to have power over flipping coins. She takes 20 nickels and throws them into the air. When they land, there are 16 heads. How should we interpret this result?
I want kids to learn to ask, “is it plausible that the coins are fair and Belinda has no special powers?” and realize that they can answer that question by flipping 20 fair coins over and over again, and seeing how often you get 16 or more heads.
Setting aside a lot of other discussion (no, she refuses to do it again) and what I hope is obvious pedagogy (the first time you see this, everybody gets 20 actual coins and has to do it a few times, chasing the rollers all over the classroom), we get Fathom to do the simulation because it saves so much time. Early on, I posted a graph showing the result of a whole lot of simulated 20-coin events, and reproduce it here.
At this point, we confront the basics of statistical inference. (These are also the bullet points in one of my learning goals, a.k.a. standards.)
- P-value. Seven out of 1000. Is it plausible? Students need to distinguish plausible from possible. Ideally, we also set some plausibility limit on this empirical P-value (which is what this 0.007 is after all) that depends on the circumstance and how willing you are to be wrong (oooh! Type I errors!). This last year, I mentioned that a lot but basically punted and explained that an orthodox reasonable value was 0.05.
- The null hypothesis is that the coins are fair, and there are no special powers. Articulating a null hypothesis is important. I began my discussion of the null by saying that it’s often the dull hypothesis: the situation when nothing of interest is going on.
- The sampling distribution is the one in the picture: repeated results from trials where the null hypothesis is true. They are not the same because random events come out differently even when the coins are fair.
- The test statistic is 16 heads out of 20. It’s what you compare to the sampling distribution to assess whether the result is plausible.
We then draw a conclusion, in this case, to reject the null hypothesis. That is, we think that something—we’re not sure what—is interefering with a fair toss of the coins. And we admit that it is possible (but not plausible) that we’re wrong and the coins are in fact fair.
What goes wrong? All sorts of things. I can’t catalog them all here, but, for example, some students set up a simulation where the probability of heads is 16/20, and then reported that “since the test statistic is in the middle of the sampling distribution, the result is completely plausible—but we know she doesn’t have powers, so it’s still false.” (They’ve set up a tautology. Of course it’s in the middle! The probability of heads has to be 1/2.) Clearly, an important habit of mind was not engaged: when the result you get does not make sense, persist until it does.
To combat this and other misconceptions, I talked a little about the subjunctive, and when that caused eyes to glaze over, I switched to another contrary-to-fact approach:
The null hypothesis is a fantasy. And the sampling distribution—which is based on the null hypothesis—is a collection of these fantasies. When you make a simulation in Fathom, you are always constructing a fantasy, always simulating the null hypothesis. Only the test statistic—the thing that actually happened—is reality.
Of course, these fantasies are dull, normal occurrences such as fair coins. They aren’t the usual fun fantasies of our imaginations. But they are “what-if” situations: we’re looking at Belinda’s sixteen heads and asking, “what if the coins were really fair and she had no powers? What would happen then?”
When we make that comparison—and make it quantitative with a P-value—we’re assessing whether the fantasy might be true.
Did the students internalize this rant? I’m not sure.
One more misconception:
T: What’s the P-value here?
T: And what does that mean?
T: (pause) What does the “P” stand for?
T: Yeah. Probability of what?
S: Probability that the null hypothesis is true?
Ack. I think there is a general tendency not to understand the graph you just made. Despite all the care that went into the lessons, students still tend to look for a clear procedure (make this graph!) without demanding understanding. I need to address this better in general.
But for this specific problem, I realized that (as I said up at the top) the null hypothesis is never true.
I ranted on this one morning. I don’t know if it was helpful to the students, but thinking about it was helpful to me, on several levels.
- First, it fits with the fantasy idea: it’s a fantasy, it’s not true.
- Second, it gives a clear answer to “what’s the probability that the null hypothesis is true?”: Zero.
- Finally, it connects up with other important mathematical and statistical ideas. For example, it suggests that for any real coin, if you flip it enough times, you can find its bias. The bias may be small, but you can in principle detect it. A more abstract idea is related to betweenness. Given a uniform random distribution on [0, 1], what’s P(0.5)? Zero.
Enough of that. More later.