# Randomization

If you’re not a stats maven, this may sound esoteric, but let’s see if I can express it well.

One of the things that’s hard to learn in orthodox statistics is the whole machinery of statistical tests. You can train yourself (or a monkey) to do it right, but it seems to be a morass of weird rules and formulas. Remember to divide by (n–1) when you compute the standard deviation. You have to have an expected count of at least five in every cell to use chi-squared. You can use z instead of t if df > 30. And then there’s remembering what tests to use in which situation. You wind up with a big flowchart in your head about whether the data are paired, whether the variables are categorical, etc., etc., etc. And as a learner, you lose sight of the big picture: what a test is really saying.

George Cobb wrote a terrific article explaining why this is all unnecessary. The short version goes like this: you can unify a lot of inferential statistics if, instead of the tests we now use (z, t, chi-squared, ANOVA…) we used randomization tests.

Here’s the basic idea, to which we will often refer as the “Aunt Belinda” problem. Your Aunt Belinda claims to have supernatural powers. She says she can make tossed nickels come up heads. You don’t believe her, so you get a dollar’s worth of nickels (20 of them); she speaks an incantation over them; you toss them all at once; and sixteen come up heads.

Does she have supernatural powers?

Traditionally, we compute the proportion, 0.80, and then perform a calculation with$\sqrt{p(1-p)}$ and then either use a z test or if we’re more advanced, the binomial theorem. (And we’ll find a P-value that’s really small, from which, if we’re tired, we’ll conclude that she does in fact have powers.) But unless you’ve just taken or taught a stats class, you will have to look it all up, and you will turn the crank on the machinery without really understanding where it came from.

The alternative, the randomization test, is really cool: you get a bunch of coins you think are fair, and you toss them repeatedly, recording your results. You see how often 20 fair coins come up with 16 or more heads. And you use that empirical probability as your P-value.

Of course doing it with real coins is tedious, so we use a computer and simulate the coins. It’s easy to get 1000 simulations of 20 coins, as in the illustration. The red bars show 16 or more heads, representing 7 trials. So P = 0.007.

Notice how we didn’t have to decide what statistic to use or what table to look it up in? That’s the beauty of the randomization test. The idea is simple: just do it a lot of times and see what happens.

The “gotcha” is the “it” that you do. In stats-talk, you have to figure out how to simulate the null hypothesis; in this case, we simulate the situation where the coins are fair and Aunt Belinda has no powers.

Pedagogically, I think the big win is twofold:

• When we ask “what does that 0.007 mean?” I’m hoping that students will know (or can be easily trained) to say that it’s the probability that, if the coins were fair, Belinda would get 16 or more heads. (As opposed to, the probability is 0.993 that Belinda has special powers.) This subjunctive statement is at the root of tests. More on this anon.
• Nobody has to learn about the binomial theorem or anything else in order to see what to do.

Anyhow, George Cobb laid out the argument at greater length, calling for us to chuck our traditional shackles and go random. This course is a chance to do just that. It will require lots of access to technology and a courageous department chair, but we have both of those.

(Note: statistical tests are only a small part of the course. But they’re a part that most people think of when you say you’re teaching stats, so it’s as good a place as any to lay out some of my curriculular choices.)