Suppose we believe George Cobb (as I do) that it’s better to approach stats education from a randomization perspective. What are some of the details—you know, the places where the devil resides?
Let’s set aside my churning doubts after this last year and look instead on issues of mechanics.
Long ago, when Fathom began, Bill Finzer and I had a discussion in which he said something like,
If you want to simulate a die, it’s best to have students sample from a collection of {1,2,3,4,5,6} rather than use some random-number generator, because sampling is a fundamental and unifying process.
I thought he was wrong, that sampling in that case was unnecessarily cumbersome, and would confuse students—whereas using something quicker would get them more quickly and rewardingly to results.
But things in class have made me remember this insight of Bill’s, and this is as good a place as any to record my reflections.
We have been learning about interval estimates. In my formulation, you have to choose between two procedures:
- If you’re estimating a proportion, set it up as a 2-collection simulation. In the “short-cut” version you set the probability to the observed probability (p-hat), collect measures, and then look at the 5th and 95th percentiles of the result. This gives you a 90% “plausibility interval” you can use as an estimate of the true probability. (If you’re a stats maven and are bursting to tell me that this is not truly a confidence interval, I know; this is a short cut that I find useful and understandable. More about this later.)
- If you’re estimating some other statistic (such as the mean), set it up as a bootstrap: sample, with replacement, from the existing data. Calculate the measure you’re interested in. Collect measures from that bootstrap sample; they will vary because you get different duplicates. Look at the 5th and 95th percentiles of the resulting distribution. This is the 90% plausibility interval for the true population mean (or whatever statistic you’ve calculated). (Details here.)
I noticed that the students who struggle the most have tended, for whatever reason, to sample when they don’t have to. Often it doesn’t affect the result, it just takes more time. But I started noticing that when they had been doing bootstrapping (mean, not proportion) and then had to do a proportion problem, they would tend to set it up as a bootstrap. It works. And doing it presents a more unified way of looking at estimates.
Why It Works and How It’s Connected
In traditional stats, we assume a Normal distribution with the same mean and an appropriate SD and reason from that. Sometimes the normality requirement is important, other times not so much, but that question is always there.
The genius of the bootstrap is this: when we have a sample, that sample is all we know about the population. So we say, suppose that the population distribution is identical to that of the sample, and see how simple sampling variation spreads out the statistic of interest—which is usually the mean, but could be most anything. Since we sample with replacement (which is just like sampling from an infinite population with that distribution) we get different values for the statistic, and the spread gives us a plausibility interval.
Note that there is no need to simulate data; we just use the data we have. And no need for assumptions of Normality.
On the other hand, in my scheme for estimating proportion, I simulate from the given p-hat. But some students essentially did a bootstrap.
For example, in the Aunt Belinda situation (20 coins, 16 heads, is anything hinky going on?) I would set up 20 cases with a P of 0.8, count the number of successes, and collect measures to generate a distribution to get an interval for the number of heads I expect from that coin.
What they did was to make a collection of 20 cases with 16 actual heads and 4 actual tails—no randomness—and then sample 20 with replacement. This is exactly the same as making each one heads with P = 0.8—just with more machinery. Then they defined measures (i.e., counted the heads) and collected them to get a distribution, from which they got an interval.
The beauty of this is that in a real-life situation, you will have collected data and, ideally, entered it case by case. So you have actual data, not some simulation. If it’s categorical, and you want to estimate a proportion, you do exactly the same thing as if it’s numerical and you want to estimate the mean:
- Define a measure for your statistic(s) of interest
- Sample with replacement using the same number of cases
- Collect the sampling distribution
- Find the interval
That is, this unifies the procedure so that it works no matter what—and this is totally in the spirit of why we would want to switch our paradigm to randomization in the first place.