There has been a lot of happy chatter recently about doing statistical tests using randomization, both in the APStat listserve and at the recent ICOTS9 conference. But testing is not everything inferential; estimation is the other side of that coin. In the case of randomization, the “bootstrap” is the first place we turn to make interval estimates. In the case of estimating the mean, we think of the bootstrap interval as the non-Normal equivalent of the orthodox, t-based confidence interval. (Here is a youtube video I made about how to do a bootstrap using Fathom.) (And here is a thoughtful blog post by list newcomer Andy Pethan that prompted this.)
But Bob Hayden has recently pointed out that the bootstrap not particularly good, especially with small samples. And Real Stats People are generally more suspicious of the bootstrap than they are of randomization (or permutation) tests.
But what do we mean by “good”?