The Index of Clumpiness, Part Two

Last time, we discussed random and not-so-random star fields, and saw how we could use the mean of the minimum distances between stars as a measure of clumpiness. The smaller the mean minimum distance, the more clumpy.

Star fields of different clumpiness, from K = 0.0 (no stars are in the clump; they’re all random) to K = 0.5 to K = 1.0 (all stars are in the big clump)

What other measures could we use?

It turns out that the Professionals have some. I bet there are a lot of them, but the one I dimly remembered from my undergraduate days was the “index of clumpiness,” made popular—at least among astronomy students—by Neyman (that Neyman), Scott, and Shane in the mid-50s. They were studying Shane (& Wirtanen)’s catalog of galaxies and studying the galaxies’ clustering. We are simply asking, is there clustering? They went much further, and asked, how much clustering is there, and what are its characteristics?

They are the Big Dogs in this park, so we will take lessons from them. They began with a lovely idea: instead of looking at the galaxies (or stars) as individuals, divide up the sky into smaller regions, and count how many fall in each region.

Continue reading The Index of Clumpiness, Part Two

How Good is the Bootstrap?

There has been a lot of happy chatter recently about doing statistical tests using randomization, both in the APStat listserve and at the recent ICOTS9 conference. But testing is not everything inferential; estimation is the other side of that coin. In the case of randomization, the “bootstrap” is the first place we turn to make interval estimates. In the case of estimating the mean, we think of the bootstrap interval as the non-Normal equivalent of the orthodox, t-based confidence interval. (Here is a youtube video I made about how to do a bootstrap using Fathom.) (And here is a thoughtful blog post by list newcomer Andy Pethan that prompted this.)

But Bob Hayden has recently pointed out that the bootstrap not particularly good, especially with small samples. And Real Stats People are generally more suspicious of the bootstrap than they are of randomization (or permutation) tests.

But what do we mean by “good”?

Continue reading How Good is the Bootstrap?

An Unexpected Expected-Value Problem, and What Was Wrong With It

"no permutations" graphicIt’s such a joy when my daughter asks for help with math. It used to happen all the time; it’s rare now. She just started medical school, and had come home for the weekend to get a quiet space for concentrated study.

“Dad, I have a statistics question.” Be still, my heart!

“It’s asking, if you have a random mRNA sequence with 2000 base pairs, how many times do you expect the stop codon AUG to appear? How do you figure that out?”

I got her to explain enough about messenger RNA so that I could picture this random sequence of 2000 characters, each one A, U, G, or C, and remembered from somewhere that a codon was a chunk of three of these.

“I think it’s more of a probability, or combinatoric question than stats…” I said. (I was wrong about that; interval estimates come up later. Read on.)

Continue reading An Unexpected Expected-Value Problem, and What Was Wrong With It