Modeling Hexnut Mass

HexnutIntroLet me encourage you to go to your hardware store and get some hexnuts. You won’t regret it. Now let’s see if I can write a post about it in under, like, four hours.

(Also, get a micrometer on eBay and a sweet 0.1 gram food scale. They’re about $15 now.)

Long ago, I wrote about coins and said I would write about hexnuts. I wrote a book chapter, but never did the post. So here we go. What prompted me was thinking different kinds of models.

I have been focusing on using functions to model data plotted on a Cartesian plane, so let’s start there. Suppose you go to the hardware store and buy hexnuts in different sizes. Now you weigh them. How will the size of the nut be related to the weight?

A super-advanced, from-the-hip answer we’d like high-schoolers to give is, “probably more or less cubic, but we should check.” The more-or-less cubic part (which less-experienced high-schoolers will not offer) comes from several assumptions we make, which it would be great to force advanced students to acknowledge, namely, the hexnuts are geometrically similar, and they’re made from the same material, so they’ll have the same density. Continue reading Modeling Hexnut Mass

Model Shop! One volume done!

The Model Shop, Volume 1Hooray, I have finally finished what used to be called EGADs and is now the first volume of The Model Shop. Calling it the first volume is, of course, a treacherous decision.

So. This is a book of 42 activities that connect geometry to functions through data. There are a lot of different ways to describe it, and in the course of finishing the book, the emotional roller-coaster took me from great pride in what a great idea this was to despair over how incredibly stupid I’ve been.

I’m obviously too close to the project.

For an idea of what drove some of the book, check out the posts on the “Chord Star.”

But you can also see the basic idea in the book cover. See the spiral made of triangles? Imagine measuring the hypotenuses of those triangles, and plotting the lengths as a function of “triangle number.” That’s the graph you see. What’s a good function for modeling that data?

If we’re experienced in these things, we say, oh, it’s exponential, and the base of the exponent is the square root of 2. But if we’re less experienced, there are a lot of connections to be made.

We might think it looks exponential, and use sliders to fit a curve (for example, in Desmos or Fathom. Here is a Desmos document with the data you can play with!) and discover that the base is close to 1.4. Why should it be 1.4? Maybe we notice that if we skip a triangle, the size seems to double. And that might lead us to think that 2 is involved, and gradually work it out that root 2 will help.

Or we might start geometrically, and reason about similar triangles. And from there gradually come to realize that the a/b = c/d trope we’ve used for years, in this situation, leads to an exponential function, which doesn’t look at all like setting up a proportion.

In either case, we get to make new connections about parts of math we’ve been learning about, and we get to see that (a) you can find functions that fit data and (b) often, there’s a good, underlying, understandable reason why that function is the one that works.

I will gradually enhance the pages on the eeps site to give more examples. And of course you can buy the book on Amazon! Just click the cover image above.


The Index of Clumpiness, Part Two

Last time, we discussed random and not-so-random star fields, and saw how we could use the mean of the minimum distances between stars as a measure of clumpiness. The smaller the mean minimum distance, the more clumpy.

Star fields of different clumpiness, from K = 0.0 (no stars are in the clump; they’re all random) to K = 0.5 to K = 1.0 (all stars are in the big clump)

What other measures could we use?

It turns out that the Professionals have some. I bet there are a lot of them, but the one I dimly remembered from my undergraduate days was the “index of clumpiness,” made popular—at least among astronomy students—by Neyman (that Neyman), Scott, and Shane in the mid-50s. They were studying Shane (& Wirtanen)’s catalog of galaxies and studying the galaxies’ clustering. We are simply asking, is there clustering? They went much further, and asked, how much clustering is there, and what are its characteristics?

They are the Big Dogs in this park, so we will take lessons from them. They began with a lovely idea: instead of looking at the galaxies (or stars) as individuals, divide up the sky into smaller regions, and count how many fall in each region.

Continue reading The Index of Clumpiness, Part Two

The Index of Clumpiness, Part One

1000 points. All random. The colors indicate how close the nearest neighbor is.

There really is such a thing. Some background: The illustration shows a random collection of 1000 dots. Each coordinate (x and y) is a (pseudo-)random number in the range [0, 1) — multiplied by 300 to get a reasonable number of pixels.

The point is that we can all see patterns in it. Me, I see curves and channels and little clumps. If they were stars, I’d think the clumps were star clusters, gravitationally bound to each other.

But they’re not. They’re random. The patterns we see are self-deception. This is related to an activity many stats teachers have used, in which the students are to secretly record a set of 100 coin flips, in order, and also make up a set of 100 random coin flips. The teacher returns to the room and can instantly tell which is the real one and which is the fake. It’s a nice trick, but easy: students usually make the coin flips too uniform. There aren’t enough streaks. Real randomness tends to have things that look non-random.

Here is a snap from a classroom activity: Continue reading The Index of Clumpiness, Part One

Coming (Back) to Our Census

Reflecting on the continuing, unexpected, and frustrating malaise that is Math 102, Probability and Statistics, one of my ongoing problems has been the deterioration of Fathom. It shouldn’t matter that much that we can’t get Census data any more, but I find that I miss it a great deal; and I think that it was a big part of what made stats so engaging at Lick.

So I’ve tried to make it accessible in kinda the same way I did the NHANES data years ago.

This time we have Census data instead of health. At this page here, you specify what variables you want to download, then you see a 10-case preview of the data to see if it’s what you want, and then you can get up to 1000 cases. I’m drawing them from a 21,000 case extract from the 2013 American Community Survey, all from California. (There are a lot more cases in the file I downloaded; I just took the first 21,000 or so so we could get an idea what’s going on.)

Continue reading Coming (Back) to Our Census

Bayes is Baaack

Bayes illustration
Screen shot from Fathom showing prior (left) and posterior (right) distributions for a situation where you flip a coin 8 times and heads comes up once. Theta is the imagined probability of heads for the coin.

Actually teaching every day again has seriously cut into my already-sporadic posting. So let me be brief, and hope I can get back soon with the many insights that are rattling around and beg to be written down so I don’t lose them.

Here’s what I just posted on the apstat listserv; refer to the illustration above:

I’ve been trying to understand Bayesian inference, and have been blogging about my early attempts both to understand the basics and to assess how teachable it might be. In the course of that (extremely sporadic) work, I just got beyond simple discrete situations, gritted my teeth, and decided to tackle how you update a prior distribution of a parameter (e.g., a probability) and update it with data to get a posterior distribution. I was thinking I’d do it in Python, but decided to try it in Fathom first.

It worked really well. I made a Fathom doc in which you repeatedly flip a coin of unknown fairness, that is, P( heads ) is somewhere between 0 and 1. You can choose between two priors (or make your own) and see how the posterior changes as you increase the number of flips or change the number of heads.

Since it’s Fathom, it updates dynamically…

Not an AP topic. But should it be?

Here’s a link to the post, from which you can get the file. I hope you can get access without being a member. Let me know if you can’t and I’ll just email it to you.

How Good is the Bootstrap?

There has been a lot of happy chatter recently about doing statistical tests using randomization, both in the APStat listserve and at the recent ICOTS9 conference. But testing is not everything inferential; estimation is the other side of that coin. In the case of randomization, the “bootstrap” is the first place we turn to make interval estimates. In the case of estimating the mean, we think of the bootstrap interval as the non-Normal equivalent of the orthodox, t-based confidence interval. (Here is a youtube video I made about how to do a bootstrap using Fathom.) (And here is a thoughtful blog post by list newcomer Andy Pethan that prompted this.)

But Bob Hayden has recently pointed out that the bootstrap not particularly good, especially with small samples. And Real Stats People are generally more suspicious of the bootstrap than they are of randomization (or permutation) tests.

But what do we mean by “good”?

Continue reading How Good is the Bootstrap?