Inference for Slope: Fathom How-to

Too long again since the last post.

Here we have something interesting that’s outside the narrative thread. On the AP Stat list serve, Chris Talone asked this question:

Is there a way to set up a Fathom simulation to illustrate how the slope of a line of best fit will vary when choosing ordered pairs from a population of ordered pairs?  My students are having a hard time understanding the purpose of the linear regression t-interval and the linreg t-test.  I would like for them to see how the slope can vary depending on the sample of points chosen.  Ideally, I’d like to set up a population of ordered pairs, graph a scatterplot and find the line of best fit for the population, then have Fathom randomly select 2, 5, or 7 of those ordered pairs, graph a scatterplot of the sample chosen, find the line of best fit for the sample chosen, and also plot the sample slope on a dot plot, and then repeat many many times….

I posted a response there, but we can’t give illustrations. We can here! This is where we’re heading:

We've sampled 100 times with sample sizes of 2, 5, 15, and 30 (the size of our original collection). A box plot is good for comparing.

How do we do this in Fathom? Read on…

Step By Step

1. Set up your source collection, the ordered pairs.

source collection
scatter plot of correated data. Click to enlarge.
2. Set up a sample collection, just as you would always sample in Fathom. Set it to sample 2 cases.
Right-click on the source collection and choose Sample Cases.
When you first sample, this inspector appears. Change the count to 2. Click Sample More Cases to take a new sample.

3. In the Sample collection, set up your measures:

  1. Make one called n (for the sample size); its formula is count( )
  2. Make one for the slope you want to calculate, call it slope if you wish; formula: linRegrSlope( predictor, response ), where predictor and response are the names of your attributes.
The "measures" panel in the inspector for the sample collection. We've defined the two measures. Their formulas and values appear.

4. Collect measures! (You now have three collections: your source, the sample, which changes, and the measures collection)

5. Make a dot plot of slope. This is the sampling distribution of slopes for a sample size of 2.

100 slopes from samples of size 2. Notice how we even get negative slopes!

6. Change the sample size (in the inspector for the sample collection) and collect more. But you want to separate them by sample size….

7. Drag n to the “other” axis of the dot plot, holding down shift. This will split the plot categorically by sample size, so you can see how the spread of the sample slope depends on n.

We've sampled 100 times with sample sizes of 2, 5, 15, and 30 (the size of our original collection). A box plot is good for comparing.
  • If you sample without replacement, students can see how the sample slopes are all the same, i.e., the population slope.
  • If you sample with replacement (default, shown) you have a bootstrap distribution for slope. If you find (for example) the 5th and 95th percentile of these values, you have a 90% bootstrap interval for the poulation slope.
Advertisements

Published by

Tim Erickson

Math-science ed freelancer and sometime math teacher. In 2014–15, at Mills College in Oakland, California.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s