Actually teaching every day again has seriously cut into my already-sporadic posting. So let me be brief, and hope I can get back soon with the many insights that are rattling around and beg to be written down so I don’t lose them.

Here’s what I just posted on the apstat listserv; refer to the illustration above:

I’ve been trying to understand Bayesian inference, and have been blogging about my early attempts both to understand the basics and to assess how teachable it might be. In the course of that (extremely sporadic) work, I just got beyond simple discrete situations, gritted my teeth, and decided to tackle how you update a prior distribution of a parameter (e.g., a probability) and update it with data to get a posterior distribution. I was thinking I’d do it in Python, but decided to try it in Fathom first.

It worked really well. I made a Fathom doc in which you repeatedly flip a coin of unknown fairness, that is, P( heads ) is somewhere between 0 and 1. You can choose between two priors (or make your own) and see how the posterior changes as you increase the number of flips or change the number of heads.

Since it’s Fathom, it updates dynamically…

Not an AP topic. But should it be?

Here’s a link to the post, from which you can get the file. I hope you can get access without being a member. Let me know if you can’t and I’ll just email it to you.

It was supposed to be a lesson about interval estimates.

The Original Plan

Here’s what I did: I bought ten tangerines from the cafe, so each pair of students would have its own personal tangerine. They were going to weigh the tangerine (I swiped a balance from Chemistry, good to 0.1 grams) and write the value on the white board. Then, when we had all ten, we’d enter the data and have something good for a bootstrap. We would see whose tangerines were outside the 90% interval, muse about how our impression of the mean weight of tangerines had changed since we weighed our own tangerines, and discuss how it was possible that more than 10% of the fruit was outside the 90% interval.

As usual, other activities took longer than I thought and most of the class was not ready to weigh their tangerines.

Mulan and Lisel were ready, though, so I had them weigh the tangerines and put the weights on the board. That way we could at least do the bootstrap about some actual right-there data.

But we didn’t even get to that, so I saved the tangerines for the next day. And here’s where the wonderful thing happened.

The two girls had not only recorded the data, they had numbered the tangerines and labeled them with a sharpie.

Opportunity Taken

So the next class, when we were ready to weigh them, I could ask the students whether they thought the weights would be the same today (Wednesday) as they had been last class (Monday). After a brief discussion, they agreed that since eventually they would be dried-out “desert tangerines,” they would have dried out a little (a kid got to use osmosis and was proud of himself) and they would weigh a little less.

The classic randomization procedure in Fathom has three collections:

a “source” collection, from which you sample to make

a “sample” collection, in which you define a statistic (a measure in Fathom-ese), which you create repeatedly, creating

a “measures” collection, which now contains the sampling distribution (okay, an approximate sampling distribution) of the statistic you collected.

This is conceptually really difficult; but if you can do this (and understand that the thing you’re making is really the simulation of what it would be like if the effect you’re studying did not exist—the deeply subjunctive philosophy of the null hypothesis, coupled with tollendo tolens…much more on this later), then you can do all of basic statistical inference without ever mentioning the Normal distribution or the t statistic. Not that they’re bad, but they sow confusion, and many students cope by trying to remember recipes and acronyms.

My claim is that if you learn inference through simulation and randomization, you will wind up understanding it better because (a) it’s more immediate and (b) it unifies many statistical procedures into one: simulate the null hypothesis; create the sampling distribution; and compare your situation to that.

Ha. We’ll see. In class, we have just begun to look at these “three-collection” simulations. I made a video demonstrating the mechanics, following the one on one- and two-collection sims described in an earlier post. They are all collected on YouTube, but here is the new one.

I’ve posted recently about “flipping the classroom,” the idea of putting the exposition—the lecturing—in little digestible vodcasts to be watched at home, (ideally) leaving more time for discussion, one-on-one work, etc., and (ideally) preventing me from nattering on and boring my students.

In that effort I made a series of vids about probability. Now we’re making simulations in Fathom, exploring empirical probability, and beginning on the road to inference. (We’re avoiding the orthodox terminology for now: don’t tell the students, but they’re simulating the conditions of the null hypothesis in order to compare the test statistics to the sampling distributions they create in the simulations. See the post about randomization.)

It’s going OK, but once you use randomness and make measures, you’re no longer in beginning Fathom. It’s conceptually harder as a whole, and the mechanics of the software inevitably ramp up in difficulty as well. So I’ve made a video that’s all about the mechanics of doing this in Fathom with one and two collections. (The three-collection case is coming…)

You wanna see it? Here it is:

Anyway, in that effort, I thought that the easy-peasy way to make the videos—using Keynote—was not sufficient. So I used Camtasia Studio, which was really fun and worked fine.

I’m looking into ScreenFlow for capture as well, and Vimeo for distribution.

Note: I had trouble for a while with getting the resolution right in YouTube. Coulda sworn that one of the Camtasia presets for YouTube was 480 x 640, but it’s 380 x 640. Text came out looking crummy, like this:

We’re starting to learn about probability. Surely one of the quintessential settings is rolling two dice and adding. I’ll try to walk that back another time and rationalize why I include it, but for now, I want students to be able to explain why seven is more likely than ten. I want them to have that archetypal diagram in their heads.

But starting with the theoretical approach won’t go very well. Furthermore, with my commitment to data and using randomization for inference, an empirical approach seems to make more sense and be more coherent. So that’s what I’m trying.

The key lesson for me for this report—related to “trust the data”—is that actual data, with the right technology, can illuminate the important concepts, such as independence. This makes me ask how much theoretical probability we need, if any.

What Happened in Class

To do the Dice Sonata (previous post), I had given each student two dice: a red one and another one. They rolled them 50 times, recording each result twice: once to do the sonata, so they could make the graph of actual results by hand, and also on the computer in a Fathom Survey so we could easily assemble the results for the entire class.

If you haven’t used Fathom Surveys, you can think of it as a Google form that you can later drag directly into Fathom. The key thing here is that they recorded the red die and the other die separately. When we were done, we had 838 pairs.

This was Thursday, the second class of the semester. After students discussed the homework, and saw that their sets of 50 rolls didn’t produce graphs with their predicted shapes, we went to the computers to see if things looked any different with more data. To make the relevant graph, students had to make a new attribute (= variable = column ) to add the two values—which they already knew how to do. Here is the bottom of the table and the graph:

One could stop here. But Fathom lets us look more deeply using its “synchronous selection” feature (stolen lovingly from ActivStats): what you select in any view is selected in all views.

Intrepid readers will remember the form of a Sonata for Data and Brain: you have three parts, prediction, data (or analysis or measurement), and comparison. As a first probability activity, we were to roll two dice and sum, and then repeat the process 50 times. The prediction question was, if you graphed these 50 sums, what would it look like? Then, of course, they were to do it with actual physical dice (more on the data from that next post) and then compare their graphs of real data with their predictions.

Note that we’re starting this entirely empirically. We might expect these juniors and seniors to “know the answer” because they probably did it theoretically and with real dice back in seventh grade. We would be wrong.

A key point: in the post referenced above, I bemoaned the problem of getting the kids to make explicit predictions. What’s great about doing this reflective writing (especially in this community) is that it prompts head-smacking realizations about practice and how to improve it, to wit: have the students turn in the prediction before they do the activity. In this case, I had them predict at the end of the first day (Tuesday) and turn it in; I copied them and taped the originals to their lockers before lunch; and today (Thursday), the next class, they turned in the Sonatas as homework. (I have not looked at them yet.)

Sample Predictions

Remember, I asked for a graph, and that’s what I got. We discussed the phenomenon briefly to establish basic understanding, e.g., that the possible numbers are [2–12]. But for your viewing pleasure, a few actual graphs appear at right.

The two outliers appear below. Even so, notice the variety. What does it say about student understandings (or preconceptions) about distributions?

In any case, my hope here was that when they plotted real data, they would be appalled by how not-following-the-pattern a collection of fifty rolls would be.

Yesterday’s APstat listserve had a question about Fathom:

How do I create a simulation to run over and over to pick 10 employees. 2/3 of the employees are male

Since my reply to the listserve had to be in plain old text, I thought I’d reproduce it here with a couple of illustrations…

There are at least two basic strategies. I’ll address just one; this is the quick one and uses random number-ish functions. The others mostly use sampling. If you use TPS, I think it’s Chapter 8 of the Fathom guide that explains them in excruciating detail 🙂

Okay: we’re going to pick 10 employees over and over from a (large) population in which 2/3 of the employees are male.

(Why large? To avoid “without-replacement” issues. If you were simulating layoffs of 10 employees from an 18-employee company, 12 of whom were male, you would need to use sampling and make sure you were sampling without replacement.)

(1) Make a collection with one attribute, sex, and 10 cases