An Empirical Approach to Dice Probability

dice probability diagram
Why seven is more likely than ten: the diagram I want them to have in their heads

We’re starting to learn about probability. Surely one of the quintessential settings is rolling two dice and adding. I’ll try to walk that back another time and rationalize why I include it, but for now, I want students to be able to explain why seven is more likely than ten. I want them to have that archetypal diagram in their heads.

But starting with the theoretical approach won’t go very well. Furthermore, with my commitment to data and using randomization for inference, an empirical approach seems to make more sense and be more coherent. So that’s what I’m trying.

The key lesson for me for this report—related to “trust the data”—is that actual data, with the right technology, can illuminate the important concepts, such as independence. This makes me ask how much theoretical probability we need, if any.

What Happened in Class

To do the Dice Sonata (previous post), I had given each student two dice: a red one and another one. They rolled them 50 times, recording each result twice: once to do the sonata, so they could make the graph of actual results by hand, and also on the computer in a Fathom Survey so we could easily assemble the results for the entire class.

If you haven’t used Fathom Surveys, you can think of it as a Google form that you can later drag directly into Fathom. The key thing here is that they recorded the red die and the other die separately. When we were done, we had 838 pairs.

This was Thursday, the second class of the semester. After students discussed the homework, and saw that their sets of 50 rolls didn’t produce graphs with their predicted shapes, we went to the computers to see if things looked any different with more data. To make the relevant graph, students had to make a new attribute (= variable = column ) to add the two values—which they already knew how to do. Here is the bottom of the table and the graph:

The data table and the graph of the sum. BTW: notice the "13?" Someone had entered 5 and 8 for the two dice, resulting in hilarity, accusations, and a good lesson about cleaning your data.

One could stop here. But Fathom lets us look more deeply using its “synchronous selection” feature (stolen lovingly from ActivStats): what you select in any view is selected in all views.

Continue reading An Empirical Approach to Dice Probability


A Road to Writing

As you may recall, the “mission statement” for the class is that each student will:

  • Learn to make effective and valid arguments using data
  • Become a critical consumer of data

Along about the beginning of November, we had been doing some problem sets from The Model Shop, and other activities from Data in Depth, and I had been getting laconic answers like “four” and “0.87%” and really wanted a little more meat. That is, getting the right answer is not the same as making an effective argument, or even telling a decent story. In a flash of d’oh! inspiration I realized that if I wanted it, I should assess it.

But there is a problem with that:  I have been constructing my standards (I’m calling then learning goals) as I go along, and had not figured out how to deal with the larger, mushier issues that are part of making valid and effective arguments using data.

This post is all about an at-least-partially-successful resolution.

Constructing learning Goals for Larger Pieces of Work

I love the kids in this class partly because they let me get away with, “this is a quiz and I want you to do your best but I really don’t know how to express the learning goals (a.k.a. standards) that go with it. So we’ll take the quiz first, OK? And figure out how to grade it later.” I explained to them what I was after in hand-wavey terms and off they went.

So they took the quiz (described later). Using their responses (and the pathologies therein),  I was able to construct learning goals for this kind of writing, in particular, for the final semester project I alluded to in the last couple of posts. And here they are, quoted, as if they were official or something (we start with Learning Goal 17): Continue reading A Road to Writing

Timers and Variability II

So what happened in class? First, you want to see the data, right?

Timer data. Stripes indicate groups.

The basic story so far is that maybe a week ago, I let the students take the measurements, uploading the data—so we could all get everyone’s measurements—using Fathom Surveys. That worked great, but there was of course not enough time to do the analysis, so that got postponed.

And we still haven’t quite gotten through it—though they have had a couple dollops of homework to make progress—at least partly because I’m not sure the best path to take. Next class—Thursday—I finally have enough time allocated to do more, and get to the bottom of something about variation; the next step in this thread is to do The Case of the Steady Hand.

So what actually happened and why am I a little at sea when the data are so interesting?

Continue reading Timers and Variability II

Simple Sampling Distribution Simulation in Fathom

What we're looking for. Result from 500 runs of the simulation.

Yesterday’s APstat listserve had a question about Fathom:

How do I create a simulation to run over and over to pick 10 employees.  2/3 of the employees are male

Since my reply to the listserve had to be in plain old text, I thought I’d reproduce it here with a couple of illustrations…

There are at least two basic strategies. I’ll address just one; this is the quick one and uses random number-ish functions. The others mostly use sampling. If you use TPS, I think it’s Chapter 8 of the Fathom guide that explains them in excruciating detail 🙂

Okay: we’re going to pick 10 employees over and over from a (large) population in which 2/3 of the employees are male.

(Why large? To avoid “without-replacement” issues. If you were simulating layoffs of 10 employees from an 18-employee company, 12 of whom were male, you would need to use sampling and make sure you were sampling without replacement.)

(1) Make a collection with one attribute, sex, and 10 cases

(2) Give the sex attribute this formula:

randomPick( “male”, “male”, “female”)

Continue reading Simple Sampling Distribution Simulation in Fathom

A First “Claim” Investigation

A slide showing another version of the instructions

These are something like the entire instructions for a mini-investigation that has taken much of the second and third of our class meetings:

Mess around with U S Census data in Fathom until you notice some pattern or relationship. Then make a claim: a statement that must be either true or false. Then create a visualization (in this case, a graph) that speaks to your claim. Then make one or two sentences of commentary. These go onto one or two PowerPoint slides.

The purpose is severalfold:

  • You get chance to play with the data
  • You learn more Fathom features, largely by induction or osmosis or something; in any case, you learn them when you need them
  • You get to direct your own investigation
  • You get practice communicating in writing—or at least slideSpeak
  • I get to see how you do on all these things
  • We all get to try out the online assignment drop-box

In fact, it has gone pretty well. We started on Wednesday (the second class) with my demonstrating how to get anything other than the default variables. I modeled the make-a-claim and make-a-graph part by showing how to compare incomes between men and women.

Continue reading A First “Claim” Investigation

Shakespeare, Cervantes, and Bush: Can you tell them apart statistically?

There’s a great activity at the beginning of Workshop Statistics where kids write a couple of sentences about why they’re taking the course, and then construct the distribution of word lengths. The main point is to ask, “are all word lengths the same?” Answer: no, duh. Right: they vary. It’s not that an individual word changes its length, but that the idea word length varies from word to word. So it’s a variable, in a way that’s a little different from the variables they’re used to from algebra.

276 word lengths from the "To be or not to be" soliloquy, Hamlet, Act III Scene I.

But what does the distribution look like? Rather than look it up, I found Hamlet’s “to be or not to be” soliloquy online, pasted it into my favorite text processor (TextMate) and did a bunch of global substitutions so that every word was on its own line. (I also stripped out hyphens and apostrophes and other punctuation, which may not always be appropriate, but never mind. But I think of ’tis as a three-, not a four-letter word.) Then a quick dump into a Fathom collection, and a new attribute (or variable) with a formula like stringLength(WORDS) and you’re all set. This process takes enough fluency that it’s an inappropriate activity for the kids in my class, at least, but the results are interesting enough to share, as in the illustration at right.

Continue reading Shakespeare, Cervantes, and Bush: Can you tell them apart statistically?

Clap Speed Follow-Up

clap sound image
The sound of two hands clapping really fast. Click the image to expand.

In this post, we saw Kent “Toast” French, the world’s fastest clapper, clap at a rate he claimed to be 14 claps per second. I said I thought I could use WireTap Studio to look at the data. Sure enough, it works; here is a screen shot of part of the audio. I get more like 13 cps, or maybe a little less. I have not looked through the whole sequence to see if he ever hit 14.

It would be lovely to use something like Fathom for the whole clip so we could calculate each interval and see how that changes over time.