Intrepid readers will remember the form of a Sonata for Data and Brain: you have three parts, prediction, data (or analysis or measurement), and comparison. As a first probability activity, we were to roll two dice and sum, and then repeat the process 50 times. The prediction question was, if you graphed these 50 sums, what would it look like? Then, of course, they were to do it with actual physical dice (more on the data from that next post) and then compare their graphs of real data with their predictions.
Note that we’re starting this entirely empirically. We might expect these juniors and seniors to “know the answer” because they probably did it theoretically and with real dice back in seventh grade. We would be wrong.
A key point: in the post referenced above, I bemoaned the problem of getting the kids to make explicit predictions. What’s great about doing this reflective writing (especially in this community) is that it prompts head-smacking realizations about practice and how to improve it, to wit: have the students turn in the prediction before they do the activity. In this case, I had them predict at the end of the first day (Tuesday) and turn it in; I copied them and taped the originals to their lockers before lunch; and today (Thursday), the next class, they turned in the Sonatas as homework. (I have not looked at them yet.)
Remember, I asked for a graph, and that’s what I got. We discussed the phenomenon briefly to establish basic understanding, e.g., that the possible numbers are [2–12]. But for your viewing pleasure, a few actual graphs appear at right.
The two outliers appear below. Even so, notice the variety. What does it say about student understandings (or preconceptions) about distributions?
In any case, my hope here was that when they plotted real data, they would be appalled by how not-following-the-pattern a collection of fifty rolls would be.
What We Did
So in class today, after re-seating students so they weren’t just hanging with their friends every time (another success, I think; time will tell) and turning back the projects from last semester, we did the going-over-the-homework, in pairs. The prompt was to talk about it briefly with special attention to the comparison: how was the data like your prediction, and how was it different?
Walking around, I heard and saw what I had hoped: that often, the “real” graphs had all kinds of bumps and pits (“Yours looks like a roller coaster”) not present in the predictions. We then shared with the whole class a little bit, and finally I asked the payoff question: what did you learn from this comparison?
That really it doesn’t matter; all the numbers might as well be the same.
(Teacher tries not to cringe.) Understandable, given our data. Anyone else?
That 50 rolls really aren’t enough to show the pattern. We may need more like 50,000.
Ta-daa! Really important insight, I said, and the need for bigger numbers is another important stats issue we’ll be looking at this semester.
At this point, we went to the computers to look at the data for the whole class, which we had collected using Fathom Surveys. Preview: the 800+ rolls show the theoretical pattern, but the technology gave us additional leverage to understand what was going on. More on that next post.
But first, comments on those graphs. Notice:
- Some are smooth lines while we also see histogram bars or dots. They just spent a semester with dots and histogram lines! Where did the smooth distributions come from?
- Some are curvy, like bell curves (one even labels it a bell curve), while others are more triangular. This potential misconception is understandable (the triangles are correct) since we see bell curves all the time.
Before we close, I want to show you two more prediction graphs that show a decided computer influence (at right). The top one shows values for every pair, plotted against roll number! The second is some sort of cumulative plot; that student was the only one to use percentile plots in the final project.
About that top graph: the student made exactly the same kind of graph for the actual data. Curious what got through for this activity, I asked how the two compared.
I noticed that in my prediction, the most of them were up around nine or ten, but in the real data it was more like seven or eight.
Isn’t that interesting? Even though the unorthodox graph may be inefficient by our standards, the important point came through.