I’m working on a curriculum project associated with the Core Standards. In the high-school section on “Interpreting Categorical and Quantitative Data,” it occurred to me—as it had before when I was designing physics materials—that we really care about relationships instead of simple answers.
In physics, it was all about functions. You’re rolling a ball down a ramp. How long does it take? It depends. On what? Well, the steepness of the ramp, how far it has to roll, the moment of inertia of the ball, and so forth. We would rather have students construct the function—how the time depends on all these quantities—than simply to answer the question (12.2 seconds) for some specific setup.
When I first was working on materials for Fathom, I studied what stats education looked like. As you can imagine, very early on, we look at one variable (everybody repeat, “shape, center, spread!”) and learn about box plots and mean and histograms and mean absolute deviation and all that before we look at more than one variable. That only makes sense, right?
Well. We started playing with Fathom. One of the first things I did was get access to U.S. Census microdata. (It’s much easier now—it’s in the File/Import menu—but back then I had to work really hard to get the data.) Of course we made a histogram of age to see if it works.
And … my … eyes … started … to … close. Shape, center, and spread. Really? Who cares? But then, the Epiphany Happened. We made a second graph—marital status, to test out a categorical variable—and selected the “never married” bar.
And the age graph immediately became interesting:
When we took this to a class, students immediately understood what was going on, and were obviously more interested in what the graph was saying. Part of this was because it was more about them: look! there are people between 15 and 20—that’s us!—who are married!
But more deeply (I conjecture here) it was more interesting because it was showing a relationship between variables. Age is related to marital status, and that’s more interesting than either variable on its own.
So, back to the core standards. Of course they begin with univariate stuff like measures of center. But why do we care about measures of center? So we can compare: one group to another, this year to last year, whatever. So when we’re learning about median and making box plots, it’s more interesting to see two of them than one (500-case sample, 2000 Census):
Now there is a story. And there are all kinds of questions you can ask about that graph. What percentage of the women earn less than the median man? How much money is that? What does that horrifying statistic actually mean?
And this, again, is about the relationship between two variables, in this case the categorical sex and the numerical income. In an ongoing stats class, we would eventually do inference on this situation, but really, looking at this bivariate relationship is the most interesting way to learn about the univariate statistics.
The core-standards standard goes on to specify looking at relationships between categoricals (in two-way tables) and between numericals (in scatter plots). So all three possible combinations of two variable types appear. Great!
It leaves me with some observations and questions:
- Isn’t it odd that when they talk about parallel box plots, they don’t recognize that as showing a relationship between a categorical and a numerical? They always talk about that as “comparing two groups” or “comparing two data sets.” Would explicitly calling it one of three possible relationships help us organize our thinking better?
- It’s clear that a scatter plot is the best (or at least the standard) numerical-numerical graph. Parallel box plots are almost but not quite standard for numerical-categorical, but it’s not as clear. And for categorical-categorical, I think what Fathom calls the ribbon chart is the champion, with the stacked percentage bar chart coming close. But we don’t even agree what these graphs should be called! Why don’t we have a standard? (One conjecture: it’s so easy to represent cat-cat as a two-way table, that has become the default representation. Alas, people have a hard time understanding it. We need a good graph!)
- Our three settings have associated inference procedures, but they’re by no means equal in importance in intro stats:
- Num-Cat: difference of means t, randomization/scrambling
- Cat-Cat: chisquare, scrambling
- Num-Num: inference for slope? Not as common
- Similarly, we have measures of strength of association (as opposed to significance):
- Num-Cat: effect size (difference of center ÷ spread)
- Num-Num: correlation coefficient
- Cat-Cat: difference of proportion; relative risk; odds ratio; and a bevy of other plausible measures. Why isn’t there a more standard approach to learning about this combination?
And finally, the real reason I wrote this:
- Is it really true that we could usefully organize our curricular thinking around relationships? Or are we missing something important by skipping a purely univariate treatment?
Note that in my vision, you still need to learn univariate techniques—you just do so in a context where you’re comparing them to other centers and spreads.
In a different project, another writer gave me a counterexample in this problem situation:
Natalie’s mom discovers that she texts 200 times per month. She’s astonished and horrified, and demands that her daughter slow down. Natalie tells her mom that 200 texts a month is not really a big deal, that she’s not unusual. They collect data from other kids and find the purely univariate distribution. (Here the problem supplies the data.) They learn that Natalie is in the second quartile after all.
Is this an unusual exception to the “rule” that everything interesting is about a relationship? Can we ignore it? Or is it the tip of an iceberg I just haven’t been looking at?
Agree, agree, agree! In fact, the framework you lay out here has often been a feature of some of the work I do with my students when we prepare for muti-until assessments or projects.
One area where a single univariate display is meaningful / helpful is when we compare a single case relative to the entire distribution (using percentiles, quartiles, z scores, etc) Knowing one’s position relative to the rest of one’s peers is sometimes important.
Also, we often compare a distribution relative to some “threshold” or “standard” not determined by the data:
* Do we have evidence that bar patrons tend to be drunk (BAC > 0.08) after leaving Jake’s bar? Compare the % cases above 0.08.
* Do we have evidence that most watches are within one minute of the correct time?
* How do sleep habits of students at our school compare to the AMA’s recommendation of at least 8 hours / night of sleep?
Except for questions like this, I usually don’t spend much / any time on having kids work through single-variable scenarios. They are still comparing individuals/ features of a distribution to something. A comparison is still essential. But wouldn’t most one-sample tests for means/ proportions fall under this umbrella?
Yeah, I was wondering about univariate comparison too in the context of looking at a sampling distribution and comparing it to a test statistic — but I was looking at that as more advanced: inferential instead of exploratory data analysis. Yours are good examples of situations you could imagine encountering in EDA and doing the IIT (interocular impact test) to answer the question 🙂
It seems to me that pedagogically it is more interesting to compare things or events. In the texting example, would it not be interesting to know if boys text as much as girls?
Absolutely. But it was at least an example where I could see some use in just looking at the distribution. Most questions in this area that I have seen lately do not pass the lameness test, e.g., “Here is a list of countries and their per capita incomes. What’s the standard deviation?” Ack!
Classical hypothesis testing is univariate: how unlikely is the observed data given the null model? That is essentially the question being asked in the texting example: the mother contends that the daughter is not a typical texter, but the daughter shows that the null model can’t be rejected.