Statistics is (among many things) the art of using data to make decisions in the face of uncertainty. But what factors besides the data influence the decision? Recently, we did problems like these:
20% of Californians aged 65–74 have been diagnosed with diabetes (from that Kaiser site). A group of 6 friends (the protoGeezers, all age 70) is sitting around the table at Denny’s waiting for their biscuits and gravy, and the talk gets round to health complaints. They notice that four of them have diabetes.
Is this group especially sickly, or could it be just by chance? (Better: is it plausible that it’s chance?)
Same restaurant, and 6 of their 70-year-old neighbors are waiting for their green salads. They’re talking about health, too, and they notice that none of them have diabetes. They congratulate each other on their good health.
Based on the data, can you conclude whether this group is especially healthy?
Students make the relevant simulation in Fathom (a typical result appears in the graph) and see whether it’s plausible that a randomly-chosen group of 6 from that age range would have four—or zero—diabetics. The orthodox answers are no and yes, respectively. We’re supposed to say that the biscuits group is sickly (after all, P < 0.05) but the salad group is overreaching.
In both cases, however, students brought up the surrounding context, as in (spelling intact…)
- P(4+diabetes)=4%. There is always ths possibility that 4 of 6 people being at the table is just chance, but I would have to say that this group of people is a family of a diabetes support group.But it could be because that they always eat pancakes that they have diabetes. (My comment: it’s biscuits and gravy, not pancakes!)
- we only got 13 out of 1000 cases where the group had 4 or more diabetics, so the biscuits are making them sick
- There is a 30% chance that this group is random, however, we know there food choice is salad, so since they are eating healthy, we can conclude that this group is really healthy.
- There is more of a chance that this could have been just by chance, but it could be explained because they only eat salads and they are especially healthy. P(no diabetics) = 26.4%.
This is related to the previous post on whether data can change your mind, but puts a different twist on it. Here, students seem to use contextual hints in the problem statement—and their own preconceptions about healthy food—to help them decide which way to go.
On the one hand, I’m pleased that they do this; we should bring outside knowledge to bear. We should allow for possibilities that are outside the narrow problem statement. The last response above seems particularly astute. After all, a group that eats salads could be fundamentally different from the 65–74 group as a whole. They may indeed be healthier. To make a better analysis, we need more data, for example, the diabetes rate broken down by eating habits.
But I’m worried too. You can’t really tell from the student response whether the student really was thinking that way or just believed that the salads had to bear on the problem.
Put another way, it could be an example of our human tendency to look for patterns and reasons when, in fact, things may be due to chance alone. We often let context and preconception sway our decisions and opinions. As a result, it’s harder (it seems) to say that we don’t have evidence for some effect. That is, it’s harder (in orthodox statSpeak) to fail to reject the null.
This could be the last math class some of these kids take. What’s the habit of mind they should take to college? To bring all of their knowledge to bear? To find numbers to justify the decision they would make anyway? To look only at the data? Or do we have time to help them balance it all?
And on the assessment front, how will I write the standard—the learning goal—that goes with this, and how will I tell if they get it?