Can Data Change Your Mind?

In our study of probability, I’ve tried to keep theoretical and empirical probability as close together as possible, using manual and computer simulation to illustrate how reality can match theory, and how it varies about it. This is all in line with my Big Plan of approaching inference though randomization rather than from the traditional Normal-approximation direction. Heck, maybe I will dump the theoretical, but probably not.

Anyhow, this leads to why we need to include the basics of inference as early as possible. Two reasons:

  • It’s why we have probability in a stats course after all; so let’s make our tools useful as soon as possible
  • The underlying logic of inference is hard, so it may take lots of time and repetition to get it right. And I’m willing to sacrifice lots and lots of other content for that understanding.

(I wonder, right now, if that’s such a good idea. I mean, doing less content more deeply sounds right, but what inference do we really need? It may be better to pick something else—but not this year.)

During our first encounters with inference, I came across a realization I’d never had. As usual, it’s obvious, and I’m sure many have thought of this before, but here goes. Consider this problem:

“Spots” McGinty claims that he can control dice. Skeptical, you challenge him to roll five dice and get as high a sum as possible. He rolls three sixes, a five, and a four—for a total of 27. What do you think? Can he control dice?

graph of simulated data showing how rare 27 is.
The tail of the (simulated) distribution of the sum of five dice.

The students are supposed to make a simulation in Fathom in which they repeatedly roll five fair dice and sum, and then look at the distribution of sums. They compare the 27 to the distribution, and should conclude that getting 27 or more with truly random dice is rare, so there’s evidence to support the notion that he does have some control.

Well. Student work rolled in, and some of it was like, “the probability of rolling 27 or more is really low, but no one can control dice, so he’s really lucky.”

How much data do you need to change your mind?

This is a really interesting response to the simulation. It brings your prior opinions into play (and may therefore support the idea of doing a Bayesian number in the introductory course, but not this year!), namely, if you basically don’t believe in the effect we’re trying to observe, it will take more data to convince you than if, you’re “in favor” of the effect. For example, if it’s the point of your research.

I’m intrigued by this notion that we need a different P value, and/or a larger effect size, to convince us of some things than others. It depends on whether we’re personally inclined to find the effect real or not.

This issue is at the center one of the more intriguing talks I’ve heard recently, the one by Jessica Utts at ICOTS last summer. Here is the link to the conference keynotes; you will also see a lesser-known talk by the astounding Hans Rosling as well at three more good ones by other luminaries. The Utts talk asks us, rather brilliantly, to confront our own preconceptions and address that question: how much data do you need to change your mind?

Simple Sampling Distribution Simulation in Fathom

What we're looking for. Result from 500 runs of the simulation.

Yesterday’s APstat listserve had a question about Fathom:

How do I create a simulation to run over and over to pick 10 employees.  2/3 of the employees are male

Since my reply to the listserve had to be in plain old text, I thought I’d reproduce it here with a couple of illustrations…

There are at least two basic strategies. I’ll address just one; this is the quick one and uses random number-ish functions. The others mostly use sampling. If you use TPS, I think it’s Chapter 8 of the Fathom guide that explains them in excruciating detail 🙂

Okay: we’re going to pick 10 employees over and over from a (large) population in which 2/3 of the employees are male.

(Why large? To avoid “without-replacement” issues. If you were simulating layoffs of 10 employees from an 18-employee company, 12 of whom were male, you would need to use sampling and make sure you were sampling without replacement.)

(1) Make a collection with one attribute, sex, and 10 cases

(2) Give the sex attribute this formula:

randomPick( “male”, “male”, “female”)

Continue reading Simple Sampling Distribution Simulation in Fathom