Yesterday’s APstat listserve had a question about Fathom:
How do I create a simulation to run over and over to pick 10 employees. 2/3 of the employees are male
Since my reply to the listserve had to be in plain old text, I thought I’d reproduce it here with a couple of illustrations…
There are at least two basic strategies. I’ll address just one; this is the quick one and uses random number-ish functions. The others mostly use sampling. If you use TPS, I think it’s Chapter 8 of the Fathom guide that explains them in excruciating detail 🙂
Okay: we’re going to pick 10 employees over and over from a (large) population in which 2/3 of the employees are male.
(Why large? To avoid “without-replacement” issues. If you were simulating layoffs of 10 employees from an 18-employee company, 12 of whom were male, you would need to use sampling and make sure you were sampling without replacement.)
(1) Make a collection with one attribute, sex, and 10 cases
(2) Give the sex attribute this formula:
randomPick( “male”, “male”, “female”)
(I like this formula best for kids early in the course; for an alternative, start typing:
(3) The sex values fill in with random genders in the right proportions. Test by choosing Rerandomize from the Collection menu.
NOW presumably you want to create a sampling distribution of the number of males in the collection.
(4) Double-click the collection to open its Inspector. Go to the Measures tab. Measures are like attributes of the whole collection: statistics that summarize the set of data, rather than values about individuals
(5) Make a <new> measure called something like Nmales. Double-click its “Formula” box and give it a formula like this one:
Count( SEX = “male” )
Notice how it fills in and how, when you rerandomize, the number changes.
(6) With the collection selected, choose Collect Measures from the Collection menu. A new collection appears, called “Measures from <whatever>”.
(7) Make a table to see that now you have 5 values of Nmales. Cool. You want more!
(8) In the Collect Measures panel of the new collection’s inspector (it’s open by default) you can change the number of measures—the number of re-randomizations–you collect. You can also turn off animation if you don’t want to wait. On the other hand, if you graph the distribution of Nmales, you can watch it grow if you have animation on.
In any case, when you’re done you have a distribution like the one in the graph at the top of the post, and you can ask the important questions such as, how likely is it that all 10 chosen employees would be male if the whole thing were done by chance alone? Answer: not bloody.