A couple posts ago, I said that I liked my learning goals (i.e., standards) pretty much, but that they were really different from one another.

But shouldn’t standards be kind of similar in size and type of material? Maybe not.

Let’s assume that we should assess what we care about, and that standards represent pretty directly what we assess. So if I care about different kinds of things, I should try to write standards that reflect that. I think I was pretty successful at making some standards that fit content-y topics and others that demand big, broad, synthesis—at least for a noob—but I’m puzzled about skills.

I do care that students know how to use Fathom to do particular things.

This is because I believe that doing those things with reasonable fluency will help them understand the content.

Still, Fathom proficiency is *not* content. So should I assess only the desired result?

On the other hand, some “Fathom” standards give some kids a chance to master something.

As an example, let’s look at two “mature” learning goals. That is, they’re from late in the course when I had gotten better, or at least faster:

27 Fathom: Three-Collection Simulations (scrambling and measures)

- 27.1 Given an appropriate situation (comparing some variable across two groups) define a sensible measure to describe the observed difference; compute that test statistic.

- 27.1.1 For quantitative variables, use measures of center (and probably difference or ratio)
- 27.1.2 For categorical variables, the measures probably use proportion or count
- 27.2 Create a sampling distribution of that statistic using scrambling
- 27.3 Use that distribution and the test statistic to find the empirical probability that the test stat could arise if there were no association between the group membership and the variable under study.
28 Basics of Inference

- 28.1 Understand these terms as applied to any of the Fathom simulations we have been doing:

- 28.1.1 Sampling distribution
- 28.1.2 Test statistic
- 28.1.3
P-value- 28.1.4 Null hypothesis
- 28.2 Given an analysis with a sampling distribution and a test statistic,

- 28.2.1 Calculate the
P-value- 28.2.2 Understand that the
P-value is the probability that—if the null hypothesis were true—you would get a value as extreme as the test statistic- 28.2.3 Correctly interpret a low
P-value (e.g., it’s implausible that this value is due to chance)- 28.2.4 Correctly interpret a high
P-value (e.g., we can’t rule out chance as the reason for this value)

So LG27 is just skills. Complicated skills, but skills nevertheless. LG28 is the meat of inference. My gut tells me to keep them both, and keep them separate. But I’d be interested in what others think (including a Tim with more experience…)

**In the cold light of morning**: Interestingly, LG27 (above) is not just mechanical skills; part of the point is making the connection between a real-life situation and an appropriate statistical technique, *especially* including developing the measure—the number that tells how big the effect is that you have noticed. And LG28 is not just the meat of inference: 28.1 specifically is talking about applying terms in the context of a Fathom simulation. Does that make it mechanical or too software-specific? I don’t think so. Looking back, I think I wrote it that way because I believe that if students can do this, they actually understand how inference works. Fathom itself uses none of these terms, so identifying them in the Fathom context means you have to understand them pretty thoroughly.

Having said that, I see that the first three are really different from the fourth, “null hypothesis.” They don’t actually exist outside an analysis, so you can’t really talk about a sampling distribution (say) without imagining an analysis, probably on a computer, or actually doing one. We can talk about *null hypothesis* without any of that, though; it arises directly out of a situation and noticing something of interest.

Which may explain why it made sense to me to do exercises where I had students write the null hypothesis for a number of situations.