## Should we have standards for mechanical skills?

A couple posts ago, I said that I liked my learning goals (i.e., standards) pretty much, but that they were really different from one another.

But shouldn’t standards be kind of similar in size and type of material? Maybe not.

Let’s assume that we should assess what we care about, and that standards represent pretty directly what we assess. So if I care about different kinds of things, I should try to write standards that reflect that. I think I was pretty successful at making some standards that fit content-y topics and others that demand big, broad, synthesis—at least for a noob—but I’m puzzled about skills.

I do care that students know how to use Fathom to do particular things.

This is because I believe that doing those things with reasonable fluency will help them understand the content.

Still, Fathom proficiency is not content. So should I assess only the desired result?

On the other hand, some “Fathom” standards give some kids a chance to master something.

As an example, let’s look at two “mature” learning goals. That is, they’re from late in the course when I had gotten better, or at least faster:

27 Fathom: Three-Collection Simulations (scrambling and measures)

• 27.1 Given an appropriate situation (comparing some variable across two groups) define a sensible measure to describe the observed difference; compute that test statistic.
• 27.1.1 For quantitative variables, use measures of center (and probably difference or ratio)
• 27.1.2 For categorical variables, the measures probably use proportion or count
• 27.2 Create a sampling distribution of that statistic using scrambling
• 27.3 Use that distribution and the test statistic to find the empirical probability that the test stat could arise if there were no association between the group membership and the variable under study.

28 Basics of Inference

• 28.1 Understand these terms as applied to any of the Fathom simulations we have been doing:
• 28.1.1 Sampling distribution
• 28.1.2 Test statistic
• 28.1.3 P-value
• 28.1.4 Null hypothesis
• 28.2 Given an analysis with a sampling distribution and a test statistic,
• 28.2.1 Calculate the P-value
• 28.2.2 Understand that the P-value is the probability that—if the null hypothesis were true—you would get a value as extreme as the test statistic
• 28.2.3 Correctly interpret a low P-value (e.g., it’s implausible that this value is due to chance)
• 28.2.4 Correctly interpret a high P-value (e.g., we can’t rule out chance as the reason for this value)

So LG27 is just skills. Complicated skills, but skills nevertheless. LG28 is the meat of inference. My gut tells me to keep them both, and keep them separate. But I’d be interested in what others think (including a Tim with more experience…)

In the cold light of morning: Interestingly, LG27 (above) is not just mechanical skills; part of the point is making the connection between a real-life situation and an appropriate statistical technique, especially including developing the measure—the number that tells how big the effect is that you have noticed. And LG28 is not just the meat of inference: 28.1 specifically is talking about applying terms in the context of a Fathom simulation. Does that make it mechanical or too software-specific? I don’t think so. Looking back, I think I wrote it that way because I believe that if students can do this, they actually understand how inference works. Fathom itself uses none of these terms, so identifying them in the Fathom context means you have to understand them pretty thoroughly.

Having said that, I see that the first three are really different from the fourth, “null hypothesis.” They don’t actually exist outside an analysis, so you can’t really talk about a sampling distribution (say) without imagining an analysis, probably on a computer, or actually doing one. We can talk about null hypothesis without any of that, though; it arises directly out of a situation and noticing something of interest.

Which may explain why it made sense to me to do exercises where I had students write the null hypothesis for a number of situations.

## What Went Right

Yikes. Another couple months. And a lot has happened: I experienced senioritis firsthand, our house has been rendered uninhabitable by a kitchen remodel (so we’re living out of suitcases in friends’ spare rooms), and my first year of actually teaching stats has drawn to a close.

It is time to reflect.

My tendency is to flagellate myself about how badly I suck, so (as suggested by Karen E, my brilliant and now former assistant head of school, we’ll miss you, Karen!) let me take a deep breath and report first on what seemed to work. Plenty of time for self-flagellation later.

### Resampling, Randomization, Simulation, and Fathom

The big overarching idea I started with—to approach inferential statistics through resampling à la George Cobb—worked for me and for at least some students. It is not obvious that you can make an entire course for these students with randomization as the background. I mean, doing this is mapping an entirely new path through the material. Is it a good path? I’m not certain, but I still basically believe it is.

To be sure, few of my students got to the point where they automatically chose the right technique at every turn. But they did the right thing a lot, and most important for me, I never had to leave them in some kind of mathematical dust where they made some calculation. For example, (and I may be wrongly proud of this) we got through an entire year of statistics without introducing the Normal distribution. This may seem so heretical to other teachers, it deserves a post of its own. Later. The point here is that no student ever was in a position of calculating NormalCDF-of-something and not understanding what it really meant.

Did they perform randomization tasks and not really understand? Sure. But when they did, they did so “closer to their data,” so they had a better chance to fix that non-understanding. They didn’t rely (for example) on the Central Limit Theorem—which, let’s face it, is a black box—to give them their results.

### Fathom and Technology

Fathom was a huge suggess throughout. It was great to be able to get them all the software and assign homework in Fathom. They enjoyed it, and really became quite adept at using the tool.

One big question was whether they would be able to use the “measures” mechanisms for creating their own simulations. Basically, they can. It’s a big set of skills, so not all of them can do everything we covered, but in general, they understand how to use the software to implement randomization and simulation techniques. This goes hand in glove with actually understanding what these procedures accomplish.

We also became more and more paper-free as the year went on, setting and turning in more and more assignments as pdfs. The “assignment drop box” wasn’t perfect, but it worked well enough.

### Starting SBG

I decided to try standards-based grading, at least some local version of it, in this first year. On reflection, that was pretty gutsy, but why wait? And it worked pretty well. Most importantly, students overwhelmingly approved; the overall comment was basically, “I like knowing what’s expected.” Furthermore—and this may be a function of who the kids were more than anything else, bit I’ll take it—there was hardly any point-grubbing.

It is also satisfying to look over my list of 30-ish standards and see that

• They largely (but not completely) span what I care about.
• They set standards for different types of mastery, ranging from understanding concepts to using the technology to putting together coherent projects.

They need editing, and I need to reflect more about how they interact, but they are a really good start.

### Flipping and Video

At the semester break, I decided to take a stab at “Flipping the Classroom.” This was a big win, at least where I used it most—in giving students exposition about probability.

There is a lot that can go wrong with videos as instruction (the Khan brouhaha is a good example; see this Frank Noschese post for a good summary of one view) and I want to explore this more. But the basic idea really works, and the students recognized it: if it’s something you would lecture about, putting it on the video has two big plusses:

• They can stop and rewind if they don’t get it
• You can do it over til you get it the way you want. No more going back and saying, “when I said x it wasn’t quite right…”

My big worry is that if I assign videos as homework, hoping to clarify and move on in class, that the lazy student may watch, but will blow off thinking, assuming that they can get me to cover it again. I need to figure out a non-punitive way around that problem; or maybe it’s not so bad simply to be able to use class time for the first repetition…

### Some Cool Ideas

Besides these esssentially structural things, I had some frankly terrific ideas during the year. Some I have mentioned before, but let me list just four, just as snippets to remind me what they were; later if I get to it I’ll elaborate:

• Using sand timers and stopwatches to explore variability.
• Going to the nearby freeway overpass to sample cars.
• Using the school’s library catalog to do random sampling.
• Going to the shop to make dice that were not cubes.

There were other curricular successes such as using old material from Data in Depth—particularly the Sonatas—for work during the first semester.

### Wonderful Kids

I can’t say enough about how much I appreciate the students. Again, I could do better at helping create a really positive class culture, but they did pretty damned well on their own. They got along well, took care of each other, exchanged good-natured barbs, were good group members and contributors.

Even the most checked-out seniors, already accepted into college and having reached escape velocity: they may not have worked very hard outside of class, and assignments may have slipped, but in class they were engaged and still learning. And some juniors did strong, strong work that will make writing college recs easy next year.

And I got a couple of those letters—teachers, you know the ones I mean—that make it worth the effort.

So all in all, a good year. Much to improve, yes. But it’s worth savoring what went right.

As you may recall, the “mission statement” for the class is that each student will:

• Learn to make effective and valid arguments using data
• Become a critical consumer of data

Along about the beginning of November, we had been doing some problem sets from The Model Shop, and other activities from Data in Depth, and I had been getting laconic answers like “four” and “0.87%” and really wanted a little more meat. That is, getting the right answer is not the same as making an effective argument, or even telling a decent story. In a flash of d’oh! inspiration I realized that if I wanted it, I should assess it.

But there is a problem with that:  I have been constructing my standards (I’m calling then learning goals) as I go along, and had not figured out how to deal with the larger, mushier issues that are part of making valid and effective arguments using data.

This post is all about an at-least-partially-successful resolution.

### Constructing learning Goals for Larger Pieces of Work

I love the kids in this class partly because they let me get away with, “this is a quiz and I want you to do your best but I really don’t know how to express the learning goals (a.k.a. standards) that go with it. So we’ll take the quiz first, OK? And figure out how to grade it later.” I explained to them what I was after in hand-wavey terms and off they went.

So they took the quiz (described later). Using their responses (and the pathologies therein),  I was able to construct learning goals for this kind of writing, in particular, for the final semester project I alluded to in the last couple of posts. And here they are, quoted, as if they were official or something (we start with Learning Goal 17): Continue reading A Road to Writing

## In it up to here

I know (self), I know, I’m behind in recording what’s going on! Let me just say here that I just handed back the first quiz, the first artifact that fits with SBG, and you know what? It felt pretty good. Everybody nodded in the right places when I explaind that “what’s it out of?” was maybe not the right question, even though a good answer is “4.”

The next quiz is Tuesday—I wanted to follow up immediately with a way for them to see how demonstrating mastery works—and I have just shipped off some practice problems. We’re redoing the material from the first quiz—the first three learniong goals—and adding one learning goal (rates), this one with items written by them in class today as part of the prep.

Meanwhile, it’s Back to School Night, which is traditionally one of my favorite events; I only hope I’ll be awake, as the last few have been Bad Sleep Nights, dominated by my having Dvorak Symphony #7 stuck in my head (only the first 8 or so bars, over and over, so not an interesting intracranial concert) and my revisiting a memo I wrote to the head of school about our evolving mission statement. The mission-statement process got me all het up the way you can get upset by comments you diagree with on political blog posts, when you feel compelled to write some pithy, intelligent, counter-comment. Unlike in the poliblog situation, however, our comments may actually have an effect, so I took the time to write carefully about it, which makes me a good citizen and community member—but the damned thing wouldn’t let me get to sleep.

Some things I need to write about soon:

• The first “Claim” assignment and its revisions. I’m really pleased with it, although I’m not sure what people learn from it.
• The need to say less and less and less. I know this, why do I have such trouble doing it?

Enough for now! Time to go find the food and set up my room!

## Core Standards: Mathematcal Practices

The core standards, increasingly adopted around the country (though sometimes with modification), are not bad, although not nearly as gutsy as the Project 2061 Benchmarks and Standards for science. Besides the lists of skills and examples in the content standards, they include a separate list of “mathematical practices”:

1. Make sense of problems and persevere in solving them.

2. Reason abstractly and quantitatively.

3. Construct viable arguments and critique the reasoning of others.

4. Model with mathematics.

5. Use appropriate tools strategically.

6. Attend to precision.

7. Look for and make use of structure.

8. Look for and express regularity in repeated reasoning.

I like these. It’s a good list. And the core-standards document gives them prominence by listing them first—before the content—on pages 6–8, with a paragraph for each one. Of course, the document is almost 100 pages long, and most of it contains lists of expectations for each grade level and, at high school, for each major topic. So it would be lamentably easy, given the sheer weight of pages, to ignore these and teach to the longer lists.

## Allies in the Search for Standards

I’m optimistic. Since this post last week, I’ve come across more resources thanks to the blogosphere and the amazing AP Stat listserv. Even if you teach “regular” stats like me, you should subscribe.

One of the blogs helping me is Undefined, which I can now find easily thanks to my figuring out how to make a blogroll. In it, you can find an actual draft of a list of stats standards for the first unit. I like the list, but it also helps clarify questions for me. Here are some items on her list:

• Create a histogram
• Create a box plot
• Create a scatter plot
• Sketch a line of best fit
• Use a calculator to find a line of best fit
• Understand correlation coefficients
• Distinguish between a normal and a skewed distribution
• Work well in a cooperative group

## SBG: The Search for Standards Continues

Yesterday I came across a great resource from missCalcul8: an SBG wiki for noobs. (Thanks to yet another blog, The Space Between the Numbers, by Breedeen Murray, for the pointer.) It includes how-tos from some of the luminaries in this field, plus, joy of joys, actual lists of standards so that we can imagine what they’re really talking about.  (She has also just posted a number of frightening skills lists on her own blog.)

For me, well, none of them are in statistics yet, but maybe that’s a place where I can contribute when I make that list.

So I tried to get started. One place to look for statistics standards is in the GAISE materials. That’s Guidelines for Assessment and Instruction in Statistics Education, put out by the American Statistical Association (ASA) and designed to elaborate on the NCTM Standards. These guidelines come in two downloadable pdf books, one for pre-college (that’s us!) and another for postsecondary. In our book, they define three levels, named A, B, and C. These do not correspond to elementary, middle, and secondary; many high-school students (not to mention adults, not to mention me) have not fully mastered the ideas in levels A and B.

## Tyranny of the Center

Tyranny of the Center: a favorite phrase of mine that I keep threatening to write about. Here is a first and brief stab, inspired by my having recently used it in a comment on ThinkThankThunk.

In elementary statistics, you learn about measures of center, especially mean, median, and mode. These are important values; they stand in for the whole set of data and make it easier to deal with, especially when we make comparisons. Are we heavier now than we were 30 years ago? You bet: the average (i.e., mean) weight has gone up. Would you rather live in Shady Glen than Vulture Gulch? Sure, but the median home price is a lot higher.

We often forget, however, that the mean or median, although useful in many ways, does not necessarily reflect individual cases. You could very well find a cheap home in Shady Glen or a skinny person in 2010. Nevertheless, it is true that on average we’re fatter now—so when we picture the situation, we tend to think that everyone is fatter.

One of my goals is to immunize my students against this tendency to assume that all the individuals in a data set are just like some center value; I think it is a good habit of mind to try to look at the whole distribution whenever possible. Let’s look at a couple situations so you can see why I care so much.

## Reality interferes with my planning

I’ve been away for three weeks, sans computer—more on that anon—spending much vacation-and-conference time mulling over what I want to do in this class and fantasizing about The First Day.

Yesterday we returned. I logged in. The list was available; I could see what kids will be in my perfect little class. There are 18 cherubs, juniors and seniors, a few of whom I had in class a couple years ago, most of whom I don’t really know at all. And I know that they will be great, that they are smart, that I can communicate my love of the subject and infect them.

But now, instead of an idealized, pristine version of a progressive student-centered, SBG stats class, my vision had actual students in it. And somehow my ideals started sliding out from under me. I could imagine giving fun, engaging lectures instead of designing explorations; awarding points for showing up and doing homework instead of for mastery of standards; dealing with deadlines and extensions; and generally succumbing to the quick and easy path, sliding off the razor’s edge in the direction of being a stand-and-deliver math teacher.

It’s not the kids. They’re great. But they’re real, and reality somehow sucks me towards what’s comfortable.

Fortunately, I have help. One good source of backbone is in the repeated rants at ThinkThankThunk. I once thought that hammering on the same anvil over and over was bad form, an indication of being enslaved to one good idea. But I was wrong. I appreciate Shawn’s willingness to remind us newbies why we decided to think about doing things differently. And I confess that one of the main reasons I put that link in this post is so that I can find it again when I find myself going over to the dark side.

Another place to find clarity, or at least reality, is at f(t), where in the post of that link, Kate Nowak reminds us how messy it all is. There is no razor’s edge, no clean, perfect educational slam-dunk; we deal with human beings every single time, and that is both a burden and a privilege.

Still. I like being in Fantasyland, where standards-based grading works beautifully from day one, where the students who have been badly treated by math in their past realize that they really can look at the world quantitatively; where they connect the math they thought was meaningless to the real world; and where these students design their own projects and critique one another’s work fairly but kindly, building classwide self-esteem while insisting on an appropriate, deepening level of rigor. Ha.

I guess I know that despite this being, after-all, a best-case scenario, it won’t be perfect. I won’t be. The kids won’t be. But we’ll get parts of all that, and a lot more that I can’t predict, because of the particular alchemy of these 18 kids.

I am so scared.

## SBG: One Dollop of Fear

I should also record what I’m afraid of. Here’s one that keeps coming back:

Suppose I really do try standards-based grading (SBG). I’ve been reading and lurking. It sounds really attractive. But in order to have SBG, you need S. You need a list of standards (or objectives or outcomes, whatever).

There are lists of these things for statistics, but they don’t list everything that I value. So I’m trying to figure out how to write that list. The problem is like—you know when you’ve been dreaming, and you wake up, and for just an instant it’s all there? But the moment you start telling about it, two things happen: the thing you’re not talking about dissolves, and then what you remember is the words, not the underlying ideas?

That’s what I’m afraid will happen about these other, less effable things that I care about.

At somebody’s suggestion (probably Meg’s) I have been keeping a document that’s a brain-dump list of whatever ideas I come up with, and it is purposely unordered. It’s allowed to be redundant. I don’t have to organize it. And so far, it has gone pretty well. I think the guts of a good list are in it.

In addition, I know that I’ll have ideas in the class and in collaboration with the class, and they’ll let me add or modify these standards on the fly.

What kinds of things am I thinking of? besides, you know, content, I want to include stuff we might call “habits of mind” and the like, such as:

Data Goggles. When appropriate, the student spontaneously looks for the data in a situation and does something useful with it (e.g., make a display).

House of Mirrors. The student consciously and explicitly uses multiple perspectives (graphical, tabular, model, formula, etc) to get insight into a situation through its data.

Could these be standards? By writing these down, do I lose the thousand other wisps of aspirations for my students? Will kids point-grub to get high marks on these? Experienced SBG-ers, any advice welcome.