# Are We Trying to Turn Stats into CS?

A post in the apstat community led to a post in my Fathom-masters blog, an that led here to a policy/philosophy nugget: as we modernize stats education—for example, to simulate phenomena rather than relying on the Normal distribution—to what extent does it turn into computer science?

It’s a problem I faced constantly over the last couple of years. We want students to understand the underlying concepts of statistics, and develop relevant skills. The concepts are more important, but you can’t just do concepts; you need to apply them in situations, sometimes subtle ones. and to do that takes some actual chops.

• Traditionally, those have included things like knowing whether the Normal approximation is appropriate, calculating relevant cumulative densities, deciding whether the situation meets various conditions for applying this or that procedure, and actually carrying that procedure out.
• In my class, which is deeply based on simulation and randomization, it has involved a different set of skills, including simulating the situation, designing a relevant measure of an effect, collecting a sampling distribution, and comparing a test statistic to that distribution.

Despite some feelings of failure and inadequacy about my own performance, I have recovered from last year enough to believe once again that the latter approach is better. But there’s a barrier to implementation I’d like to discuss, and it comes from that phrase, “simulating the situation.”

This is because, in order to simulate a variety of situations, students need to do what amounts to computer programming, or at least begin to think in a CS-y way. In Fathom, you write measures and collect them. Sometimes it’s not obvious, as in the problem alluded to above, what your measure should be, or what you should collect it from. You need to be aware of the various functions you have available, and sometimes use them in novel ways. You need to be an ingenious architect, and design solutions, not simply choose an appropriate strategy or test.

As a stats teacher, even one how loves the technology, I worry about the prospect of students having to do that. Stats is hard enough, I say to myself, without forcing kids to be programmers. Fathom, after all, works hard to make the programming as easy as possible; it usually just amounts to learning how to tell the computer to find something like a difference of means. And I have taken the position, over the years, that really learning how to do the fancy stuff is for a few geeks like me; I can then write Fifty Fathoms, encapsulating all the sneaky bits so that everyone can experience, for example, how sample size affects the spread of the sampling distribution, without having to build it all themselves.

It’s OK, the reasoning goes, to drive a car without knowing how to build it.

Now I’m not so sure.

And rather than try to make that entire argument here, watch this space for more bits of it. But here are a few things that are making me rethink:

• At a conference I attended last month (July 2012) an international group of stats educators essentially agreed without discussion that the future of stats education was in randomization and simulation. We had all been around the track a few times on the issue, and it “felt” decided.
• Developments in technology and Big (or even medium-sized) data are making a lot of traditional stats procedures less relevant. For example, in many situations where we used to infer from a sample, we now have access to the entire population. So we need new tools and techniques for making decisions based on population data. Right out of the gate, that promotes effect size and its brethren, and makes us want to create new and better visualizations. But what else? We don’t know because we’ve been talking for years about pulling 10 widgets an hour off the assembly line—while now they just measure all the widgets using an optical scanner.
• Talk about Data Science is heating up, and in the formulations I’ve seen, this includes a lot of stuff that has the heady aroma of computer science. It includes understanding data structures, planning for archival, and, of course, programming, in order to get the data into the shape that you need it. One can make the argument that the truly sexy skill out there is not statistics as currently taught, but rather data science.

Of course, there’s a balance to be struck. We won’t be having high-school kids all learn R anytime soon. But I do think we should look critically at what we teach in light of the huge availability of data and figure out what skills students would really benefit from, and when.

## Author: Tim Erickson

Math-science ed freelancer and sometime math and science teacher. Currently working on various projects.