How can you be awash in data? Let me count the ways.

Three.

I oversimplify, of course, but this is what I’m thinking about; and this came as a result of attending an advisory meeting about a cool project called Data Clubs. And as usual for this blog, we are using CODAP.

Elsewhere (especially in Awash in Data) I have talked about being awash because there are so many cases. When there are a lot of rows in the table, when you make a graph, there may be so many dots that you can’t see the pattern. One remedy to try is to use data moves such as filtering or grouping-and-summarizing (making aggregates), both of which reduce how many dots appear in your plots.

Another kind of awash happens when you have a lot of attributes. You have trouble figuring out what you even want to plot because there are so many choices. Sometimes you want to essentially filter columns in the table, or, alternatively, find a way to combine multiple columns into one. We developed a remedy for the former in work on the Writing Data Stories project in the form of a CODAP plug-in called Choosy that lets you hide columns easily. For the latter, write a formula to combine many columns into one—and then use Choosy to hide the many. (I call this a calculation data move, mutate for those of you in the tidyverse.)

But today, I saw something I had seen before, but never in quite this way—a third way of being awash.

We were looking at this dataset from the National Health Interview Survey. It’s about injuries sustained at leisure or during sports. And here is a typical kind of CODAP graph some students might make:

Type of injury (10 categories) by Location at time of injury (16 categories).
Type of injury (10 categories) by Location at time of injury (16 categories). Sheesh.

I kind of hate this kind of graph. It’s really hard to see anything in it, and it’s hard to use it to help make an argument. There are just too many cells and too many labels to read. Why? Because it’s two categoricals, and each one has a bunch of categories. What’s worse, some novice students seem to like these graphs a lot, maybe because they look very official and complicated. Michelle set me straight about part of this, though: although making sense of the numbers of dots in the cells is a fool’s errand, it’s interesting to see patterns in which cells have anything and which have nothing. So I don’t hate them anymore. They’re not useless.

Still, looking at it, I’m certainly awash, but in this new way: not from too many cases (rows) or attributes (columns), but from too many categories.

We don’t have great tools for this. One possibility is to do a (different) calculation data move, recoding the data into fewer categories (here is an example from Awash, so I guess I have thought about this, just not this way). But recoding is not always a good idea, and it may be a bridge too far for the middle-school students who are the target population for the Data Clubs project.

So I write this to remind me about this insight, and maybe inspire someone to figure out other ways to cope with a surfeit of categories.

Flowers! Phi! Codap!

Okay, something very short, with thanks to Avery Pickford: How do sunflowers organize their seeds? Why is phi the most irrational number? How are these two questions connected, and how can we model that in CODAP?

Here is the YouTube video from Numberphile that inspired it. Worth a watch.

And here is a CODAP document:
 https://codap.concord.org/releases/latest/static/dg/en/cert/index.html#shared=149141

Weather Models Reflection

Last time I described an idea about how to use matrices to study simple weather models. Really simple weather models; in fact, the model we used was a two-state Markov system. And like all good simple models, it was interesting enough and at the same time inaccurate enough to give us some meat to chew on.

I used it as one session in a teacher institute I just helped present (October 2019), where “matrices” was the topic we were given for the five-day, 40-contact-hour event. Neither my (excellent!) co-presenter Paola Castillo nor I would normally have subjected teachers to that amount of time, and we would never have spent that much time on that topic. But we were at the mercy of people at a higher pay grade, and the teachers, whom we adore, were great and gamely stuck with us.

One purpose I had in doing this session was to show a cool use for matrices that had nothing to do with solving systems of linear equations (which is the main use they have in their textbook).

Some takeaways:

  • Just running the model and recording data was fun and very important. Teachers were unfamiliar with the underlying idea, and although a few immediately “got it,” others needed time just to experience it.
  • Making the connection between the randomness in the Markov model and thinking about natural frequencies did not appear to cause any problem. I suspect that this was not an indication of understanding, but rather a symptom of their not having had enough time with it to realize that they had a right to be confused.
  • The diagram of the model was confusing.

Let’s take the last bullet first. The model looked like this:

Our two-state Markov weather model. Use one die to update today’s weather to tomorrow’s.
Continue reading Weather Models Reflection

Weather Models and Matrices

Ack! I don’t have time to do justice to this right now, but any readers need to know if you don’t already that the geniuses at Desmos seem to be making a matrix calculator: https://www.desmos.com/matrix.

Having read that, you might rightly say, I can’t get to everything in my curriculum as it is, why are you bringing up matrices? (You might also say, Tim, I thought you were a data guy, what does this have to do with data?)

Let me address that first question (and forget the second): I’m about to go do a week of inservice in a district that, for reasons known only to them, have put matrices in their learning goals for high-school math. Their goal seems to be to learn procedures for using matrices to solve systems of linear equations.

I look at that and think, surely there are more interesting things to do with matrices. And there are!

Continue reading Weather Models and Matrices

Sometimes, articles get done

Back in 2017, I gave a talk in which I spoke of “data moves.” These are things we do to data in order to analyze data. They’re all pretty obvious, though some are more cognitively demanding than others. They range from things like filtering (i.e., looking at a subset of the data) to joining (making a relationship between two datasets). The bee in my bonnet was that it seemed to me that in many cases, instructors might think that these should not be taught because they are not part of the curriculum—either because they are too simple and obvious or too complex and beyond-the-scope. I claimed (and still claim) that they’re important and that we should pay attention to them, acknowledge them when they come up, and occasionally even name them to students and reflect explicitly on how useful they are.

Of course there’s a great deal more to say. And because of that I wrote, with my co-PI’s, an actual, academic, peer-reviewed article—a “position paper”; this is not research—describing data moves. Any of you familiar with the vagaries of academic publishing know what a winding road that can be. But at last, here it is:

Erickson, T., Wilkerson, M., Finzer, W., & Reichsman, F. (2019). Data Moves. Technology Innovations in Statistics Education, 12(1). Retrieved from https://escholarship.org/uc/item/0mg8m7g6.

Then, in the same week, a guest blog post by Bill Finzer and me got published. Or dropped, or whatever. It’s about using CODAP to introduce some data science concepts. It even includes figures that are dynamic and interactive. Check out the post, but stay for the whole blog, it’s pretty interesting:

https://teachdatascience.com/codap/

Whew.

When research questions don’t make sense: use claims!

I need to write up this Very Small Thought in order to get it off my to-do list. The basic thesis is: when we ask students to do rich, open-ended projects, we often insist that they write “research questions.” Sometimes this is a terrible idea.

Don’t get me wrong: asking students to come up with research questions can be important. Many frameworks for how science works have “formulate a research question” as an early step. Furthermore, when you grow up, some grant proposal RFPs insist that you specify your research questions.

Continue reading When research questions don’t make sense: use claims!

Ping Pong Ball Bounce Redux

Long ago (2007) Bryan Cooley and I wrote a set of physics labs; in one of them we had students bounce a ping-pong ball. You know the sound; it’s like this:

Ping-pong ball bouncing on my kitchen counter

For the lab, we had students record the sound at 1000 points per second using a Vernier microphone. Using the resulting data, students could identify the times of the “pocks” and then see how the times between the pocks — the “interpock intervals” — decreased exponentially. This is a cool take on the old Algebra 2/Precalculus activity about bouncing balls where you measure drop heights; using sound and the technology, you can get more bounces and more accuracy.

A typical graph of the sound looks like this:

Graph of the sound from the audio above. In CODAP. Time in milliseconds.

And a graph of the interpock intervals looks like this:

Continue reading Ping Pong Ball Bounce Redux

Data Moves with CO2

The concentration of CO2 in the atmosphere is rising, and we have good data on that from, among other sources, atmospheric measurements that have been taken near the summit of Mauna Loa, in Hawaii, for decades.

Here is a link to monthly data through September 2018, as a CODAP document. There’s a clear upward trend.

CO2 concentration (mole fraction, parts per million) as a function of time, here represented as a “decimal year.”

Each of the 726 dots in the graph represents the average value for one month of data.

What do we have to do—what data moves can we make—to make better sense of the data? One thing that any beginning stats person might do is to fit a line to the data. I won’t do that here, but you can imagine what happens: the data curve upward, so the line is a poor model, but the positive slope of the line (about 1.5, which is in ppm per year) is a useful average rate of increase over the interval we’re looking at. You could consider fitting a curve, or a sequence of line segments, but we won’t do that either.

Instead, let’s point out that the swath of points is wide. There are lots of overlapping points. We should zoom in and see if there is a pattern.

Continue reading Data Moves with CO2

A ruler: what … were they thinking??

A few years ago, when I was visiting my (very) old friend Charlie up in Washington, he gave me what has become a treasured possession.

It looks like a normal, old, wooden ruler. A foot long with inches on one side and centimeters (going the other direction, of course) on the other.

But feast your eyes on it. If you don’t see the problem immediately, that’s normal. Just relax. Take your time. And wonder: how did this happen?

Bogus Ruler (from CLB)

Please don’t give it away in the comments. Be sly 🙂

Don’t Expect the Expected Value

One day, over 50 years ago, we were visiting Lake Tahoe as a family, and dad went across the border to play keno. He came back elated: he had hit seven out of eight on one of his tickets, and won eleven hundred dollars. He proudly laid out fifty twenties and two fifties on the kitchen table. It was a magnificent sight.

The details of keno are unimportant here, except to note that keno is not a game of skill. Of course the house has an edge. In the long run, you will lose money playing keno no matter how you do it. Even my dad, who over the years has played a lot of keno, and won even bigger payouts, would probably admit that he might have a net lifetime loss.

So why do people play? There are lots of reasons, I’m sure, but one of them must be connected to that heartwarming anecdote: fifty years later, I remember the event clearly, as one of joy and wonder.

Let’s explore that using roulette, which is much simpler than keno. A roulette wheel has 18 red and 18 black numbered slots, plus a smaller number of green slots (often two). You can make many different bets, but we will stick with red and black. If you place a $1 bet on red, and it comes up red, you get $2 back (winning $1); if it comes up black or green, you lose your dollar.

Continue reading Don’t Expect the Expected Value