## Flowers! Phi! Codap!

Okay, something very short, with thanks to Avery Pickford: How do sunflowers organize their seeds? Why is phi the most irrational number? How are these two questions connected, and how can we model that in CODAP?

Here is the YouTube video from Numberphile that inspired it. Worth a watch.

And here is a CODAP document:
https://codap.concord.org/releases/latest/static/dg/en/cert/index.html#shared=149141

## Weather Models and Matrices

Ack! I don’t have time to do justice to this right now, but any readers need to know if you don’t already that the geniuses at Desmos seem to be making a matrix calculator: https://www.desmos.com/matrix.

Having read that, you might rightly say, I can’t get to everything in my curriculum as it is, why are you bringing up matrices? (You might also say, Tim, I thought you were a data guy, what does this have to do with data?)

Let me address that first question (and forget the second): I’m about to go do a week of inservice in a district that, for reasons known only to them, have put matrices in their learning goals for high-school math. Their goal seems to be to learn procedures for using matrices to solve systems of linear equations.

I look at that and think, surely there are more interesting things to do with matrices. And there are!

Continue reading Weather Models and Matrices

## Sometimes, articles get done

Back in 2017, I gave a talk in which I spoke of “data moves.” These are things we do to data in order to analyze data. They’re all pretty obvious, though some are more cognitively demanding than others. They range from things like filtering (i.e., looking at a subset of the data) to joining (making a relationship between two datasets). The bee in my bonnet was that it seemed to me that in many cases, instructors might think that these should not be taught because they are not part of the curriculum—either because they are too simple and obvious or too complex and beyond-the-scope. I claimed (and still claim) that they’re important and that we should pay attention to them, acknowledge them when they come up, and occasionally even name them to students and reflect explicitly on how useful they are.

Of course there’s a great deal more to say. And because of that I wrote, with my co-PI’s, an actual, academic, peer-reviewed article—a “position paper”; this is not research—describing data moves. Any of you familiar with the vagaries of academic publishing know what a winding road that can be. But at last, here it is:

Erickson, T., Wilkerson, M., Finzer, W., & Reichsman, F. (2019). Data Moves. Technology Innovations in Statistics Education, 12(1). Retrieved from https://escholarship.org/uc/item/0mg8m7g6.

Then, in the same week, a guest blog post by Bill Finzer and me got published. Or dropped, or whatever. It’s about using CODAP to introduce some data science concepts. It even includes figures that are dynamic and interactive. Check out the post, but stay for the whole blog, it’s pretty interesting:

https://teachdatascience.com/codap/

Whew.

## Ping Pong Ball Bounce Redux

Long ago (2007) Bryan Cooley and I wrote a set of physics labs; in one of them we had students bounce a ping-pong ball. You know the sound; it’s like this:

For the lab, we had students record the sound at 1000 points per second using a Vernier microphone. Using the resulting data, students could identify the times of the “pocks” and then see how the times between the pocks — the “interpock intervals” — decreased exponentially. This is a cool take on the old Algebra 2/Precalculus activity about bouncing balls where you measure drop heights; using sound and the technology, you can get more bounces and more accuracy.

A typical graph of the sound looks like this:

And a graph of the interpock intervals looks like this:

Continue reading Ping Pong Ball Bounce Redux

## Don’t Expect the Expected Value

One day, over 50 years ago, we were visiting Lake Tahoe as a family, and dad went across the border to play keno. He came back elated: he had hit seven out of eight on one of his tickets, and won eleven hundred dollars. He proudly laid out fifty twenties and two fifties on the kitchen table. It was a magnificent sight.

The details of keno are unimportant here, except to note that keno is not a game of skill. Of course the house has an edge. In the long run, you will lose money playing keno no matter how you do it. Even my dad, who over the years has played a lot of keno, and won even bigger payouts, would probably admit that he might have a net lifetime loss.

So why do people play? There are lots of reasons, I’m sure, but one of them must be connected to that heartwarming anecdote: fifty years later, I remember the event clearly, as one of joy and wonder.

Let’s explore that using roulette, which is much simpler than keno. A roulette wheel has 18 red and 18 black numbered slots, plus a smaller number of green slots (often two). You can make many different bets, but we will stick with red and black. If you place a \$1 bet on red, and it comes up red, you get \$2 back (winning \$1); if it comes up black or green, you lose your dollar.

## Data Moves and Simplification

or, What I should have emphasized more at NCTM

I’m just back from NCTM 2018 in Washington DC where I gave a brief workshop that introduced ideas in data science education and the use of CODAP to a very nice group in a room that—well, NCTM and the Marriott Marquis were doing their best, but we really need a different way of doing technology at these big conferences.

Anyway: at the end of a fairly wide-ranging presentation in which my main goal was for participants to get their hands dirty—get into the data, get a feel for the tools, have data science on their radar—it was inevitable that I would feel:

• that I talked too much; and
• that there were important things I should have said.

Sigh. Let’s address the latter. Here is a take-away I wish I had set up better:

In data science, things are often too complicated. So one step is to simplify things; and some data moves, by their nature, simplify.

Complication is related to being awash in data (see this post…); it can come from the sheer quantity of data as well as things like being multivariate or otherwise just containing a lot of stuff we’re not interested in right now.

To cut through that complication, we often filter or summarize, and to do those, we often group. To give some examples, I will look again at the data that appeared in the cards metaphor post, but with a different slant.

Here we go: NHANES data on height, age, and sex. At the end of the process, we will see this graph:

And the graph tells a compelling story: boys and girls are roughly the same height—OK, girls are a little taller at ages 10–12—but starting at about age 13, girls’ heights level off, while the boys continue growing for about two more years.

We arrived at this after a bunch of analysis. But how did we start?

## Data Moves: the cards metaphor

In the Data Science Games project, we started talking, early, about what we called data moves. We weren’t quite sure what they were exactly, but we recognized some when we did them.

In CODAP, for example (like in Fathom), there is this thing we learn to do where you select points in one graph and, since when you select data in CODAP, the data are selected everywhere, the same points are selected in all other graphs—and you can see patterns that were otherwise hidden.

You can use that same selection idea to hide the selected or unselected points, thereby filtering the data that you’re seeing. Anyway, that felt like a data move, a tool in our data toolbox. We could imagine pointing them out to students as a frequently-useful action to take.

I’ve mentioned the idea in a couple of posts because it seemed to me that data moves were characteristic of data science, or at least the proto-data-science that we have been trying to do: we use data moves to make sense of rich data where things can get confusing; we use data moves to help when we are awash in data. In traditional intro stats, you don’t need data moves because you generally are given exactly the data you need.

## Trees. And. Diagnosis. (Live!)

I’ve been invited to give a webinar about our work on trees; it will include material from the previous two posts.

Here’s the blurb:

Data, Decisions, and Trees

We often say that we want to make decisions “based on data.” What does that really mean? We’ll look at a simple approach to data-based decisionmaking using a representation we might not use every day: the tree. In this webinar, you’ll use data to make trees, and then use the trees to diagnose diseases.

On the surface, trees are very simple. But for some reason — perhaps because we’re less familiar with using trees — people (and by that we mean us) have more trouble than we expect. Anticipate having a couple of “wait a second, let me think about this!” moments.

## Trees. And. Diagnosis. (Part two)

Last time we introduced decision trees and a tool we’ve made to explore them. With that tool, embedded in a simple game (Arbor), you can generate data from alien creatures with a simulated malady, figure out its predictors, and make a decision tree that will let you automate its diagnosis. (Here is the link to that not-quite-game.)

Your job was to get through the diseases ague and botulosis. Today I want to reflect on those two scenarios.

## Ague

Ague is ridiculously simple, and with that ridiculous simplicity, the user is supposed to be able to learn the basics of the game, that is, how to “drive” the tools. One way to figure out the disease is to sort the table by health and see what matches health. Here is what the sorted table looks like:

Just scanning the various columns, you can see that health is associated with hair color.  Pink means sick, blue means well. With that insight, you can go on to diagnose individual creatures and then make a simple tree, which looks like this:

Although there is a lot of information in the tree, users can generally figure it out. If they (or you) have trouble, they can get additional information by hovering over the boxes or the links.

## Trees. And. Diagnosis. (Part one)

(This is part one. Link to part two.)

In the Data Science Games project, we have recently been exploring decision trees. It’s been great fun, and it’s time to post about it so you dear readers (all three or so of you) can play as well. There is even a working online not-quite-game you can play, and its URL will probably endure even as the software gets upgraded, so in a year it might even still work.

Here’s the genesis of all this: my German colleague Laura Martignon has been doing research on trees and learning, related to work by Gerd Gigerenzer at the Harding Center for Risk Literacy. A typical context is that of a doctor making a diagnosis. The doctor asks a series of questions; each question gets a binary, yes-no answer, which leads either to a diagnosis or a further question. The diagnosis could be either positive (the doc thinks you have the disease) or negative (the doc thinks you don’t).

The risk comes in because the doctor might be wrong. The diagnosis could be a false positive or a false negative. Furthermore, these two forms of failure are generally not equivalent.

Anyway, you can represent the sequence of questions as a decision tree, a kind of flowchart to follow as you diagnose a patient. And it’s a special kind of tree: all branchings are binary—there are always two choices—and all of the ends—the leaves, the “terminal nodes”—are one of two types: positive or negative.

The task is to design the tree. There are fancy ways (such as CART and Random Forest) to do this automatically using machine learning techniques. These techniques use a “training set”—a collection of cases where you know the correct diagnosis—to produce the tree according to some optimization criteria (such as how bad false positives and false negatives are relative to one another).  So it’s a data science thing.

But in data science education, a question arises: what if you don’t really understand what a tree is? How can you learn?

That’s where our game comes in. It lets you build trees by hand, starting with simple situations. Your trees will not in general be optimal, but that’s not the point. You get to mess around with the tree and see how well it works on the training set, using whatever criteria you like to judge the tree. Then, in the game, you can let the tree diagnose a fresh set of cases and see how it does.

That’s enough for now. Your job is to play around with the tool. It will look like this to start:

The first few scenarios are designed so that it’s possible to make perfect diagnoses. No false positives, no false negatives. So it’s all about logic, and not about risk or statistics. But even that much is really interesting. As you mess around, think about the representation, and how amazingly hard it can be to think about what’s going on.

There are instructions on the left in the tan-colored “tile” labeled ArborWorkshop. Start with those. There is also a help panel in the tree tile on the right. It may not be up to date. All of the software is under development.

The first disease scenario, ague, is very simple. The next one, botulosis, is almost as simple, and worth reflecting on. That will happen soon, I hope after you have tried it.

Note: if you are unfamiliar with this platform, CODAP, go to the link, then to the “hamburger” menu. Upper left. Choose New. Then Open Document or Browse Examples. Then Getting Started with CODAP. That should be enough for now.