What’s the purpose of mathematical modeling? The easy answer is something like, *to understand the real world*. When I look more deeply, however, I see distinct reasons to model—and to model in the classroom. I hope that trying to define these will help me clarify my thinking and shed light on some of the worries I have about how modeling might be portrayed.

(So this is the third in a series on modeling. We began with some definitions, then proceeded to look at “genres” of modeling.)

Let’s look at a few purposes and try to distinguish them. To save the casual reader time, I’ll talk about *prediction*, *finding parameter values*, and *finding insight*. I think the last is the most subtle and the one most likely to be missed or misused by future developers.

Maybe I’ll post more about each of these in detail later, but for now I’ll move quickly and not give extended examples.

## Prediction

You can use models to predict values for data that don’t yet exist. For example:

- if you have mileage and arrival times for Amtrak’s
*Coast Starlight*for the stations between Seattle and Portland, you could extrapolate to predict the arrival time at Los Angeles. - if you have flipped a paper cup 20 times and it landed on its side 16 times, you could predict how many times it will land on its side in 100 flips.

The first uses a function and data; the second uses a probability model. In these examples, we can push for an answer that’s not a specific, single answer, but rather a *range* of answers. I mean, the obvious answer to the cup question is 80, but should we accept it as final? It does use a model—the idea of simple proportionality. But if the cup-tossing is a random process, we should be able to ask students at the high-school level to recognize the consequences of that and use a probability model: one that recognizes variability. It doesn’t have to be very sophisticated (e.g., decide it’s likely to be binomial and use binomial probabilities to compute a confidence interval); it can be completely informal or students can get a plausible range from simulation.

Likewise, since the train data do not lie on a straight line, we won’t know exactly when it will get to Los Angeles. But we could find a range of arrival times that makes sense.

Finally, although the word *prediction* implies looking into the future, it doesn’t have to be that way.

Anyhow, this sort of modeling is easy to understand. Furthermore, the very question—*when will the train arrive in LA?*—is a good assessment prompt.

We can imagine, for different genres of modeling, what doing well would entail.

For the trains, you need to make a scatter plot, fit a line, and use the equation for the line to find a specific value for LA; then vary the line in a plausible range to get a range of answers. In fitting the line, you have to decide intelligently about constraints: for example, should you force the line to go though the Seattle point?

In the case of cup tossing, if you use simulation, how do you do it? How do you collect the results for repeated trials? How many trials do you collect? What probability do you use? How do you turn the distribution you get into a range? There are multiple sensible answers to all of these questions. Exploring them gives students rich tasks and gives teachers more assessment opportunities.

## Parameter Estimation

In the train question above, another interesting question is, *how fast does the train go*? (I wish there were a better name for this than *parameter estimation*. It makes my eyes glaze over and worry about confidence intervals.)

This is different (I claim) from the prediction of a particular value: it’s asking for the value of a meaningful parameter in the model—in this case, the slope. And again, I hope we can teach students to report uncertainty in the answer. Here variability arises partly from uncertainty in the model—different plausible lines have different slopes, so we don’t know the average speed precisely—and partly from the fact that the train’s average speed from station to station varies throughout the trip.

(Using published Amtrak schedules, i.e., ignoring the fact that the Starlight is always really late, you can discover that the train goes slower at night.)

Parameter estimation is also closely allied to science-y things. We’re trying to infer something important about the system based on a relationship. This might be clearer given a different example.

Suppose we tie a weight to a string and swing it. We take data about the period of this pendulum. We measure 10 swings for various lengths of string, and plot the period as a function of length.

A “prediction” task might ask for the period of a 1-meter pendulum, or to find the length that would give a one-second period (famously used in year one of IMP).

A “parameter” task might ask us (if we knew the official physics equation for the period) to find the acceleration of gravity. I go though this process (and more) in this video from the Den of Inquiry physics lab project:

(There is another version of this video that uses Logger Pro instead of Fathom. It’s here.)

This type of modeling seems to be to be, in general, more sophisticated than simple prediction-modeling. You need to have more of an inferential bent, and start using deeper logic. You have to think subjunctively: *if it were true that the period follows this square-root relationship, and the acceleration of gravity were 978 cm/sec ^{2}, would I get data that looked like this?*

And of course it requires that students be comfortable enough with functions that they can imagine functions having different parameters. Students also need to see how parameters can have meaning in functions outside the function itself.

Assessing this kind of modeling is harder. We can give students raw data (or string, a weight, and a stopwatch) and ask, *what is the acceleration of gravity*? but what kind of scaffolding will they need? And how much help can we give them before it ceases to be a modeling task?

(In probability-model land, this kind of modeling might be exactly what it’s called: parameter estimation. Does that mean that every confidence interval is the result of modeling? Maybe, but pedagogically, I don’t think it should count. Let’s leave that quandary open so we can move on.)

## Aside: Finding the Relationship

Along with simple prediction and parameter inference, students also model in order to find the relationship. I’m not sure where this fits: is it an essential part of the other two, or something on its own? But using the paragraphs activity as an example, students could find the formula (*h* = *k*/*w*) without ever caring that *k* is the area.

Would that be modeling? Sure. (In fact, asking students to “find the area” as the central question in the paragraphs activity ruins it.)

Finding the relationship—as opposed to estimating a parameter value—may even be the main point. In a data-and-function situation, it’s fair to say that we want students to find the relationship, estimate the parameters, find meanings for as many parameters as possible, and be able to predict ranges for specific values for any conditions where the model applies.

But that’s a lot to ask. So on the road to mastering that set of skills, students may well predict without finding parameter values, or estimate parameters without knowing their meaning.

## Modeling for Insight

Making a geometrical model (as in the peanut-butter cup example from the last post) seems like a good way to use a model to get insight into a problem. But I think you can get insight from data and functions as well.

The very first time I did the paragraphs activity with students (in Gretchen Davis’s AP Stats class, back in the Pleistocene), the students saw the graph, and I asked, what sort of function do you think has the same shape as this data?

They immediately chorused, “exponential decreasing!” And set to work with their TI-80-somethings to find the function. They were good data analysts, so they were dissatisfied with how poorly their exponentials fit.

What does that mean? I asked.

Maybe it’s not a decreasing exponential, they said.

So: what other functions might have the shape you see?

They set to work again trying different things, and somebody hit on *height* = *k*/*width* and discovered that it worked really well.

I got to ask the payoff question: Is there any particular reason why that form of the function—that inverse relationship—works so well in this situation, while the exponential and the shifted parabola did not?

Wait time wait time wait time. Brief discussion in groups. Finally, a light-bulb went off in one girl’s head, and she reported, almost gleefully, that since the paragraphs have the same text and the same font, they have the same area.

What does that have to do with the form of the function? I asked.

And she explained.

I think this is a good example of a situation where doing modeling—in this case, modeling data with a function—gives you *insight* into the situation.

Frequently, it’s insight you may think you should have had without ever doing the modeling. Math teachers I do this activity with will sometimes dope-slap themselves that they didn’t see it from the beginning. But I think that’s a mistake.

First of all, there’s no shame in not noticing something. Secondly, the great part of an investigation is that you don’t know what’s going on at first, and then at some point the light-bulb turns on—and that moment is so rewarding we want to make it happen to ourselves and others as often as possible. Noticing the inverse relationship form the start spoils it, in a way. Dan Meyer fans might say it eliminates perplexity.

More importantly for why modeling should be in the *math* curriculum, modeling is an alternative approach to a problem that can help us see features of a solution—features that may help us understand the underlying mathematics.

And this is not just a pedagogical trick: practicing scientists and the big dogs of pure mathematics use it too. (In-family example: my brother Dan is working on the “happy end[ing]” problem, and was just telling me how he realized he could eliminate thousands of cases simply by plotting points at random and seeing how they were related.)

Here’s another example so that the Noble Reader can have the insight experience: You know that breaking-the-spaghetti problem? The question is, if you take a piece of (dried) spaghetti and break it randomly in two places, making three pieces, what’s the probability that the resulting pieces can form the edges of a triangle?

Use a modeling approach: simulate it. The empirical fraction you get is close to a ratio of small integers. Assume for a moment that it’s the answer. Make graphs of your (many) results. Use them to get the insight you need to solve the problem exactly.

I could never figure that one out until I modeled.

Three observations to close:

First, a low-*n* anecdotal study: when I have done some of these activities with practicing teachers, the teachers with more pure-mathy tendencies tend not to see the value of measurement and data collection. They immediately see that the paragraphs relationship should be inversely proportional; they might decide to measure one paragraph to get the area and just plot the curve.

This misses two benefits: the help that a modeling approach might give to a student who doesn’t have the insight to see the consequences of the constant area; and the opportunity to deal with the fact that the area is *not* in fact constant: the constant-area model is a simplification, a shortcut you can use *because you have insight*. But if you measure all the paragraphs, you see how much the “effective” area of the paragraphs vary, and can study the next layer, namely, how this area depends on the width.

Second closing observation: assessing getting insight is harder than assessing your ability to use modeling to predict, find a relationship, or estimate a parameter. So I worry that this function of modeling will get left out of implementation.

Finally, notice that this “insight” aspect of modeling applies to pure math as well as to the real world. So: is that application really modeling? It feels like it to me. If so, it suggests that the real-world part of the modeling definition may not be necessary.

Pingback: The Modeling Cycle (and troubles therein) | A Best-Case Scenario

Pingback: A Best-Case Scenario

Pingback: Comments on the California Modeling Appendix | A Best-Case Scenario

Amtrak is claiming a 91% on-time performance for the Coast Starlight in the past year http://www.amtrak.com/coast-starlight-train&mode=perf&overrideDefaultTemplate=OTPPageVerticalRouteOverview

Of course, I don’t know their definition of “on time”, but train fan web sites do suggest that they are much better about the Starlight schedule than a decade ago.