## Fidelity versus Clarity

Thinking about yesterday’s post, I was struck with an idea that may be obvious to many readers, and has doubtless been well-explored, but it was new to me (or I had forgotten it) so here I go, writing to help me think and remember:

The post touched on the notion that communication is an important part of data science, and that simplicity aids in communication. Furthermore, simplification is part of modelmaking.

That is, we look at unruly data with a purpose: to understand some phenomenon or to answer a question. And often, the next step is to communicate that understanding or answer to a client, be it the person who is paying us or just ourselves. “Communicating the understanding” means, essentially, encapsulating what we have found out so that we don’t have to go through the entire process again.

So we might boil the data down and make a really cool, elegant visualization. We hold onto that graphic, and carry it with us mentally in order to understand the underlying phenomenon, for example, that graph of mean height by sex and age in order to have an internal idea—a model—for sex differences in human growth.

But every model leaves something out. In this case, we don’t see the spread in heights at each age, and we don’t see the overlap between females and males. So we could go further, and include more data in the graph, but eventually we would get a graph that was so unwieldy that we couldn’t use it to maintain that same ease of understanding. It would require more study every time we needed it. Of course, the appropriate level of detail depends on the context, the stakes, and the audience.

So there’s a tradeoff. As we make our analysis more complex, it becomes more faithful to the original data and to the world, but it also becomes harder to understand.

Which suggests this graphic:

## What’s Modeling Good For?

What’s the purpose of mathematical modeling? The easy answer is something like, to understand the real world. When I look more deeply, however, I see distinct reasons to model—and to model in the classroom. I hope that trying to define these will help me clarify my thinking and shed light on some of the worries I have about how modeling might be portrayed.

(So this is the third in a series on modeling. We began with some definitions, then proceeded to look at “genres” of modeling.)

Let’s look at a few purposes and try to distinguish them. To save the casual reader time, I’ll talk about prediction, finding parameter values, and finding insight. I think the last is the most subtle and the one most likely to be missed or misused by future developers.

Maybe I’ll post more about each of these in detail later, but for now I’ll move quickly and not give extended examples.