Ack! I don’t have time to do justice to this right now, but any readers need to know if you don’t already that the geniuses at Desmos seem to be making a matrix calculator: https://www.desmos.com/matrix.
Having read that, you might rightly say, I can’t get to everything in my curriculum as it is, why are you bringing up matrices? (You might also say, Tim, I thought you were a data guy, what does this have to do with data?)
Let me address that first question (and forget the second): I’m about to go do a week of inservice in a district that, for reasons known only to them, have put matrices in their learning goals for high-school math. Their goal seems to be to learn procedures for using matrices to solve systems of linear equations.
I look at that and think, surely there are more interesting things to do with matrices. And there are!
One is to use matrices as transformation operators, like doing reflections and rotations using 2×2 matrices in the plane. Just using 0 and ±1, you can do a lot of great stuff. You can even introduce symmetry groups. But not in this post.
Because you can also use them to model the weather, and that’s what we’ll do, using the Desmos tool to help with calculation.
A very simple model
Suppose we say, this is California, mostly it doesn’t rain. So we’ll make a model of the weather that goes like this: every (simulated) day, you roll a die. If you roll 6, it rains. Otherwise it’s sunny.
This will have some long-term behavior that kinda-sorta resembles real weather but also really doesn’t. We can ask, what aspects of the weather this algorithm generates are realistic, and what are not?
For one thing, 1/6 is probably not the true fraction of days with precipitation. (If you want to find out the real proportion, you could play with my recent portal to weather data. See? I am a data guy!) We could adjust the model so the probability was right. We won’t do that, though, because I want to stick with dice.
We could point out that “sunny or rainy” is too coarse. For example, we might want temperature, wind, amount of precipitation. Agreed. But that’s not where I’m going either.
Instead, we might note that the weather tends to be streaky. That’s another way of saying that the weather one day is not independent of the weather on another day: if it’s sunny today, there is a big chance that it will be sunny tomorrow—bigger than if it’s raining today.
So we will make a model that takes today’s weather into account when predicting tomorrow’s weather.
A Markov model
Here’s the model we will study:
If today is sunny, roll a die: if it’s a 6, tomorrow will be rainy, otherwise it’s still sunny.
If today is rainy, roll a die: if it’s a 5 or 6, tomorrow will be rainy again, otherwise, it will be sunny.
We can now give the class a task: everybody model a month. Do 30 days, starting with a sunny day. We can ask all sorts of what-do-you-notice questions. We can assess the streakiness of sunny or rainy days, and so forth. We can muse about how to change the model to make it more like Seattle, or whether we should have different models in different seasons, or what our model should really depend on.
But let’s focus on this question: in this model, on average, what fraction of the days are rainy?
It should be clear that it ought to be more than 1/6, because of the rule for rainy days.
Do the data bear this out? Sure: especially when we have a class where everybody has done 30 days. We aggregate the data and see (for a fictitious class of 20, 600 days run in CODAP) that we have 479 sunny days and 121 rainy days, for a ratio of almost exactly 4:1, or 20% rainy days. About one-fifth. Which is more than one-sixth, as expected. Could it be that the “correct” probability is exactly one-fifth?
You might already be objecting: “but you have to start with either rain or sun—that will bias your results.” Not as much as you might think! More on this later.
How could we figure this out theoretically?
There is an amazingly elegant way that I will let you find. My theory about this is that we often approach problems first in an inefficient way, and only later will we see the cool, elegant solution—often inspired by the result we get inelegantly.
One such inelegant way is to start making a tree diagram. I will spare you this. Instead, I’ll modify that tree diagram and follow Gerd Gigerenzer and his colleagues in using natural frequencies.
Imagine that we have six situations (six, because dice), each starting with a sunny day. Of those six, on average, five will be sunny the next day and one will be rainy. That is, the six equally-likely outcomes—assuming we start sunny—are SS, SS, SS, SS, SS, and SR.
To do the next level, we have to start with 36 sunny days and let them progress; we get six each of our results from the previous paragraph. That’s 30 SS and 6 SR.
Of the 30 days that began SS, 5/6 (or 25) will still be sunny: SSS
The other 5 will be SSR
Of the six that began SR, four will be sunny: SRS
And the other two are still rainy: SRR.
If we look just at the most recent day, the sun:rain ratio is 29:7, or just under 1/5.
Finally, using matrices
At this point, we could move on to the next level, the third new day, perhaps starting with 216 days; or we could do this again starting with a rainy day, or we could count up all the days instead of just the last—but I want to get to the matrices while I have the time.
We can use matrices to do the calculation. Suppose we represent the initial sunny state with a vector like this:
That one in the top is for sunny and the zero is for rainy.
I can make a matrix that evolves that state to the next day. I think figuring out that matrix would be a great challenge for students, but I haven’t done that part with actual students yet, so I will just write it here as a matrix A:
If we multiply that matrix A by a vector representing sunny and rainy days, the result is the expected values after one more day. That is, the matrix “evolves” the state to the next day. If we start with 6 simulations, each beginning with a sunny day, that initial “state” is [6 0], and we get:
That is, after one day, if you have 6 sunny starts, the weather will (on average) be 5 sunny and one rainy.
We can project it two days into the future by (aha!) squaring the matrix A. As before, we’ll start with 36 sunny days:
And we get the 29:7 ratio we got by hand.
Then we realize that we don’t have to use natural frequencies any more. We can just start with [1 0] and use any exponent we like. Our result will be decimal proportions:
Wow. After 10 iterations (actually, sooner) we converge to that 4:1 ratio. One-fifth of the days will be rainy in this model.
Notice two things: that last statement, as a conclusion to be drawn from the calculation, is pretty sophisticated! Students will need a while to grok it. (What does that [0.8 0.2] vector really represent? Why A to the tenth?)
Second, after only three iterations, we’re pretty close. This means that if we started rainy, with [0 1] (or any 2-vector whose components add to one), we’d be pretty close pretty quickly too. That is, the initial state doesn’t matter much if you let it run for a couple of days.
That’s enough for now. We’ll see how the teachers like it next week. I’m encouraged by the fact that Desmos is helping do all the calculation, but also because with only 3 matrix multiplications, you get close anyway. Besides, with 2x2s, I don’t feel bad making them do it with pencil and paper (maybe with calculator help)!
Finally, I think it is a very interesting, and different, way of thinking about how matrices can be useful—beyond solving systems of equations. This also leads to the kinds of procedures they use to do search-engine ranking, and I do a lot more searches on a typical day than I solve systems of linear equations…
Oh: the theoretical answer for this model really is that the expected fraction of rainy days is exactly 1/5. How can you get the fraction 1/5 out of six-sided dice?? See if you can figure out an elegant way to explain it. Don’t give it away in the comments.