Last time, I introduced a quest—it’s time I learned more about Bayesian inference—and admitted how hard some of it is. I wrote,

The minute I take it out of context, or even very far from the ability to look at the picture, I get amazingly flummoxed by the abstraction. I mean,

just doesn’t roll of the tongue. I have to look it up in a way that I never have to with Pythagoras, or the quadratic formula, or rules of logs (except for changing bases, which feels

exactlylike this), or equations in kinematics.

Which prompted this comment from gasstationwithoutpumps:

I find it easiest just to keep coming back to the definition of conditional probability P(A|B) = P(A & B) / P(B). There is no complexity here…(and more)

Which is true, of course. But for this post I’d like to focus on the intuition, not the math. That is, I’m a mathy-sciencey person learning something new, trying to record myself in the act of learning it. And here’s this bump in the road: What’s up with my having so much trouble with a pretty simple formula? (And what can I learn about what my own students are going through?)

So: let’s take that definition and write it here:

(Equation 1)

How do I make sense of Equation 1? I know from experience that I need to make (on paper or just in my head) a diagram just like a Fathom ribbon plot. This is the same as the “area chart” I use when I teach (and do) probability. It’s a one-by-one box:

With the picture, for me, is *obviously* the size of the purple box divided by the whole blue column. That is,

(Equation 1 again)

* But*—when I look at it just on its own, I have no intuition about whether it’s right or not. To check it, I feel as if I need to do a character-by-character interpretation, like a first-grader learning to read using phonics: “the probability of

*A*given

*B*is…” and then I’m still not sure.

This is in stark contrast to a whole slew of other symbolic situations. For example, if we’re talking about a right triangle and I see , I know very quickly that *m* is the hypotenuse. I don’t have to stop and parse it. It’s familiar. It sings Pythagoras by its very look. Logarithms don’t paralyze me. Calculus and I get along: if I need to apply the chain rule, I don’t get flummoxed; I just keep track of things and take the appropriate derivatives. I can cope with linear algebra, the interstellar medium, recursion, and object-oriented design.

Why are those situations so different (for me; right now) from the one in Equation 1? Why do I still need to draw the picture?

- Perhaps it’s just a matter of practice, though I’ve worked with conditional probability a lot already. If that’s the case, and I persist, it will become as natural for me as it is for gasstation.
- I wonder if it’s something deeper, something fundamentally hard about conditional probability that Gigerenzer, Martignon, and others are getting at with their research about natural frequencies (and in the natural-frequency contexts, I do not have this problem).
- Then there’s the dark fear that, although I’m fine with a lot of abstract symbolic representations, there is a limit and this is it for me; I’m Bayes-disabled.

Then there is a broader implication: if I really want to learn about Bayesian inference—and I’m also interested in the question that increasingly comes up, namely, given the problems with the Frequentist approach, should we consider teaching Bayesian methods to beginners?—and this is only step one; in order to really do it, we’re going to have to do exactly as gasstation says in the rest of his comment—

[combine] prior beliefs (which are distributions over parameters of models) with likelihood models and data to get posterior distributions over parameters of models, then [integrate] everything to get mean estimates of interesting functions.

If we have to do all that, and I have no intuition about the formula, is there any hope?

## Hope from the Topology Incident

I’m heartened by a memory from maybe 25 years ago.

I was working for EQUALS, a professional- and curriculum-development outfit at the Lawrence Hall of Science in Berkeley. We did workshops for teachers to try to help them open up their mathematics offerings, especially in ways that would attract girls to math.

One of the workshop features was “stations,” places around the room where students could do math things—solve problems, work on puzzles, and so forth, rooted in different areas of math. The idea was that some students might find some aspect of math they were good at; or at least get practice in a math-related skill (e.g., spatial visualization) that would turn out to be important later but was missing from their traditional curriculum.

I was debriefing a “stations” session in Portland. This involved eliciting from participants what the math was in this or that station, and why it was important. We came to some ring-and-loop puzzles. You know the kind: metal shapes are interlocked in a way that looks inextricable, and you’re supposed to extricate them.

They asked me what the math was. I should have turned it back to the group, of course, but I screwed up and answered the question. I said, these are topology puzzles; topology is a branch of math that’s about, among other things, how things are connected. But (and here I stepped off the pier) in my experience, I had a problem with using these puzzles because solving them seemed to be more about luck than about actual problem-solving. You mess around with the equipment, and maybe see a way to twist a piece in a way you hadn’t before, and you suddenly see—or feel—how it fits though this hole. Which is great, and a good experience, but I’d rather be able to apply some strategy.

At which point a participant took me—kindly—to task. “My dad is a topologist,” she said, “and we had these puzzles around all the time when we were kids. There *are* strategies.” And went on to describe some.

So I hope it will come with practice.

I have to get a lot of bioinformatics students to the point where they understand the rudiments of Bayesian inference (luckily there is a real Bayesian statistician to do the heavy lifting in a subsequent course, so I only need to do the rudiments). Some of them may be getting stuck where you are, so I’d like to learn how you get unstuck, in the hopes that I can help students with similar problems.

Are you reading P(A|B) as the “probability of A given B”? Having the verbal notion “given that B is true” helps some people assign meaning to the formula, though then they find Bayes rule rather magical, since “A given B” can be related to “B given A”, which doesn’t make sense if you think of “given that” as meaning causality of some sort. (This is the same problem that a lot of students have with contrapositives and other logical manipulations, of confusing inference with causality.)

The picture you drew helps: the “given B” part is restricting the cases you are interested in to just the column where B is true. The contingency table (which is what you have represented pictorially) is an excellent model here—why do you feel that it is a crutch and that you have to understand the formula without the model?

Thank you for that drawing. I’ve had the opposite struggle from you: I read the formula P(A|B) = P(A)P(B|A)/P(B) as “the probability of A in the world of B is equal to the probability of A chopped down (I.e., intersected–so multiplied) to A is the world of B, normalized (I.e., divided) so that B is the whole universe.” This reading of the formula helps me intuitively understand and remember it, but I haven’t had much luck drawing my intuition to explain it to others. I’m teaching statistics for the first time this year, so I’m very glad I saw your area model!