A Closet Bayesian

At least that’s how I’ve described myself, but it’s a weak sort of Bayesianism because I’ve never really learned how to do Bayesian inference.

It’s time that chapter came to a close. So, with luck, this is the first in a series of posts (all tagged with “Bayes”) in which I finally try to learn how to do Bayesian inference—and report on what happens, especially, on what is confusing.

Bayesian rumors I’ve heard

Let’s begin with what I know—or think I know—already.

  • Bayesian inference is an alternative to “frequentist” inference, which is the usual kind, the kind with P-values and confidence intervals.
  • It has its roots in Bayes’s Theorem, which I more or less understand, and which lets us turn conditional probability situations sideways.
  • Bayesians, though typically smug, do make intriguing claims. This includes a claim that Bayesian analysis lets you answer the questions you really want to ask, such as, “what’s the probability that this drug is better than eating more kale?” (as opposed to the frequentist, “If this drug were the same as the kale—the null hypothesis—what’s the probability that we would see a difference this extreme?”).
  • Bayesians start with a prior—a distribution of what you think is possible. The prior is a collection of beliefs about parameters of the situation, made explicit. Then things happen, and you get data. With the data, they convert the prior into a posterior—a new distribution that’s based on both the prior and the data.
  • It’s becoming more important that I understand this stuff. Bayesian analysis is becoming hotter, more mainstream. Drew has lent me books. Even my own work with Laura Martignon turns out to have Bayesian implications. At the recent ICOTS, it made much more of an appearance than before.
  • Even so, official practicing Bayesians could not give me clear answers to what I thought were obvious questions. (I’m hoping I can work out the answers as part of this effort.)

I’m aided in this quest by some books:

  • Allen B Downey. 2012. Think Bayes: Bayesian Statistics Made Simple. Green Tea Press. Free pdf!
  • John K Kruschke. 2011. KruschkeCoverDoing Bayesian Data Analysis: A tutorial with R and BUGS. Academic Press. Has very cute puppies on the cover.
  • Colin Howson and Peter Urbach. 2006. Scientific Reasoning: a Bayesian approach. Open Court.

The first two of these start out very well, and are extremely friendly and sensible. Downey, for me, is particularly inspiring. The third, at least at first glance, is much scarier.

Speaking of Scary

I want to make the big admission up front. Yes, I have known Bayes’s Theorem for a long time. At least I know where to look it up. And I’ve kind of always known what it was about. But I think I have an obligation to really understand and internalize this theorem in order to proceed. It should be as clear as Pythagoras to me.

And I’m having trouble.

I’m not a total loss, though. I can draw the picture, with icons or with a Fathom ribbon chart. For example, this illustration shows a situation in Fairyland: There are princesses (in the pink dresses) and mermaids (blue tails). If you’re a princess, the chance that you have a crown is about 65%. Only 35% if you’re a mermaid. But there are about 3 times as many mermaids as princesses; so many, in fact, that if you have a crown, the chance that you’re a mermaid is greater than 50%.

MeerPrinzExample(Clicking the image will take you to the Flash-based simulation. This is part of my work with some very smart Germans.)

This fairy-tale situation is analogous to disease-testing contexts you might have used in class—how, for example, if you get a positive result on an HIV test, the chance that you have the disease might still be relatively small.

So we have taken three known, easily-understood probabilities—the sliders in that screen shot—and made a representation in which it’s easy to see and understand the Bayesian result, that P( Mermaid | Crown ) > 0.5.

So let’s express all this more quantitatively and symbolically.

P(Mermaid\land Crown) = P(Mermaid) P(Crown \mid Mermaid)

In this case, we have numbers. So P(Mermaid \cap Crown) = 0.75 * 0.35 = 0.2625.

The probability of getting a crown at all is the sum of the two cases, Mermaid and Princess:

P(Crown) = P(Mermaid\land Crown) + P(Princess\land Crown)

P(Crown)=P(Mermaid)P(Crown \mid Mermaid)+P(Princess)P(Crown \mid Princess)

(With our numbers, this is 0.75 * 0.35 + 0.25 * 0.65, or 0.425. This is the marginal overall probability that a fairy creature—mermaid or princess—has a crown.)

We can express the first equation the other way:

P(Crown\land Mermaid) = P(Crown) P(Mermaid \mid Crown)

But that’s the same as the first quantity, so:

P(Mermaid) P(Crown \mid Mermaid) = P(Crown) P(Mermaid \mid Crown)

Now we can isolate the thing we want to know:

\displaystyle P(Mermaid \mid Crown) = \frac{P(Mermaid) P(Crown \mid Mermaid)}{P(Crown)} \\= \frac{0.2625}{0.425} \approx 0.62

Notice that this is the probability that it’s “Mermaid and Crown” divided by the probability that it’s a Crown, which makes total sense.

So what’s the problem?

Basically, this is way too complicated. The minute I take it out of context, or even very far from the ability to look at the picture, I get amazingly flummoxed by the abstraction. I mean,

\displaystyle P(A \mid B) = \frac{P(A)P(B \mid A)}{P(B)}

just doesn’t roll of the tongue. I have to look it up in a way that I never have to with Pythagoras, or the quadratic formula, or rules of logs (except for changing bases, which feels exactly like this), or equations in kinematics.

So: Is there actually a better way to look at it so that the abstractions make more intuitive sense? Or do I just have to get used to it? This may, after all, be one of those places that abstraction is for: giving us the tools to keep track of things when they don’t make intuitive sense, so that we can grind through to the end and then see how the result applies to reality.

Stay tuned.

Author: Tim Erickson

Math-science ed freelancer and sometime math and science teacher. Currently working on various projects.

5 thoughts on “A Closet Bayesian”

  1. I find it easiest just to keep coming back to the definition of conditional probability P(A|B) = P(A & B) / P(B). There is no complexity here—where Bayesian stuff gets complicated is in combining prior beliefs (which are distributions over parameters of models) with likelihood models and data to get posterior distributions over parameters of models, then integrating everything to get mean estimates of interesting functions.

  2. I’m coming from a similar place as you are except with less experience. I’m really looking forward to the rest of the posts as you convince yourself both how and why to do the Bayesian thing, as I now like to call it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: