Deprecated: Function add_custom_image_header is deprecated since version 3.4.0! Use add_theme_support( 'custom-header', $args ) instead. in /home/aron/public_html/blog/wp-includes/functions.php on line 6131

Deprecated: Function add_custom_background is deprecated since version 3.4.0! Use add_theme_support( 'custom-background', $args ) instead. in /home/aron/public_html/blog/wp-includes/functions.php on line 6131
{"id":676,"date":"2012-12-23T15:27:36","date_gmt":"2012-12-23T22:27:36","guid":{"rendered":"http:\/\/www.wall.org\/~aron\/blog\/?p=676"},"modified":"2012-12-30T21:09:13","modified_gmt":"2012-12-31T04:09:13","slug":"bayes-theorem","status":"publish","type":"post","link":"http:\/\/www.wall.org\/~aron\/blog\/bayes-theorem\/","title":{"rendered":"Bayes’ Theorem"},"content":{"rendered":"

Today I’d like to talk about<\/em> Bayes’ Theorem, especially since it’s come up in the comments section several times.\u00a0 It’s named after St. Thomas Bayes<\/a> (rhymes with “phase”).\u00a0 It can be used as a general framework for evaluating the probability of some hypothesis about the world, given some evidence, and your background assumptions about the world.<\/p>\n

Let me illustrate it with a specific and very non-original example.\u00a0 The police find the body of someone who was murdered!\u00a0 They find DNA evidence on the murder weapon.\u00a0 So they analyze the DNA and compare it to their list of suspects.\u00a0 They have a huge computer database containing 100,000 people who have previously had run-ins with the law.\u00a0 They find a match!\u00a0 Let’s say that the DNA test only gives a false positive one out of every million (1,000,000) times.<\/p>\n

So the prosecutor hauls the suspect into court.\u00a0 He stands up in front of the jury.\u00a0 “There’s only a one in a million chance that the test is wrong!” he thunders, “so he’s guilty beyond a reasonable doubt; you must convict.”<\/p>\n

The problem here\u2014colloquially known as the prosecutor’s fallacy<\/a>\u2014is a misuse of the concept of conditional probability<\/em>, that is, the probability that something is true given something else.\u00a0 We write the conditional probability as $$P(A\\,|\\,B)$$, the probability that $$A$$ is true if <\/em>it turns out that $$B$$ is true.\u00a0 It turns out that $$P(A\\,|\\,B)$$ is not the same thing in general as $$P(B\\,|\\,A)$$.<\/p>\n

When we say that the rate of false positives is 1 in a million, we mean that $$!P(\\mathrm{DNA\\,match}\\,|\\,\\mathrm{innocent}) = .000001$$(note that I’m writing probabilities as numbers between 0 and 1, rather than as percentages between 0 and 100).\u00a0 However, the probability of guilt given a match is not the same concept: $$!P(\\mathrm{innocent}\\,|\\,\\mathrm{DNA\\,match}) \\neq .000001.$$The reason for this error is easy to see.\u00a0 The police database contains 100,000 names, which is 10% of a million.\u00a0\u00a0 That means that even if all 100,000 people are innocent, the odds are still nearly equal to .1 that some poor sucker on the list is going to have a false positive (it’s slightly less than .1 actually, because sometimes there are multiple false positives, but I’m going to ignore this since it’s a small correction.)<\/p>\n

Suppose that there’s a .5 chance that the guilty person is on the list, and a .5 chance that he isn’t.\u00a0 Then prior <\/em>to doing the DNA test, the probability of a person on the list being guilty is only 1 : 200,000.\u00a0 The positive DNA test makes that person’s guilt a million times more likely, but this only increases the odds to 1,000,000 : 200,000 or 5 : 1.\u00a0 So the suspect is only guilty with 5\/6 probability.\u00a0 That’s not beyond a reasonable doubt.\u00a0 (And that’s before considering the possibility of identical twins and other close relatives…)<\/p>\n

Things would have been quite different if the police had any other specific evidence<\/em> that the suspect is guilty.\u00a0 For example, suppose that the suspect was seen near the scene of the crime 45 minutes before it was committed.\u00a0 Or suppose that the suspect was the murder victim’s boyfriend.\u00a0 Suppose that the prior odds of such a person doing the murder rises to 1 : 100.\u00a0 That’s weak circumstantial evidence.\u00a0 But in conjunction<\/em> with the DNA test, the ratio becomes 1,000,000 : 100, which corresponds to a .9999 probability of guilt.\u00a0 Intuitively, we think that the circumstantial evidence is weak because it could easily be compatible with innocence.\u00a0 But if it has the effect of putting the person into a much smaller pool of potential suspects, then in fact it raises the probability of guilt by many orders of magnitude.\u00a0 Then the DNA evidence clinches the case.<\/p>\n

So you have to be careful when using conditional probabilities.\u00a0 Fortunately, there’s a general rule for how to do it.\u00a0 It’s called Bayes’ Theorem, and I’ve already used it implicitly in the example above.\u00a0 It’s a basic result of probability theory which goes like this: $$!P(H\\,|\\,E) = \\frac{P(H)P(E\\,|\\,H)}{P(E)}.$$The way we read this, is that if we want to know the probability of some hypothesis <\/em>$$H$$ given some evidence $$E$$ which we just observed, we start by asking what was the prior probability <\/em>$$P(H)$$ of the hypothesis before taking data.\u00a0 Then we ask what is the likelihood <\/em>$$P(E\\,|\\,H)$$, if the hypothesis $$H$$ were true, we’d see the evidence $$E$$ that we did.\u00a0 We multiply these two numbers together.<\/p>\n

Finally, we divide by the probability $$P(E)$$ of observing that evidence $$E$$.\u00a0 This just ensures that the probabilities all add up to 1.\u00a0 The rule may seem a little simpler if you think in terms of proability ratios for a complete set of mutually exclusive rival hypotheses $$(H_1,\\,H_2\\,H_3…)$$ for explaining the same evidence $$E$$.\u00a0 The prior probabilities $$P(H_1) + P(H_2) + P(H_3)\\ldots$$ all add up to 1.\u00a0 $$P(E\\,|\\,H)$$ is a number between 0 and 1 which lowers<\/em> the probability of hypotheses depending on how likely they were to predict $$E$$.\u00a0 If $$H_n$$ says that $$E$$ is certain, the probability remains the same; if $$H_n$$ says that $$E$$ is impossible, it lowers the probability of $$H_n$$ to 0; otherwise it is somewhere inbetween.\u00a0 The resulting probabilities add up to less than 1.\u00a0 $$P(E)$$ is just the number you have to divide by to make everything add up to 1 again.<\/p>\n

If you’re comparing two rival hypotheses, $$P(E)$$ doesn’t matter for calculating their relative odds, since it’s the same for both of them.\u00a0 It’s easiest to just compare the probability ratios of the rival hypotheses, because then you don’t have to figure out what $$P(E)$$ is.\u00a0 You can always figure it out at the end by requiring everything to add up to 1.<\/p>\n

For example, let’s say that you have a coin, and you know it’s either fair ($$H_1$$), or a double-header $$H_2$$.\u00a0 Double-headed coins are a lot rarer than regular coins, so maybe you’ll start out thinking that the odds are 1000 : 1 that it’s fair (i.e. $$P(H_2) = 1\/1,001$$).\u00a0 You flip it and get heads.\u00a0 This is twice as likely if it’s a double-header, so the odds ratio drops down to 500 : 1 (i.e. $$P(H_2) = 1\/501$$).\u00a0 A second heads will make it 250 : 1, and a third will make it 125 : 1 (i.e. $$P(H_2) = 1\/126$$).\u00a0 But then you flip a tails and it becomes 1 : 0.<\/p>\n

If that’s still too complicated, here’s an even easier way to think about Bayes’ Theorem.\u00a0 Suppose we imagine <\/em>making a list of every way that the universe could possibly be<\/em>.\u00a0 (Obviously we could never really do this, but at least in some cases we can list every possibility we actually care about, for some particular purpose.)\u00a0 Each of us has a prior<\/em>, which tells us how unlikely each possibility is (essentially, this is a measure of <\/em>how surprised you’d be if that possibility turned out to be true).\u00a0 Now we learn the results of some measurement $$E$$.\u00a0 Since a complete description of the universe should include what $$E$$ is, the likelihood of measuring $$E$$ has to be either 0 or 1<\/em>.\u00a0 Now we simply eliminate all of the possibilities that we’ve ruled out, and rescale the probabilities of all the other possibilities so that the odds add to 1.\u00a0 That’s equivalent to Bayes’ Theorem.<\/p>\n

I would have liked to discuss the philosophical aspects of the Bayesian attitude towards probability theory, but this post is already too long without it!\u00a0 Some other time, maybe.\u00a0 In the meantime, try this old discussion here<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"