# Describing Bayes Theorem succinctly in words

I'm trying to ensure I understand Bayes Theorem. Can you tell me if this explanation of mine is correct or if it has mistakes? Also, I'm not sure how to write out in words a generalized input question for Prior 2 and for H on E. Here goes:

Imagine reasoners hope to ascertain the probability that a hypothesis, H, is true in light of new belief-altering evidence E. Bayes theorem asks them, before they acquire or consider the new credence-altering evidence E, to assign the probability of H conditional on E. To do so, the reasoners need to first assign unconditional probability to H (one "prior probability" or "prior"), unconditional probability to E (the other "prior probability" or "prior"), and finally a probability to E given H (the "likelihood").

Inputs for Bayesian theorem written in words:

Prior 1 or P(H): What do you believe the probability is that the hypothesis, H, is true?

Prior 2 or P(E): What do you believe ... something

Likelihood or P(E | H): Given H is true, what's the likelihood/probability of E obtaining?

Probability of H conditional on E or P(H | E):

Finally, P(H) is called a prior, P(E) is also called a prior, and P(E | H) is called a likelihood. But what is P(H | E) known as?

Okay it seems like there is a little confusion here, so we'll walk through the full concept. Suppose we have a hypothesis $H$ and evidence $E$. Bayes' theorem tells us that the probability that $H$ is true given $E$ is:

$$P(H | E) = \frac{P(E|H)P(H)}{P(E)}$$

So let's talk about these terms. First, $P(H|E)$. This is called the posterior. This the probability that $H$ is true accounting for the evidence $E$. Next, $P(E|H)$. As you said correctly, this is the likelihood, which answers the question: "what is the probability of observing $E$ given that $H$ is true?" So we assume that $H$ is true and determine how likely it is that $E$ is observed. Next we have $P(H)$, which is the term that we refer to as the prior for $H$. This is the prior probability that you assign to $H$ being true before you see $E$.

These three terms are the most important for Bayes' theorem (we'll get to the denominator in a second); in fact, often times we state Bayes' theorem in its proportional form as:

$$\mathrm{Posterior} \propto \mathrm{Likelihood}\times\mathrm{Prior}$$

Where $\propto$ means "proportional to". Now you probably are asking "What about $P(E)$? Don't we need that?" To calculate the posterior probability exactly, yes. But there are some interesting ways that we can deal with it. As to what it is, $P(E)$ is just the unconditional probability that $E$ is true with no reference to the hypotheses. It is not referred to as a prior typically. The best way to reason about it is to use something called the Law of Total Probability, which lets us write:

$$P(E) = \sum_i P(E | H_i) P(H_i)$$

In words, this is saying that the probability that $E$ is true is the sum over all possible hypotheses, $H_i$, weighted by the probability that each of those hypotheses is true. You'll note that each term in the sum looks just like the numerator, except we have more than one hypothesis. This is important. When we talk about the probability of a hypothesis being true, it doesn't exist independently. It is only considered in comparison to a whole set of other possible hypotheses. Now those other hypotheses have priors associated with them, just as they also have likelihoods associated with them, but once you have specified all the priors for the different hypotheses and evaluate the likelihood, you have $P(E)$. No need to specify it separately. In light of this, we can rewrite Bayes' theorem as (we will say $H_0$ is the hypothesis we care about):

$$P(H_0 | E) = \frac{P(E|H_0)P(H_0)}{\sum_i P(E | H_i) P(H_i)}$$

Now this can get pretty complicated very quickly, thinking about the universe of different possible hypotheses and evaluating priors for them, etc. But we can ground it in a real example, so we'll do that.

Suppose you are presented a bag and you know that there are 3 colored marbles in the bag. The possible colors are 2 red and 1 blue or 2 blue and 1 red. You draw a single marble out of the bag and it is red, what is the probability that the bag contains 2 red/1 blue marbles?

First, let's get some notation:

$$H_0 : \textrm{2 Red, 1 Blue} \qquad H_1 : \textrm{1 Red, 2 Blue}$$

And $E = \textrm{Drew red}$. Using Bayes' we write:

$$P(H_0 | E ) = \frac{P(E | H_0)P(H_0)}{P(E)} = \frac{P(E | H_0)P(H_0)}{P(E | H_0)P(H_0) + P(E | H_1)P(H_1)}$$

Let's start defining these terms, the likelihoods are fairly straightfoward:

$$P(E|H_0) = \frac{2}{3} \qquad P(E|H_1) = \frac{1}{3}$$

Now we need priors. Since we don't know have any reason to believe one hypothesis is more likely than the others, we can assign equal prior weight to each:

$$P(H_0) = P(H_1) = \frac{1}{2}$$

We plug in to get the posterior probability:

$$P(H_0 | E) = \frac{\frac{2}{3}\frac{1}{2}}{\frac{2}{3}\frac{1}{2} + \frac{1}{3}\frac{1}{2}} = \frac{2}{3}$$

This simple example demonstrates a few key principles. We only needed to define priors over our hypotheses, once we did that, we could calculate $P(E)$ without issue. Other than that, we just needed forms for the likelihoods. From there, we just plug in and go.

In summary, Bayes' theorem allows you compute a posterior probability $P(H | E)$ based on a likelihood $P(E|H)$ and a prior $P(H)$. The remaining term $P(E)$, the unconditional probability of observing the evidence, is often calculated from likelihoods and priors that you've previously defined.