Imagine that I have some prior beliefs about some hypothesis H, encoded as a probability P(H). If I acquire some evidence E, I will update my beliefs to P(H | E). Think of E as finding a picture of Santa in my security camera footage, and H as the hypothesis that “Santa exists”.
Of course, finding no picture is also evidence. Let’s call it notE. The same reasoning that holds for E holds for notE. So if I observe notE, I will update to P(H | notE). The update should be in the opposite direction, that is if finding a picture of Santa would strengthen my belief in Santa, then not finding a picture of Santa should lower my belief in Santa. Crucially, the magnitude of the update will not be the same in either direction, but the sign will be opposite.
We can see this by noticing that the union of E and notE coincides with the whole space and their intersection is empty. Therefore
Next, I can rewrite P(H) as P(H)P(E) + P(H)P(notE) because P(E) + P(notE) = 1. Substituting, I get
Leading to
This last equation says, among other things, that if the update in my belief in Santa upon finding a picture of Santa in my surveillance camera footage is positive, then the update upon not finding a picture should be negative. Just rearrange the terms as follows to see it:
this of course under the assumption that it is possible that no picture of Santa shows up, to avoid a divergence due to P(notE) going to zero.
Some people say that “absence of evidence is not evidence of absence”. Our last equation seems to flat out contradict them (see also here). Yet intuitively this saying seems reasonable. What gives?
Well, if you are not a kid and unless your neighbours like to walk through your lawn dressed up as Santa, P(E) is very small, which makes the magnitude of the update P(H|notE) - P(H) very small in turn. If you actually have to keep a record of your belief in Santa you need to write down P(H) on some kind of medium. This gives you limited precision: there is a smallest non-zero number you can store in a limited number of bits.
Now, given that you hold billions of beliefs in addition to believing in Santa and that your storage space is limited, you will have to decide at some point that a small update in your belief is too small to bother. So P(H|notE) - P(H) may be effectively zero for the practical purposes of your storage system, even if the converse P(H|E) - P(H) is substantial.
For the practical purposes of survival and reproduction, certain beliefs are more important than others. So it is reasonable that we will keep track of those in far greater detail than of others. For the latter, absence of evidence is not evidence of absence is a good enough heuristic.
If this sounds like an argument that it is rational to hold false1 beliefs under certain circumstances, it’s because it is. Can you find any bug in it?
I should say approximate beliefs rather than false; but if I have one bit only, then 1/2 is about 0. The approximation can become very bad if I decide to devote a very limited amount of space to modelling a certain corner of the world.
Possibly relevant: https://johncarlosbaez.wordpress.com/2014/10/30/sensing-and-acting-under-information-constraints/#comments