What are Bayesian networks and why HTM is a kind of?

Before understanding what are Bayesian networks, we must first understand what is the Bayes theorem.

When I searched on the internet something to explain didactically this theorem, I found this article easy to understand even for those who do not have a mathematical background (I changed the example to the case of cards in a deck which is more intuitive):

Visualizing Bayes’ theorem

One of the easiest ways to understand probabilities is to think of them in terms of Venn Diagrams. You basically have a Universe with all the possible outcomes (of an experiment for instance), and you are interested in some subset of them, namely some event. Say we are studying cards from a deck, so we observe cards and see whether they are kings or not. If we take as our Universe all cards from our deck, then there are two possible outcomes for any particular card: either it is a king or not. We can then split our universe in two events: the event “cards that are kings” (designated as A), and “cards that are not kings” (or ~A). We could build a diagram like this:

So what is the probability that a randomly chosen card is a king? It is just the number of elements in A divided by the number of elements of U (the Universe). We denote the number of elements of A as |A|, and read it the cardinality of A. And define the probability of A, P(A), as:

Since A has 4 elements (one king per hearts, spades, etc) e U has 52 elements (all cards), the probability P(A) can be at most 4/52, ie 7.7%.

Good so far? Okay, let’s add another event. Let's say we want to know what is the probability of we withdraw a spade. If we take the event B to mean “cards that are spades”. We can create another diagram:

So what is the probability that the suit will be spades for a randomly selected card? It would be the number of elements of B (cardinality of B, or |B|) divided by the number of elements of U, we call this P(B), the probability of event B occurring.

Since B has 13 elements (all spades cards in deck) e U has 52 elements (all cards), the probability P(A) can be at most 13/52, ie 25%.

Note that so far, we have treated the two events in isolation. What happens if we put them together?

We can compute the probability of both events occurring (AB is a shorthand for A∩B) in the same way.

But this is where it starts to get interesting. What can we read from the diagram above?

We are dealing with an entire Universe (all cards), the event A (cards that are kings), and the event B (cards that are spades). There is also an overlap now, namely the event AB which we can read as “cards that are both kings and spades”. There is also the event B – AB or “cards that are not kings but are spades”, and the event A – AB or “cards that are kings but are not spades”.

Now, the question we’d like answered is “given that the suit is spades for a randomly selected card, what is the probability that said card also is a king?”. In terms of our Venn diagram, that translates to “given that we are in region B, what is the probability that we are in region AB?” or stated another way “if we make region B our new Universe, what is the probability of A?”. The notation for this is P(A|B) and it is read “the probability of A given B”.

So what is it? Well, it should be:

And if we divide both the numerator and the denominator by |U|:

we can rewrite it using the previously derived equations as:

What we’ve effectively done is change the Universe from U (all cards), to B (cards for which the suit is spades), but we are still dealing with probabilities defined in U.

Now let’s ask the converse question “given that a randomly selected card is a king (event A), what is the probability that the suit is spades for that card (event AB)?”. It’s easy to see that it is:

Since only 1 card in 54 cards can be a king of spades, we have P(AB) = 1. As we know that the possibility of a card is a king is one among four different suits, we have P(A) = 4. Thus, P(B|A) = 1/4, ie 25%.

Now we have everything we need to derive Bayes’ theorem, putting those two equations together we get:

which is to say P(AB) is the same whether you’re looking at it from the point of view of A or B, and finally

Which is Bayes’ theorem.

With this we can ask: if we know that the possibility of a card be a king is 1/4 when we already know the probability of a card to be a spades, then how could we obtain the opposite, ie, what the possibility of we select randomly a card that is a king if we now know that it is spades?

We have that P(B|A) = 1/4, P(A) = 4 and P(B)= 13. Then, P(B|A) = (1/4 * 4) / 13 = 1/13. That is, we have the probability of 1 card among 13 cards be a king, or 7.7%.

I have found that this Venn diagram method lets me re-derive Bayes’ theorem at any time without needing to memorize it. It also makes it easier to apply it.

Reference: http://oscarbonilla.com/2009/05/visualizing-bayes-theorem/

The above theorem is related to the brain since the more inputs this receives, the better the prediction or solution to a problem.

The great thing is that the cortex does not use Bayesian formulas to perform predictions, but it BEHAVES in a Bayesian manner. Thus, the term: Bayesian neural networks.

This is the case when we're playing guessing games with words. As we receive more tips (inputs) about the target word, we come closer to the answer.

If I tell you to guess a word, and give only one tip (input): "mammal". Your range of responses about the correct word will be quite high. However, if I say the next tip: "feline". Your range will decrease dramatically. And finally, if I say "mane”, you certainly will say: "Lion!!".

Hawkins does not even mention the word Bayesian in his book, but in his example on music (predict next musical note), he gives a classic example of Bayesian reasoning that the brain does automatically.

However, the inputs do not need to follow a sequential order, such as a song or similar. Even if the information is scrambled (as in a sequence of eye saccades), yet the cortex can reduce in a masterful way the range of predictions.

This is the case of the hangman game. Where as the letters are populated, the possibilities will be reduced until we get the correct word:

E _ E _ _ _ _ _
E _ E _ _ A _ _
E _ E P _ A _ _
E _ E P _ A _ T
E L E P H A N T

You do not need to force a reasoning, the brain simply does automatically without you noticing.

There are scientists such as Karl Friston who claims that the entire brain can be summarized in a single Bayesian formula. When he says that, I believe that he means that all regions from the lowest to the most abstract are working recursively to get an answer reducing the possibilities as new inputs arrive (http://reverendbayes.wordpress.com/2008/05/29/bayesian-theory-in-new-scientist/).

Think about it on your day-to-day and you will see how bayesian your brain is.

David Ragazzi

Last edit: David Ragazzi 2013-04-14

What are Bayesian networks and why HTM is a kind of?

An open-source implementation of the HTM Cortical Learning Algorithms

Forums

Help