From: Laird B. <lb...@us...> - 2004-03-29 06:08:00
|
On Mar 29 2004, Raul Miller wrote: > > crm114 says: > > Assume Plocal-spam(w) to mean that > > > > Plocal-spam(w) = P({document d contains w}| document d is spam) > > Ok, well, I'm pretty sure that's backwards. I don't think so. If the local probabilities have another interpretation, then the Bayesian chain rule would have to be applied differently. But you can also check the plateau paper again, on p. 2 the expression is P(in class|feature) = P(feat|in class)P(in class) / ( P(feat|class)P(in class) + P(feat|not in class)P(not in class) ) and if you compare with the code in crm114.c, lines 9141 and 9073, you see that P(feat|in class) = Plocal-spam(feat), which is what I'm saying at the top of this email. > Either that, or I don't understand the P(|) notation. Possible. Here's the normal definition: P(A|B) = P(A and B)/P(B), and note that this isn't symmetrical in A and B. > > > left. In words, Plocal-spam(w) means "if a document is spam, then > > what's its probability of containing w?". > > That may be what the documentation says, but examination of the associated > formula shows that Plocal-spam(w) really means "if a document contains w, > what is the probability that it is spam?" > It seems that you've simply misunderstood the notation, which would explain our difficulties to communicate. It looks like you've interpreted P(A|B) to mean something like P(B|A). -- Laird Breyer. |