Re: [Crm114-discuss] Re: continuation of current model in the crm114

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Mar 29 2004, Raul Miller wrote:

> > crm114 says: 
> > Assume Plocal-spam(w) to mean that
> > > > Plocal-spam(w) = P({document d contains w}| document d is spam)
> 
> Ok, well, I'm pretty sure that's backwards.

I don't think so. If the local probabilities have another
interpretation, then the Bayesian chain rule would have to be applied
differently. 

But you can also check the plateau paper again, on p. 2 the
expression is 

P(in class|feature) = P(feat|in class)P(in class) /
     ( P(feat|class)P(in class) + P(feat|not in class)P(not in class) )

and if you compare with the code in crm114.c, lines 9141 and 9073, 
you see that

P(feat|in class) = Plocal-spam(feat),

which is what I'm saying at the top of this email.

> Either that, or I don't understand the P(|) notation.

Possible. Here's the normal definition: P(A|B) = P(A and B)/P(B), and
note that this isn't symmetrical in A and B.

> 
> > left. In words, Plocal-spam(w) means "if a document is spam, then
> > what's its probability of containing w?". 
> 
> That may be what the documentation says, but examination of the associated
> formula shows that Plocal-spam(w) really means "if a document contains w,
> what is the probability that it is spam?"
> 

It seems that you've simply misunderstood the notation, 
which would explain our difficulties to communicate. It looks like
you've interpreted P(A|B) to mean something like P(B|A).

-- 
Laird Breyer.