Re: [Crm114-discuss] Re: continuation of current model in the crm114

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

> > The set of P(B|A) represents independent probability variables for
> > arbitrary B and A.  The set of P(A|B) represents probability variables
> > which are not independent of each other.  P(A|B) is derivable knowing
> > P(B|A) and P(B).

On Tue, Mar 30, 2004 at 10:50:01AM +1000, Laird Breyer wrote:
> You are still supposing that the numbers P(B|A) of interest can be
> chosen independently first. But if you do, then the family P(A|B) derived
> from them is broken. The only way to ensure the P(A|B) are not broken is
> to *not* take the P(B|A) independently. 
>
> It's a circular consistency requirement. You have got to know
> something about the P(A|B) before you can choose the P(B|A), and
> you've got to know the P(B|A) to verify that the P(A|B) are consistent.
> It's a fact of life (or rather, probability theory).

We don't have to know P(A|B) to find P(B|A), we just have to know 
the number of documents in each set which contain the relevant feature.

It's true that with this information, and some additional information
(the cardinality of the B set), we could find P(A|B), but I don't see
that this imposes any kind of circular consistency requirement.

More generally, if "knowing something about" a dependent variable meant
that some other variable couldn't be independent, then we'd never be able
to have any independent variables.  It's always the case that when you
are able to find the value of an independent variable you know something
about any associated dependent variables.

-- 
Raul