From: Raul M. <mo...@ma...> - 2004-03-30 05:59:03
|
> > The set of P(B|A) represents independent probability variables for > > arbitrary B and A. The set of P(A|B) represents probability variables > > which are not independent of each other. P(A|B) is derivable knowing > > P(B|A) and P(B). On Tue, Mar 30, 2004 at 10:50:01AM +1000, Laird Breyer wrote: > You are still supposing that the numbers P(B|A) of interest can be > chosen independently first. But if you do, then the family P(A|B) derived > from them is broken. The only way to ensure the P(A|B) are not broken is > to *not* take the P(B|A) independently. > > It's a circular consistency requirement. You have got to know > something about the P(A|B) before you can choose the P(B|A), and > you've got to know the P(B|A) to verify that the P(A|B) are consistent. > It's a fact of life (or rather, probability theory). We don't have to know P(A|B) to find P(B|A), we just have to know the number of documents in each set which contain the relevant feature. It's true that with this information, and some additional information (the cardinality of the B set), we could find P(A|B), but I don't see that this imposes any kind of circular consistency requirement. More generally, if "knowing something about" a dependent variable meant that some other variable couldn't be independent, then we'd never be able to have any independent variables. It's always the case that when you are able to find the value of an independent variable you know something about any associated dependent variables. -- Raul |