Hello, I'm trying to use a trained model that someone gave me with the ARPA python package. I can't seem to verify any Bayes identities, i.e. P(A | B) = P(B | A) P(A) / P(B).
I would assume that, say, lm.log_p("how are"),
which means
log P(are | how),
gives a result equal to
log P(how | are) + log P(are) - log P(how)
entered as lm.log_p("are how") + lm.log_p("are") - lm.log_p("how").
However, lm.log_p("how are")
returns a value of -1.15 whereas lm.log_p("are how") + lm.log_p("are") - lm.log_p("how")
returns a value of -2.84.
Are my assumptions correct? Am I doing something wrong here?
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It doesn't work like that lm.log_p("how are") is not really log P(are | how) but more the estimate of probability of seeing both words together in a text corpus, i.e. log P(are | how) + log P(how)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
And even last thing is wrong since order or words is important, you need to adjust this with the prob of "are going after how" vs "how going after are".
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks Nickolay. So if I want log P(are | how) I should type lm.log_P("how are") - lm.log_P("how"), correct?
No. P("how are") is about the order like I wrote above. So it is not simply P("how" & "are") but more like P("how" & "are" & "are follows how") so log_P("how are") - lm.log_P("how") is P(are | how & "are follows how") not simply P(are | how). There is an extra term that must balance P("how are") and P("are how").
I couldn't find the documentation that explained this. Can you point me to it? Thanks.
Not every piece of this world has documentation.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
First of all, let me simplify by thinking in probability [0,1] space instead of log-probability space. Suppose again that lm is the arpa model object. You said earlier in your clarification that lm.p("how are") is the estimate of probability of seeing both words together in a text corpus, accounting for word order. What is the difference between that and the conditional probability Pr(are | how), i.e. probability of observing "are" in text, given that we have just observed "how" in text?
Also, what is the mathematical definition of Pr("how" & "are" & "are follows how") if not Pr(are | how) ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In probability calculations it is important to properly describe probability spaces. Say you have position1 that would be space A1 and you have next word position 2 that would be space A2. You can write p("how are") = P (are|how) * P(how) and you'll reduce it to P(how|are) * P(are) by Bayes rule but here you need to be careful because in P(how | are) the first word "how" is still from space A1 and the second word "are" is still from the space A2 so you can not really replace it with P("are how"), it is again the same P("how are"), just reformulated.
I.e. when you do Bayes operation and swap are and how, you can not really do the same with P("how are").
Bayes theorem is almost religious thing these days, but it is actually more complex than it seems to be.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello, I'm trying to use a trained model that someone gave me with the ARPA python package. I can't seem to verify any Bayes identities, i.e. P(A | B) = P(B | A) P(A) / P(B).
I would assume that, say,
lm.log_p("how are")
,which means
log P(are | how),
gives a result equal to
log P(how | are) + log P(are) - log P(how)
entered as
lm.log_p("are how") + lm.log_p("are") - lm.log_p("how")
.However,
lm.log_p("how are")
returns a value of -1.15 whereas
lm.log_p("are how") + lm.log_p("are") - lm.log_p("how")
returns a value of -2.84.
Are my assumptions correct? Am I doing something wrong here?
Thanks.
It doesn't work like that
lm.log_p("how are")
is not reallylog P(are | how)
but more the estimate of probability of seeing both words together in a text corpus, i.e.log P(are | how) + log P(how)
And even last thing is wrong since order or words is important, you need to adjust this with the prob of "are going after how" vs "how going after are".
Thanks Nickolay. So if I want
log P(are | how)
I should typelm.log_P("how are") - lm.log_P("how")
, correct?I couldn't find the documentation that explained this. Can you point me to it? Thanks.
No.
P("how are")
is about the order like I wrote above. So it is not simplyP("how" & "are")
but more likeP("how" & "are" & "are follows how")
solog_P("how are") - lm.log_P("how")
isP(are | how & "are follows how")
not simplyP(are | how)
. There is an extra term that must balance P("how are") and P("are how").Not every piece of this world has documentation.
Then I don't quite follow what you are saying.
First of all, let me simplify by thinking in probability [0,1] space instead of log-probability space. Suppose again that
lm
is the arpa model object. You said earlier in your clarification thatlm.p("how are")
is the estimate of probability of seeing both words together in a text corpus, accounting for word order. What is the difference between that and the conditional probabilityPr(are | how)
, i.e. probability of observing "are" in text, given that we have just observed "how" in text?Also, what is the mathematical definition of
Pr("how" & "are" & "are follows how")
if notPr(are | how)
?In probability calculations it is important to properly describe probability spaces. Say you have position1 that would be space A1 and you have next word position 2 that would be space A2. You can write p("how are") = P (are|how) * P(how) and you'll reduce it to P(how|are) * P(are) by Bayes rule but here you need to be careful because in P(how | are) the first word "how" is still from space A1 and the second word "are" is still from the space A2 so you can not really replace it with P("are how"), it is again the same P("how are"), just reformulated.
I.e. when you do Bayes operation and swap are and how, you can not really do the same with P("how are").
Bayes theorem is almost religious thing these days, but it is actually more complex than it seems to be.