Menu

Bayes Rule identities in ARPA

Help
2020-04-28
2020-05-02
  • Doug Bergman

    Doug Bergman - 2020-04-28

    Hello, I'm trying to use a trained model that someone gave me with the ARPA python package. I can't seem to verify any Bayes identities, i.e. P(A | B) = P(B | A) P(A) / P(B).
    I would assume that, say,
    lm.log_p("how are"),
    which means
    log P(are | how),
    gives a result equal to
    log P(how | are) + log P(are) - log P(how)
    entered as
    lm.log_p("are how") + lm.log_p("are") - lm.log_p("how").
    However,
    lm.log_p("how are")
    returns a value of -1.15 whereas
    lm.log_p("are how") + lm.log_p("are") - lm.log_p("how")
    returns a value of -2.84.

    Are my assumptions correct? Am I doing something wrong here?
    Thanks.

     
    • Nickolay V. Shmyrev

      It doesn't work like that lm.log_p("how are") is not really log P(are | how) but more the estimate of probability of seeing both words together in a text corpus, i.e. log P(are | how) + log P(how)

       
      • Nickolay V. Shmyrev

        And even last thing is wrong since order or words is important, you need to adjust this with the prob of "are going after how" vs "how going after are".

         
  • Doug Bergman

    Doug Bergman - 2020-04-30

    Thanks Nickolay. So if I want log P(are | how) I should type lm.log_P("how are") - lm.log_P("how"), correct?

     
  • Doug Bergman

    Doug Bergman - 2020-04-30

    I couldn't find the documentation that explained this. Can you point me to it? Thanks.

     
    • Nickolay V. Shmyrev

      Thanks Nickolay. So if I want log P(are | how) I should type lm.log_P("how are") - lm.log_P("how"), correct?

      No. P("how are") is about the order like I wrote above. So it is not simply P("how" & "are") but more like P("how" & "are" & "are follows how") so log_P("how are") - lm.log_P("how") is P(are | how & "are follows how") not simply P(are | how). There is an extra term that must balance P("how are") and P("are how").

      I couldn't find the documentation that explained this. Can you point me to it? Thanks.

      Not every piece of this world has documentation.

       
  • Doug Bergman

    Doug Bergman - 2020-05-01

    Then I don't quite follow what you are saying.

    First of all, let me simplify by thinking in probability [0,1] space instead of log-probability space. Suppose again that lm is the arpa model object. You said earlier in your clarification that lm.p("how are") is the estimate of probability of seeing both words together in a text corpus, accounting for word order. What is the difference between that and the conditional probability Pr(are | how), i.e. probability of observing "are" in text, given that we have just observed "how" in text?

    Also, what is the mathematical definition of Pr("how" & "are" & "are follows how") if not Pr(are | how) ?

     
    • Nickolay V. Shmyrev

      In probability calculations it is important to properly describe probability spaces. Say you have position1 that would be space A1 and you have next word position 2 that would be space A2. You can write p("how are") = P (are|how) * P(how) and you'll reduce it to P(how|are) * P(are) by Bayes rule but here you need to be careful because in P(how | are) the first word "how" is still from space A1 and the second word "are" is still from the space A2 so you can not really replace it with P("are how"), it is again the same P("how are"), just reformulated.

      I.e. when you do Bayes operation and swap are and how, you can not really do the same with P("how are").

      Bayes theorem is almost religious thing these days, but it is actually more complex than it seems to be.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.