Menu

Need for </s> (sentence end marker)

dovark
2013-09-28
2013-09-28
  • dovark

    dovark - 2013-09-28

    Hi, I have a question in Section 11.2.2 N-gram language models in SPOKEN LANGUAGE PROCESSING by Huang et al.
    On page 553, just above "P(Mary loves that person)" example, authors say that "to make the sum of the probabilities of all strings equal 1, it is necessary to place a distinguished token at the end of the sentence"

    Why is that so?

    Consider a language where there are only two sentences. "yes no" and "no yes"

    S1 = <s> yes no
    S2 = <s> no yes

    P(S1) = P(yes|<s>)*P(no|yes,<s>)
    = Count(<s> yes)/Count(<s>) * Count(<s> yes no)/Count(<s> yes)
    = 1/2 * 1/1

    P(S2) = P(no|<s>)*P(yes|no,<s>)
    = 1/2 * 1/1

    There seems no need for </s> to make P(S1)+P(S2) = 1. What am I missing?

     

    Last edit: dovark 2013-09-28
  • Nickolay V. Shmyrev

    What am I missing?

    Consider language where there are sentences of different lenght:

    <s> no
    <s> no yes
    

    You need to estimate short sentence probability

     
  • dovark

    dovark - 2013-09-28

    Thanks I got it.

    If
    S1 = <s> no
    S2 = <s> no yes

    P(S1)+P(S2) = 1 + 1*0.5 = 1.5

    So we need < /s > at the end so that
    P(S1) = 1 x 0.5
    P(S2) = 1 x 0.5 x 1
    P(S1)+P(S2) = 1

     

Log in to post a comment.