Need for </s> (sentence end marker)

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Need for </s> (sentence end marker)

Forum: Speech Recognition Theory

Creator: dovark

Created: 2013-09-28

Updated: 2013-09-28

dovark - 2013-09-28

Hi, I have a question in Section 11.2.2 N-gram language models in SPOKEN LANGUAGE PROCESSING by Huang et al.
On page 553, just above "P(Mary loves that person)" example, authors say that "to make the sum of the probabilities of all strings equal 1, it is necessary to place a distinguished token at the end of the sentence"

Why is that so?

Consider a language where there are only two sentences. "yes no" and "no yes"

S1 = <s> yes no
S2 = <s> no yes

P(S1) = P(yes|<s>)*P(no|yes,<s>)
= Count(<s> yes)/Count(<s>) * Count(<s> yes no)/Count(<s> yes)
= 1/2 * 1/1

P(S2) = P(no|<s>)*P(yes|no,<s>)
= 1/2 * 1/1

There seems no need for </s> to make P(S1)+P(S2) = 1. What am I missing?

Last edit: dovark 2013-09-28

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2013-09-28

What am I missing?

Consider language where there are sentences of different lenght:

<s> no <s> no yes

You need to estimate short sentence probability
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

dovark - 2013-09-28

Thanks I got it.

If
S1 = <s> no
S2 = <s> no yes

P(S1)+P(S2) = 1 + 1*0.5 = 1.5

So we need < /s > at the end so that
P(S1) = 1 x 0.5
P(S2) = 1 x 0.5 x 1
P(S1)+P(S2) = 1

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.