Wiki page http://cmusphinx.sourceforge.net/wiki/sphinx4:standardgrammarformats talks about assigning probabilities to N-grams that are 'Not Listed'. However the formula for computing the the probability of an Not Listed N-gram uses P( word_N | word_{N-1}, word_{N-2}, …., word_1 ), which is the value we are looking to compute since 'word_{N-1}, word_{N-2}, …., word_1,word_N' is the N-gram which is not listed. Could someone please clarify if my intepretation is right? If No, could you please let me know the approach to assign a probability to unseen/Not List N-grams?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello Nickolay, Thank you for the response.
However, thats not exactly what I was looking for.
I'll try to be more explicit this time.
Lets say an lm file has a trigram entry from 'blue house gate'. Is there a way I can compute the probability of detection of an unseen term say 'blue house door'? (assuming that bigram for 'blue house' and unigram for 'door' exists along with backoffs).
If yes, how is it done? Also if we do manage to compute the values would they be representative of the actual speech to text output when the term is spoken?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Is there a way I can compute the probability of detection of an unseen term say 'blue house door'? (assuming that bigram for 'blue house' and unigram for 'door' exists along with backoffs).
There is no such thing as "probability of detection". The probability of unseen trigram (not detection of the trigram, just the probability of trigram) in the language model is computed as a product of probability of bigram and backoff weight of the unigram. P(blue house door) = P(door) * Backoff(blue house).
Also if we do manage to compute the values would they be representative of the actual speech to text output when the term is spoken?
I have no idea what do you mean by "representative" here.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you for clarifying. Its now clear to me how the backoff probability and term probability values are used to compute the probability of unseen terms.
By 'representative' I meant to ask the about the correlation of the probability values and the actual recognition output.
Example: If the probability value for Phrase1(unseen) is X and Phrase2 ( also unseen) is Y and X > Y , does it mean that Phrase1 is more likely to be recognized when spoken as compared to Phrase2? Given that the pronunciations are as per the dictionary and acoustic model.
If no, are there other parameters that come into play to determine the recognition output?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If no, are there other parameters that come into play to determine the recognition output?
There are components that are taken into overall score estimation, not parameters. Parameters are something user changes, this word is not relevant here.
The most important component is acoustic model score, it's way more important than language model score.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was not sure about acoustic scores until I saw your response. I then read a bit(https://docs.google.com/file/d/0B73LgocyHQnfS0g5ZEw1aFNKT2s/edit?pli=1) about it and found that it has more to do with representing the alignment of the pronunciation of test phrase/word with the training data.
I therefore cant understand how this component would be critical when detecting unseen terms. Can you refer me to a resource or link that would elaborate more about acoustic model scores?
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Wiki page http://cmusphinx.sourceforge.net/wiki/sphinx4:standardgrammarformats talks about assigning probabilities to N-grams that are 'Not Listed'. However the formula for computing the the probability of an Not Listed N-gram uses P( word_N | word_{N-1}, word_{N-2}, …., word_1 ), which is the value we are looking to compute since 'word_{N-1}, word_{N-2}, …., word_1,word_N' is the N-gram which is not listed. Could someone please clarify if my intepretation is right? If No, could you please let me know the approach to assign a probability to unseen/Not List N-grams?
Could you please clarify your interpretation first? It's hard to understand you.
For the proper documentation on LM formats it's better to check SRILM toolkit documentation like
http://www.speech.sri.com/projects/srilm/manpages/ngram-format.5.html
Hello Nickolay, Thank you for the response.
However, thats not exactly what I was looking for.
I'll try to be more explicit this time.
Lets say an lm file has a trigram entry from 'blue house gate'. Is there a way I can compute the probability of detection of an unseen term say 'blue house door'? (assuming that bigram for 'blue house' and unigram for 'door' exists along with backoffs).
If yes, how is it done? Also if we do manage to compute the values would they be representative of the actual speech to text output when the term is spoken?
There is no such thing as "probability of detection". The probability of unseen trigram (not detection of the trigram, just the probability of trigram) in the language model is computed as a product of probability of bigram and backoff weight of the unigram. P(blue house door) = P(door) * Backoff(blue house).
I have no idea what do you mean by "representative" here.
Hello Nicholay,
Thank you for clarifying. Its now clear to me how the backoff probability and term probability values are used to compute the probability of unseen terms.
By 'representative' I meant to ask the about the correlation of the probability values and the actual recognition output.
Example: If the probability value for Phrase1(unseen) is X and Phrase2 ( also unseen) is Y and X > Y , does it mean that Phrase1 is more likely to be recognized when spoken as compared to Phrase2? Given that the pronunciations are as per the dictionary and acoustic model.
If no, are there other parameters that come into play to determine the recognition output?
There are components that are taken into overall score estimation, not parameters. Parameters are something user changes, this word is not relevant here.
The most important component is acoustic model score, it's way more important than language model score.
Hello Nicklay,
I was not sure about acoustic scores until I saw your response. I then read a bit(https://docs.google.com/file/d/0B73LgocyHQnfS0g5ZEw1aFNKT2s/edit?pli=1) about it and found that it has more to do with representing the alignment of the pronunciation of test phrase/word with the training data.
I therefore cant understand how this component would be critical when detecting unseen terms. Can you refer me to a resource or link that would elaborate more about acoustic model scores?
Thanks!
This paper is really harmful and unprofessional one, you should not trust it
You can learn a lot from a textbook
http://www.amazon.com/Spoken-Language-Processing-Algorithm-Development/dp/0130226165