Hi guys,
It's me again.
I have got a question, I hope you'll be able to enlight me.
I'm using my own langage model. Let's try to put it simple. Imagine my langage model is tidigits. If I try to recognize any text it will always give me a result with tidigits even if the text I'm trying to recognize doesn't have any tidigits in it. I guess that the recognizer gives a probability to the reconized utterance reliability. How could I get this value in order to know whether the recognized utterance is correct or if it is just the best the recognizer could do but the text has nothing to do with tidigits.
It's kind of hard to explain, I hope you understood what I meant.
Best regards.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
look's like I wasn't clear enough. Let me rephrase it.
I'm willing to perform a phone number recognition.
I have noticed that, even if the recorded file isn't composed of a phone number, the recognizer will force the recognition to a phone number. This is normal because my vocabulary contains only tidigits.
I guess that the recognized sentence have a score representing how good the recognition was.
Is it possible then to get that number to discriminate the bad scored recognition?
This way I will avoid to get wrong phone number if the audio file is just some random noise.
I hope this time I have made myself clear, at least I would have done my best.
Thank's for reading me.
Best regards.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2007-06-26
I believe that you want to implement "rejection" -- the recognizer should "reject" the utterance if it is not consistent with the language model, and you assume that the recognition score will allow you to distinguish well-recognized utterances from badly-fitting ones. It is my understanding that the value of the HMM recognition score itself cannot be used to tell a "good" recognition from a "bad one", because it depends on many factors in addition to how well it fits the language model (for example, the length of the utterance, how well it fits the acoustic model, how much noise is in the audio signal, etc.) The score can be used to compare different recognition theories (i.e., is theory A better than theory B?), but not is theory A "good enough"?
One way of implementing rejection is to provide an alternative language model in parallel with your application's language model. This second, rejection model should be general enough to approximate any and all possible utterances. The HMM recognizer will estimate the most probable theory given the combined language model. If that best theory is in the general rejection model rather than in the application's language model, then we reject the utterance, since the no path through the application model was better than the approximate general-utterance rejection model. If you put a score-multiplier in front of the rejection model, you'll have a means of adjusting the balance between these two models.
One thing you can try for a rejection model is a loop of HMMs for all context-indepemdent phones.
cheers,
jerry
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Well, the score is accessible for example in hyp_t structure in sphinx3. But probably it's better to adjust language weight/silence probability and different parameters to make recognition more reliable.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
"it's better to adjust language weight/silence probability and different parameters to make recognition more reliable."
Of course, but what if the langage model is tidigits model, and we ask that model to recognize
a shakespeare text... won't it always give tidigits result?
This is this kind of behaviour I want to avoid.
best regards.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi guys,
It's me again.
I have got a question, I hope you'll be able to enlight me.
I'm using my own langage model. Let's try to put it simple. Imagine my langage model is tidigits. If I try to recognize any text it will always give me a result with tidigits even if the text I'm trying to recognize doesn't have any tidigits in it. I guess that the recognizer gives a probability to the reconized utterance reliability. How could I get this value in order to know whether the recognized utterance is correct or if it is just the best the recognizer could do but the text has nothing to do with tidigits.
It's kind of hard to explain, I hope you understood what I meant.
Best regards.
look's like I wasn't clear enough. Let me rephrase it.
I'm willing to perform a phone number recognition.
I have noticed that, even if the recorded file isn't composed of a phone number, the recognizer will force the recognition to a phone number. This is normal because my vocabulary contains only tidigits.
I guess that the recognized sentence have a score representing how good the recognition was.
Is it possible then to get that number to discriminate the bad scored recognition?
This way I will avoid to get wrong phone number if the audio file is just some random noise.
I hope this time I have made myself clear, at least I would have done my best.
Thank's for reading me.
Best regards.
I believe that you want to implement "rejection" -- the recognizer should "reject" the utterance if it is not consistent with the language model, and you assume that the recognition score will allow you to distinguish well-recognized utterances from badly-fitting ones. It is my understanding that the value of the HMM recognition score itself cannot be used to tell a "good" recognition from a "bad one", because it depends on many factors in addition to how well it fits the language model (for example, the length of the utterance, how well it fits the acoustic model, how much noise is in the audio signal, etc.) The score can be used to compare different recognition theories (i.e., is theory A better than theory B?), but not is theory A "good enough"?
One way of implementing rejection is to provide an alternative language model in parallel with your application's language model. This second, rejection model should be general enough to approximate any and all possible utterances. The HMM recognizer will estimate the most probable theory given the combined language model. If that best theory is in the general rejection model rather than in the application's language model, then we reject the utterance, since the no path through the application model was better than the approximate general-utterance rejection model. If you put a score-multiplier in front of the rejection model, you'll have a means of adjusting the balance between these two models.
One thing you can try for a rejection model is a loop of HMMs for all context-indepemdent phones.
cheers,
jerry
Well, the score is accessible for example in hyp_t structure in sphinx3. But probably it's better to adjust language weight/silence probability and different parameters to make recognition more reliable.
"it's better to adjust language weight/silence probability and different parameters to make recognition more reliable."
Of course, but what if the langage model is tidigits model, and we ask that model to recognize
a shakespeare text... won't it always give tidigits result?
This is this kind of behaviour I want to avoid.
best regards.