Hi,
So in Sphinx4, we have monophones and triphones. And triphones may be defined up to 4 times since the triphone position in taken into consideration to build the search graph. For exemple if we have the triphone a-b+c at the beginning of the word to recognize, we're definitively going to use the [b a c b] where the last b stands for beginning. Although the changes between the states of the similar triphones are quiet minor, changes are changes.
I'm thinking about merging similar triphone definitions into one. Does anybody know how will this affect the recognition results ?
Last edit: Karim BEN ALAYA 2017-07-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For the Pocketsphinx case, in case I merge the triphones, I won't have to do calculations for gaussians, since I'm going to use the gaussians of the base which are the same for every triphone. But what about mixture weights ? Can I just sum and divide ? Or maybe I should keep a triphone and forget about the others. But how will I prioritize the selection ? i (intetrnal)->e (end)->b (begin)->s (single)?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
What I mean is that the word-internal triphone doesn't exist for every triphone, in that case I guess it would be better to use the begin triphone (b) or the end triphone (e) (since the single triphone (s) is usually quiet rare). looking at the states may also give an idea on which triphone to keep as it would be better to have a triphone with states that are more often seen in similar context dependent triphones.
A 1 percent difference looks intresting ! I will defintly try it out ! Thanks Nickolay :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
So in Sphinx4, we have monophones and triphones. And triphones may be defined up to 4 times since the triphone position in taken into consideration to build the search graph. For exemple if we have the triphone a-b+c at the beginning of the word to recognize, we're definitively going to use the [b a c b] where the last b stands for beginning. Although the changes between the states of the similar triphones are quiet minor, changes are changes.
I'm thinking about merging similar triphone definitions into one. Does anybody know how will this affect the recognition results ?
Last edit: Karim BEN ALAYA 2017-07-05
For the Pocketsphinx case, in case I merge the triphones, I won't have to do calculations for gaussians, since I'm going to use the gaussians of the base which are the same for every triphone. But what about mixture weights ? Can I just sum and divide ? Or maybe I should keep a triphone and forget about the others. But how will I prioritize the selection ? i (intetrnal)->e (end)->b (begin)->s (single)?
You can use word-internal triphones exclusively the difference less than a percent.
I do not understand your second question.
What I mean is that the word-internal triphone doesn't exist for every triphone, in that case I guess it would be better to use the begin triphone (b) or the end triphone (e) (since the single triphone (s) is usually quiet rare). looking at the states may also give an idea on which triphone to keep as it would be better to have a triphone with states that are more often seen in similar context dependent triphones.
A 1 percent difference looks intresting ! I will defintly try it out ! Thanks Nickolay :)