In a current recognition task, I need to recognize mainly digits and spelled letters (with high accuracy), but also a small limited set of other words. I was thinking of training special HMMs for each number and letter and also generic phoneme models for word recognition.
So I wonder, if it's possible to train models with different number of states: e.g. normal 3-state models for phonemes but longer models for letters and digits? I understand that this is not supported by the normal SphinxTrain script workflow, but could I create the model architecture file by hand (or by a script)... I was just wondering if the bw and other training programs won't choke upon it, and if Sphinx3.5 and/or Sphinx4 can theoretically handle it... Oh, and I was planning to only use CI models.
Anybody?
Thanks in advance
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I also want to train two HMM's with different topologies. Can you explain your
solution a bit more? Is it possible to train models with different number of
states? Is it possible to decode with such HMM topology using Sphinx3 decoder?
Thanks for your help
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> Anybody has any idea about training and decoding with models which have
different number of states? Can Sphinx handle it?
No, sphinxtrain can only build a models with uniform topology. You can easily
model nonuniform topology with word-dependent phones. TIDIGITS example
mentioned above is good. Please take a look at it
eight EY_eight T_eight
five F_five AY_five V_five
four F_four OW_four R_four
nine N_nine AY_nine N_nine_2
oh OW_oh
one W_one AX_one N_one
seven S_seven EH_seven V_seven E_seven N_seven
six S_six I_six K_six S_six_2
three TH_three R_three II_three
two T_two OO_two
zero Z_zero II_zero R_zero OW_zero
Please also avoid posting a reply to 5-year-old post. Start a new topic if
needed.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
In a current recognition task, I need to recognize mainly digits and spelled letters (with high accuracy), but also a small limited set of other words. I was thinking of training special HMMs for each number and letter and also generic phoneme models for word recognition.
So I wonder, if it's possible to train models with different number of states: e.g. normal 3-state models for phonemes but longer models for letters and digits? I understand that this is not supported by the normal SphinxTrain script workflow, but could I create the model architecture file by hand (or by a script)... I was just wondering if the bw and other training programs won't choke upon it, and if Sphinx3.5 and/or Sphinx4 can theoretically handle it... Oh, and I was planning to only use CI models.
Anybody?
Thanks in advance
Replying to myself:
I solved the problem by creating special intra-word 3-state HMMs for each digit/letter, in a similar way of the tidigits acoustic models.
Hello,
I also want to train two HMM's with different topologies. Can you explain your
solution a bit more? Is it possible to train models with different number of
states? Is it possible to decode with such HMM topology using Sphinx3 decoder?
Thanks for your help
Anybody has any idea about training and decoding with models which have
different number of states? Can Sphinx handle it?
Thanks in advance
> Anybody has any idea about training and decoding with models which have
different number of states? Can Sphinx handle it?
No, sphinxtrain can only build a models with uniform topology. You can easily
model nonuniform topology with word-dependent phones. TIDIGITS example
mentioned above is good. Please take a look at it
eight EY_eight T_eight
five F_five AY_five V_five
four F_four OW_four R_four
nine N_nine AY_nine N_nine_2
oh OW_oh
one W_one AX_one N_one
seven S_seven EH_seven V_seven E_seven N_seven
six S_six I_six K_six S_six_2
three TH_three R_three II_three
two T_two OO_two
zero Z_zero II_zero R_zero OW_zero
Please also avoid posting a reply to 5-year-old post. Start a new topic if
needed.