To build a WORD based digit model with SphinxTrain collect some test corpus (about 20 wave files with digits..) for a test setup to make me familar with the Trainer. I want to build a Context Independent Continious model and want to use it with Sphinx4. For that i write my own scripts because this special case (CI/Cont) is not prepared in the common scripts). Based on the 02.ci_schmm scripts which generates only one GMM density, i extend the script so that it splits the trained model and accumulates 8 GMM densitys. This works so far fine... An interesting point is the following: For my word models (typically 100-400ms long) i try to use 10-18 States per HMM and the BW algorithm says
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2006-03-24
Hi Athur,
i see the Word based model of the tidigit corpus and i see that they used 3 States per model too.For our Aurora setup we see that 18 States per HMM works best for the word model !!! which sounds really usefull for complete words wich take about 100-400ms. The Tidigit model is Word based ???
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, digit models in AURORA is word HMM-based. And yes, it is actually good to use an 18 state model than to use 3 state model in tidigits for word model.
Though, I want to point out the same effect could be obtained by using phoneme model as well. I would also speculate the phoneme model will give small advantage on across word acoustic modeling.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2006-03-23
SORRY,HERE THE CONTINUING TEXT FOR THE ABOVE MESSAGE:
th BW algorithm says that "final state not reached". If i use a shorter count of States it works fine. On an HTK based setup withexact the same Windowssize/Shift/NumFFT we have a well working model. An other interesting point is the this WORD based model works best for only 3 States which sound completely nonsense for me. Is there any detail that i not know about the Trainer ???
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Chris,
You could change $CFG_STATESPERHMM to 3 from another to have a longer HMM. Though, we usually just test on a 3 state or 5 state HMM. I don't know whether num-state equals to arbitrary N works for all tools.
As a general rule of thumb, HTK provides a very flexible of command for training acoustic models. It is very flexible in terms of specifying the topologys of HMMs. In the case of Sphinx, the fixed topologies is usually used because 3-state or 5-state are traditionally used in LVCSR. There were also experimental results show that in LVCSR, changing the number of state wouldn't result in more than 5% relative gain. That was why we didn't adopt it.
It is also important to note, using phone-based system to train a word model might not be a bad idea. One of our ti46 system is actually phone
models.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
To build a WORD based digit model with SphinxTrain collect some test corpus (about 20 wave files with digits..) for a test setup to make me familar with the Trainer. I want to build a Context Independent Continious model and want to use it with Sphinx4. For that i write my own scripts because this special case (CI/Cont) is not prepared in the common scripts). Based on the 02.ci_schmm scripts which generates only one GMM density, i extend the script so that it splits the trained model and accumulates 8 GMM densitys. This works so far fine... An interesting point is the following: For my word models (typically 100-400ms long) i try to use 10-18 States per HMM and the BW algorithm says
Hi Athur,
i see the Word based model of the tidigit corpus and i see that they used 3 States per model too.For our Aurora setup we see that 18 States per HMM works best for the word model !!! which sounds really usefull for complete words wich take about 100-400ms. The Tidigit model is Word based ???
Yes, digit models in AURORA is word HMM-based. And yes, it is actually good to use an 18 state model than to use 3 state model in tidigits for word model.
Though, I want to point out the same effect could be obtained by using phoneme model as well. I would also speculate the phoneme model will give small advantage on across word acoustic modeling.
Arthur
SORRY,HERE THE CONTINUING TEXT FOR THE ABOVE MESSAGE:
th BW algorithm says that "final state not reached". If i use a shorter count of States it works fine. On an HTK based setup withexact the same Windowssize/Shift/NumFFT we have a well working model. An other interesting point is the this WORD based model works best for only 3 States which sound completely nonsense for me. Is there any detail that i not know about the Trainer ???
Hi Chris,
You could change $CFG_STATESPERHMM to 3 from another to have a longer HMM. Though, we usually just test on a 3 state or 5 state HMM. I don't know whether num-state equals to arbitrary N works for all tools.
As a general rule of thumb, HTK provides a very flexible of command for training acoustic models. It is very flexible in terms of specifying the topologys of HMMs. In the case of Sphinx, the fixed topologies is usually used because 3-state or 5-state are traditionally used in LVCSR. There were also experimental results show that in LVCSR, changing the number of state wouldn't result in more than 5% relative gain. That was why we didn't adopt it.
It is also important to note, using phone-based system to train a word model might not be a bad idea. One of our ti46 system is actually phone
models.
Arthur