Menu

How much States for a Word Model ?

Help
Anonymous
2006-03-23
2012-09-22
  • Anonymous

    Anonymous - 2006-03-23

    To build a WORD based digit model with SphinxTrain collect some test corpus (about 20 wave files with digits..) for a test setup to make me familar with the Trainer. I want to build a Context Independent Continious model and want to use it with Sphinx4. For that i write my own scripts because this special case (CI/Cont) is not prepared in the common scripts). Based on the 02.ci_schmm scripts which generates only one GMM density, i extend the script so that it splits the trained model and accumulates 8 GMM densitys. This works so far fine... An interesting point is the following: For my word models (typically 100-400ms long) i try to use 10-18 States per HMM and the BW algorithm says

     
    • Anonymous

      Anonymous - 2006-03-24

      Hi Athur,
      i see the Word based model of the tidigit corpus and i see that they used 3 States per model too.For our Aurora setup we see that 18 States per HMM works best for the word model !!! which sounds really usefull for complete words wich take about 100-400ms. The Tidigit model is Word based ???

       
      • The Grand Janitor

        Yes, digit models in AURORA is word HMM-based. And yes, it is actually good to use an 18 state model than to use 3 state model in tidigits for word model.

        Though, I want to point out the same effect could be obtained by using phoneme model as well. I would also speculate the phoneme model will give small advantage on across word acoustic modeling.

        Arthur

         
    • Anonymous

      Anonymous - 2006-03-23

      SORRY,HERE THE CONTINUING TEXT FOR THE ABOVE MESSAGE:
      th BW algorithm says that "final state not reached". If i use a shorter count of States it works fine. On an HTK based setup withexact the same Windowssize/Shift/NumFFT we have a well working model. An other interesting point is the this WORD based model works best for only 3 States which sound completely nonsense for me. Is there any detail that i not know about the Trainer ???

       
      • The Grand Janitor

        Hi Chris,
        You could change $CFG_STATESPERHMM to 3 from another to have a longer HMM. Though, we usually just test on a 3 state or 5 state HMM. I don't know whether num-state equals to arbitrary N works for all tools.

        As a general rule of thumb, HTK provides a very flexible of command for training acoustic models. It is very flexible in terms of specifying the topologys of HMMs. In the case of Sphinx, the fixed topologies is usually used because 3-state or 5-state are traditionally used in LVCSR. There were also experimental results show that in LVCSR, changing the number of state wouldn't result in more than 5% relative gain. That was why we didn't adopt it.

        It is also important to note, using phone-based system to train a word model might not be a bad idea. One of our ti46 system is actually phone
        models.
        Arthur

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.