Menu

Training word model HOWTO?

Help
2004-09-01
2012-09-22
  • danial ibrahim

    danial ibrahim - 2004-09-01

    Hi,
    can i know how to perform training for word model?
    what is the different between training for word and phoneme model in sphinxtrain?

    thank you.

     
    • The Grand Janitor

      Before we go on, I want you to know both sphinx 3 and sphinx 4 are used mainly for sub-word based speech recognition.  We never tried to use it for word-based speech recognition.  So, we actually didn't know the consequence.

      In general, "Word models" and "Phoneme model" are just names of HMM composition scheme in general.  When people say they use phoneme models, that actually means "first compose a word HMM model using phoneme HMM models, use them in Viterbi search".   When people say they use "word models" means "directly represent word model without composition".

      Enough for theory. The following is the trick how people used speech trainer to do whole word model. 
      1, First, define each word as an HMM, say "one", "two" and "three" are now HMM.
      2, In the dictionary file, put the following entries.
      one  one
      two  two
      three three
      .
      .
      .
      Notice that when you do this, the first column actually means the final hmm and the second column actually mean the component used to compose the finall hmm.

      3, Now for the "phone list", what you need to put is a list of words like
      one
      two
      three

      Why? because this time, the word itself is also the sub-word unit.

      I think these are the major thing if you want to do a whole word model hacks.  Again, prepare to get hurt because no one actually did it before using Sphinx or SphinxTrain.

      Arthur

       
      • Eric H. Thayer

        Eric H. Thayer - 2004-09-02

        Something also to think about is the acoustic complexity of the word.  As a first approximation you might want to look at an acoustic lexicon, like cmudict, and do a one-for-one substitution for all the phone models w/ models particular to the word.  For example,

        BAT    M1 M2 M3
        CANTANKEROUS    M4 M5 M6 M7 M8 M9 M10 M11 M12

        If you do this, the context-dependent parts of training become irrelevant so you needn't use or define any cd phones.

        Also be aware that this approach really only works well if you are doing isolated word recognition (pausing between words) because the articulation of a word is significantly influenced by the word preceding it in continuous speech.

        ...eric

         
    • danial ibrahim

      danial ibrahim - 2004-09-02

      thanks a lot both of you for the information.

      i think it will be more convenient if i just follow the process from the instruction: http://fife.speech.cs.cmu.edu/sphinxman/fr4.html

      besides, i have spent a lot of time in training before and now is the time to get the output.
      maybe some other time i will focus on this (word model).

      thanks again :)

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.