Menu

Senones

Help
2010-09-02
2012-09-22
  • Madhav Kishore

    Madhav Kishore - 2010-09-02

    I have some doubts on training

    1.In sphinx-3FAQ
    http://www.speech.cs.cmu.edu/sphinxman/FAQ.html

    It is mentioned some of the Thumb rule figures for setting senones ..

    My training set contains 200 sentences (~1 hour data) for 15 speakers
    so ,Amount of training data for setting SENONES should be 1 hour or 15
    hours....

    2.If I create a model for Command and Control Application ,is there need for
    composite triphones..How these composite triphones are trained if it is not in
    the transcript of training ...

     
  • Nickolay V. Shmyrev

    ,Amount of training data for setting number of senones should be 1 hour or
    15 hours..

    Total amount of your training data is 15 hours. So you should choose number of
    senones for 15 hours. 4000 would be a good guess

    If I create a model for Command and Control Application ,is there need for
    composite triphones..How these composite triphones are trained if it is not in
    the transcript of training ...

    It's not clear what composite triphones are you asking about. My suggestion is
    if you don't know how to train such "composite triphones" don't train them.

     
  • Madhav Kishore

    Madhav Kishore - 2010-09-03
    1. I am confused with senones...I think senone is a sub phonetic unit .
      if that is the case,why senones for 1 hour and 15 hour(same training data so
      same phonetic sentences )is different ?

    2.from dictionary to triphones.c file

    • \brief Building triphones for a dictionary.
    • This is one of the more complicated parts of a cross-word
    • triphone model decoder. The first and last phones of each word
    • get their left and right contexts, respectively, from other
    • words. For single-phone words, both its contexts are from other
    • words, simultaneously. As these words are not known beforehand,
    • life gets complicated. In this implementation, when we do not
    • wish to distinguish between distinct contexts, we use a COMPOSITE
    • triphone (a bit like BBN's fast-match implementation), by
    • clubbing together all possible contexts
     
  • Nickolay V. Shmyrev

    1. I am confused with senones...I think senone is a sub phonetic unit .
      if that is the case,why senones for 1 hour and 15 hour(same training data so
      same phonetic sentences )is different ?

    No, they aren't. Senone as triphone is just a collection of probablistic
    models to match specific phone in specific context. They are different from
    phones or diphones which correspond to actual audio chunk. The amount of
    context in your 15 hours recording is enough to train 4000 senones. Even if
    there are same phonetic content for different speakers, the amount of contexts
    in 1 hour is enough. The different situation will be of course if you have
    1000 recordings of 1 minutes long reading the same small sentence. Then number
    of contexts to train will be way smaller.

    2.from dictionary to triphones.c file

    • \brief Building triphones for a dictionary.
    • This is one of the more complicated parts of a cross-word
    • triphone model decoder. The first and last phones of each word
    • get their left and right contexts, respectively, from other
    • words. For single-phone words, both its contexts are from other
    • words, simultaneously. As these words are not known beforehand,
    • life gets complicated. In this implementation, when we do not
    • wish to distinguish between distinct contexts, we use a COMPOSITE
    • triphone (a bit like BBN's fast-match implementation), by
    • clubbing together all possible contexts

    Those composite senones are internals of sphinx3 large vocabulary decoding
    used to optimize speed on word boundaries where most lextree expansion
    happens. You can read about lextree in ASR textbook if you are interested, but
    such composite senones arent' visible to the user and you shouldn't care about
    them.

     
  • Madhav Kishore

    Madhav Kishore - 2010-09-06

    The amount of context in your 15 hours recording is enough to train 4000
    senones. Even if there are same phonetic content for different speakers, the
    amount of contexts in 1 hour is enough
    then you are suggesting me to use 4000 senones...
    but such composite senones arent' visible to the user and you shouldn't care
    about them.
    I see such a composite triphones in my MDEF file(default training
    settings),whether it will affect my system (Command and control
    app)performance

     
  • Nickolay V. Shmyrev

    I see such a composite triphones in my MDEF file(default training
    settings),whether it will affect my system (Command and control
    app)performance

    No, you don't see them. Model definition file lists known triphones and senone
    sequencies for them. You have some issues with terminology it seems.

    whether it will affect my system (Command and control app)performance

    Sorry, I don't understand your question here.

     
  • Madhav Kishore

    Madhav Kishore - 2010-09-07

    No, you don't see them. Model definition file lists known triphones and
    senone sequencies for them. You have some issues with terminology it seems.

    since my mdef file is huge file , I am pasting few lines

    a SIL v b n/a 2 390 626 682 N
    a SIL y b n/a 2 393 629 767 N
    a SIL yy b n/a 2 393 629 767 N
    a SIL z b n/a 2 393 629 729 N
    a a dd b n/a 2 360 485 797 N
    a a h b n/a 2 350 485 838 N
    a a j b n/a 2 360 485 797 N
    a a k b n/a 2 360 503 777 N
    a a l b n/a 2 353 502 685 N
    a a m b n/a 2 360 485 667 N
    a a n b n/a 2 360 554 667 N
    a a n' b n/a 2 361 514 753 N

    a a n1 b n/a 2 353 502 685 N
    a a ng' b n/a 2 353 502 685 N
    a a ng'ng' b n/a 2 353 502 685 N
    a a nj' b n/a 2 353 502 685 N

    these triphones (Bold ) are not in my training dictionary...

     
  • Nickolay V. Shmyrev

    these triphones (Bold ) are not in my training dictionary...

    Triphones are taken from the transcription of the training prompts, not from
    the dictionary. All triphones above are present in your prompts.

     
  • Madhav Kishore

    Madhav Kishore - 2010-09-07

    Triphones are taken from the transcription of the training prompts, not from
    the dictionary. All triphones above are present in your prompts.
    I checked with dict2tri exe which generates triphones from dictionary.It
    generates between word triphones (with default option
    -btwtri yes Compute between-word triphone set )it lists all ( 10742)triphones which are seen in the MDEF file.

    0.3
    80 n_base
    10742 n_tri
    43288 n_state_map
    740 n_tied_state
    240 n_tied_ci_state
    80 n_tied_tmat

    note: my transcript contains only words no sentences

     
  • Nickolay V. Shmyrev

    Great, now we have found the truth as well as the proper name for triphones :)
    Any other question?

     
  • Madhav Kishore

    Madhav Kishore - 2010-09-07

    Any other question?
    then how such triphones will be trained if it is not in the training
    transcript.....(plz correct me if I am wrong)

     
  • Nickolay V. Shmyrev

    They aren't trained. First of all on untied stage they will be just ignored.
    Later on cd stage when states will be tied, they will have same senone
    sequence as word-internal triphones. And this tied senone-sequence like

    A   I  CH e    n/a    0    137    227    255 N
    A   I  CH i    n/a    0    137    227    255 N
    

    will be trained from word-internal material. If there will be no word-internal
    material as well you'll get a warning:

    if (wt_var_ < 0) {
    _

                                E_ERROR("Variance (mgau= %u, feat= %u, "
                                        "density=%u, component=%u) is less then 0. "
                                        "Most probably the number of senones is "
                                        "too high for such a small training "
                                        "database. Use smaller $CFG_N_TIED_STATES.\n",
    

    _

    in norm log on stage 50._

     
  • Madhav Kishore

    Madhav Kishore - 2010-09-08

    I think
    1. even if there is no word internal triphone,it is tied with other similar triphones...
    2.then,the Maximum number of senones will be 3*number of triphones listed by
    dictotri.exe
    3.whether it is possible to train only the internal word triphones and reduce
    the number of senones (for my command and control application to increase the
    speed and accuracy)

     
  • Nickolay V. Shmyrev

    whether it is possible to train only the internal word triphones and reduce
    the number of senones (for my command and control application to increase the
    speed and accuracy)

    Did you try to change the script to run dict2tri with -btwtri no?

    Anyway, I think you have way more effective way to reduce amount of senones -
    N_TIED_STATES configuration in sphinx_train.cfg. Why don't you want to set it
    properly and this way get the amount of senones you want. I think if cross-
    word triphones will not be in training transcription, model will not have
    separate senones for them. Even more, they will not be considered in decoder
    if your grammar doesn't have self-loops.

    If you have only limited amount of word-internal triphones, set the number of
    states so that only they will be in final model. Yes, documentation doesn't
    consider that in detail, we'll update it accordingly to explain this.

     
  • Madhav Kishore

    Madhav Kishore - 2010-09-09

    Did you try to change the script to run dict2tri with -btwtri no?
    I tried my level,I could't point out where the dic2tri is called...
    (when I removed dic2tri from bin folder of training ,still it is working
    .....)

     
  • Nickolay V. Shmyrev

    Hello

    Sorry for confusion. I've checked the source and let me try to state
    everything as it is

    1. dict2tri is not used at all
    2. Untied mdef is created with mk_mdef_get with -ountiedmdef flag
    3. Untied mdef only counts triphones that are present in transcription and contain only them
    4. If your transcription doesn't have cross-word triphones, untied mdef will not have them as well.

    Correct me if I'm wrong

     
  • Madhav Kishore

    Madhav Kishore - 2010-09-13

    Sorry for late response...
    All the above said are correct....

    I think in final mdef , all the triphones in dictionary is listed and
    clustered with trained triphones

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.