#316 Augmenting the triphone set of an acoustic model

next release
closed
nobody
5
2015-05-07
2005-04-01
No

Any acoustic model contains a particular set of
triphones, which have been derived from the training
utterances and the training dictionary. A recognition
application using a particular acoustic model may
require (from the language model and dictionary)
triphones that are missing from the acoustic model --
that weren't in the dictionary when the model was
trained. Recognizers must handle this situation in
some way; a common strategy is to substitute the HMM of
the context-independent phone for the missing triphone
-- it's suboptimal, but better than nothing. A
somewhat better option, in the context of a tied-state
acoustic model, is to augment the model with an
"artificial" HMM formed from the senones that best fit
the triphone context (see Mei-Yuh Hwang, "Predicting
Unseen Triphones with Senones", IEEE Transaction on
Acoustic Speech and Signal Processing, pp. 311-314,
April, 1993.)

I have included a Perl script which, given an
"augmented" dictionary (the model's training dictionary
plus additional words containing the desired
triphones), creates a new acoustic model that contains
the additional triphones. (It's actually rather simple
and quick; all it does is generate an augmented
"alltriphones" .mdef file and run "tiestate" to walk
the decision trees to produce a new tied-state .mdef
file. This file, plus the previous binary data files,
constitute the augmented model. In the case of a
semi-continuous (Sphinx2) model, there's an additional
step to generate new map and phone files.)

The script requires two additional configuration
variables, which are best added to etc/sphinx_train.cfg
(and set to values appropriate to your model):

Expanded dictionary (training + recognition words)

should be named thus:
$CFG_AUG_DICT = "$CFG_BASE_DIR/etc/${CFG_DB_NAME}_aug.dic";

Suffix to distinguish the augmented from original model

$CFG_AUG_SUFFIX = "AUG1";

Discussion

  • Anonymous - 2005-04-01

    augment_model.pl

     
  • Anonymous - 2005-04-01

    Logged In: YES
    user_id=1159586

    I should have added that it's not always necessary to
    "augment" an already-trained model to add triphones from
    outside the training data set. If, at the time you train
    the model, you already have a wider dictionary than the
    words from the training set (e.g., from one or more
    application vocabularies), simply use the union of it and
    your training words as your training dictionary, and the
    resulting model will contain all the triphones from that.
    (In fact the practice at CMU may simply be to use the entire
    cmudict for training.)

     
  • Nickolay V. Shmyrev

    Ticket moved from /p/cmusphinx/patches/38/

     
  • Anonymous - 2015-05-06

    Jerry,
    I am trying to do something similar, make an acoustic model to recognize individual words, to recognize the pronunciation.
    But sometimes the recognition is of compound words, more than one word.
    What is the best way to do this ?
    In the construction of acoustic models, the documentation says to make a model based on sentences and not in isolated words.

    Sometimes the recognition when there is a pronunciation the pocketsphinx responds by compound words, something like :

    What is, instead of being just what.

     
  • Anonymous - 2015-05-06

    I think that the designers of the project should have a documentation more organized, a pdf file or odt, this lack of documentation generates many difficulties, it would be very nice to have a documentation with the functions of tools and own pocketsphinx, case had this pdf documentation would be more productive for everyone.

     
    • Nickolay V. Shmyrev

      Dear Bruno

      If you have some specific question it is better to ask it on forum instead of raising up a patch submitted 10 years ago. Also spend some more time on making your question clear, I'm not quite sure what are you asking now.

       
  • Nickolay V. Shmyrev

    Probably not a big issue these days since we are using big dictionaries for training.

     
  • Nickolay V. Shmyrev

    • status: open --> closed
    • Group: --> next release
     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks