Menu

Adding words to Dictionary POST-training

Help
roof
2006-01-11
2012-09-22
  • roof

    roof - 2006-01-11

    Hi guys,

    I have trained 30 hours of speech data (1500 words, 200 sentences - repeated hundreds of times)

    What I want to do now is, extend the dictionary to encompass another, say, few thousand words, without having to retrain the data.

    Presumably all the phonemes have been mapped to particular acoustic signals during training, so adding words (making sure they have phoneme mapping) to the dictionary, should allow them to be recognised, without having to retrain, correct?

    I am assuming this because the new added words are just another combination of already-trained acoustic signals, determined by the words phonemic composition.

    Would this work, and would recognition be accurate (assuming the new words are also encorporated into the language model)?

    Cheers

     
    • Anonymous

      Anonymous - 2006-01-14

      The short answer is yes, but it depends on the generality of the data you've used in training.

      The acoustic model contains not only HMMs for phones, but much more important, HMMs for triphones. (Triphone means a particular phone in the context of particular phones on the left and right.) The Sphinx recognizers then build and use models for the words in your LM by creating sequences of triphone HMMs. Therefore you can recognize words not present in the training data, as long as the triphones in those words are present in your acoustic model. If an LM word contains a triphone that's missing in the acoustic model, the recognizer must substitute some other triphone. (Sphinx-2 substitutes a context-independent phone model; Sphinx-4 does something fancier; I don't know about Sphinx-3.)

      There are only around 40 phones in English, but there are on the order of 40^3 triphones. And in SphinxTrain, there are 3 types of triphones that are modeled independently (word-initial, word-internal, and word-final); there's also a 4th type that isn't very important for this discussion.

      Therefore, if you wish to train an acoustic model that's generally useful, your training data should contain as many different triphones as possible. The fact that your training data contains "hundreds" of examples of each sentence suggests that your model won't contain a rich assortment of triphones and therefore won't generalize as well as a training set that is phonetically more diverse.

      If you are training a new model, it's useful to include in your training dictionary not only the words in the training data but also a wider set. if you do this, the resulting tied-state acoustic model will contain triphones contained in those extra words as well as those seen in the training data.

      If you already have trained a model, there is a procedure for augmenting its triphone inventory to include triphones in a wider dictionary. See my 2005-04-01 posting in the Open Discussion forum entitled "Augmenting the triphone set of a model" and also Patch 1174913 under "Patches" here in Sourceforge.

      cheers,
      jerry

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.