CMU Sphinx / Forums / Help: Adding words to Dictionary POST-training

Speech Recognition Toolkit

Adding words to Dictionary POST-training

Forum: Help

Creator: roof

Created: 2006-01-11

Updated: 2012-09-22

roof - 2006-01-11

Hi guys,

I have trained 30 hours of speech data (1500 words, 200 sentences - repeated hundreds of times)

What I want to do now is, extend the dictionary to encompass another, say, few thousand words, without having to retrain the data.

Presumably all the phonemes have been mapped to particular acoustic signals during training, so adding words (making sure they have phoneme mapping) to the dictionary, should allow them to be recognised, without having to retrain, correct?

I am assuming this because the new added words are just another combination of already-trained acoustic signals, determined by the words phonemic composition.

Would this work, and would recognition be accurate (assuming the new words are also encorporated into the language model)?

Cheers

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous - 2006-01-14
  
  The short answer is yes, but it depends on the generality of the data you've used in training.
  
  The acoustic model contains not only HMMs for phones, but much more important, HMMs for triphones. (Triphone means a particular phone in the context of particular phones on the left and right.) The Sphinx recognizers then build and use models for the words in your LM by creating sequences of triphone HMMs. Therefore you can recognize words not present in the training data, as long as the triphones in those words are present in your acoustic model. If an LM word contains a triphone that's missing in the acoustic model, the recognizer must substitute some other triphone. (Sphinx-2 substitutes a context-independent phone model; Sphinx-4 does something fancier; I don't know about Sphinx-3.)
  
  There are only around 40 phones in English, but there are on the order of 40^3 triphones. And in SphinxTrain, there are 3 types of triphones that are modeled independently (word-initial, word-internal, and word-final); there's also a 4th type that isn't very important for this discussion.
  
  Therefore, if you wish to train an acoustic model that's generally useful, your training data should contain as many different triphones as possible. The fact that your training data contains "hundreds" of examples of each sentence suggests that your model won't contain a rich assortment of triphones and therefore won't generalize as well as a training set that is phonetically more diverse.
  
  If you are training a new model, it's useful to include in your training dictionary not only the words in the training data but also a wider set. if you do this, the resulting tied-state acoustic model will contain triphones contained in those extra words as well as those seen in the training data.
  
  If you already have trained a model, there is a procedure for augmenting its triphone inventory to include triphones in a wider dictionary. See my 2005-04-01 posting in the Open Discussion forum entitled "Augmenting the triphone set of a model" and also Patch 1174913 under "Patches" here in Sourceforge.
  
  cheers,
  jerry
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Adding words to Dictionary POST-training

Speech Recognition Toolkit

Forums

Help

Adding words to Dictionary POST-training document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Adding words to Dictionary POST-training