CMU Sphinx / Forums / Help: Problem with accuracy using n-gram models

Gabriel Skantze - 2008-11-24

I have used SphinxTrain to build acoustic models for Swedish. When using these models and a CFG (with Sphinx 4), I get very good accuracy. However, when building a simple trigram model (LexTreeLinguist + SimpleNGramModel), I get very bad accuracy. Even if I say a phrase that is very frequent in the training material, it is very hard to get good recognition. I also tried the HelloNgram-example that comes with Sphinx and I get very bad accuracy. For example, if I say "the green one on the lower right side", it is almost impossible for it to get it right. I get results like "the green lot all middle are right side" which should not get a high language model score (some of these trigrams do not even exist in the data). This is a very simple example model that really should work most of the times when I read a sentence from the training material. Since I get very good results with a CFG, this should not be a problem with my microphone, or the acoustic models. Have you noticed the same problem?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gabriel Skantze - 2008-11-27
  
  Thanks a lot! Those parameter settings really helped. The parameters I was using were taken from the HelloNGram example that comes with Sphinx. They should really be updated.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2008-11-28
    
    Yeah, those values were confusing. I've just changed them to more suitable defaults.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2008-11-24
  
  Well, everything depends on the config and recording. Can you please share them.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Gabriel Skantze - 2008-11-25
  
  Ok, I have put together a test set:
  http://dl.getdropbox.com/u/110350/testngram.zip
  
  I have recorded some test sentences (which are well represented in the tri-grams) with two different speakers (GS & JE). Beolow are the results. As you can see, they do not represent very likely word sequences.
  
  REF: the closest purple one on the far left side
  JE: closest purple one on four left side
  GS: that us us that one on the far next side
  
  REF: the green one right in the middle
  JE: the green one right little
  GS: between one right of middle
  
  REF: the only one left on the left
  JE: you near one left colors
  GS: the only was the a only left
  
  REF: the purple one on the lower right side
  JE: the purple one little right side
  GS: the talking one little are right side
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2008-11-25
    
    Well, I checked this, first of all you need much bigger wordInsertionProbability (around 0.7). Demo is not correct here.
    Next, where are you from? Are you from UK? It seems you say words a bit differently. Hub4 acoustic model handles your speech correctly. But for wsj there are differences. I had to fix the dictionary for PURPLE for example to make it work properly:
    
    PURPLE P ER P AH L
    PURPLE(2) P AO P EH L
    
    I'd say you say lower like L OW EH R as well, but it's not important. And here is the result:
    
    RESULT: the purple one on the lower right side
    
    <property name="absoluteBeamWidth" value="5000"/> <property name="relativeBeamWidth" value="1E-120"/> <property name="absoluteWordBeamWidth" value="200"/> <property name="relativeWordBeamWidth" value="1E-80"/> <property name="wordInsertionProbability" value="0.2"/> <property name="languageWeight" value="15.5"/> <property name="silenceInsertionProbability" value=".1"/>
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Problem with accuracy using n-gram models

Speech Recognition Toolkit

Forums

Help

Problem with accuracy using n-gram models document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Problem with accuracy using n-gram models