Menu

Problem with accuracy using n-gram models

Help
2008-11-24
2012-09-22
  • Gabriel Skantze

    Gabriel Skantze - 2008-11-24

    I have used SphinxTrain to build acoustic models for Swedish. When using these models and a CFG (with Sphinx 4), I get very good accuracy. However, when building a simple trigram model (LexTreeLinguist + SimpleNGramModel), I get very bad accuracy. Even if I say a phrase that is very frequent in the training material, it is very hard to get good recognition. I also tried the HelloNgram-example that comes with Sphinx and I get very bad accuracy. For example, if I say "the green one on the lower right side", it is almost impossible for it to get it right. I get results like "the green lot all middle are right side" which should not get a high language model score (some of these trigrams do not even exist in the data). This is a very simple example model that really should work most of the times when I read a sentence from the training material. Since I get very good results with a CFG, this should not be a problem with my microphone, or the acoustic models. Have you noticed the same problem?

     
    • Gabriel Skantze

      Gabriel Skantze - 2008-11-27

      Thanks a lot! Those parameter settings really helped. The parameters I was using were taken from the HelloNGram example that comes with Sphinx. They should really be updated.

       
      • Nickolay V. Shmyrev

        Yeah, those values were confusing. I've just changed them to more suitable defaults.

         
    • Nickolay V. Shmyrev

      Well, everything depends on the config and recording. Can you please share them.

       
    • Gabriel Skantze

      Gabriel Skantze - 2008-11-25

      Ok, I have put together a test set:
      http://dl.getdropbox.com/u/110350/testngram.zip

      I have recorded some test sentences (which are well represented in the tri-grams) with two different speakers (GS & JE). Beolow are the results. As you can see, they do not represent very likely word sequences.

      REF: the closest purple one on the far left side
      JE: closest purple one on four left side
      GS: that us us that one on the far next side

      REF: the green one right in the middle
      JE: the green one right little
      GS: between one right of middle

      REF: the only one left on the left
      JE: you near one left colors
      GS: the only was the a only left

      REF: the purple one on the lower right side
      JE: the purple one little right side
      GS: the talking one little are right side

       
      • Nickolay V. Shmyrev

        Well, I checked this, first of all you need much bigger wordInsertionProbability (around 0.7). Demo is not correct here.
        Next, where are you from? Are you from UK? It seems you say words a bit differently. Hub4 acoustic model handles your speech correctly. But for wsj there are differences. I had to fix the dictionary for PURPLE for example to make it work properly:

        PURPLE P ER P AH L
        PURPLE(2) P AO P EH L

        I'd say you say lower like L OW EH R as well, but it's not important. And here is the result:

        RESULT: the purple one on the lower right side

        <property name="absoluteBeamWidth"  value="5000"/>
        <property name="relativeBeamWidth"  value="1E-120"/>
        <property name="absoluteWordBeamWidth" value="200"/>
        <property name="relativeWordBeamWidth" value="1E-80"/>
        <property name="wordInsertionProbability" value="0.2"/>
        <property name="languageWeight" value="15.5"/>
        <property name="silenceInsertionProbability" value=".1"/>
        
         

Log in to post a comment.