Menu

Improving Sphinx4 word recognition results

Help
Van Nguyen
2008-06-25
2012-09-22
1 2 > >> (Page 1 of 2)
  • Van Nguyen

    Van Nguyen - 2008-06-25

    Hi everyone,

    I'm still getting used to Sphinx, its a very well organized piece of software. I'm evaluating its potential use for transcribing spoken speech to text (for something like automatic closed captioning). I've copied the transcriber demo, modified it a bit. I'm trying to transcribe the audio file found here:

    http://www.thegoleffect.com/sphinx4/peterRabbit1_humanread.wav

    I've also been trying to use some TTS generated audio files with better success. If anyone would like those files, please let me know.

    The audio on the file should read:
    "Once upon a time there were four little rabbits. [and] There names were Flopsy, Mopsy, Cotton-tail, and Peter."

    Currently, the program (VansDecoder.jar) gives the following results:
    "one second and are west lower little rather just
    very at shade lot see cone table and to third
    middle not and a rear to further it between"

    The rest of the files (including the code) is available here:
    http://www.thegoleffect.com/sphinx4/

    I've been changing the configuration file a lot and trying a lot of different things, beamwidths and what not. I haven't been getting much luck from that. I'm not too sure what particular settings I can change to maximize the level of recognition from Sphinx. Should I customize a JSGF grammar for the story? Customize a language model? Retrain the acoustic model 0_0? Any help is greatly appreciated. Thanks for your time.

    Best Regards,
    Van Nguyen

     
    • Maks

      Maks - 2009-02-19

      Hello Van,
      I could not find your files from the link you have provided! All the links leads to these error:
      The requested URL /sphinx4/pass6/Pg4Sent2.txt was not found on this server.
      Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

      I will be grateful if you could upload those files again for our help.

      Regards
      Maks

       
    • Nickolay V. Shmyrev

      You need another language model and another dictionary. You can create basic ones with online lmtool:

      http://www.speech.cs.cmu.edu/tools/lmtool.html

      and for more advanced one you'll need cmuclmtk. Other tuning will be in setting wip to 0.7:

      <property name="wordInsertionProbability" value="0.7"/>

      Try these changes first please.

       
      • Nickolay V. Shmyrev

        Your file is encoded at 44100 Hz, one you'll convert it to 16 kHz as it should be you'll get with this config:

        RESULT: once upon a time there were four little rabbits
        RESULT: and then names were
        RESULT: got thing mopsy
        RESULT: cottontail and peter

         
    • Van Nguyen

      Van Nguyen - 2008-06-25

      Hello Mr. Shmyrev,

      Thank you for your super fast response :O.

      I was able to the lmtool to get a language model (0980.lm) and a new dictionary (0980.dic). I modified the config to use those new files as well as the wip modification. Updated files are in the following folder:

      http://www.thegoleffect.com/sphinx4/pass2/

      Updated Results:
      whom who a thing a long
      hung nothing looked
      11:50.891 INFO wordPruningSearchMa Average Tokens/State: 695
      whom whom on out
      whom hung
      whom who
      could hoeing
      whom hide

      Did I use the new files incorrectly? Thanks for your help!!!!!!

      Best Regards,
      Van Nguyen

       
    • Nickolay V. Shmyrev

      And once you'll adjust beams a bit:

      &lt;property name=&quot;absoluteBeamWidth&quot;  value=&quot;1000&quot;/&gt;
      &lt;property name=&quot;relativeBeamWidth&quot;  value=&quot;1E-90&quot;/&gt;
      &lt;property name=&quot;absoluteWordBeamWidth&quot; value=&quot;20&quot;/&gt;
      &lt;property name=&quot;relativeWordBeamWidth&quot; value=&quot;1E-60&quot;/&gt;
      &lt;property name=&quot;wordInsertionProbability&quot; value=&quot;.7&quot;/&gt;
      &lt;property name=&quot;languageWeight&quot; value=&quot;9.0&quot;/&gt;
      

      The result will be precise

       
    • Van Nguyen

      Van Nguyen - 2008-06-25

      Wow, the result is impeccable with those settings. That was thrilling :D. But it only works for that file. I have a larger file containing the one above:

      http://www.thegoleffect.com/sphinx4/pass3/peterRabbit_gutenberg_part1.wav
      But it is a very long file, so here is a cut portion that follows the file given earlier:
      http://www.thegoleffect.com/sphinx4/pass3/peterRabbit1_humanread2.wav

      How do you determine the values to configure the beams for each audio file? What if I have to work with a different set with a different speaker?

      Regardless, I'm thoroughly impressed with Sphinx4 now :-D. Thank you very much for your time, again and again.

      Best Regards,
      Van Nguyen

       
    • Van Nguyen

      Van Nguyen - 2008-06-25

      Hello again,

      It would be interesting to know if Sphinx capable of printing out time stamps from the file between words? For the sake of building a karaoke-like system? How does Sphinx determine word boundaries? Very fascinating stuff! :D Thank you for your help!!!!!!!!!!!!!!!!!!!!!

      Best Regards,
      Van Nguyen

       
    • Van Nguyen

      Van Nguyen - 2008-06-26

      Hello,

      If I try to run VansDecoder, even if it gives correct results, the program just stalls at the end no matter how long I let it run. Is there some kind of infinite looping going on? Error in my code? I'm not sure :(.

      Best Regards,
      Van Nguyen

       
    • Nickolay V. Shmyrev

      > How do you determine the values to configure the beams for each audio file? What if I have to work with a different set with a different speaker?

      Beams are actually not the most important thing, they only restrict search space. They affect the speed of recognition. You can select them so that system will be more or less precise and will work in a reasonable time.

      > It would be interesting to know if Sphinx capable of printing out time stamps from the file between words? For the sake of building a karaoke-like system? How does Sphinx determine word boundaries? Very fascinating stuff! :D Thank you for your help!!!!!!!!!!!!!!!!!!!!!

      sure, you can use Result.getTimedBestResult() in your java code. If you specifically look on the task of transcribing the speech from existing text, say for subtitles, this task is called "forced alignment" and can be done a little bit differently.

      > If I try to run VansDecoder, even if it gives correct results, the program just stalls at the end no matter how long I let it run

      If you are using latest trunk, it's a known bug not solved yet. Released version should perform better.

       
    • Van Nguyen

      Van Nguyen - 2008-06-26

      > sure, you can use Result.getTimedBestResult() in your java code

      Using the following code for Result.getTimedBestResult(), the getTBR's results get printed as blank lines? I tried different possibilities for parameters but it seems to be blank no matter what. Did I do something wrong? I tried to use the code based on the javadoc but its not working right :(.
      Result result = recognizer.recognize();
      if (result != null) {
      String resultText = result.getBestResultNoFiller();
      String timedResult = result.getTimedBestResult(false, true);
      System.out.println(resultText);
      System.out.println(timedResult);
      unitTestBuffer.add(result);
      } else {
      done = true;
      }

      > this task is called "forced alignment" and can be done a little bit differently.

      If I want to do forced alignment, should I still be using Sphinx4? I'll dig into FA some more, thanks for the name! I wouldn't have found it on my own! Thanks again :D

      > If you are using latest trunk, it's a known bug not solved yet. Released version should perform better.

      Awesome, phew. I thought I put in an infinite loop or something. Thanks for some peace of mind.

      Best Regards,
      Van Nguyen

       
    • Van Nguyen

      Van Nguyen - 2008-06-30

      Is there more I have to add or do to get result.getBestTimedResult(bool, bool) to work? I've read the javadocs, tried different combinations, used eclipse debugger. I'm not good enough with java to tell 0_0. Any ideas? :-\

      Thanks for your help, as always! :D

       
      • Nickolay V. Shmyrev

        It works with WavFile demo, it seems it depends on the decoder or grammar or other bits in config. More close investigation will take more time.

         
    • Van Nguyen

      Van Nguyen - 2008-07-07

      Hi,

      By changing the activeList setup from the activeListManager (with factories) to one based on the activeList from WavFile, the results I got were:

      once(0.54,0.86) upon(0.86,0.91) a(0.91,1.5) time(1.5,1.71) there(1.89,2.03) were(2.03,2.37) four(2.37,2.61) little(2.61,-1.0) rabbits(-1.0,-1.0)

      and(3.91,4.06) then(4.06,4.41) names(4.41,-1.0) were(-1.0,-1.0)

      flopsy(6.18,6.45) mopsy(-1.0,-1.0)

      cottontail(8.23,8.79) and(8.79,-1.0) peter(-1.0,-1.0)

      The numeric results are somewhat inaccurate (especially looking at some of the -1.0's). So more experimentation needs to be done before its perfected. Nickolay, do you have intuition about why this is happening? Please let me know.

      I have uploaded a copy of my files here:
      http://www.thegoleffect.com/sphinx4/pass4

      Thanks for your time.

      Best Regards,
      Van Nguyen

       
    • Van Nguyen

      Van Nguyen - 2008-07-07

      Also, as a happy side-effect, the program no longer hangs upon completion.

       
    • Van Nguyen

      Van Nguyen - 2008-07-07

      If you go by the first number in parentheses as the range in seconds for the next word in the list. Its pretty close to the right timestamps. I suppose that the "-1.0"'s could just be ignored that way, skip straight to a non neg number, get your right timestamp. Is that how its supposed to be?

       
    • Van Nguyen

      Van Nguyen - 2008-07-07

      I forgot to mention:

      I modified Result.java like so:

      private String getTimedWordPath(Token token, boolean wantFiller) {
      StringBuffer sb = new StringBuffer();

          /* get to the first emitting token
          while (token != null &amp;&amp; !token.isEmitting()) {
              System.out.println(&quot;token = &quot; + token);
              token = token.getPredecessor();
          }*/
      

      I put the comments around the emitting token chunk so that things work out properly. If I call getTimedBestResults with wordTokenFirst set to false, nothing would be printed out. I haven't figured that out yet.

       
    • Van Nguyen

      Van Nguyen - 2008-07-07

      To fix the wordTokenFirst issue, I added the line with the comment "VL's Addition".

      private String getTimedWordTokenLastPath(Token token, boolean wantFiller) {
      StringBuffer sb = new StringBuffer();
      Word word = null;
      Data lastFeature = null;
      Data lastWordFirstFeature = null;

          while (token != null) {
              if (token.isWord()) {
                  if (word != null) {
                      if (wantFiller || !word.isFiller()) {
                          addWord(sb, word,
                                  (FloatData) lastFeature,
                                  (FloatData) lastWordFirstFeature);
                      }
                      word = token.getWord();
                      lastWordFirstFeature = lastFeature;
                  }
                  word = token.getWord();  // VL's Addition
              }
              Data feature = token.getData();
              if (feature != null) {
                  lastFeature = feature;
                  if (lastWordFirstFeature == null) {
                      lastWordFirstFeature = lastFeature;
                  }
              }
              token = token.getPredecessor();
          }
      
          return sb.toString();
      }
      

      Using this method results in highly accurate time stamps.

      I haven't tested any corner cases but this stuff might be useful to add to the sphinx4 code base, I suppose.

      Best Regards,
      Van Nguyen

       
    • Van Nguyen

      Van Nguyen - 2008-07-08

      Hello,

      I used lmtool to make a new dictionary and language model but when I plug them in, I get the following error:

      Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
      at edu.cmu.sphinx.linguist.lextree.HMMTree.collectEntryAndExitUnits(HMMTree.java:198)
      at edu.cmu.sphinx.linguist.lextree.HMMTree.compile(HMMTree.java:152)
      at edu.cmu.sphinx.linguist.lextree.HMMTree.<init>(HMMTree.java:73)
      at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.generateHmmTree(LexTreeLinguist.java:366)
      at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLinguist.java:353)
      at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:279)
      at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.allocate(SimpleBreadthFirstSearchManager.java:567)
      at edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:66)
      at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:158)
      at demo.sphinx.vansdecoder.VansDecoder.main(VansDecoder.java:64)

      Does anyone have a clue what's causing it? My current code can be found here:

      http://www.thegoleffect.com/sphinx4/pass5

      I appreciate any assistance you can offer. Thanks for your time.

      Best Regards,
      Van Nguyen

       
    • Van Nguyen

      Van Nguyen - 2008-07-08

      ArrayIndex problem fixed. For now.

       
    • Van Nguyen

      Van Nguyen - 2008-07-08

      Okay, so current status:

      I have a new recorded audio sample. I've put all the necessary files here:
      http://www.thegoleffect.com/sphinx4/pass6

      Its set up using a custom dictionary/lm from lmtool specific to this file only. peterrabbit.* are general ones for the entire corpus. But they don't work as well.

      The audio file says:
      "She went through the woods to the bakery where she bought some bread."

      Sphinx4 thinks it says:
      "she went through the woods to to bakery where she bought to bread"

      What can I do to get that last inkling of accuracy? I've tried a variety of beam settings and other settings. Do I need to change my search module? Linguist? I have no clue. Feels like making progress though! hehehe

      Thanks for reading!

      Best Regards,
      Van Nguyen

       
      • Nickolay V. Shmyrev

        3) can you please try with the old ActiveListManager? Probably it will not dump times but it will be more precise.

         
      • Nickolay V. Shmyrev

        1) can you please post the changes you've made so far as patches? With some description of the solution probably. There is sense to commit them into trunk really

        2) What sentences you generated language model from?

         
    • Van Nguyen

      Van Nguyen - 2008-07-08

      > 2) What sentences you generated language model from?

      for the general peterrabbit.*
      http://www.thegoleffect.com/sphinx4/pass6/PeterRabbitSimplified.txt

      for the Pg4Sent2.wav file:
      http://www.thegoleffect.com/sphinx4/pass6/Pg4Sent2.txt

      > 1) can you please post the changes you've made so far as patches? With some description of the solution probably. There is sense to commit them into trunk really

      What do I do to create a folder I can commit with? I have a lot of extraneous stuff but would love to add the bugfixes and help others out. But I have zero experience helping with patches.

      > 3) can you please try with the old ActiveListManager? Probably it will not dump times but it will be more precise.

      Ah, that's a good point. Thanks! I had abandoned the ALM but I could probably use both separately to get accuracy and times with some subtle but (mostly) acceptable mismatches.

      I know its possible to get the pronunciations (phonemes) printed out via Sphinx but is there a way to get those AND time stamps? That would be nice. Is that functionality included in Sphinx4 at the moment? Or is that something I would have to put together?

      As always, thanks for your help, Nickolay! I hope to learn from your example :D.

      Best Regards,
      Van Nguyen

       
    • Van Nguyen

      Van Nguyen - 2008-07-09

      Is there a way to have lmtool create batches of dictionaries and language models? Or to have it run locally for scripting purposes?

       
1 2 > >> (Page 1 of 2)

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.