Menu

"ngram_search.c", line 1139: Couldn't find <s> in first frame

Help
2016-06-13
2016-09-20
  • Zachary Deering

    Zachary Deering - 2016-06-13

    Occasionally, when stopping and then immediately thereafter starting a listening task, Pocketsphinx (for Android) seems to hang for ~40 seconds before throwing the titular error and then recovering successfully. This is probably a bug that should be fixed. The log around the event is below, with my comments inserted:

    /* Stopped Listening */
    ...
    06-13 09:05:42.150 14558-18313/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
    06-13 09:05:42.150 14558-14558/mil.army.us.cerdec.cpi.tce.smash.service I/SpeechRecognizer: Cancel recognition
    
    /* Started Listening */
    
    06-13 09:05:42.175 14558-14558/mil.army.us.cerdec.cpi.tce.smash.service I/SpeechRecognizer: Start recognition "search"
    06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 7.39 CPU 0.392 xRT
    06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 55.00 wall 2.915 xRT
    06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 1.62 CPU 0.086 xRT
    06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 1.86 wall 0.098 xRT
    06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search.c(303): TOTAL bestpath 0.02 CPU 0.001 xRT
    06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search.c(306): TOTAL bestpath 0.04 wall 0.002 xRT
    06-13 09:05:42.215 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service D/SpeechRecognizer: Starting decoding
    
    /* Long pause (~44 sec) with no reaction or logs from Sphinx */
    
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: cmn_prior.c(131): cmn_prior_update: from < 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: 28.01 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -11.78 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -3.77 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: 16.50 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -0.29 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -7.41 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.15 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.62 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx:  0.44 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.67 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -4.79 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx:  1.20 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -4.05 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: >
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: cmn_prior.c(149): cmn_prior_update: to   < 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: 28.01 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -11.78 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -3.77 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: 16.50 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -0.29 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -7.41 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.15 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.62 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx:  0.44 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.67 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -4.79 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx:  1.20 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -4.05 
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: >
    06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
    06-13 09:06:26.800 14558-14558/mil.army.us.cerdec.cpi.tce.smash.service I/SpeechRecognizer: Stop recognition
    06-13 09:06:26.800 14558-14558/mil.army.us.cerdec.cpi.tce.smash.service E/cmusphinx: ERROR: "ngram_search.c", line 1139: Couldn't find <s> in first frame
    06-13 09:06:26.800 14558-14558/mil.army.us.cerdec.cpi.tce.smash.service I/SpeechRecognizer: Start recognition search
     
  • Zachary Deering

    Zachary Deering - 2016-06-15

    Any suggestions? Is there a different place I should submit this (e.g. Github)?

     
  • Dan Yaeger

    Dan Yaeger - 2016-06-15

    I am having an eerily similar problem. Will post logs as well.

     
  • Dan Yaeger

    Dan Yaeger - 2016-06-23

    Bumping this thread. Were you able to figure this out?

     
    • Zachary Deering

      Zachary Deering - 2016-06-23

      No I was not. I hopefully will hear back from someone on the project soon.

       
  • Dan Yaeger

    Dan Yaeger - 2016-06-24

    Bump.

     
  • Daniel Wolf

    Daniel Wolf - 2016-06-27

    I'm having a similar problem.

    I'm working with pre-recorded files. First, I perform VAD using WebRTC. Then I feed the "active" audio segments to PocketSphinx for word recognition -- every segment as a separate utterance. Sometimes, WebRTC will give a false positive and detect a short sequence of background noise (about 0.1s) as active. When I pass this audio sequence to PocketSphinx, it reproducably logs the error you mentioned.

    What's worse, it seems that at this point, some internal decoder state is getting corrupted: The next utterance I feed to PocketSphinx is always detected as garbage (something like "[UM] <sil> you <sil> know " instead of "marley was dead". After that, the decoder seems to have recovered: The following utterances are usually recognized just fine.</sil></sil>

    Here's a shortened version of what I'm doing:

    for (/* each VAD segment... */) {
        ps_start_stream(decoder);
        ps_start_utt(decoder);
        while (/* more data... */) {
            ps_process_raw(decoder, /* data */);
        }
        ps_end_utt(decoder);
    
        for (ps_seg_t* it = ps_seg_iter(decoder); it; it = ps_seg_next(it)) {
            /* processing */
        }
    }
    

    My actual code checks every single function call for failure. There is no error until the call to ps_seg_iter to get the recognition result. This is where the actual error occurs. Here is the relevant call stack:

    find_start_node(ngram_search_s * ngs, ps_lattice_s * dag) Line 1139
    ngram_search_lattice(ps_search_s * search) Line 1244
    ngram_search_seg_iter(ps_search_s * search) Line 1016
    ps_seg_iter(ps_decoder_s * ps) Line 1216
    

    Please let me know if there is any more information I can supply!

     

    Last edit: Daniel Wolf 2016-06-28
    • Daniel Wolf

      Daniel Wolf - 2016-06-28

      I did some more digging.

      • find_start_node (in ngram_search.c) doesn't find any nodes and logs that as an error (accompanied by the comment "This is probably impossible.")
      • The reason why it can't find a node is that dag->nodes is NULL.
      • That probably shouldn't happen because ngram_search_lattice calls create_dag_nodes before.
      • However, create_dag_nodes only adds nodes if ngs->bpidx > 0. This isn't the case, it's 0.
      • From my (very shaky) understanding of things, that's simply because no words have been recognized.

      So I'm wondering two things:

      1. Is there actually something wrong with the code, or should it simply expect the case that no words were recognized?
      2. Why is the following utterance always detected as garbage? Is there some clean-up missing?
       
      • Dan Yaeger

        Dan Yaeger - 2016-06-28

        This is exactly where we are in our thinking. Shouldn't the case of no words recognized be valid? I suppose it could be a philosophical dilemma for a speech recognizer to recognize "not speech"...

        We've been trying to find a work around (calling start/stop less often, modifiying LM data), but have had no luck thus far.

         
        • Daniel Wolf

          Daniel Wolf - 2016-06-29

          So far, I couldn't find a workaround, either. What I tried is checking ngs->bpidx right after ps_end_utt. If it's 0, I skip the ps_seg_iter call because I know there won't be any result.

          This gets rid of the error message, of course. But the bigger problem still remains -- the following utterance is detectected as garbage.

          One theory I have is that after processing an "utterance" containing only background noise, the decoder's noise statistics are way off, so it takes some time for them to re-calibrate. If that's the case, maybe there is a way to reset these values after a 0-word recognition.

          I'm running out of ideas, though. Maybe Nickolay can help?

           
          • Daniel Wolf

            Daniel Wolf - 2016-07-05

            I've implemented a rather ugly hack: Right after ps_end_utt, I check ngs->bpidx. If it's 0, I skip the ps_seg_iter call as described above. In addition, I then discard the decoder altogether and re-load it.

            Re-loading the decoder takes some time, but at least it means that the next utterance will be recognized correctly.

            Nickolay, please let me know what you think on the matter. There must be a simple way of fixing this and I'm willing to help in any way I can!

             
  • Dan Yaeger

    Dan Yaeger - 2016-07-05

    Daniel,

    How long does the reloading take? Are we talking milliseconds, seconds, or tens of seconds?

    Thanks!

     

    Last edit: Dan Yaeger 2016-07-05
    • Daniel Wolf

      Daniel Wolf - 2016-07-05

      I'd say one or two seconds, but I didn't measure it.

       
      • Dan Yaeger

        Dan Yaeger - 2016-07-05

        Thanks. We'll try that and see how it affects the behavior we've been experiencing.

         
      • Dan Yaeger

        Dan Yaeger - 2016-07-07

        Daniel:

        After looking at your fixes and investigating a bit further from there, we were able to determine that our issue, a lengthy pause of 30+ seconds followed by the n-gram_serch.c error, was being caused by Android AudioRecorder during AudioRecord.read freezing. The behavior only seems to appear on Samsung devices with stock Android L, which is our main development device.

        Thanks fo ryour help on this! Sorry that we couldn't be more helpful with your issue.

         
        • Daniel Wolf

          Daniel Wolf - 2016-07-08

          Dan,

          It's good to hear that you found a solution for your cause of the problem! Not I'm hoping for a word from Nickolay regarding the decoder problem. It's uncommon for him not to reply within days.

           
          • Nickolay V. Shmyrev

            Dear Daniel, I'll check this issue, it will just take a little more time to setup the test to reproduce it.

             
            • Daniel Wolf

              Daniel Wolf - 2016-07-08

              Hi Nickolay,

              That's great to hear! Please let me know if there's anything I can do to help.

               
  • Daniel Wolf

    Daniel Wolf - 2016-08-04

    Hi Nickolay,

    I just saw your commit "Properly finalizes utt even if there was no speech". Is that the fix for this issue?

     
    • Nickolay V. Shmyrev

      No, sorry, that was a different thing in gstreamer.

       
      • Daniel Wolf

        Daniel Wolf - 2016-08-04

        Pity. And I thought I could toss out my workaround.

         
        • Daniel Wolf

          Daniel Wolf - 2016-09-20

          Now I can indeed remove my workaround. It seems that at least in my case, the problem was this:

          • VAC erroneously detected an utterance. Really, it was just noise.
          • The decoder tried to recognize this utterance. Running in 'continuous' mode, it kept adjusting its CMN values, thinking the noise was a signal.
          • At the start of the next utterance, the CMN values were totally off, resulting in bad recognition quality.
          • After that, the CMN values had recovered and recognition worked again.

          For me, the solution was to switch to 'batch' mode instead of 'continuous' mode. See this discussion for details.

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.