CMU Sphinx / Forums / Help: "ngram_search.c", line 1139: Couldn't find <s> in first frame

Occasionally, when stopping and then immediately thereafter starting a listening task, Pocketsphinx (for Android) seems to hang for ~40 seconds before throwing the titular error and then recovering successfully. This is probably a bug that should be fixed. The log around the event is below, with my comments inserted:

/* Stopped Listening */
...
06-13 09:05:42.150 14558-18313/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
06-13 09:05:42.150 14558-14558/mil.army.us.cerdec.cpi.tce.smash.service I/SpeechRecognizer: Cancel recognition

/* Started Listening */

06-13 09:05:42.175 14558-14558/mil.army.us.cerdec.cpi.tce.smash.service I/SpeechRecognizer: Start recognition "search"
06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdtree.c(429): TOTAL fwdtree 7.39 CPU 0.392 xRT
06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdtree.c(432): TOTAL fwdtree 55.00 wall 2.915 xRT
06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdflat.c(176): TOTAL fwdflat 1.62 CPU 0.086 xRT
06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdflat.c(179): TOTAL fwdflat 1.86 wall 0.098 xRT
06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search.c(303): TOTAL bestpath 0.02 CPU 0.001 xRT
06-13 09:05:42.210 14558-14578/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search.c(306): TOTAL bestpath 0.04 wall 0.002 xRT
06-13 09:05:42.215 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service D/SpeechRecognizer: Starting decoding

/* Long pause (~44 sec) with no reaction or logs from Sphinx */

06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: cmn_prior.c(131): cmn_prior_update: from < 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: 28.01 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -11.78 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -3.77 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: 16.50 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -0.29 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -7.41 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.15 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.62 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx:  0.44 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.67 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -4.79 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx:  1.20 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -4.05 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: >
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: cmn_prior.c(149): cmn_prior_update: to   < 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: 28.01 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -11.78 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -3.77 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: 16.50 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -0.29 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -7.41 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.15 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.62 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx:  0.44 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -5.67 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -4.79 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx:  1.20 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: -4.05 
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: >
06-13 09:06:26.795 14558-18334/mil.army.us.cerdec.cpi.tce.smash.service I/cmusphinx: INFO: ngram_search_fwdflat.c(302): Utterance vocabulary contains 0 words
06-13 09:06:26.800 14558-14558/mil.army.us.cerdec.cpi.tce.smash.service I/SpeechRecognizer: Stop recognition
06-13 09:06:26.800 14558-14558/mil.army.us.cerdec.cpi.tce.smash.service E/cmusphinx: ERROR: "ngram_search.c", line 1139: Couldn't find <s> in first frame
06-13 09:06:26.800 14558-14558/mil.army.us.cerdec.cpi.tce.smash.service I/SpeechRecognizer: Start recognition “search”

Zachary Deering - 2016-06-15

Any suggestions? Is there a different place I should submit this (e.g. Github)?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dan Yaeger - 2016-06-15

I am having an eerily similar problem. Will post logs as well.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dan Yaeger - 2016-06-23

Bumping this thread. Were you able to figure this out?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Zachary Deering - 2016-06-23
  
  No I was not. I hopefully will hear back from someone on the project soon.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dan Yaeger - 2016-06-24

Bump.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Wolf - 2016-06-27

I'm having a similar problem.

I'm working with pre-recorded files. First, I perform VAD using WebRTC. Then I feed the "active" audio segments to PocketSphinx for word recognition -- every segment as a separate utterance. Sometimes, WebRTC will give a false positive and detect a short sequence of background noise (about 0.1s) as active. When I pass this audio sequence to PocketSphinx, it reproducably logs the error you mentioned.

What's worse, it seems that at this point, some internal decoder state is getting corrupted: The next utterance I feed to PocketSphinx is always detected as garbage (something like "[UM] <sil> you <sil> know " instead of "marley was dead". After that, the decoder seems to have recovered: The following utterances are usually recognized just fine.</sil></sil>

Here's a shortened version of what I'm doing:

for (/* each VAD segment... */) { ps_start_stream(decoder); ps_start_utt(decoder); while (/* more data... */) { ps_process_raw(decoder, /* data */); } ps_end_utt(decoder); for (ps_seg_t* it = ps_seg_iter(decoder); it; it = ps_seg_next(it)) { /* processing */ } }

My actual code checks every single function call for failure. There is no error until the call to ps_seg_iter to get the recognition result. This is where the actual error occurs. Here is the relevant call stack:

find_start_node(ngram_search_s * ngs, ps_lattice_s * dag) Line 1139 ngram_search_lattice(ps_search_s * search) Line 1244 ngram_search_seg_iter(ps_search_s * search) Line 1016 ps_seg_iter(ps_decoder_s * ps) Line 1216

Please let me know if there is any more information I can supply!

Last edit: Daniel Wolf 2016-06-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Wolf - 2016-06-28
  
  I did some more digging.
  
  find_start_node (in ngram_search.c) doesn't find any nodes and logs that as an error (accompanied by the comment "This is probably impossible.")
  
  The reason why it can't find a node is that dag->nodes is NULL.
  
  That probably shouldn't happen because ngram_search_lattice calls create_dag_nodes before.
  
  However, create_dag_nodes only adds nodes if ngs->bpidx > 0. This isn't the case, it's 0.
  
  From my (very shaky) understanding of things, that's simply because no words have been recognized.
  
  So I'm wondering two things:
  
  Is there actually something wrong with the code, or should it simply expect the case that no words were recognized?
  
  Why is the following utterance always detected as garbage? Is there some clean-up missing?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dan Yaeger - 2016-06-28
    
    This is exactly where we are in our thinking. Shouldn't the case of no words recognized be valid? I suppose it could be a philosophical dilemma for a speech recognizer to recognize "not speech"...
    
    We've been trying to find a work around (calling start/stop less often, modifiying LM data), but have had no luck thus far.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Daniel Wolf - 2016-06-29
      
      So far, I couldn't find a workaround, either. What I tried is checking ngs->bpidx right after ps_end_utt. If it's 0, I skip the ps_seg_iter call because I know there won't be any result.
      
      This gets rid of the error message, of course. But the bigger problem still remains -- the following utterance is detectected as garbage.
      
      One theory I have is that after processing an "utterance" containing only background noise, the decoder's noise statistics are way off, so it takes some time for them to re-calibrate. If that's the case, maybe there is a way to reset these values after a 0-word recognition.
      
      I'm running out of ideas, though. Maybe Nickolay can help?
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Daniel Wolf - 2016-07-05
        
        I've implemented a rather ugly hack: Right after ps_end_utt, I check ngs->bpidx. If it's 0, I skip the ps_seg_iter call as described above. In addition, I then discard the decoder altogether and re-load it.
        
        Re-loading the decoder takes some time, but at least it means that the next utterance will be recognized correctly.
        
        Nickolay, please let me know what you think on the matter. There must be a simple way of fixing this and I'm willing to help in any way I can!
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dan Yaeger - 2016-07-05

Daniel,

How long does the reloading take? Are we talking milliseconds, seconds, or tens of seconds?

Thanks!

Last edit: Dan Yaeger 2016-07-05

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Daniel Wolf - 2016-07-05
  
  I'd say one or two seconds, but I didn't measure it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dan Yaeger - 2016-07-05
    
    Thanks. We'll try that and see how it affects the behavior we've been experiencing.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Dan Yaeger - 2016-07-07
    
    Daniel:
    
    After looking at your fixes and investigating a bit further from there, we were able to determine that our issue, a lengthy pause of 30+ seconds followed by the n-gram_serch.c error, was being caused by Android AudioRecorder during AudioRecord.read freezing. The behavior only seems to appear on Samsung devices with stock Android L, which is our main development device.
    
    Thanks fo ryour help on this! Sorry that we couldn't be more helpful with your issue.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Daniel Wolf - 2016-07-08
      
      Dan,
      
      It's good to hear that you found a solution for your cause of the problem! Not I'm hoping for a word from Nickolay regarding the decoder problem. It's uncommon for him not to reply within days.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Nickolay V. Shmyrev - 2016-07-08
        
        Dear Daniel, I'll check this issue, it will just take a little more time to setup the test to reproduce it.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        Daniel Wolf - 2016-07-08
        
        Hi Nickolay,
        
        That's great to hear! Please let me know if there's anything I can do to help.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Daniel Wolf - 2016-08-04

Hi Nickolay,

I just saw your commit "Properly finalizes utt even if there was no speech". Is that the fix for this issue?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-08-04
  
  No, sorry, that was a different thing in gstreamer.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Daniel Wolf - 2016-08-04
    
    Pity. And I thought I could toss out my workaround.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Daniel Wolf - 2016-09-20
      
      Now I can indeed remove my workaround. It seems that at least in my case, the problem was this:
      
      VAC erroneously detected an utterance. Really, it was just noise.
      
      The decoder tried to recognize this utterance. Running in 'continuous' mode, it kept adjusting its CMN values, thinking the noise was a signal.
      
      At the start of the next utterance, the CMN values were totally off, resulting in bad recognition quality.
      
      After that, the CMN values had recovered and recognition worked again.
      
      For me, the solution was to switch to 'batch' mode instead of 'continuous' mode. See this discussion for details.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

"ngram_search.c", line 1139: Couldn't find <s> in first frame

Speech Recognition Toolkit

Forums

Help

"ngram_search.c", line 1139: Couldn't find <s> in first frame document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

"ngram_search.c", line 1139: Couldn't find <s> in first frame