Menu

Pocketsphinx continuous problem

Help
Bula
2010-09-10
2012-09-22
  • Bula

    Bula - 2010-09-10

    Hi, I have been using pocketsphinx_continuous with custom language and
    acoustic model and it has been working very well but after while it just
    stops. Below is the usual output of two last utterances before halt. I would
    be very grateful if someone can give some advice where I should be looking for
    solution before I lost myself in code completely :)

    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 11.63  0.18  0.07 -0.16 -0.30 -0.30 -0.11 -0.33 -0.05 -0.20 -0.05 -0.13 -0.09 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 11.61  0.18  0.06 -0.17 -0.31 -0.30 -0.12 -0.34 -0.05 -0.20 -0.05 -0.13 -0.09 >
    INFO: ngram_search_fwdtree.c(1502):      428 words recognized (5/fr)
    INFO: ngram_search_fwdtree.c(1504):    40277 senones evaluated (491/fr)
    INFO: ngram_search_fwdtree.c(1506):    13346 channels searched (162/fr), 2574 1st, 3921 last
    INFO: ngram_search_fwdtree.c(1510):      659 words for which last channels evaluated (8/fr)
    INFO: ngram_search_fwdtree.c(1513):      427 candidate words for entering last phone (5/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 11 words
    INFO: ngram_search_fwdflat.c(912):      270 words recognized (3/fr)
    INFO: ngram_search_fwdflat.c(914):    22248 senones evaluated (271/fr)
    INFO: ngram_search_fwdflat.c(916):     7823 channels searched (95/fr)
    INFO: ngram_search_fwdflat.c(918):      912 words searched (11/fr)
    INFO: ngram_search_fwdflat.c(920):      371 word transitions (4/fr)
    WARNING: "ngram_search.c", line 1082: </s> not found in last frame, using vrata instead
    INFO: ngram_search.c(1132): lattice start node <s>.0 end node vrata.44
    INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(vrata:44:80) = -831767
    INFO: ps_lattice.c(1266): Joint P(O,S) = -831943 P(S|O) = -176
    000000113: otkljuczaj vrata (-16040810)
    READY....
    Listening...
    Stopped listening, please wait...
    INFO: cmn_prior.c(121): cmn_prior_update: from < 11.61  0.18  0.06 -0.17 -0.31 -0.30 -0.12 -0.34 -0.05 -0.20 -0.05 -0.13 -0.09 >
    INFO: cmn_prior.c(139): cmn_prior_update: to   < 11.65  0.14  0.04 -0.15 -0.29 -0.27 -0.12 -0.35 -0.05 -0.21 -0.05 -0.13 -0.10 >
    INFO: ngram_search_fwdtree.c(1502):      372 words recognized (3/fr)
    INFO: ngram_search_fwdtree.c(1504):    41649 senones evaluated (389/fr)
    INFO: ngram_search_fwdtree.c(1506):    12908 channels searched (120/fr), 2978 1st, 3233 last
    INFO: ngram_search_fwdtree.c(1510):      580 words for which last channels evaluated (5/fr)
    INFO: ngram_search_fwdtree.c(1513):      373 candidate words for entering last phone (3/fr)
    INFO: ngram_search_fwdflat.c(295): Utterance vocabulary contains 10 words
    INFO: ngram_search_fwdflat.c(912):      190 words recognized (2/fr)
    INFO: ngram_search_fwdflat.c(914):    13517 senones evaluated (126/fr)
    INFO: ngram_search_fwdflat.c(916):     4863 channels searched (45/fr)
    INFO: ngram_search_fwdflat.c(918):      689 words searched (6/fr)
    INFO: ngram_search_fwdflat.c(920):      460 word transitions (4/fr)
    WARNING: "ngram_search.c", line 1082: </s> not found in last frame, using <sil> instead
    INFO: ngram_search.c(1132): lattice start node <s>.0 end node <sil>.98
    INFO: ps_lattice.c(1228): Normalizer P(O) = alpha(<sil>:98:105) = -952607
    INFO: ps_lattice.c(1266): Joint P(O,S) = -952607 P(S|O) = 0
    000000114: otkljuczaj vrata (-18248974)
    READY....
    Listening...
    INFO: ngram_search.c(407): Resized backpointer table to 10000 entries
    INFO: ngram_search.c(407): Resized backpointer table to 20000 entries
    INFO: ngram_search_fwdtree.c(1433): Renormalizing Scores at frame 4136, best score -534675725
    INFO: ngram_search.c(407): Resized backpointer table to 40000 entries
    INFO: ngram_search.c(407): Resized backpointer table to 80000 entries
    INFO: ngram_search.c(415): Resized score stack to 200000 entries
    INFO: ngram_search.c(407): Resized backpointer table to 160000 entries
    pocketsphinx_continuous: ngram_search_fwdtree.c:803: prune_nonroot_chan: Assertion `(&hmm->hmm)->frame >= frame_idx' failed.
    
     
  • Nickolay V. Shmyrev

    Hello

    That sounds like a bug. What version are you using? Can you reproduce it with
    recent trunk? Can you provides models to let me reproduce it locally? Can you
    enable audio dump with -rawlogdir . and share last audio chunk that auses
    issue as well.

    Are you running on the system with restricted memory? It looks you need some
    tigher beams then.

     
  • Bula

    Bula - 2010-09-11

    Hi,

    I'm using sphinxbase and pocketsphinx both version 0.6 on embedded system with
    256MB, I can try to reproduce it with recent trunk and with -rawlogdir later
    today or tomorrow and send you results.

     
  • Bula

    Bula - 2010-09-13

    Today I finally managed to make some tests and here is what I found out.
    Pocketsphinx_continuous is stuck in continuous listening mode and these
    messages (memory related?) are consequence of wanna-be-infinite utterance.

    When I introduce break (if the utterance is more than few seconds long) in the
    listening loop then it continuously breaks out of it with (null) or some
    other, mainly wrong, hypothesis.

    I tried to find out what triggers that behavior but I cant put my finger on
    it. It seems like sensitivity to noise suddenly gets really high. When the raw
    files are played I can hear a little elevated noise (interference) in
    background but nothing too loud or out of the ordinary. If you would like to
    help me I can send you the raw files before and after the constant listening
    mode kicks in.

    Also, there is one more thing that probably is not related - at the beginning
    of each raw file there a little stuttering can be heard. When I make normal
    rec (parallel with pocketsphinx) in another terminal it sounds ok so I wonder
    is that normal behavior and what is the cause?

     
  • Nickolay V. Shmyrev

    If I would be able to reproduce it I could solve this easily. That parallel
    recording file will also help

    As for embedded device there are number of tricks to make memory usage low.
    For example -maxhmmpf 3000 should be better.

     
  • Bula

    Bula - 2010-09-13

    Hey, thank you for the tip, but it seems that memory is not the problem,
    problem is infinite utterance loop. I looked into it further and real problem
    seems to be silence/speech filtering in sphinxbase cont_ad module. Source
    might be the fact that my program often mutes capture device so by
    recalibration in muted periods this module becomes too sensitive to
    noise/interference and thinks it's speech. If that is the case solution is
    avoidance of muting the device and keeping interference as low as possible by
    solving some hardware issues.

    There is one other thing that bothers me and I cant figure it out and that is
    stuttering in raw files. I'm sending you the recordings via mail.

     
  • Nickolay V. Shmyrev

    Hi

    I've got your file, it seems it has DC offset. Please try to add "-remove_dc
    yes" option to remove it. Also please send me long continuous recording
    bypassing pocketsphinx from your device. I'll try to test cont_ad locally.

     
  • Nickolay V. Shmyrev

    Hi

    Sorry, it took me a while. I checked your file and it seems ok. It might be
    indeed like you said when you mute energy endpointer breaks. Can you try to
    insert into your code

    cont_ad_set_logfp(cont, stdout);

    and share the log of the endpointer?

    Also, I wonder if you mute the device, why don't you stop recording as well.
    You can just pause cont_ad_read for a while,

     
  • Bula

    Bula - 2010-09-20

    Hi thank you for looking into it. Conditions that had to be satisfied for
    problem to occur are very subtle and difficult to grasp but anyway the
    solution was not so hard to make - removing the mute part. Indeed, it should
    have been avoided in the first place as it was only a bad temporary-shortcut
    code to real solution (which is pausing cont_ad_read like you said).

    Second problem in my last post is a little bit harder to trace: as you can
    hear from files in my last mail, usual rec (parallel to pocektsphinx) records
    a perfect sound file but raw audio files that are logged from pocketsphinx at
    the same time have stutterings all over. I wonder what is the cause and if
    solving this problem can enhance recognition accuracy.

     
  • Nickolay V. Shmyrev

    Hello

    In order to investigate this can you dump the raw input to the file please?
    Not a parallel recording but actual data that
    appears in the device. You need to insert this chunk of code into the
    application.

          if ((rawfp = fopen(copyfile, "wb")) == NULL)
                E_ERROR("Failed to open raw output file '%s' for writing: %s\n",
                        copyfile, strerror(errno));
            else
                cont_ad_set_rawfp(cont, rawfp);
    
     
  • Bula

    Bula - 2010-09-25

    Hi, I have inserted the code and files are sent by mail.

     
  • Erik Andresen

    Erik Andresen - 2012-03-05

    Hi,

    Yes, I know this bug is a bit older, but I have the same problem running
    pocketsphinx over gstreamer:

    ngram_search_fwdtree.c:825: prune_nonroot_chan: Assertion `(&hmm->hmm)->frame

    = frame_idx' failed.

    Bug is in both version 0.7 and current svn trunk.

     
  • Nickolay V. Shmyrev

    Hello nxdefiant

    If you have the same problem maybe the same solution can help you. You just
    need to configure the frontend to split the audio on utterances properly. Also
    you can increase frame_idx_t type to int32 to make utterance size bigger. See
    hmm.h header for details.

     
  • Erik Andresen

    Erik Andresen - 2012-03-06

    Just to be on the save side: This means configuring vader in gstreamer, right?

     
  • Nickolay V. Shmyrev

    And that too

     
  • Erik Andresen

    Erik Andresen - 2012-03-24

    ok with frame_idx_t as int32 it is way more stable (no single crash yet)
    Thanks.

     

Log in to post a comment.