Menu

[sphinx_live3.7] hypsegs & FSG-warning

Help
2007-11-26
2012-09-22
  • Masrur Doostdar

    Masrur Doostdar - 2007-11-26

    Hi,

    I have two issues:

    1.A problem (propably bug) in version 3.7:
    For getting during live recognition an array of partially-hypothesised wordsegments i'm using the s3_decode-api function[1]. This worked in sphinx3.6, but in sphinx3.7 the last argument _hyp_segs is empty (**_hyp_segs==NULL).

    So has anyone else noticed this problem, or is it just mine?

    2.
    Working with FSG's and calling the function[1], warnings are printed:

    rarely this one:
    ERROR: "fsg_search.c", line 1062: Final state not reached; backtracing from best scoring entry

    often this one (with varying scores of course):
    INFO: fsg_search.c(1054): Best score (-4290480) > best final state score (-4952473); but using latter

    So anyone seen this too? Is this something I should care about, does it influence my recognition accurancy, can I fix it? Could there perhaps be even a dependence between problem 1 and problem 2?

    thanks for any helps...
    regards

    [1] in s3_decode.h:
    int s3_decode_hypothesis(s3_decode_t _decode, char _uttid,
    char
    _hyp_str, hyp_t
    **_hyp_segs)

     
    • Nickolay V. Shmyrev

      Dear Masrur, I'm afraid without the data and scripts to reproduce the problem it's hard to say much really. Is it reproducable with wsj and some English recording?

       
    • Masrur Doostdar

      Masrur Doostdar - 2007-12-01

      Year, i'm working with the wsj model[1] and try to recognize english speach (its live_decode, thus i just try to speak english ;-) )

      Regarding the first problem i mentioned, I have this problem also for decoding with language models - and for this i have a direct comparision of my configuration with sphinx3.6. And there it works with the same configuration! So i dont think its a problem of my configuration.

      It would be a valuable information to me, if just some tests in sphinx3.7 for the last argument of the s3_decode-function and denies or affirms my problem.
      If its denied, i will try to post a reproducable configuration.

      thanks alot

      [1] build by Keith Veranen, 8000 senones - 16 distributions

       
    • Masrur Doostdar

      Masrur Doostdar - 2007-12-17

      I figured out where my problem of not getting a proper words-segments-array in the hypthesis came from. I got an error the first time I used sphinx3.7 with FSG, and I just made some quick hack [1] in s3decode.c overgoing this error - up to now I didnt considered this as possible cause of my problem, I didnt noticed and remember that the editing i made to overgo this problem was in the function responsible for providing the hypotheses.

      ok, now discarding this editing solves my problem with the words-segments-array, but the error is there again, and I get this error also in an parralel new installation of sphinx37:

      It cames up only when working with sphinx37 and FSG (thus in mode 2).
      I get this assertion error:

      ERROR: "fsg_search.c", line 1063: Final state not reached; backtracing from best scoring entry
      sphinx3_livepretend: dict.c:575: dict_filler_word: Assertion `(w >= 0) && (w < d->n_word)' failed.

      backtrace shows the function dict_filler_word is called by s3_decode_record_hyp:
      line 545: if (!dict_filler_word(dict, hyp->id) && hyp->id != finish_wid) {

      the problem is: hyp->id is -1
      In my editing of the code (see [1]) I just tried to catch the nodes with hyp->id==-1, but it seems to be a more deep going problem.

      I also attached one confige-file of me [2], for the case you think the problem may be caused by my configuration.

      regards
      Masrur Doostdar


      [1] my prior editing is s3_decode.c:
      @@ -549,13 +540,17 @@
      finish_wid = dict_finishwid(dict);
      for (node = hyp_list; node != NULL; node = gnode_next(node)) {
      hyp = (srch_hyp_t *) gnode_ptr(node);
      + if(hyp->id!=-1)
      + {
      hyp_seglen++;
      +
      if (!dict_filler_word(dict, hyp->id) && hyp->id != finish_wid) {
      hyp_strlen +=
      strlen(dict_wordstr(dict, dict_basewid(dict, hyp->id))) +
      1;
      }
      }
      + }

       if (hyp_strlen == 0) {
           hyp_strlen = 1;
      

      @@ -574,8 +569,11 @@
      / iterate thru to fill in the array of segments and/or decoded string /
      i = 0;
      hyp_strptr = hyp_str;
      - for (node = hyp_list; node != NULL; node = gnode_next(node), i++) {
      + for (node = hyp_list; node != NULL; node = gnode_next(node)) {
      hyp = (srch_hyp_t
      ) gnode_ptr(node);
      + if (hyp->id!=-1)
      + {
      + i++;
      hyp_segs[i] = hyp;

           hyp-&gt;word = dict_wordstr(dict, dict_basewid(dict, hyp-&gt;id));
      

      @@ -587,6 +585,7 @@
      hyp_strptr += 1;
      }
      }
      + }
      glist_free(hyp_list);

       hyp_str[hyp_strlen - 1] = '\0';
      

      [2] my config-file:
      -mdef model_architecture/wsj_all_cont_3no_8000.mdef
      -mean model_parameters/wsj_all_cont_3no_8000_16.cd/means
      -var model_parameters/wsj_all_cont_3no_8000_16.cd/variances
      -mixw model_parameters/wsj_all_cont_3no_8000_16.cd/mixture_weights
      -tmat model_parameters/wsj_all_cont_3no_8000_16.cd/transition_matrices
      -lw 15
      -feat s3_1x39
      -beam 1e-120
      -wbeam 1e-100
      -pbeam 1e-120
      -dict navigate-go7.dic
      -fdict navigate-go7.filler
      -fsg nav-withoutstop.fsg
      -wip 0.2
      -agc max
      -varnorm no
      -cmn current
      -hyp _live_navfsg.match
      -hypseg result_live_navfsg.match
      -op_mode 2

       
    • Nickolay V. Shmyrev

      What is listed in your filler dictionary? Does it have proper newline in the end? Please paste your fsg as well.

       
    • Masrur Doostdar

      Masrur Doostdar - 2007-12-18

      Ok,i have uploaded the fillder-dic. [1], the dictionary [2] and the fsg [3]. Note that these are not the only files i got an error with.

      regards

      [1] http://www-users.rwth-aachen.de/Masrur.Doostdar/navigate-go7.filler
      [2] http://www-users.rwth-aachen.de/Masrur.Doostdar/navigate-go7.dic
      [3] http://www-users.rwth-aachen.de/Masrur.Doostdar/nav-withoutstop.fsg

       
    • Nickolay V. Shmyrev

      It is a bug in sphinx3. It cant handle empty transitions. I'm looking for a proper fix.

       
    • Masrur Doostdar

      Masrur Doostdar - 2007-12-19

      ok, for now i have a workaround [1] for avoiding the error and still being able to get the word-segments-array. Its a just a minor modification of my first editing of the s3_decode.c file (the incrementing i++ should be at the end of the added if-body).

      But still this I get the warnigs i mentionen in my initial post:
      >Working with FSG's and calling the function[1], warnings are printed:

      >rarely this one:
      >ERROR: "fsg_search.c", line 1062: Final state not reached; backtracing from best scoring entry

      >often this one (with varying scores of course):
      >INFO: fsg_search.c(1054): Best score (-4290480) > best final state score (-4952473); but using latter

      again the question, do i have to worry about them?

      regars


      [1] workaround-diff of the latest s3_decode.c of the svn-repository:

      --- s3_decode.c 2007-12-19 14:47:57.000000000 +0100
      +++ s3_decode-edited.c 2007-12-19 14:46:39.000000000 +0100
      @@ -549,6 +549,7 @@
      finish_wid = dict_finishwid(dict);
      for (node = hyp_list; node != NULL; node = gnode_next(node)) {
      hyp = (srch_hyp_t *) gnode_ptr(node);
      + if(hyp->id==-1){
      hyp_seglen++;
      if (!dict_filler_word(dict, hyp->id) && hyp->id != finish_wid) {
      hyp_strlen +=
      @@ -556,6 +557,8 @@
      1;
      }
      }
      + }
      +

       if (hyp_strlen == 0) {
           hyp_strlen = 1;
      

      @@ -574,8 +577,9 @@
      / iterate thru to fill in the array of segments and/or decoded string /
      i = 0;
      hyp_strptr = hyp_str;
      - for (node = hyp_list; node != NULL; node = gnode_next(node), i++) {
      + for (node = hyp_list; node != NULL; node = gnode_next(node)) {
      hyp = (srch_hyp_t
      ) gnode_ptr(node);
      + if(hyp->id==-1){
      hyp_segs[i] = hyp;

           hyp-&gt;word = dict_wordstr(dict, dict_basewid(dict, hyp-&gt;id));
      

      @@ -586,6 +590,8 @@
      *hyp_strptr = ' ';
      hyp_strptr += 1;
      }
      + i++;
      + }
      }
      glist_free(hyp_list);

       
      • Nickolay V. Shmyrev

        Proper fix to my opinion is the following patch.
        David: Ok to commit?


        diff -upr sphinx3.orig/include/fsg_search.h sphinx3/include/fsg_search.h
        --- sphinx3.orig/include/fsg_search.h 2007-11-28 10:19:46.000000000 +0300
        +++ sphinx3/include/fsg_search.h 2007-12-19 16:39:50.000000000 +0300
        @@ -164,7 +164,6 @@ typedef struct fsg_search_s {
        int32 bpidx_start; / First history entry index this frame /

         srch_hyp_t *hyp;       /* Search hypothesis */
        
        • srch_hyp_t filt_hyp; / Filtered hypothesis /
          int32 ascr, lscr; /
          Total acoustic and lm score for utt */

          int32 n_hmm_eval; / Total HMMs evaluated this utt /
          diff -upr sphinx3.orig/src/libs3decoder/libsearch/fsg_search.c sphinx3/src/libs3decoder/libsearch/fsg_search.c
          --- sphinx3.orig/src/libs3decoder/libsearch/fsg_search.c 2007-11-28 10:19:46.000000000 +0300
          +++ sphinx3/src/libs3decoder/libsearch/fsg_search.c 2007-12-19 16:39:47.000000000 +0300
          @@ -904,69 +904,6 @@ fsg_search_hyp_dump(fsg_search_t * searc
          search->senscale);
          }

        -
        -#if 0
        -/ Fill in hyp_str in search.c; filtering out fillers and null trans /
        -static void
        -fsg_search_hyp_filter(fsg_search_t * search)
        -{
        - srch_hyp_t hyp, filt_hyp, head;
        - int32 i;
        - int32 startwid, finishwid;
        - int32 altpron;
        - dict_t
        dict;
        -
        -
        - dict = search->dict;
        - filt_hyp = search->filt_hyp;
        - startwid = dict_basewid(dict, dict_startwid(dict));
        - finishwid = dict_basewid(dict, dict_finishwid(dict));
        - dict = search->dict;
        - altpron = search->isUsealtpron;
        -
        - i = 0;
        - head = 0;
        - for (hyp = search->hyp; hyp; hyp = hyp->next) {
        - if ((hyp->id < 0) ||
        - (hyp->id == startwid) || (hyp->id >= finishwid))
        - continue;
        -
        - / Copy this hyp entry to filtered result /
        - filt_hyp = (srch_hyp_t ) ckd_calloc(1, sizeof(srch_hyp_t));
        -
        - filt_hyp->word = hyp->word;
        - filt_hyp->id = hyp->id;
        - filt_hyp->type = hyp->type;
        - filt_hyp->sf = hyp->sf;
        - filt_hyp->ascr = hyp->ascr;
        - filt_hyp->lscr = hyp->lscr;
        - filt_hyp->pscr = hyp->pscr;
        - filt_hyp->cscr = hyp->cscr;
        - filt_hyp->fsg_state = hyp->fsg_state;
        - filt_hyp->next = head;
        - head = filt_hyp;
        - /

        - filt_hyp[i] = hyp;
        -
        /
        -
        - / Replace specific word pronunciation ID with base ID /
        - if (!altpron) {
        - filt_hyp->id = dict_basewid(dict, filt_hyp->id);
        - }
        -
        - i++;
        - if ((i + 1) >= HYP_SZ)
        - E_FATAL
        - ("Hyp array overflow; increase HYP_SZ in fsg_search.h\n");
        - }
        -
        - filt_hyp->id = -1; / Sentinel /
        - search->filt_hyp = filt_hyp;
        -}
        -
        -#endif
        -
        -
        void
        fsg_search_history_backtrace(fsg_search_t * search,
        boolean check_fsg_final_state)
        diff -upr sphinx3.orig/src/libs3decoder/libsearch/srch_fsg.c sphinx3/src/libs3decoder/libsearch/srch_fsg.c
        --- sphinx3.orig/src/libs3decoder/libsearch/srch_fsg.c 2007-11-28 10:19:46.000000000 +0300
        +++ sphinx3/src/libs3decoder/libsearch/srch_fsg.c 2007-12-19 16:39:47.000000000 +0300
        @@ -292,11 +292,16 @@ srch_FSG_gen_hyp(void srch /
        s = (srch_t ) srch;
        fsgsrch = (fsg_search_t
        ) s->grh->graph_struct;

        • fsg_search_history_backtrace(fsgsrch, TRUE);
        • fsg_search_history_backtrace(fsgsrch, FALSE);

          ghyp = NULL;
          for (h = fsgsrch->hyp; h; h = h->next) {
          srch_hyp_t tmph;
          +
          + /
          Skip NULL states /
          + if (h->id < 0)
          + continue;
          +
          /
          We have to copy the nodes here since fsgsrch retains
          * ownership of the hyp... /
          tmph = ckd_calloc(1, sizeof(
          tmph));

         
        • David Huggins-Daines

          Yes, I think so. The NULL transitions might be useful for something but the decode API shouldn't be seeing them since they are irrelevant to the hypothesis.

           
          • Masrur Doostdar

            Masrur Doostdar - 2008-02-20

            I rivive this thread, because I met two problems, the second not directly related to this thread:

            1. I observed that with the patch Nicolay provided (1) the decoder (in FSG mode) in some situations tends more to hypothesize SILENCE. I observed this mainly for words with lower language-model probabilties (due to a wide branching in the fsg), so I assume the decoder hypothesizes SILENCE if acoustic and language-model scores are low. I evaluated comparing with the last patch (2) (more a workaround) i provided:
              livepretend, 724 utterances,rev7433 with patch(1) or (2), configuration is the same as postet before,
              (1) SER: 25.4% WER 9.5%
              (2) SER: 16.9% WER 5.4%

            So there is still a flaw in this patch...

            1. Since revision 7438 live_pretend produces on perhaps 80% of the given utterances no hypothesis. I checked, and this problem does not occur the last with revision 7433. Also when using directly the api in my own programms this problem does not occur.

            regards
            M.D.

             
    • Nickolay V. Shmyrev

      applied.

       
    • Nickolay V. Shmyrev

      Hm, about the first problem. What if we change FALSE back to TRUE in my patch?

       
      • Masrur Doostdar

        Masrur Doostdar - 2008-02-21

        Yes, this solved the first problem!

        regards
        M.D.

         
        • Nickolay V. Shmyrev

          Hm, I tried the fsg regression test from sphinx3, everything seems to work fine with the latest version. Can you please submit the test - a single recording that recognized incorrectly, fsg, dictionary, arguments? You can upload it somewhere and give a link for example.

           
          • Masrur Doostdar

            Masrur Doostdar - 2008-02-24

            An Utterance not hypothesized:
            http://www-users.rwth-aachen.de/Masrur.Doostdar/8.raw

            filler,dic,fsg and config[1] you can take the same as posted before:
            http://www-users.rwth-aachen.de/Masrur.Doostdar/navigate-go7.filler
            http://www-users.rwth-aachen.de/Masrur.Doostdar/navigate-go7.dic
            http://www-users.rwth-aachen.de/Masrur.Doostdar/nav-withoutstop.fsg

            note, that the problem is not restricted on fsg-mode, it occurs also using a language-model! But so far i observed it only when testing with sphinx_livedecode -What is that fsg-regression-test you mentioned? I dont know it.

            thanks and regards
            Masrur D.

            [1] cfg:
            -mdef model_architecture/wsj_all_cont_3no_8000.mdef
            -mean model_parameters/wsj_all_cont_3no_8000_16.cd/means
            -var model_parameters/wsj_all_cont_3no_8000_16.cd/variances
            -mixw model_parameters/wsj_all_cont_3no_8000_16.cd/mixture_weights
            -tmat model_parameters/wsj_all_cont_3no_8000_16.cd/transition_matrices
            -lw 15
            -feat s3_1x39
            -beam 1e-120
            -wbeam 1e-100
            -pbeam 1e-120
            -dict navigate-go7.dic
            -fdict navigate-go7.filler
            -fsg nav-withoutstop.fsg
            -wip 0.2
            -agc max
            -varnorm no
            -cmn current
            -hyp _live_navfsg.match
            -hypseg result_live_navfsg.match
            -op_mode 2

             
            • Nickolay V. Shmyrev

              Hm, everything works fine for me with latest trunk

              FWDVIT: ROBOT DRIVE TO THE COUCH TABLE (8)

              Can it be so that you updated sphinx3 but not sphinxbase?

               
              • Masrur Doostdar

                Masrur Doostdar - 2008-02-25

                hmm, stange
                > Can it be so that you updated sphinx3 but not sphinxbase?

                I had not updated sphinxbase on my first post of this thread, but I realized that yesterday. I have now the latest trunk for both sphinxbase and sphinx3 - but it got worse. Before i observed this problem just in the sphinx_livedecode program, now its the same in my own program using the api.

                regards

                M.D.

                 
                • Nickolay V. Shmyrev

                  Can you please compare your output with mine:

                  http://pastebin.ca/917497

                   
                  • Masrur Doostdar

                    Masrur Doostdar - 2008-02-26

                    First of all, thanks for your affort of testing.

                    Its still very strange, seems the hypotheses are even dependent on the ctl file. To produce comparable output I used a ctl-file with just one utterance (for 8.raw)- and it works! Before, i had of course much more utterances in it.
                    I'll provide you with a ctl-file of 6 utterances and the corresponding raw files. I checked,and the problem appears there (3 utterances not hypothesized at all, 2 false hypothesized). There is another strange thing I observed: after I changed the folder of the raws the hyptheses changed.(And remember that these problem do not appear with older revisions) Nevertheless,I doubt but hope that my problem is reproducable for you.

                    ctl-file:
                    http://www-users.rwth-aachen.de/Masrur.Doostdar/test.ctl
                    raws:
                    http://www-users.rwth-aachen.de/Masrur.Doostdar/test_raws.tar
                    output of live-decode:
                    http://www-users.rwth-aachen.de/Masrur.Doostdar/test_output

                    regards
                    M.D.

                     
                    • Nickolay V. Shmyrev

                      Well, I can easily reproduce this. The problem actually is that your are using agc, decoding is much better without it

                      In theory, you shouldn't use agc at all, it depends on model if you need agcmax. So the problem is why it didn't affect decoding before or something like that. I'll try to look on this.

                      Also, don't use so small beams, defaults are ok:

                      -hmm hmm
                      -lw 15
                      -feat s3_1x39
                      -dict navigate-go7.dic
                      -fdict navigate-go7.filler
                      -fsg nav-withoutstop.fsg
                      -op_mode 2

                       
                      • Masrur Doostdar

                        Masrur Doostdar - 2008-02-27

                        thanks Nickolay!
                        without agc the problem is gone, and the result are even a little bit better in comparison to the last revision without this problem (with agc). I thought agc is like cmn, and never harms to use. Apropos, regarding cmn I have one question, pehaps you just know this spontaneously (if not i will just have look in the code). If cmn is used for seperate utterance decoding (like in livepretend), is the cmn-mean-value caluculated for one utterance kept discarded for the next utterance, or is it kept and updated? And if its discarded, couldn't it be better to keep it for the assumption of all utterances having roughly the same background noise?

                        About your proposal to use the default small beams, I have evaluated it again on our task, and it makes a big difference:
                        default-beam: WER 13.9% SER 33%
                        higher-beams(120,120,100): WER 4.2% SER 14%

                        regards

                        M.D

                         
                        • Nickolay V. Shmyrev

                          > thought agc is like cmn, and never harms to use.

                          Actually it never was reliable to my experience, probably it was broken for a long time.

                          > If cmn is used for seperate utterance decoding (like in livepretend), is the cmn-mean-value caluculated for one utterance kept discarded for the next utterance, or is it kept and updated?

                          If you are doing batch decoding or don't care about quick response time, it's actually better to use sphinx3_decode that will work with current CMN and calculate mean over the current one. Both livedecode and livepretend use prior cmn which is much less stable sometimes to my opinion.

                          About update, statistics is collected over all utterances with exponential decay, so recent cepstrum is more important than older one. CMN params are updated on the end of
                          the utterance or after you have enough frames. Statistics over previous utterances is taken into account of course. On the utterance end the lines should appear in the log:

                          INFO: cmn_prior.c(121): cmn_prior_update: from < 12.00 0.00 0.00 0.00 0.00
                          INFO: cmn_prior.c(139): cmn_prior_update: to < -1.07 -1.06 -0.06 -0.04 0.00

                          but cmn is updated without such signal too. For more details look into sphinxbase in the file cmn_prior.c, it's rather simple code.

                          > About your proposal to use the default small beams, I have evaluated it again on our task, and it makes a big difference:
                          default-beam: WER 13.9% SER 33%
                          higher-beams(120,120,100): WER 4.2% SER 14%

                          Ok, I was wrong here.

                           
        • Nickolay V. Shmyrev

          Ok, I fixed the first one, the second needs investigation.

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.