Menu

pocketsphinx - agc problem report

Help
2016-04-22
2016-04-23
  • Won Kyu Park

    Won Kyu Park - 2016-04-22

    I have a problem with the pocketsphinx_continuous using a wave file.

    my settings are following
    -hmm model/en-us/en-us
    -lm model/en-us/lm.fixed.bin (based on cmusphinx-5.0-en-us.lm with small fixes)
    -dict model/en-us/cmudict-en-us.dict
    -infile my-test.wav

    but the result is so poor.

    and I have found the following discussion:
    https://sourceforge.net/p/cmusphinx/discussion/help/thread/79bab866/

    it shows that the pocketsphinx_batch result is more good and I also have tested...
    got the following result:
    - current agc implementation have a problem
    - agc emax is good rather than agc none mode (model/en-us/en-us/feat.params: -agc emax)
    - hardwired emax=5.0 is not good enough
    - a simple agc_emax() fix works nicely.

    from libsphinxbase/feat/feat.c

            /* HACK: hardwired initial estimates based on use of CMN (from Sphinx2) */
            agc_emax_set(fcb->agc_struct, (cmn != CMN_NONE) ? 5.0 : 10.0);
    

    my simple agc_emax() fix is following
    (see also https://sourceforge.net/p/cmusphinx/discussion/help/thread/79bab866/#f92c Boris fix)

    @@ -145,6 +146,31 @@ agc_emax(agc_t *agc, mfcc_t **mfc, int32 n_frame)
    
         if (n_frame <= 0)
             return;
    +
    +    if (agc->obs_utt == 0 && !agc->obs_frame) {
    +        mfcc_t max = FLOAT2MFCC(-1000.0);
    +        mfcc_t mfcc, new_max;
    +        mfcc_t sum = FLOAT2MFCC(0.0);
    +        agc->obs_max = max;
    +
    +        for (i = 1; i < n_frame; ++i) {
    +            mfcc = mfc[i][0] >= 0 ? mfc[i][0] : -mfc[i][0];
    +            if (mfcc > max)
    +                max = mfcc;
    +            sum += mfcc;
    +
    +            fprintf(stdout, "XXX mfcc = %.2f\n", mfc[i][0]);
    +        }
    +        new_max = sum / n_frame;
    +        fprintf(stdout, "XXX new_max = %.2f\n", new_max);
    +        fprintf(stdout, "XXX max = %.2f\n", max);
    +        fprintf(stdout, "XXX n_frame = %d\n", n_frame);
    +        if (new_max > agc->max) {
    +            agc->max = new_max;
    +            fprintf(stdout, "initial AGCEMax max = %.2f\n", agc->max);
    +        }
    +    }
    +
         for (i = 0; i < n_frame; ++i) {
             if (mfc[i][0] > agc->obs_max) {
                 agc->obs_max = mfc[i][0];
    

    expectation: a new study finds that men who drink a lot of coffee are less likely to develop potentially fatal prostate cancer.

    agc=none result : one is sure the thorns edna new drug all lot of coffee or less likely to develop potentially fatal prostate cancer

    agc=emax result with this patch: one is to pour into the men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer

    as you can see small improvement obtained.

    more over, I can get nbest results as following:

    agc=none case:
    NBEST 1: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop potentially fatal prostate cancer (-504075)
    NBEST 2: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop potentially fatal prostate cancer (-504095)
    NBEST 3: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop a potentially fatal prostate cancer (-504131)
    NBEST 4: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop a potentially fatal prostate cancer (-504151)
    NBEST 5: but mr lee thorns edna new drug all lot of coffee that are less likely to develop potentially fatal prostate cancer (-503961)

    agc=emax case with this patch:
    NBEST 1: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop potentially fatal prostate cancer (-60630)
    NBEST 2: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop potentially fatal prostate cancer (-60662)
    NBEST 3: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop a potentially fatal prostate cancer (-60692)
    NBEST 4: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop a potentially fatal prostate cancer (-60724)
    NBEST 5: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop potentially fatal prostate cancer (-60789)


    pocketsphinx_batch result with agc=none case:
    a new study florence that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer

    pocketsphinx_batch result with agc=emax: (almost same result)
    a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer

    and I made a fix for continuous.c to call ps_decode_raw() instead to compare ps_process_raw():

    agc=none + ps_decode_raw()
    ps_get_hyp(): a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
    NBEST 1: when you saw the thorns that men who drink a lot of coffee at or less likely to develop potentially fatal prostate cancer (-820485)
    NBEST 2: when you saw the thorns that men who drink a lot of coffee at or less likely to develop potentially fatal prostate cancer (-820559)
    NBEST 3: when you saw the thorns that men who drink a lot of coffee at or less likely to develop a potentially fatal prostate cancer (-820564)
    NBEST 4: when you saw the thorns that men who drink a lot of coffee at or less likely to develop a potentially fatal prostate cancer (-820638)
    NBEST 5: when you saw the thorns that men who drink a lot of coffee at or less likely to develop potentially fatal prostate cancer (-820746)

    agc=emax + ps_decode_raw()
    ps_get_hyp(): a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
    NBEST 1: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500074)
    NBEST 2: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500121)
    NBEST 3: a new study thorns that men who drink a lot of coffee or less likely to develop a potentially fatal prostate cancer (-500211)
    NBEST 4: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500184)
    NBEST 5: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500231)


    and I found that agc=none mode is not good at all cases and the agc=emax is work fine with small modification. but the result of ps_process_raw() in online mode with agc=emax is very sensitive with the initial agc->max (as mfcc[i][0] -= agc->max;)

    I think in the online mode agc_emax() got too small n_frames set and estimate invalid agc->max.

    Im a totally newbie with pocketsphinx but someone who interest in it could make agc_emax() better.

    used test sample and fix
    http://wikisend.com/download/324646/a-new-study-finds-that.wav
    http://wikisend.com/download/804386/minimun.diff

     
    • Nickolay V. Shmyrev

      and I have found the following discussion:
      https://sourceforge.net/p/cmusphinx/discussion/help/thread/79bab866/

      This thread is not realted at all. French model used AGC those days, en-us model does not use AGC at all so it should have zero effect.

      but the result is so poor.

      It is cmninit issue, you can seach forum for the links. If you decode longer file it will be accurate.

       
      • Won Kyu Park

        Won Kyu Park - 2016-04-22

        this is longer file: http://wikisend.com/download/882240/sample.wav

        in these case agc=emax make totally wrong results :(

        so Ive found the following thread.
        https://sourceforge.net/p/cmusphinx/bugs/243/

        and I can get expected result with -cmninit 70~80,3,-1 but is seems not good enough..

        Is there any support to estmate initial cmninit value ?

        I think cmn_mean[0] value is most critical and almost same effect as the agc->obs_max
        like as cmn() or cmn_prior() etc.

         
  • Won Kyu Park

    Won Kyu Park - 2016-04-22

    this is a simple quick hack to make the initial cmn mean values resonable
    for a short term online audio.

    --- a/src/libsphinxbase/feat/cmn_prior.c
    +++ b/src/libsphinxbase/feat/cmn_prior.c
    @@ -156,6 +156,7 @@ void
     cmn_prior(cmn_t *cmn, mfcc_t **incep, int32 varnorm, int32 nfr)
     {
         int32 i, j;
    +    mfcc_t mean = FLOAT2MFCC(0.0);
    
         if (nfr <= 0)
             return;
    @@ -172,12 +173,27 @@ cmn_prior(cmn_t *cmn, mfcc_t **incep, int32 varnorm, int32 nfr)
    
             for (j = 0; j < cmn->veclen; j++) {
                 cmn->sum[j] += incep[i][j];
    -            incep[i][j] -= cmn->cmn_mean[j];
             }
    
             ++cmn->nframe;
         }
    
    +    if (cmn->nframe > 0 && cmn->nframe < CMN_WIN) {
    +        mean = cmn->sum[0] / cmn->nframe;
    +        if (mean > cmn->cmn_mean[0]) {
    +            E_INFO("n_frame = %d\n", nfr);
    +            E_INFO("mean = %.2f, cmn_mean[0] = %.2f\n", MFCC2FLOAT(mean), MFCC2FLOAT(cmn->cmn_mean[0]));
    +            cmn_prior_update(cmn);
    +            E_INFO("cmn_mean[0] = %.2f\n", MFCC2FLOAT(cmn->cmn_mean[0]));
    +        }
    +    }
    +
    +    for (i = 0; i < nfr; i++) {
    +        for (j = 0; j < cmn->veclen; j++) {
    +            incep[i][j] -= cmn->cmn_mean[j];
    +        }
    +    }
    +
         /* Shift buffer down if we have more than CMN_WIN_HWM frames */
         if (cmn->nframe > CMN_WIN_HWM)
             cmn_prior_shiftwin(cmn);
    

    this fix simply estimate during 0 < nframe < CMN_WIN using cmn_prior_update() and it works fine without fix the cmninit params in the model/en-us/feat.params.

    (longer audio file also works fine)

    See also the comment of the srec by nuance.
    https://github.com/android/platform_external_srec/blob/master-soong/srec/clib/swicms.c#L32

       In-utterance CMN calculation:
       A new short-term average mechanism was introduced, with faster update,
       to improve recognition on the very first recognition after init or reset.
       We wait for a minimum number of new data frames to apply this. We also
       disable the fast updater after some frames, because we assume the
       cross-utterance estimator to be more reliable, particularly in its
       ability to exclude silence frames from the calculation.
    
     

    Last edit: Won Kyu Park 2016-04-22
    • Nickolay V. Shmyrev

      Yes, this is one of the possible way. The better way would be to buffer the data until a reliable cmn esitmate is possible. However, that would require quite a bit of rework in frontend.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.