I have a problem with the pocketsphinx_continuous using a wave file.
my settings are following
-hmm model/en-us/en-us
-lm model/en-us/lm.fixed.bin (based on cmusphinx-5.0-en-us.lm with small fixes)
-dict model/en-us/cmudict-en-us.dict
-infile my-test.wav
it shows that the pocketsphinx_batch result is more good and I also have tested...
got the following result:
- current agc implementation have a problem
- agc emax is good rather than agc none mode (model/en-us/en-us/feat.params: -agc emax)
- hardwired emax=5.0 is not good enough
- a simple agc_emax() fix works nicely.
@@-145,6+146,31@@agc_emax(agc_t*agc,mfcc_t**mfc,int32n_frame)if(n_frame<=0)return;++if(agc->obs_utt==0&&!agc->obs_frame){+mfcc_tmax=FLOAT2MFCC(-1000.0);+mfcc_tmfcc,new_max;+mfcc_tsum=FLOAT2MFCC(0.0);+agc->obs_max=max;++for(i=1;i<n_frame;++i){+mfcc=mfc[i][0]>=0?mfc[i][0]:-mfc[i][0];+if(mfcc>max)+max=mfcc;+sum+=mfcc;++fprintf(stdout,"XXX mfcc = %.2f\n",mfc[i][0]);+}+new_max=sum/n_frame;+fprintf(stdout,"XXX new_max = %.2f\n",new_max);+fprintf(stdout,"XXX max = %.2f\n",max);+fprintf(stdout,"XXX n_frame = %d\n",n_frame);+if(new_max>agc->max){+agc->max=new_max;+fprintf(stdout,"initial AGCEMax max = %.2f\n",agc->max);+}+}+for(i=0;i<n_frame;++i){if(mfc[i][0]>agc->obs_max){agc->obs_max=mfc[i][0];
expectation: a new study finds that men who drink a lot of coffee are less likely to develop potentially fatal prostate cancer.
agc=none result : one is sure the thorns edna new drug all lot of coffee or less likely to develop potentially fatal prostate cancer
agc=emax result with this patch: one is to pour into the men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
as you can see small improvement obtained.
more over, I can get nbest results as following:
agc=none case:
NBEST 1: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop potentially fatal prostate cancer (-504075)
NBEST 2: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop potentially fatal prostate cancer (-504095)
NBEST 3: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop a potentially fatal prostate cancer (-504131)
NBEST 4: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop a potentially fatal prostate cancer (-504151)
NBEST 5: but mr lee thorns edna new drug all lot of coffee that are less likely to develop potentially fatal prostate cancer (-503961)
agc=emax case with this patch:
NBEST 1: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop potentially fatal prostate cancer (-60630)
NBEST 2: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop potentially fatal prostate cancer (-60662)
NBEST 3: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop a potentially fatal prostate cancer (-60692)
NBEST 4: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop a potentially fatal prostate cancer (-60724)
NBEST 5: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop potentially fatal prostate cancer (-60789)
pocketsphinx_batch result with agc=none case:
a new study florence that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
pocketsphinx_batch result with agc=emax: (almost same result)
a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
and I made a fix for continuous.c to call ps_decode_raw() instead to compare ps_process_raw():
agc=none + ps_decode_raw()
ps_get_hyp(): a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
NBEST 1: when you saw the thorns that men who drink a lot of coffee at or less likely to develop potentially fatal prostate cancer (-820485)
NBEST 2: when you saw the thorns that men who drink a lot of coffee at or less likely to develop potentially fatal prostate cancer (-820559)
NBEST 3: when you saw the thorns that men who drink a lot of coffee at or less likely to develop a potentially fatal prostate cancer (-820564)
NBEST 4: when you saw the thorns that men who drink a lot of coffee at or less likely to develop a potentially fatal prostate cancer (-820638)
NBEST 5: when you saw the thorns that men who drink a lot of coffee at or less likely to develop potentially fatal prostate cancer (-820746)
agc=emax + ps_decode_raw()
ps_get_hyp(): a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
NBEST 1: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500074)
NBEST 2: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500121)
NBEST 3: a new study thorns that men who drink a lot of coffee or less likely to develop a potentially fatal prostate cancer (-500211)
NBEST 4: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500184)
NBEST 5: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500231)
and I found that agc=none mode is not good at all cases and the agc=emax is work fine with small modification. but the result of ps_process_raw() in online mode with agc=emax is very sensitive with the initial agc->max (as mfcc[i][0] -= agc->max;)
I think in the online mode agc_emax() got too small n_frames set and estimate invalid agc->max.
Im a totally newbie with pocketsphinx but someone who interest in it could make agc_emax() better.
this is a simple quick hack to make the initial cmn mean values resonable
for a short term online audio.
--- a/src/libsphinxbase/feat/cmn_prior.c+++ b/src/libsphinxbase/feat/cmn_prior.c@@ -156,6 +156,7 @@ voidcmn_prior(cmn_t *cmn, mfcc_t **incep, int32 varnorm, int32 nfr)
{
int32 i, j;
+ mfcc_t mean = FLOAT2MFCC(0.0); if (nfr <= 0)
return;
@@ -172,12 +173,27 @@ cmn_prior(cmn_t *cmn, mfcc_t **incep, int32 varnorm, int32 nfr) for (j = 0; j < cmn->veclen; j++) {
cmn->sum[j] += incep[i][j];
- incep[i][j] -= cmn->cmn_mean[j]; }
++cmn->nframe;
}
+ if (cmn->nframe > 0 && cmn->nframe < CMN_WIN) {+ mean = cmn->sum[0] / cmn->nframe;+ if (mean > cmn->cmn_mean[0]) {+ E_INFO("n_frame = %d\n", nfr);+ E_INFO("mean = %.2f, cmn_mean[0] = %.2f\n", MFCC2FLOAT(mean), MFCC2FLOAT(cmn->cmn_mean[0]));+ cmn_prior_update(cmn);+ E_INFO("cmn_mean[0] = %.2f\n", MFCC2FLOAT(cmn->cmn_mean[0]));+ }+ }++ for (i = 0; i < nfr; i++) {+ for (j = 0; j < cmn->veclen; j++) {+ incep[i][j] -= cmn->cmn_mean[j];+ }+ }+ /* Shift buffer down if we have more than CMN_WIN_HWM frames */
if (cmn->nframe > CMN_WIN_HWM)
cmn_prior_shiftwin(cmn);
this fix simply estimate during 0 < nframe < CMN_WIN using cmn_prior_update() and it works fine without fix the cmninit params in the model/en-us/feat.params.
Yes, this is one of the possible way. The better way would be to buffer the data until a reliable cmn esitmate is possible. However, that would require quite a bit of rework in frontend.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a problem with the pocketsphinx_continuous using a wave file.
my settings are following
-hmm model/en-us/en-us
-lm model/en-us/lm.fixed.bin (based on cmusphinx-5.0-en-us.lm with small fixes)
-dict model/en-us/cmudict-en-us.dict
-infile my-test.wav
but the result is so poor.
and I have found the following discussion:
https://sourceforge.net/p/cmusphinx/discussion/help/thread/79bab866/
it shows that the pocketsphinx_batch result is more good and I also have tested...
got the following result:
- current agc implementation have a problem
- agc emax is good rather than agc none mode (model/en-us/en-us/feat.params: -agc emax)
- hardwired emax=5.0 is not good enough
- a simple agc_emax() fix works nicely.
from libsphinxbase/feat/feat.c
my simple agc_emax() fix is following
(see also https://sourceforge.net/p/cmusphinx/discussion/help/thread/79bab866/#f92c Boris fix)
expectation: a new study finds that men who drink a lot of coffee are less likely to develop potentially fatal prostate cancer.
agc=none result : one is sure the thorns edna new drug all lot of coffee or less likely to develop potentially fatal prostate cancer
agc=emax result with this patch: one is to pour into the men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
as you can see small improvement obtained.
more over, I can get nbest results as following:
agc=none case:
NBEST 1: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop potentially fatal prostate cancer (-504075)
NBEST 2: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop potentially fatal prostate cancer (-504095)
NBEST 3: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop a potentially fatal prostate cancer (-504131)
NBEST 4: but mr lee thorns edna new drug all lot of coffee cup or less likely to develop a potentially fatal prostate cancer (-504151)
NBEST 5: but mr lee thorns edna new drug all lot of coffee that are less likely to develop potentially fatal prostate cancer (-503961)
agc=emax case with this patch:
NBEST 1: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop potentially fatal prostate cancer (-60630)
NBEST 2: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop potentially fatal prostate cancer (-60662)
NBEST 3: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop a potentially fatal prostate cancer (-60692)
NBEST 4: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop a potentially fatal prostate cancer (-60724)
NBEST 5: a new study thorns that men who drink a lot of coffee a or a girl was likely to develop potentially fatal prostate cancer (-60789)
pocketsphinx_batch result with agc=none case:
a new study florence that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
pocketsphinx_batch result with agc=emax: (almost same result)
a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
and I made a fix for continuous.c to call ps_decode_raw() instead to compare ps_process_raw():
agc=none + ps_decode_raw()
ps_get_hyp(): a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
NBEST 1: when you saw the thorns that men who drink a lot of coffee at or less likely to develop potentially fatal prostate cancer (-820485)
NBEST 2: when you saw the thorns that men who drink a lot of coffee at or less likely to develop potentially fatal prostate cancer (-820559)
NBEST 3: when you saw the thorns that men who drink a lot of coffee at or less likely to develop a potentially fatal prostate cancer (-820564)
NBEST 4: when you saw the thorns that men who drink a lot of coffee at or less likely to develop a potentially fatal prostate cancer (-820638)
NBEST 5: when you saw the thorns that men who drink a lot of coffee at or less likely to develop potentially fatal prostate cancer (-820746)
agc=emax + ps_decode_raw()
ps_get_hyp(): a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer
NBEST 1: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500074)
NBEST 2: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500121)
NBEST 3: a new study thorns that men who drink a lot of coffee or less likely to develop a potentially fatal prostate cancer (-500211)
NBEST 4: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500184)
NBEST 5: a new study thorns that men who drink a lot of coffee or less likely to develop potentially fatal prostate cancer (-500231)
and I found that agc=none mode is not good at all cases and the agc=emax is work fine with small modification. but the result of ps_process_raw() in online mode with agc=emax is very sensitive with the initial
agc->max
(asmfcc[i][0] -= agc->max;
)I think in the online mode agc_emax() got too small n_frames set and estimate invalid agc->max.
Im a totally newbie with pocketsphinx but someone who interest in it could make agc_emax() better.
used test sample and fix
http://wikisend.com/download/324646/a-new-study-finds-that.wav
http://wikisend.com/download/804386/minimun.diff
This thread is not realted at all. French model used AGC those days, en-us model does not use AGC at all so it should have zero effect.
It is cmninit issue, you can seach forum for the links. If you decode longer file it will be accurate.
this is longer file: http://wikisend.com/download/882240/sample.wav
in these case agc=emax make totally wrong results :(
so Ive found the following thread.
https://sourceforge.net/p/cmusphinx/bugs/243/
and I can get expected result with -cmninit 70~80,3,-1 but is seems not good enough..
Is there any support to estmate initial cmninit value ?
I think cmn_mean[0] value is most critical and almost same effect as the agc->obs_max
like as cmn() or cmn_prior() etc.
this is a simple quick hack to make the initial cmn mean values resonable
for a short term online audio.
this fix simply estimate during
0 < nframe < CMN_WIN
usingcmn_prior_update()
and it works fine without fix the cmninit params in themodel/en-us/feat.params
.(longer audio file also works fine)
See also the comment of the srec by nuance.
https://github.com/android/platform_external_srec/blob/master-soong/srec/clib/swicms.c#L32
Last edit: Won Kyu Park 2016-04-22
Yes, this is one of the possible way. The better way would be to buffer the data until a reliable cmn esitmate is possible. However, that would require quite a bit of rework in frontend.