1.A problem (propably bug) in version 3.7:
For getting during live recognition an array of partially-hypothesised wordsegments i'm using the s3_decode-api function[1]. This worked in sphinx3.6, but in sphinx3.7 the last argument _hyp_segs is empty (**_hyp_segs==NULL).
So has anyone else noticed this problem, or is it just mine?
2.
Working with FSG's and calling the function[1], warnings are printed:
rarely this one:
ERROR: "fsg_search.c", line 1062: Final state not reached; backtracing from best scoring entry
often this one (with varying scores of course):
INFO: fsg_search.c(1054): Best score (-4290480) > best final state score (-4952473); but using latter
So anyone seen this too? Is this something I should care about, does it influence my recognition accurancy, can I fix it? Could there perhaps be even a dependence between problem 1 and problem 2?
thanks for any helps...
regards
[1] in s3_decode.h:
int s3_decode_hypothesis(s3_decode_t _decode, char _uttid,
char _hyp_str, hyp_t **_hyp_segs)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear Masrur, I'm afraid without the data and scripts to reproduce the problem it's hard to say much really. Is it reproducable with wsj and some English recording?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Year, i'm working with the wsj model[1] and try to recognize english speach (its live_decode, thus i just try to speak english ;-) )
Regarding the first problem i mentioned, I have this problem also for decoding with language models - and for this i have a direct comparision of my configuration with sphinx3.6. And there it works with the same configuration! So i dont think its a problem of my configuration.
It would be a valuable information to me, if just some tests in sphinx3.7 for the last argument of the s3_decode-function and denies or affirms my problem.
If its denied, i will try to post a reproducable configuration.
thanks alot
[1] build by Keith Veranen, 8000 senones - 16 distributions
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I figured out where my problem of not getting a proper words-segments-array in the hypthesis came from. I got an error the first time I used sphinx3.7 with FSG, and I just made some quick hack [1] in s3decode.c overgoing this error - up to now I didnt considered this as possible cause of my problem, I didnt noticed and remember that the editing i made to overgo this problem was in the function responsible for providing the hypotheses.
ok, now discarding this editing solves my problem with the words-segments-array, but the error is there again, and I get this error also in an parralel new installation of sphinx37:
It cames up only when working with sphinx37 and FSG (thus in mode 2).
I get this assertion error:
ERROR: "fsg_search.c", line 1063: Final state not reached; backtracing from best scoring entry
sphinx3_livepretend: dict.c:575: dict_filler_word: Assertion `(w >= 0) && (w < d->n_word)' failed.
backtrace shows the function dict_filler_word is called by s3_decode_record_hyp:
line 545: if (!dict_filler_word(dict, hyp->id) && hyp->id != finish_wid) {
the problem is: hyp->id is -1
In my editing of the code (see [1]) I just tried to catch the nodes with hyp->id==-1, but it seems to be a more deep going problem.
I also attached one confige-file of me [2], for the case you think the problem may be caused by my configuration.
ok, for now i have a workaround [1] for avoiding the error and still being able to get the word-segments-array. Its a just a minor modification of my first editing of the s3_decode.c file (the incrementing i++ should be at the end of the added if-body).
But still this I get the warnigs i mentionen in my initial post:
>Working with FSG's and calling the function[1], warnings are printed:
>rarely this one:
>ERROR: "fsg_search.c", line 1062: Final state not reached; backtracing from best scoring entry
>often this one (with varying scores of course):
>INFO: fsg_search.c(1054): Best score (-4290480) > best final state score (-4952473); but using latter
again the question, do i have to worry about them?
regars
[1] workaround-diff of the latest s3_decode.c of the svn-repository:
ghyp = NULL;
for (h = fsgsrch->hyp; h; h = h->next) {
srch_hyp_t tmph;
+
+ / Skip NULL states /
+ if (h->id < 0)
+ continue;
+
/ We have to copy the nodes here since fsgsrch retains
* ownership of the hyp... /
tmph = ckd_calloc(1, sizeof(tmph));
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Yes, I think so. The NULL transitions might be useful for something but the decode API shouldn't be seeing them since they are irrelevant to the hypothesis.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I rivive this thread, because I met two problems, the second not directly related to this thread:
I observed that with the patch Nicolay provided (1) the decoder (in FSG mode) in some situations tends more to hypothesize SILENCE. I observed this mainly for words with lower language-model probabilties (due to a wide branching in the fsg), so I assume the decoder hypothesizes SILENCE if acoustic and language-model scores are low. I evaluated comparing with the last patch (2) (more a workaround) i provided:
livepretend, 724 utterances,rev7433 with patch(1) or (2), configuration is the same as postet before,
(1) SER: 25.4% WER 9.5%
(2) SER: 16.9% WER 5.4%
So there is still a flaw in this patch...
Since revision 7438 live_pretend produces on perhaps 80% of the given utterances no hypothesis. I checked, and this problem does not occur the last with revision 7433. Also when using directly the api in my own programms this problem does not occur.
regards
M.D.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hm, I tried the fsg regression test from sphinx3, everything seems to work fine with the latest version. Can you please submit the test - a single recording that recognized incorrectly, fsg, dictionary, arguments? You can upload it somewhere and give a link for example.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
note, that the problem is not restricted on fsg-mode, it occurs also using a language-model! But so far i observed it only when testing with sphinx_livedecode -What is that fsg-regression-test you mentioned? I dont know it.
hmm, stange
> Can it be so that you updated sphinx3 but not sphinxbase?
I had not updated sphinxbase on my first post of this thread, but I realized that yesterday. I have now the latest trunk for both sphinxbase and sphinx3 - but it got worse. Before i observed this problem just in the sphinx_livedecode program, now its the same in my own program using the api.
regards
M.D.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Its still very strange, seems the hypotheses are even dependent on the ctl file. To produce comparable output I used a ctl-file with just one utterance (for 8.raw)- and it works! Before, i had of course much more utterances in it.
I'll provide you with a ctl-file of 6 utterances and the corresponding raw files. I checked,and the problem appears there (3 utterances not hypothesized at all, 2 false hypothesized). There is another strange thing I observed: after I changed the folder of the raws the hyptheses changed.(And remember that these problem do not appear with older revisions) Nevertheless,I doubt but hope that my problem is reproducable for you.
Well, I can easily reproduce this. The problem actually is that your are using agc, decoding is much better without it
In theory, you shouldn't use agc at all, it depends on model if you need agcmax. So the problem is why it didn't affect decoding before or something like that. I'll try to look on this.
thanks Nickolay!
without agc the problem is gone, and the result are even a little bit better in comparison to the last revision without this problem (with agc). I thought agc is like cmn, and never harms to use. Apropos, regarding cmn I have one question, pehaps you just know this spontaneously (if not i will just have look in the code). If cmn is used for seperate utterance decoding (like in livepretend), is the cmn-mean-value caluculated for one utterance kept discarded for the next utterance, or is it kept and updated? And if its discarded, couldn't it be better to keep it for the assumption of all utterances having roughly the same background noise?
About your proposal to use the default small beams, I have evaluated it again on our task, and it makes a big difference:
default-beam: WER 13.9% SER 33%
higher-beams(120,120,100): WER 4.2% SER 14%
regards
M.D
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> thought agc is like cmn, and never harms to use.
Actually it never was reliable to my experience, probably it was broken for a long time.
> If cmn is used for seperate utterance decoding (like in livepretend), is the cmn-mean-value caluculated for one utterance kept discarded for the next utterance, or is it kept and updated?
If you are doing batch decoding or don't care about quick response time, it's actually better to use sphinx3_decode that will work with current CMN and calculate mean over the current one. Both livedecode and livepretend use prior cmn which is much less stable sometimes to my opinion.
About update, statistics is collected over all utterances with exponential decay, so recent cepstrum is more important than older one. CMN params are updated on the end of
the utterance or after you have enough frames. Statistics over previous utterances is taken into account of course. On the utterance end the lines should appear in the log:
INFO: cmn_prior.c(121): cmn_prior_update: from < 12.00 0.00 0.00 0.00 0.00
INFO: cmn_prior.c(139): cmn_prior_update: to < -1.07 -1.06 -0.06 -0.04 0.00
but cmn is updated without such signal too. For more details look into sphinxbase in the file cmn_prior.c, it's rather simple code.
> About your proposal to use the default small beams, I have evaluated it again on our task, and it makes a big difference:
default-beam: WER 13.9% SER 33%
higher-beams(120,120,100): WER 4.2% SER 14%
Ok, I was wrong here.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I have two issues:
1.A problem (propably bug) in version 3.7:
For getting during live recognition an array of partially-hypothesised wordsegments i'm using the s3_decode-api function[1]. This worked in sphinx3.6, but in sphinx3.7 the last argument _hyp_segs is empty (**_hyp_segs==NULL).
So has anyone else noticed this problem, or is it just mine?
2.
Working with FSG's and calling the function[1], warnings are printed:
rarely this one:
ERROR: "fsg_search.c", line 1062: Final state not reached; backtracing from best scoring entry
often this one (with varying scores of course):
INFO: fsg_search.c(1054): Best score (-4290480) > best final state score (-4952473); but using latter
So anyone seen this too? Is this something I should care about, does it influence my recognition accurancy, can I fix it? Could there perhaps be even a dependence between problem 1 and problem 2?
thanks for any helps...
regards
[1] in s3_decode.h:
int s3_decode_hypothesis(s3_decode_t _decode, char _uttid,
char _hyp_str, hyp_t **_hyp_segs)
Dear Masrur, I'm afraid without the data and scripts to reproduce the problem it's hard to say much really. Is it reproducable with wsj and some English recording?
Year, i'm working with the wsj model[1] and try to recognize english speach (its live_decode, thus i just try to speak english ;-) )
Regarding the first problem i mentioned, I have this problem also for decoding with language models - and for this i have a direct comparision of my configuration with sphinx3.6. And there it works with the same configuration! So i dont think its a problem of my configuration.
It would be a valuable information to me, if just some tests in sphinx3.7 for the last argument of the s3_decode-function and denies or affirms my problem.
If its denied, i will try to post a reproducable configuration.
thanks alot
[1] build by Keith Veranen, 8000 senones - 16 distributions
I figured out where my problem of not getting a proper words-segments-array in the hypthesis came from. I got an error the first time I used sphinx3.7 with FSG, and I just made some quick hack [1] in s3decode.c overgoing this error - up to now I didnt considered this as possible cause of my problem, I didnt noticed and remember that the editing i made to overgo this problem was in the function responsible for providing the hypotheses.
ok, now discarding this editing solves my problem with the words-segments-array, but the error is there again, and I get this error also in an parralel new installation of sphinx37:
It cames up only when working with sphinx37 and FSG (thus in mode 2).
I get this assertion error:
ERROR: "fsg_search.c", line 1063: Final state not reached; backtracing from best scoring entry
sphinx3_livepretend: dict.c:575: dict_filler_word: Assertion `(w >= 0) && (w < d->n_word)' failed.
backtrace shows the function dict_filler_word is called by s3_decode_record_hyp:
line 545: if (!dict_filler_word(dict, hyp->id) && hyp->id != finish_wid) {
the problem is: hyp->id is -1
In my editing of the code (see [1]) I just tried to catch the nodes with hyp->id==-1, but it seems to be a more deep going problem.
I also attached one confige-file of me [2], for the case you think the problem may be caused by my configuration.
regards
Masrur Doostdar
[1] my prior editing is s3_decode.c:
@@ -549,13 +540,17 @@
finish_wid = dict_finishwid(dict);
for (node = hyp_list; node != NULL; node = gnode_next(node)) {
hyp = (srch_hyp_t *) gnode_ptr(node);
+ if(hyp->id!=-1)
+ {
hyp_seglen++;
+
if (!dict_filler_word(dict, hyp->id) && hyp->id != finish_wid) {
hyp_strlen +=
strlen(dict_wordstr(dict, dict_basewid(dict, hyp->id))) +
1;
}
}
+ }
@@ -574,8 +569,11 @@
/ iterate thru to fill in the array of segments and/or decoded string /
i = 0;
hyp_strptr = hyp_str;
- for (node = hyp_list; node != NULL; node = gnode_next(node), i++) {
+ for (node = hyp_list; node != NULL; node = gnode_next(node)) {
hyp = (srch_hyp_t ) gnode_ptr(node);
+ if (hyp->id!=-1)
+ {
+ i++;
hyp_segs[i] = hyp;
@@ -587,6 +585,7 @@
hyp_strptr += 1;
}
}
+ }
glist_free(hyp_list);
[2] my config-file:
-mdef model_architecture/wsj_all_cont_3no_8000.mdef
-mean model_parameters/wsj_all_cont_3no_8000_16.cd/means
-var model_parameters/wsj_all_cont_3no_8000_16.cd/variances
-mixw model_parameters/wsj_all_cont_3no_8000_16.cd/mixture_weights
-tmat model_parameters/wsj_all_cont_3no_8000_16.cd/transition_matrices
-lw 15
-feat s3_1x39
-beam 1e-120
-wbeam 1e-100
-pbeam 1e-120
-dict navigate-go7.dic
-fdict navigate-go7.filler
-fsg nav-withoutstop.fsg
-wip 0.2
-agc max
-varnorm no
-cmn current
-hyp _live_navfsg.match
-hypseg result_live_navfsg.match
-op_mode 2
What is listed in your filler dictionary? Does it have proper newline in the end? Please paste your fsg as well.
Ok,i have uploaded the fillder-dic. [1], the dictionary [2] and the fsg [3]. Note that these are not the only files i got an error with.
regards
[1] http://www-users.rwth-aachen.de/Masrur.Doostdar/navigate-go7.filler
[2] http://www-users.rwth-aachen.de/Masrur.Doostdar/navigate-go7.dic
[3] http://www-users.rwth-aachen.de/Masrur.Doostdar/nav-withoutstop.fsg
It is a bug in sphinx3. It cant handle empty transitions. I'm looking for a proper fix.
ok, for now i have a workaround [1] for avoiding the error and still being able to get the word-segments-array. Its a just a minor modification of my first editing of the s3_decode.c file (the incrementing i++ should be at the end of the added if-body).
But still this I get the warnigs i mentionen in my initial post:
>Working with FSG's and calling the function[1], warnings are printed:
>rarely this one:
>ERROR: "fsg_search.c", line 1062: Final state not reached; backtracing from best scoring entry
>often this one (with varying scores of course):
>INFO: fsg_search.c(1054): Best score (-4290480) > best final state score (-4952473); but using latter
again the question, do i have to worry about them?
regars
[1] workaround-diff of the latest s3_decode.c of the svn-repository:
--- s3_decode.c 2007-12-19 14:47:57.000000000 +0100
+++ s3_decode-edited.c 2007-12-19 14:46:39.000000000 +0100
@@ -549,6 +549,7 @@
finish_wid = dict_finishwid(dict);
for (node = hyp_list; node != NULL; node = gnode_next(node)) {
hyp = (srch_hyp_t *) gnode_ptr(node);
+ if(hyp->id==-1){
hyp_seglen++;
if (!dict_filler_word(dict, hyp->id) && hyp->id != finish_wid) {
hyp_strlen +=
@@ -556,6 +557,8 @@
1;
}
}
+ }
+
@@ -574,8 +577,9 @@
/ iterate thru to fill in the array of segments and/or decoded string /
i = 0;
hyp_strptr = hyp_str;
- for (node = hyp_list; node != NULL; node = gnode_next(node), i++) {
+ for (node = hyp_list; node != NULL; node = gnode_next(node)) {
hyp = (srch_hyp_t ) gnode_ptr(node);
+ if(hyp->id==-1){
hyp_segs[i] = hyp;
@@ -586,6 +590,8 @@
*hyp_strptr = ' ';
hyp_strptr += 1;
}
+ i++;
+ }
}
glist_free(hyp_list);
Proper fix to my opinion is the following patch.
David: Ok to commit?
diff -upr sphinx3.orig/include/fsg_search.h sphinx3/include/fsg_search.h
--- sphinx3.orig/include/fsg_search.h 2007-11-28 10:19:46.000000000 +0300
+++ sphinx3/include/fsg_search.h 2007-12-19 16:39:50.000000000 +0300
@@ -164,7 +164,6 @@ typedef struct fsg_search_s {
int32 bpidx_start; / First history entry index this frame /
srch_hyp_t filt_hyp; / Filtered hypothesis /
int32 ascr, lscr; / Total acoustic and lm score for utt */
int32 n_hmm_eval; / Total HMMs evaluated this utt /
diff -upr sphinx3.orig/src/libs3decoder/libsearch/fsg_search.c sphinx3/src/libs3decoder/libsearch/fsg_search.c
--- sphinx3.orig/src/libs3decoder/libsearch/fsg_search.c 2007-11-28 10:19:46.000000000 +0300
+++ sphinx3/src/libs3decoder/libsearch/fsg_search.c 2007-12-19 16:39:47.000000000 +0300
@@ -904,69 +904,6 @@ fsg_search_hyp_dump(fsg_search_t * searc
search->senscale);
}
-
-#if 0
-/ Fill in hyp_str in search.c; filtering out fillers and null trans /
-static void
-fsg_search_hyp_filter(fsg_search_t * search)
-{
- srch_hyp_t hyp, filt_hyp, head;
- int32 i;
- int32 startwid, finishwid;
- int32 altpron;
- dict_t dict;
-
-
- dict = search->dict;
- filt_hyp = search->filt_hyp;
- startwid = dict_basewid(dict, dict_startwid(dict));
- finishwid = dict_basewid(dict, dict_finishwid(dict));
- dict = search->dict;
- altpron = search->isUsealtpron;
-
- i = 0;
- head = 0;
- for (hyp = search->hyp; hyp; hyp = hyp->next) {
- if ((hyp->id < 0) ||
- (hyp->id == startwid) || (hyp->id >= finishwid))
- continue;
-
- / Copy this hyp entry to filtered result /
- filt_hyp = (srch_hyp_t ) ckd_calloc(1, sizeof(srch_hyp_t));
-
- filt_hyp->word = hyp->word;
- filt_hyp->id = hyp->id;
- filt_hyp->type = hyp->type;
- filt_hyp->sf = hyp->sf;
- filt_hyp->ascr = hyp->ascr;
- filt_hyp->lscr = hyp->lscr;
- filt_hyp->pscr = hyp->pscr;
- filt_hyp->cscr = hyp->cscr;
- filt_hyp->fsg_state = hyp->fsg_state;
- filt_hyp->next = head;
- head = filt_hyp;
- /
- filt_hyp[i] = hyp;
- /
-
- / Replace specific word pronunciation ID with base ID /
- if (!altpron) {
- filt_hyp->id = dict_basewid(dict, filt_hyp->id);
- }
-
- i++;
- if ((i + 1) >= HYP_SZ)
- E_FATAL
- ("Hyp array overflow; increase HYP_SZ in fsg_search.h\n");
- }
-
- filt_hyp->id = -1; / Sentinel /
- search->filt_hyp = filt_hyp;
-}
-
-#endif
-
-
void
fsg_search_history_backtrace(fsg_search_t * search,
boolean check_fsg_final_state)
diff -upr sphinx3.orig/src/libs3decoder/libsearch/srch_fsg.c sphinx3/src/libs3decoder/libsearch/srch_fsg.c
--- sphinx3.orig/src/libs3decoder/libsearch/srch_fsg.c 2007-11-28 10:19:46.000000000 +0300
+++ sphinx3/src/libs3decoder/libsearch/srch_fsg.c 2007-12-19 16:39:47.000000000 +0300
@@ -292,11 +292,16 @@ srch_FSG_gen_hyp(void srch /
s = (srch_t ) srch;
fsgsrch = (fsg_search_t ) s->grh->graph_struct;
fsg_search_history_backtrace(fsgsrch, FALSE);
ghyp = NULL;
for (h = fsgsrch->hyp; h; h = h->next) {
srch_hyp_t tmph;
+
+ / Skip NULL states /
+ if (h->id < 0)
+ continue;
+
/ We have to copy the nodes here since fsgsrch retains
* ownership of the hyp... /
tmph = ckd_calloc(1, sizeof(tmph));
Yes, I think so. The NULL transitions might be useful for something but the decode API shouldn't be seeing them since they are irrelevant to the hypothesis.
I rivive this thread, because I met two problems, the second not directly related to this thread:
livepretend, 724 utterances,rev7433 with patch(1) or (2), configuration is the same as postet before,
(1) SER: 25.4% WER 9.5%
(2) SER: 16.9% WER 5.4%
So there is still a flaw in this patch...
regards
M.D.
applied.
Hm, about the first problem. What if we change FALSE back to TRUE in my patch?
Yes, this solved the first problem!
regards
M.D.
Hm, I tried the fsg regression test from sphinx3, everything seems to work fine with the latest version. Can you please submit the test - a single recording that recognized incorrectly, fsg, dictionary, arguments? You can upload it somewhere and give a link for example.
An Utterance not hypothesized:
http://www-users.rwth-aachen.de/Masrur.Doostdar/8.raw
filler,dic,fsg and config[1] you can take the same as posted before:
http://www-users.rwth-aachen.de/Masrur.Doostdar/navigate-go7.filler
http://www-users.rwth-aachen.de/Masrur.Doostdar/navigate-go7.dic
http://www-users.rwth-aachen.de/Masrur.Doostdar/nav-withoutstop.fsg
note, that the problem is not restricted on fsg-mode, it occurs also using a language-model! But so far i observed it only when testing with sphinx_livedecode -What is that fsg-regression-test you mentioned? I dont know it.
thanks and regards
Masrur D.
[1] cfg:
-mdef model_architecture/wsj_all_cont_3no_8000.mdef
-mean model_parameters/wsj_all_cont_3no_8000_16.cd/means
-var model_parameters/wsj_all_cont_3no_8000_16.cd/variances
-mixw model_parameters/wsj_all_cont_3no_8000_16.cd/mixture_weights
-tmat model_parameters/wsj_all_cont_3no_8000_16.cd/transition_matrices
-lw 15
-feat s3_1x39
-beam 1e-120
-wbeam 1e-100
-pbeam 1e-120
-dict navigate-go7.dic
-fdict navigate-go7.filler
-fsg nav-withoutstop.fsg
-wip 0.2
-agc max
-varnorm no
-cmn current
-hyp _live_navfsg.match
-hypseg result_live_navfsg.match
-op_mode 2
Hm, everything works fine for me with latest trunk
FWDVIT: ROBOT DRIVE TO THE COUCH TABLE (8)
Can it be so that you updated sphinx3 but not sphinxbase?
hmm, stange
> Can it be so that you updated sphinx3 but not sphinxbase?
I had not updated sphinxbase on my first post of this thread, but I realized that yesterday. I have now the latest trunk for both sphinxbase and sphinx3 - but it got worse. Before i observed this problem just in the sphinx_livedecode program, now its the same in my own program using the api.
regards
M.D.
Can you please compare your output with mine:
http://pastebin.ca/917497
First of all, thanks for your affort of testing.
Its still very strange, seems the hypotheses are even dependent on the ctl file. To produce comparable output I used a ctl-file with just one utterance (for 8.raw)- and it works! Before, i had of course much more utterances in it.
I'll provide you with a ctl-file of 6 utterances and the corresponding raw files. I checked,and the problem appears there (3 utterances not hypothesized at all, 2 false hypothesized). There is another strange thing I observed: after I changed the folder of the raws the hyptheses changed.(And remember that these problem do not appear with older revisions) Nevertheless,I doubt but hope that my problem is reproducable for you.
ctl-file:
http://www-users.rwth-aachen.de/Masrur.Doostdar/test.ctl
raws:
http://www-users.rwth-aachen.de/Masrur.Doostdar/test_raws.tar
output of live-decode:
http://www-users.rwth-aachen.de/Masrur.Doostdar/test_output
regards
M.D.
Well, I can easily reproduce this. The problem actually is that your are using agc, decoding is much better without it
In theory, you shouldn't use agc at all, it depends on model if you need agcmax. So the problem is why it didn't affect decoding before or something like that. I'll try to look on this.
Also, don't use so small beams, defaults are ok:
-hmm hmm
-lw 15
-feat s3_1x39
-dict navigate-go7.dic
-fdict navigate-go7.filler
-fsg nav-withoutstop.fsg
-op_mode 2
thanks Nickolay!
without agc the problem is gone, and the result are even a little bit better in comparison to the last revision without this problem (with agc). I thought agc is like cmn, and never harms to use. Apropos, regarding cmn I have one question, pehaps you just know this spontaneously (if not i will just have look in the code). If cmn is used for seperate utterance decoding (like in livepretend), is the cmn-mean-value caluculated for one utterance kept discarded for the next utterance, or is it kept and updated? And if its discarded, couldn't it be better to keep it for the assumption of all utterances having roughly the same background noise?
About your proposal to use the default small beams, I have evaluated it again on our task, and it makes a big difference:
default-beam: WER 13.9% SER 33%
higher-beams(120,120,100): WER 4.2% SER 14%
regards
M.D
> thought agc is like cmn, and never harms to use.
Actually it never was reliable to my experience, probably it was broken for a long time.
> If cmn is used for seperate utterance decoding (like in livepretend), is the cmn-mean-value caluculated for one utterance kept discarded for the next utterance, or is it kept and updated?
If you are doing batch decoding or don't care about quick response time, it's actually better to use sphinx3_decode that will work with current CMN and calculate mean over the current one. Both livedecode and livepretend use prior cmn which is much less stable sometimes to my opinion.
About update, statistics is collected over all utterances with exponential decay, so recent cepstrum is more important than older one. CMN params are updated on the end of
the utterance or after you have enough frames. Statistics over previous utterances is taken into account of course. On the utterance end the lines should appear in the log:
INFO: cmn_prior.c(121): cmn_prior_update: from < 12.00 0.00 0.00 0.00 0.00
INFO: cmn_prior.c(139): cmn_prior_update: to < -1.07 -1.06 -0.06 -0.04 0.00
but cmn is updated without such signal too. For more details look into sphinxbase in the file cmn_prior.c, it's rather simple code.
> About your proposal to use the default small beams, I have evaluated it again on our task, and it makes a big difference:
default-beam: WER 13.9% SER 33%
higher-beams(120,120,100): WER 4.2% SER 14%
Ok, I was wrong here.
Ok, I fixed the first one, the second needs investigation.