In sphinxbase, if features are set to be computed with an AGC mode different from NONE, and if recognition is done in block mode, AGC mode will be automatically (and silently) changed to EMAX. It means that c0 will be normalized with the maximum value from the previous sentence, rather than with the current one (max value of current utterance is not known yet at recognition time).
I was wondering if this is the correct strategy. For example, I am using the French acoustic model trained and published by Universite du Maine, which was trained using -agc max, in an app performing recognition in block mode, and I could only recognize correctly the first utterance. if I turn agc off, then recognition is fine.
Thanks!
Sylvain
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm not sure I understand why you're having this problem. It should be the case that with AGC off, recognition using these models won't work at all. The difference between -agc max and -agc emax should vanish over the course of multiple utterances.
Unless, of course, there is a bug in the -agc emax calculation. Can you run a batch mode test and compare the AGC norm values you obtain with the ones you get in block mode?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In batch mode, -agc max and -agc emax give better results than -agc none. Then, I realized than in our app (built on sphinx 3.7) and using sphinx_livepretend release 3.7 - sphinxbase 0.3, c0max, output in emax update is negative (around -4). In any other case (batch mode, or sphinx_livepretend from the trunk), c0max is positive (around 4). Is it a bug in sphinxbase 0.3?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
In sphinxbase, if features are set to be computed with an AGC mode different from NONE, and if recognition is done in block mode, AGC mode will be automatically (and silently) changed to EMAX. It means that c0 will be normalized with the maximum value from the previous sentence, rather than with the current one (max value of current utterance is not known yet at recognition time).
I was wondering if this is the correct strategy. For example, I am using the French acoustic model trained and published by Universite du Maine, which was trained using -agc max, in an app performing recognition in block mode, and I could only recognize correctly the first utterance. if I turn agc off, then recognition is fine.
Thanks!
Sylvain
I'm not sure I understand why you're having this problem. It should be the case that with AGC off, recognition using these models won't work at all. The difference between -agc max and -agc emax should vanish over the course of multiple utterances.
Unless, of course, there is a bug in the -agc emax calculation. Can you run a batch mode test and compare the AGC norm values you obtain with the ones you get in block mode?
In batch mode, -agc max and -agc emax give better results than -agc none. Then, I realized than in our app (built on sphinx 3.7) and using sphinx_livepretend release 3.7 - sphinxbase 0.3, c0max, output in emax update is negative (around -4). In any other case (batch mode, or sphinx_livepretend from the trunk), c0max is positive (around 4). Is it a bug in sphinxbase 0.3?