I've just installed Sphinx 4 and I've tried the sample apps (e.g. HelloWorld). I'm running sphinx on WinXP/Cygwin, with a USB headset.
I found that performance degrades with time on my machine. In a given session, the first two utterances are recognized with high accuracy but after a while accuracy drops drastically. Digging a bit deeper, I modified the HelloWorld example (as well as the LiveCMN class) to display the mean value of the first cepstral coefficient (C0, i.e. currentMean[0] in LiveCMN.java) before each utterance. With the default values for the CMN parameters, the first two utterances use the default initial value for currentMean[0], (which is 12) then once it actually starts updating the value, they get to low values between 1 and 5 and that's the point where accuracy drops. To verify if that was the problem, I set the shiftWindow parameter (in helloworld.config.xml) to 100000 so that it does not update the mean values for a long time and then accuracy remains good.
Given that, I'm pretty sure the CMN update is screwing up somewhere... Setting shiftWindow=100000 does make recognition work but that does not seem like the right solution in the long term... It is true that the recording level (even with all settings to their max value) is kind of low, but not that much that it should hurt recognition (particularly given that keeping the default value of 12 seems to work).
Any idea what's going on and what I can do to fix it?
Thanks,
antoine
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all,
I've just installed Sphinx 4 and I've tried the sample apps (e.g. HelloWorld). I'm running sphinx on WinXP/Cygwin, with a USB headset.
I found that performance degrades with time on my machine. In a given session, the first two utterances are recognized with high accuracy but after a while accuracy drops drastically. Digging a bit deeper, I modified the HelloWorld example (as well as the LiveCMN class) to display the mean value of the first cepstral coefficient (C0, i.e. currentMean[0] in LiveCMN.java) before each utterance. With the default values for the CMN parameters, the first two utterances use the default initial value for currentMean[0], (which is 12) then once it actually starts updating the value, they get to low values between 1 and 5 and that's the point where accuracy drops. To verify if that was the problem, I set the shiftWindow parameter (in helloworld.config.xml) to 100000 so that it does not update the mean values for a long time and then accuracy remains good.
Given that, I'm pretty sure the CMN update is screwing up somewhere... Setting shiftWindow=100000 does make recognition work but that does not seem like the right solution in the long term... It is true that the recording level (even with all settings to their max value) is kind of low, but not that much that it should hurt recognition (particularly given that keeping the default value of 12 seems to work).
Any idea what's going on and what I can do to fix it?
Thanks,
antoine
Point what version are you talking about. Try with latest svn trunk and report about results.
Thanks for your quick response Nickolay. I just checked out the trunk yesterday and that gave the results I reported.
Thanks. Hm, I also see this.
Hi Antoine
I think it's the issue we discussed here
https://sourceforge.net/forum/message.php?msg_id=6376192
please try the suggested change.
Hey,
I know this thread is pretty old but I just updated to the current trunk of
Sphinx 4 and noticed that the lines
if (input instanceof DataStartSignal)
sum = null;
in LiveCMN::GetData() are still there.
Does that mean that the issue has been solved somewhere else or should I still
keep them commented out in my local copy?
Thanks...
antoine
The problem was solved in endpointer, now DataStartSignal is sent very rarely
on the beginning of the stream so LiveCMN is rarely reset.