I posted this question previously in another thread, but it will be better to create a separate thread for the question.
So, a matter of interest is different recognized results on the same audio. I used Python implementation for pocketsphinx and sphinxbase to check the fact, discovered the cmn parameter, which stands for adaptation to the device, on which the recognition is performed. After that I researched on cmn values and the modes of their calculation. As far as I understood, in batch mode the cmn values renew after every recognized utterance. With that knowledge I tried to organize my code so that decoder is configured each time before recognizing an utterance, having, in addition, customized -cmninit value. This way an utterance should have been recognized with the set cmn values and before any more data is recognized with changed cmn values the script creates a new decoder with the same -cmninit. But it didn't work: the results still differ.
What could be the cause for this randomness? The -dither parameter was disabled.
Thanks in advance,
Olya Yakovenko
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I posted this question previously in another thread, but it will be better to create a separate thread for the question.
It is better to provide more technical details instead
As far as I understood, in batch mode the cmn values renew after every recognized utterance.
Not really, in batch mode CMN is calcualted for every utterance separately before the utterance, not after it.
I tried to organize my code so that decoder is configured each time before recognizing an utterance, having, in addition, customized -cmninit value.
That should be very slow to reconfigure decoder every time. cminit is only useful in live mode, not in batch mode.
But it didn't work: the results still differ.
Differ from what? From other runs, from other utterances? From the same run with the same utterance with different parameters? You are not quite clear here.
There is also a noise estimation which is updated continuously. If you decode second time, the noise estimation might change so the result. You can disable noise estimation with -remove_noise no.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That should be very slow to reconfigure decoder every time.
It is slow :) But it is not a matter of consern since this is done for educational purposes.
Differ from what? From other runs, from other utterances? From the same run with the same utterance with different parameters? You are not quite clear here.
Other runs on the same utterance.
Not really, in batch mode CMN is calcualted for every utterance separately before the utterance, not after it. <...> cminit is only useful in live mode, not in batch mode. <...> There is also a noise estimation which is updated continuously. If you decode second time, the noise estimation might change so the result. You can disable noise estimation with -remove_noise no.
I see! This is very informative, thank you :)
Strangely, when I tried to configure my decoder with the parameter -remove_noise console returned a segmentation fault.
I'm using the latest versions of pocketsphinx, sphinxbase and the wrapper for python
Hello!
I posted this question previously in another thread, but it will be better to create a separate thread for the question.
So, a matter of interest is different recognized results on the same audio. I used Python implementation for pocketsphinx and sphinxbase to check the fact, discovered the cmn parameter, which stands for adaptation to the device, on which the recognition is performed. After that I researched on cmn values and the modes of their calculation. As far as I understood, in batch mode the cmn values renew after every recognized utterance. With that knowledge I tried to organize my code so that decoder is configured each time before recognizing an utterance, having, in addition, customized -cmninit value. This way an utterance should have been recognized with the set cmn values and before any more data is recognized with changed cmn values the script creates a new decoder with the same -cmninit. But it didn't work: the results still differ.
What could be the cause for this randomness? The -dither parameter was disabled.
Thanks in advance,
Olya Yakovenko
It is better to provide more technical details instead
Not really, in batch mode CMN is calcualted for every utterance separately before the utterance, not after it.
That should be very slow to reconfigure decoder every time. cminit is only useful in live mode, not in batch mode.
Differ from what? From other runs, from other utterances? From the same run with the same utterance with different parameters? You are not quite clear here.
There is also a noise estimation which is updated continuously. If you decode second time, the noise estimation might change so the result. You can disable noise estimation with
-remove_noise no
.It is slow :) But it is not a matter of consern since this is done for educational purposes.
Other runs on the same utterance.
I see! This is very informative, thank you :)
Strangely, when I tried to configure my decoder with the parameter -remove_noise console returned a segmentation fault.
I'm using the latest versions of pocketsphinx, sphinxbase and the wrapper for python
Last edit: Nickolay V. Shmyrev 2017-07-27
Remove_noise is boolean, not string, you should use
You can also reset noise stats with
decoder.start_stream()
actually, there is no need to disable noise cancellation.Last edit: Nickolay V. Shmyrev 2017-07-27
Oh! Yes, I see.
Thank you!