I am currently evaluating the performance of Pocketsphinx using Matlab and the Python wrapper for Pocketsphinx (https://github.com/bambocher/pocketsphinx-python). I'm currently stuck in a weird situation, because I first normalize to the maximum absolute value and then applying different scale values. In Matlab Code :
"result" is the audio signal.
Then I send it to Pocketsphinx and afterwards it gets evaluated by the nist scoring toolkit.
So here are the results :
Correct (x-Axis -> normalizing factor, y-Axis -> percentages):
Total Error (x-Axis -> normalizing factor, y-Axis -> percentages):
Do you have any suggestions how to overcome this problem and get a flat curve (disregarding quantization issues)?
Or at least finding the best value?
Thanks in advance!
~Frank
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Recognizer adapts to any audio level but does that only after the first utterance so the test you created is not very meaningful I suppose, you just do not leave the recognizer the time to adapt.
True decoding accuracy does not depend on volume level.
You can read this forum about cepstral mean normalization.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thank you very much! By putting all test sound files together into one large sound file, I'm getting the same results except (like you already said) for the first utterance :-)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi!
I am currently evaluating the performance of Pocketsphinx using Matlab and the Python wrapper for Pocketsphinx (https://github.com/bambocher/pocketsphinx-python). I'm currently stuck in a weird situation, because I first normalize to the maximum absolute value and then applying different scale values. In Matlab Code :
"result" is the audio signal.


Then I send it to Pocketsphinx and afterwards it gets evaluated by the nist scoring toolkit.
So here are the results :
Correct (x-Axis -> normalizing factor, y-Axis -> percentages):
Total Error (x-Axis -> normalizing factor, y-Axis -> percentages):
Do you have any suggestions how to overcome this problem and get a flat curve (disregarding quantization issues)?
Or at least finding the best value?
Thanks in advance!
~Frank
Recognizer adapts to any audio level but does that only after the first utterance so the test you created is not very meaningful I suppose, you just do not leave the recognizer the time to adapt.
True decoding accuracy does not depend on volume level.
You can read this forum about cepstral mean normalization.
Thank you very much! By putting all test sound files together into one large sound file, I'm getting the same results except (like you already said) for the first utterance :-)