I have used Pocketsphinx multiple times now to perform speech recognition tasks on Android devices. Most recently, I tried to detect a single word with a grammar, and afterwards obtain time stamps for each of the segments. However, it turned out that the time stamps do not match the actual times from the file. It always tells me that the word was detected right in the beginning of the file, after 7 frames of SILence.
This is my jsgf:
The last line shows the duration of the recorded file in seconds. One can clearly see that there is a mismatch between frame numbers and total time.
Audio is RIFF-WAVE, 16 kHz, 16 bit/mono
Can you please explain why this happens and how I can get correct frame numbers?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello all,
I have used Pocketsphinx multiple times now to perform speech recognition tasks on Android devices. Most recently, I tried to detect a single word with a grammar, and afterwards obtain time stamps for each of the segments. However, it turned out that the time stamps do not match the actual times from the file. It always tells me that the word was detected right in the beginning of the file, after 7 frames of SILence.
This is my jsgf:
This is the output for any file where I say the German word "Kompliment":
The last line shows the duration of the recorded file in seconds. One can clearly see that there is a mismatch between frame numbers and total time.
Audio is RIFF-WAVE, 16 kHz, 16 bit/mono
Can you please explain why this happens and how I can get correct frame numbers?