I can answer the first two questions because the last one seems to be too philosphical for me. :-)
1, I will think if you could put a segments of at least 0.2s to the Sphinx 3.X recognizer, it will be safest because it is always tested in that interval. Shorter segments work for 3.X but we haven't done too much testing.
2, You should consider how your language model will take effect. Say if you define an utterance to be too short, then the LM score will be restarted and previous history of words will not affect the future decoding. That could potentially make the decoding results poorer because LM score counts a lot.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I would like to have a few tips about speech segmentation.
-At what intervals should the recorder thread (using Audio Port) "send" data to the decoder thread?
-When should an utterance be ended? (keyword by user? Pause with a duration?)
-What are end pointers? (reference)
Thank you for any help,
PA
I can answer the first two questions because the last one seems to be too philosphical for me. :-)
1, I will think if you could put a segments of at least 0.2s to the Sphinx 3.X recognizer, it will be safest because it is always tested in that interval. Shorter segments work for 3.X but we haven't done too much testing.
2, You should consider how your language model will take effect. Say if you define an utterance to be too short, then the LM score will be restarted and previous history of words will not affect the future decoding. That could potentially make the decoding results poorer because LM score counts a lot.
Arthur