Hi,
I'm testing Live Speech Recognizer and Stream Speech Recognizer in Sphinx4 and I realize I must loop the getResult() method from recognizer object, linking partial results, in order to obtain full result.
For example in stream speech recognizer, why doesn't Sphinx4 build full result if it recieves whole speech? In which frame does it cut the stream? I think we can possibly lose efficiency in both recognition...
Thank you very much.
Ana
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Sphinx4 automatically splits audio on utterances and strips silence between them. It cuts the stream once it sees about 0.5 seconds of silence. I don't think there is anything bad here, it actually helps to decode audio faster and with higher accuracy because you can properly normalize volume and you can properly model language using language model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm testing Live Speech Recognizer and Stream Speech Recognizer in Sphinx4 and I realize I must loop the getResult() method from recognizer object, linking partial results, in order to obtain full result.
For example in stream speech recognizer, why doesn't Sphinx4 build full result if it recieves whole speech? In which frame does it cut the stream? I think we can possibly lose efficiency in both recognition...
Thank you very much.
Ana
Sphinx4 automatically splits audio on utterances and strips silence between them. It cuts the stream once it sees about 0.5 seconds of silence. I don't think there is anything bad here, it actually helps to decode audio faster and with higher accuracy because you can properly normalize volume and you can properly model language using language model.