Hi! I'm working right now with pocketsphinx and its amazing so far.
I'm currently investigating if with this tool I can recognize phonemes in real time (as soon as the user speak get that input and later the phoneme) but the time between the input and the output is very high for what we are trying to do (3D models that move their lips when the user is talking on a microphone).
So my question is, there is a way to improve this time and get the phoneme instantly when the user is speaking?
I've tried to modify pocketsphinx_continuous in order to get the hyp as soon as posible when a counter reach a certain number of loops in the for(;;) but the ouput isn't instanlty.
Thanks in advance,
Kind regards!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Another question crossed my mind. Which speech recognition process is faster, word recognition or phoneme recognition? In all of my test is a lot faster when I get the word than the phoneme.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi! I'm working right now with pocketsphinx and its amazing so far.
I'm currently investigating if with this tool I can recognize phonemes in real time (as soon as the user speak get that input and later the phoneme) but the time between the input and the output is very high for what we are trying to do (3D models that move their lips when the user is talking on a microphone).
So my question is, there is a way to improve this time and get the phoneme instantly when the user is speaking?
I've tried to modify pocketsphinx_continuous in order to get the hyp as soon as posible when a counter reach a certain number of loops in the for(;;) but the ouput isn't instanlty.
Thanks in advance,
Kind regards!
Phonemes have length, so "instantly" is certainly not possible. You should have a delay of 50ms at least.
Thanks for your quick response! I'm gonna investigate more.
Regards!
Another question crossed my mind. Which speech recognition process is faster, word recognition or phoneme recognition? In all of my test is a lot faster when I get the word than the phoneme.
For the fastest recognition you need to try kaldi models from kaldi-android-demo. You will have to compile phonetic graph though.