I am trying to get live phoneme output as they are spoken - not listening for a sentence or keywords, just output the moment anything is recognized. I was able to download and compile sphinxbase and pocketsphinx, and if I run pocketsphinx_continuous via the shell with -allphone_ci yes and -allphone en-us-phone.lm.bin it seems to work pretty well. The issue is, it doesn't update the output live, instead it appears to wait for a small period of silence and then displays all of what was said before the silence. I was essetnially a "stream of consciousness" of the recognized phonemes, constantly outputting anything it recognizes or constantly outputting SIL if it recognizes nothing. Is there any way to configure it to do this?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I couldn't find any methods called ps_hyp, but I assumed you meant ps_get_hyp? I added some test lines to continuous.c and it is now giving me live output, although it is additive as I speak (not just current phoneme, it prints every phoneme during that speaking session with the latest one appended at the end, but it is live). I could split the string and just grab the last one if needed, so I think this will work. Thank you for pointing me in that direction! Here is an .
Last edit: Ben 2019-04-14
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the quick response, I will give that a shot. I'm not a C++ or Python guy, though, but if I made it this far I might be able to figure it out :P
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You can try to add the -vad_postspeech parameter with a value of 5 or 10, it will wait for 5 or 10sec for a silence after a speech instead of 50 by default.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am trying to get live phoneme output as they are spoken - not listening for a sentence or keywords, just output the moment anything is recognized. I was able to download and compile sphinxbase and pocketsphinx, and if I run
pocketsphinx_continuous
via the shell with-allphone_ci yes
and-allphone en-us-phone.lm.bin
it seems to work pretty well. The issue is, it doesn't update the output live, instead it appears to wait for a small period of silence and then displays all of what was said before the silence. I was essetnially a "stream of consciousness" of the recognized phonemes, constantly outputting anything it recognizes or constantly outputting SIL if it recognizes nothing. Is there any way to configure it to do this?Modify pocketsphinx_continuous to call ps_hyp and print result every time the chunk of audio is processed. You can modify python example script too.
I couldn't find any methods called
.
ps_hyp
, but I assumed you meantps_get_hyp
? I added some test lines to continuous.c and it is now giving me live output, although it is additive as I speak (not just current phoneme, it prints every phoneme during that speaking session with the latest one appended at the end, but it is live). I could split the string and just grab the last one if needed, so I think this will work. Thank you for pointing me in that direction! Here is anLast edit: Ben 2019-04-14
Thanks for the quick response, I will give that a shot. I'm not a C++ or Python guy, though, but if I made it this far I might be able to figure it out :P
Hello,
You can try to add the -vad_postspeech parameter with a value of 5 or 10, it will wait for 5 or 10sec for a silence after a speech instead of 50 by default.