I'm trying to understand how to tune pocketsphinx to very quickly recognize commands from an extremely small vocabulary. I've built a custom LMM with the handful of phrases I want to recognize: SIGNAL LEFT, SIGNAL RIGHT, SIGNAL OFF.
I'm running this on a Raspberry Pi, and the goal is to get recognition happening in about 1 second. Sometimes, it can hit that, but other times it takes as much as 4 seconds. Eventually, I'll want to trigger off a keyphrase, since it needs to be "always on", but first I want to tune recognition performance as much as possible.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Right now, I'm mostly using default settings. I've tried turning on and off different kinds of search modes, like disabling fwdflat or bestpath, but didn't get any real improvements. This run is through a 19 second long audio file that contains several phrases, including a few that don't contain the keyphrase, which can be downloaded here: http://jetpackshark.com/systemtests.wav
processor : 0
model name : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS : 697.95
Features : half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xb76
CPU revision : 7
Hardware : BCM2708
Revision : 0015
Serial : 00000000d620ebdb
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm trying to understand how to tune pocketsphinx to very quickly recognize commands from an extremely small vocabulary. I've built a custom LMM with the handful of phrases I want to recognize: SIGNAL LEFT, SIGNAL RIGHT, SIGNAL OFF.
I'm running this on a Raspberry Pi, and the goal is to get recognition happening in about 1 second. Sometimes, it can hit that, but other times it takes as much as 4 seconds. Eventually, I'll want to trigger off a keyphrase, since it needs to be "always on", but first I want to tune recognition performance as much as possible.
Hello Remi
You need to provide exact command line or code you are using and the output log of pocketsphinx to get help on this issue.
According to experience of our users recognition should be pretty fast:
https://www.element14.com/community/roadTestReviews/2166/l/roadtest-review-a-raspberry-pi-3-model-b-review
You can start experimenting with keyphrase spotting mode directly, there is no need to spend time on language models if you are not going to use them.
Right now, I'm mostly using default settings. I've tried turning on and off different kinds of search modes, like disabling fwdflat or bestpath, but didn't get any real improvements. This run is through a 19 second long audio file that contains several phrases, including a few that don't contain the keyphrase, which can be downloaded here: http://jetpackshark.com/systemtests.wav
Ok, and what is your CPU exactly? What is the output of
cat /proc/cpuinfo
?You can try with jsgf grammar instead:
It should be faster than LM.
You can also try with semi model:
https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English/cmusphinx-en-us-semi-5.1.tar.gz/download
It is going to be faster as well.
processor : 0
model name : ARMv6-compatible processor rev 7 (v6l)
BogoMIPS : 697.95
Features : half thumb fastmult vfp edsp java tls
CPU implementer : 0x41
CPU architecture: 7
CPU variant : 0x0
CPU part : 0xb76
CPU revision : 7
Hardware : BCM2708
Revision : 0015
Serial : 00000000d620ebdb
And JSGF makes a lot more sense, and with some tweaking, should be exactly what I need. Thanks!
And this is the current output:
jsgf and lm conflict with each other, you need to use either jsgf or lm.
Hey, that's much better!