I am having trouble understanding the many arguments you can use with pocketsphinx. They do have small descriptions built into them (https://github.com/cmusphinx/pocketsphinx/blob/master/doc/pocketsphinx_continuous.1), but some of them are not helpful in understanding what they do or how they affect the recognition results.
Basically, I was hoping to configure it to update as many times as possible even if it starts using a lot more resources than it currently does during recognition (right now it is like 1-3% of my cpu and does a pretty good job). How can I crank it up to warp speed so it is analyzing many more times each second? Like a lot of people seem to be using it for, I am using the phoneme bin files to get the broken down phonemes recongized instead of actual full words.
The arguments that look like they would help the most are:
-frate
-samprate
-beam
-pbeam
The beam ones have a description like this:
Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
And I am not sure what that means. I read the wikipedia page for Viterbi, and though interesting it doesn't mention beams or go into the technical side. Does a wider beam mean more results quicker but less accurate? Also they accept a value like 1e-30 or something like that, so it's even more difficult to understand what is actually going on.
Are there more resources regarding these arguments or am I doing it right by simply slowly experimenting with trial and error to find what works best?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ok, I think I am understanding them a little more now. So for the beams, a value like 1e-10 (wider) would be aiming for more accuracy, while something like 1e-70 would be more performant (quick), right?
Last edit: John 2019-04-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I am having trouble understanding the many arguments you can use with pocketsphinx. They do have small descriptions built into them (https://github.com/cmusphinx/pocketsphinx/blob/master/doc/pocketsphinx_continuous.1), but some of them are not helpful in understanding what they do or how they affect the recognition results.
Basically, I was hoping to configure it to update as many times as possible even if it starts using a lot more resources than it currently does during recognition (right now it is like 1-3% of my cpu and does a pretty good job). How can I crank it up to warp speed so it is analyzing many more times each second? Like a lot of people seem to be using it for, I am using the phoneme bin files to get the broken down phonemes recongized instead of actual full words.
The arguments that look like they would help the most are:
-frate
-samprate
-beam
-pbeam
The beam ones have a description like this:
And I am not sure what that means. I read the wikipedia page for Viterbi, and though interesting it doesn't mention beams or go into the technical side. Does a wider beam mean more results quicker but less accurate? Also they accept a value like 1e-30 or something like that, so it's even more difficult to understand what is actually going on.
Are there more resources regarding these arguments or am I doing it right by simply slowly experimenting with trial and error to find what works best?
Frame rate is the rate of analysis frames
This is the rate of the samples in source audio
Wikipedia is a very bad source for almost everything unfortunately. Beams are a core thing in Viterbi search. You can read about beams here:
https://web.stanford.edu/class/cs224s/lectures/224s.17.lec4.pdf
There is also quite detailed description of sphinx algorithms here:
http://www.cs.cmu.edu/~rkm/th/th.pdf
Ok, I think I am understanding them a little more now. So for the beams, a value like 1e-10 (wider) would be aiming for more accuracy, while something like 1e-70 would be more performant (quick), right?
Last edit: John 2019-04-20
The other way around