CMU Sphinx / Forums / Help: Help understanding pocketsphinx args

John - 2019-04-19

Hi,

I am having trouble understanding the many arguments you can use with pocketsphinx. They do have small descriptions built into them (https://github.com/cmusphinx/pocketsphinx/blob/master/doc/pocketsphinx_continuous.1), but some of them are not helpful in understanding what they do or how they affect the recognition results.

Basically, I was hoping to configure it to update as many times as possible even if it starts using a lot more resources than it currently does during recognition (right now it is like 1-3% of my cpu and does a pretty good job). How can I crank it up to warp speed so it is analyzing many more times each second? Like a lot of people seem to be using it for, I am using the phoneme bin files to get the broken down phonemes recongized instead of actual full words.

The arguments that look like they would help the most are:
-frate
-samprate
-beam
-pbeam

The beam ones have a description like this:

Beam width applied to every frame in Viterbi search (smaller values mean wider beam)

And I am not sure what that means. I read the wikipedia page for Viterbi, and though interesting it doesn't mention beams or go into the technical side. Does a wider beam mean more results quicker but less accurate? Also they accept a value like 1e-30 or something like that, so it's even more difficult to understand what is actually going on.

Are there more resources regarding these arguments or am I doing it right by simply slowly experimenting with trial and error to find what works best?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-04-19
  
  -frate
  
  Frame rate is the rate of analysis frames
  
  -samprate
  
  This is the rate of the samples in source audio
  
  -beam -pbeam
  
  Wikipedia is a very bad source for almost everything unfortunately. Beams are a core thing in Viterbi search. You can read about beams here:
  
  https://web.stanford.edu/class/cs224s/lectures/224s.17.lec4.pdf
  
  There is also quite detailed description of sphinx algorithms here:
  
  http://www.cs.cmu.edu/~rkm/th/th.pdf
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

John - 2019-04-20

Ok, I think I am understanding them a little more now. So for the beams, a value like 1e-10 (wider) would be aiming for more accuracy, while something like 1e-70 would be more performant (quick), right?

Last edit: John 2019-04-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2019-04-20
  
  The other way around
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Help understanding pocketsphinx args

Speech Recognition Toolkit

Forums

Help

Help understanding pocketsphinx args document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Help understanding pocketsphinx args