Menu

Help understanding pocketsphinx args

Help
John
2019-04-19
2019-05-26
  • John

    John - 2019-04-19

    Hi,

    I am having trouble understanding the many arguments you can use with pocketsphinx. They do have small descriptions built into them (https://github.com/cmusphinx/pocketsphinx/blob/master/doc/pocketsphinx_continuous.1), but some of them are not helpful in understanding what they do or how they affect the recognition results.

    Basically, I was hoping to configure it to update as many times as possible even if it starts using a lot more resources than it currently does during recognition (right now it is like 1-3% of my cpu and does a pretty good job). How can I crank it up to warp speed so it is analyzing many more times each second? Like a lot of people seem to be using it for, I am using the phoneme bin files to get the broken down phonemes recongized instead of actual full words.

    The arguments that look like they would help the most are:
    -frate
    -samprate
    -beam
    -pbeam

    The beam ones have a description like this:

    Beam width applied to every frame in Viterbi search (smaller values mean wider beam)
    

    And I am not sure what that means. I read the wikipedia page for Viterbi, and though interesting it doesn't mention beams or go into the technical side. Does a wider beam mean more results quicker but less accurate? Also they accept a value like 1e-30 or something like that, so it's even more difficult to understand what is actually going on.

    Are there more resources regarding these arguments or am I doing it right by simply slowly experimenting with trial and error to find what works best?

     
    • Nickolay V. Shmyrev

      -frate

      Frame rate is the rate of analysis frames

      -samprate

      This is the rate of the samples in source audio

      -beam -pbeam

      Wikipedia is a very bad source for almost everything unfortunately. Beams are a core thing in Viterbi search. You can read about beams here:

      https://web.stanford.edu/class/cs224s/lectures/224s.17.lec4.pdf

      There is also quite detailed description of sphinx algorithms here:

      http://www.cs.cmu.edu/~rkm/th/th.pdf

       
  • John

    John - 2019-04-20

    Ok, I think I am understanding them a little more now. So for the beams, a value like 1e-10 (wider) would be aiming for more accuracy, while something like 1e-70 would be more performant (quick), right?

     

    Last edit: John 2019-04-20
    • Nickolay V. Shmyrev

      The other way around

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.