CMU Sphinx / Forums / Help: Help with Pocketsphinx

Troy McComas - 2015-05-08

Good day!

I am writing a wrapper around pocketsphinx with PHP. The intended goal is to process WAV files (podcasts and public speech) and "auto-tag" each processed audio file according to a pre-compiled keyword list. The results would be weighted and tags would then be created based upon the resulted words with the highest score.

Moving on, I am having a little difficult wrapping my mind around the KWS_THRESHOLD argument for pocketsphinx. Can I used decimal values (0.95), or is e notation (1e-10) required? If I use a threshold value of 1e-10, it seems a little accurate (a handful of false positives), but if I raise it to 1e-30, the amount of false positives increase significantly.

I am using the following cli call:

pocketsphinx_continuous -infile some.wav -dict /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict -kws keywords.txt -kws_threshold 0.95

What are the best practices regarding keyword spotting and assigning threshold, and can I depend upon the accuracy (obviously not fully) of pocketsphinx to process the audio and recognize keywords accordingly? Is it speaker independent?

Thanks in advance for you help, and please forgive my ignorance on this subject.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bic-user - 2015-05-08

I am writing a wrapper around pocketsphinx with PHP

cool

Can I used decimal values (0.95), or is e notation (1e-10) required?

You should use e-notation because very big/small values are expected as threshold

If I use a threshold value of 1e-10, it seems a little accurate (a handful of false positives), but if I raise it to 1e-30, the amount of false positives increase significantly

It is expected. 1e-10 > 1e-30. Bigger threshold drops uncertain occurrences of keyword, decreasing amount of false alarms.

What are the best practices regarding keyword spotting and assigning threshold, and can I depend upon the accuracy (obviously not fully) of pocketsphinx to process the audio and recognize keywords accordingly?

Collect some audio data with your keyword and try to select best threshold for it (optimal position on ROC curve). It depends on length of keyword, mismatch between acoustic model and provided audio. For sure accurate keyword spotting requires accurate acoustic modeling.

Is it speaker independent?

Depends on acoustic model you're using. Default AM provided with pocketsphinx is speaker independent.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Troy McComas - 2015-05-08
  
  Thanks for your assistance!
  
  Any resources to help a layman better understand acoustic models? What would be the ideal way to customize an acoustic model?
  
  One more thing...
  
  I am assuming, by your reply, that I am able to assign a threshold to individual keywords? How is this achieved within the keyword text file?
  
  Last edit: Troy McComas 2015-05-08
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Troy McComas - 2015-05-08
  
  Last edit: Troy McComas 2015-05-08
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - bic-user - 2015-05-08
    
    Any resources to help a layman better understand acoustic models? What would be the ideal way to customize an acoustic model?
    
    http://cmusphinx.sourceforge.net/wiki/tutorialadapt
    
    I am assuming, by your reply, that I am able to assign a threshold to individual keywords? How is this achieved within the keyword text file?
    
    http://stackoverflow.com/questions/25748113/recognizing-multiple-keywords-using-pocketsphinx
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Help with Pocketsphinx - KWS_THRESHHOLD

Speech Recognition Toolkit

Forums

Help

Help with Pocketsphinx - KWS_THRESHHOLD

Help with Pocketsphinx - KWS_THRESHHOLD

Speech Recognition Toolkit

Forums

Help

Help with Pocketsphinx - KWS_THRESHHOLD document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Help with Pocketsphinx - KWS_THRESHHOLD