Menu

Help with Pocketsphinx - KWS_THRESHHOLD

Help
2015-05-08
2015-05-08
  • Troy McComas

    Troy McComas - 2015-05-08

    Good day!

    I am writing a wrapper around pocketsphinx with PHP. The intended goal is to process WAV files (podcasts and public speech) and "auto-tag" each processed audio file according to a pre-compiled keyword list. The results would be weighted and tags would then be created based upon the resulted words with the highest score.

    Moving on, I am having a little difficult wrapping my mind around the KWS_THRESHOLD argument for pocketsphinx. Can I used decimal values (0.95), or is e notation (1e-10) required? If I use a threshold value of 1e-10, it seems a little accurate (a handful of false positives), but if I raise it to 1e-30, the amount of false positives increase significantly.

    I am using the following cli call:

    pocketsphinx_continuous -infile some.wav -dict /usr/local/share/pocketsphinx/model/en-us/cmudict-en-us.dict -kws keywords.txt -kws_threshold 0.95

    What are the best practices regarding keyword spotting and assigning threshold, and can I depend upon the accuracy (obviously not fully) of pocketsphinx to process the audio and recognize keywords accordingly? Is it speaker independent?

    Thanks in advance for you help, and please forgive my ignorance on this subject.

     
  • bic-user

    bic-user - 2015-05-08

    I am writing a wrapper around pocketsphinx with PHP

    cool

    Can I used decimal values (0.95), or is e notation (1e-10) required?

    You should use e-notation because very big/small values are expected as threshold

    If I use a threshold value of 1e-10, it seems a little accurate (a handful of false positives), but if I raise it to 1e-30, the amount of false positives increase significantly

    It is expected. 1e-10 > 1e-30. Bigger threshold drops uncertain occurrences of keyword, decreasing amount of false alarms.

    What are the best practices regarding keyword spotting and assigning threshold, and can I depend upon the accuracy (obviously not fully) of pocketsphinx to process the audio and recognize keywords accordingly?

    Collect some audio data with your keyword and try to select best threshold for it (optimal position on ROC curve). It depends on length of keyword, mismatch between acoustic model and provided audio. For sure accurate keyword spotting requires accurate acoustic modeling.

    Is it speaker independent?

    Depends on acoustic model you're using. Default AM provided with pocketsphinx is speaker independent.

     
    • Troy McComas

      Troy McComas - 2015-05-08

      Thanks for your assistance!

      Any resources to help a layman better understand acoustic models? What would be the ideal way to customize an acoustic model?

      One more thing...

      I am assuming, by your reply, that I am able to assign a threshold to individual keywords? How is this achieved within the keyword text file?

       

      Last edit: Troy McComas 2015-05-08
    • Troy McComas

      Troy McComas - 2015-05-08
       

      Last edit: Troy McComas 2015-05-08

Log in to post a comment.