Menu

Single keyword detection with reasonable accuracy (pocketsphinx)

Help
2016-01-12
2016-01-23
  • Andy Barker

    Andy Barker - 2016-01-12

    Hi, I am new to pocketsphinx, I have been playing with pocketsphinx_continuous and various params with limited success.

    I simply want to continuously listen for a single activation keyword (and then this will trigger something else which is not Voice related), when I configure this in a kws file I just get loads of false activations. If I honest I am a bit lost in all the options but willing to learn!

    I don't mind what the activation keyword is.

    Could someone point me in the right direction please to try and get no false actiivation and a higher detection on my one keyword please.

     
    • Nickolay V. Shmyrev

      Hello Andy

      This issue is covered in our FAQ:

      http://cmusphinx.sourceforge.net/wiki/faq#qhow_to_implement_hot_word_listening

      Please let us know what did you try and what were the results. For debugging accuracy issues you'd better provide audio recording you are playing with.

       
  • Andy Barker

    Andy Barker - 2016-01-12

    Thanks Nikolay.

    Some success.... I got a detection every now and then with this and no false positives.

    pocketsphinx_continuous -keyphrase "okay computer" -kws_threshold 1e-30 -inmic yes

    Can I improve the detections by providing an .lm and .dic file for the same keyword or by training my voice on this single keyword?

     

    Last edit: Andy Barker 2016-01-12
    • Nickolay V. Shmyrev

      Can I improve the detections by proving an .lm and .dic file for the same keyword or by training my voice on this single keyword?

      No, langauge models are for different purpose. Voice adaptation as described in our tutorial might be reasonable idea but it is probably better to get more distinctive keyphrase.

       
  • Andy Barker

    Andy Barker - 2016-01-12

    When I look in the dic file it has broken down the word samantha as...

    SAMANTHA S AH M AE N TH AH

    Does pocketsphinx not need the same to help identify my keyword?

    (I was thinking my keyword would be just samantha!)

     

    Last edit: Andy Barker 2016-01-12
    • Nickolay V. Shmyrev

      Dictionary file is loaded by default, you do not need another dictionary. If you have some special keyword missing in the dictionary, you need to add it there.

       
      • Andy Barker

        Andy Barker - 2016-01-12

        Ah, that makes sense now.

        With some output like the following, is it the case pocketsphinx_continuous is only listening when it says "Listening", or is it always listening?

        READY....
        Listening...
        INFO: cmn_prior.c(131): cmn_prior_update: from < 40.00  3.00 -1.00  0.00  0.0                 0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
        INFO: cmn_prior.c(149): cmn_prior_update: to   < 40.67  2.80  7.69  4.53 -2.5                 9.82 -7.80 -6.28 -6.86 -3.77  3.06  1.49  4.54 >
        READY....
        Listening...
        INFO: cmn_prior.c(99): cmn_prior_update: from < 40.67  2.80  7.69  4.53 -2.55                 .82 -7.80 -6.28 -6.86 -3.77  3.06  1.49  4.54 >
        INFO: cmn_prior.c(116): cmn_prior_update: to   < 40.62  3.67  5.81  3.63 -3.6                 9.44 -6.93 -7.65 -7.46 -4.48  3.70  1.72  4.02 >
        INFO: cmn_prior.c(99): cmn_prior_update: from < 40.62  3.67  5.81  3.63 -3.62                 .44 -6.93 -7.65 -7.46 -4.48  3.70  1.72  4.02 >
        INFO: cmn_prior.c(116): cmn_prior_update: to   < 41.72  3.79  6.42  6.79 -2.6                 8.27 -6.43 -7.84 -6.79 -4.62  2.78  2.67  4.46 >
        INFO: cmn_prior.c(131): cmn_prior_update: from < 41.72  3.79  6.42  6.79 -2.6                 8.27 -6.43 -7.84 -6.79 -4.62  2.78  2.67  4.46 >
        INFO: cmn_prior.c(149): cmn_prior_update: to   < 40.26  3.04  6.17  6.64 -2.9                 7.84 -6.08 -8.09 -6.64 -4.24  2.70  2.75  4.27 >
        wake up computer
        READY....
        
         
  • Andy Barker

    Andy Barker - 2016-01-12

    I changed my recording levels and can see now that pocketsphinx_continuous is only ready when it says Ready..., as soon as as I say something it changes to Listening..., slightly misleading I guess as this really is processing!?

    Anyway, for whatever reason, I can get it to detect "oh mighty computer" much better than "wake up samantha" which it never detects yet is in the default model/en-us/cmudict-en-us.dict dictionary. So may be it would be useful to do voice adaption on "wake up samantha", time to read some more....

     

    Last edit: Andy Barker 2016-01-12
    • Nickolay V. Shmyrev

      You'd better provide recording, maybe you say samantha somehow differently.

       
  • Andy Barker

    Andy Barker - 2016-01-13

    Funny, it was the way I was saying wake up, as if it was one word "wakeup" and it obviously didn't like it! I have now tried just using the single word samantha and getting much better results with 1e-60.

    pocketsphinx_continuous -keyphrase "samantha" -kws_threshold 1e-60 -adcdev hw:1,0 -inmic yes -samprate 16000/8000/48000
    

    Nikolay, with your help I have now found several other people asking similar things to me, and each time I see your replies - can't thank you enough for your dedication!

    If you have time, could you explain the meaning of kws_threshold, and what the difference would be between 1e-10 and 1e-60. I am just randomly trying values without knowing why and I can't find any good doc on this option.

    The other thing I am playing with is mic input volume, is there any recomended recording level that pocketsphinx works better at, again I am ramdonly moving the level up and down which is not very scientific. I think it prefers is prefers 16khz and 48k samples.

     

    Last edit: Andy Barker 2016-01-13
  • Nickolay V. Shmyrev

    If you have time, could you explain the meaning of kws_threshold, and what the difference would be between 1e-10 and 1e-60. I am just randomly trying values without knowing why and I can't find any good doc on this option.

    It is just a threshold. The smaller threshold is the more false alarms you get. The higher threshold, less false alarms you get, but you also start to skip real matches.

    You need to record a sample and try with different thresholds. Then compare matches and false alarms and find the best threshold which gives all matches but not any false alarms.

    The other thing I am playing with is mic input volume, is there any recomended recording level that pocketsphinx works better at, again I am ramdonly moving the level up and down which is not very scientific. I think it prefers is prefers 16khz and 48k samples.

    It is hard to say what is going on there but I suggest you to provide recordings or raw dumps which you can collect with -rawlogdir option. Input volume should not matter at all. 48khz samples require special decoder options, by default it expects 16khz samples only. Samples must be mono. Incorrect format can be very harmful.

     

    Last edit: Nickolay V. Shmyrev 2016-01-13
  • Andy Barker

    Andy Barker - 2016-01-13

    Am I correct in thinking then my use of -samprate 16000/8000/48000 is correct?

    So re threshold
    1e-10 = 0.0000000001
    1e-20 = 0.00000000000000000001

    So 1e-20 would get more false alarms than 1e-10?

     

    Last edit: Andy Barker 2016-01-13
    • Nickolay V. Shmyrev

      Yes, we accept everything with the probability more than 1e-20, it is more than variants with probability 1e-10.

       
  • Andy Barker

    Andy Barker - 2016-01-13

    Wow that is amazing to think you are working with variables that go as low as 1e-60!

    Last evening I had a play with using python but notice that the python examples simply take streams in 1024 chunks and check for keywords.

    This does not seem very scientific as it would make more sense dependent on volume levels to find natural start and stop to sentences/command sentences/keywords given. I assume this is what pocketsphinx_continuous is doing as well and maybe that is why sometimes there is a missing detection.

    Or am I missing something?

     
    • Nickolay V. Shmyrev

      Wow that is amazing to think you are working with variables that go as low as 1e-60!

      Yes, probabilities of specific events might be very small, thats why they are usually handled in log domain.

      This does not seem very scientific as it would make more sense dependent on volume levels to find natural start and stop to sentences/command sentences/keywords given.

      pocketsphinx_continuous collects the data and adapts to the current volume, it takes about 10 seconds into account. On the bad side, it needs few seconds on start to make initial estimation.

       

Log in to post a comment.