Menu

Threshold not doing anything?

Help
2017-08-27
2017-08-27
  • Jelmer Feenstra

    Jelmer Feenstra - 2017-08-27

    I've been testing PocketSphinx (pocketsphinx_continuous on Linux) to see if it could be used (in keyword spotting mode) to detect a small subset of simple words. Right now I'm trying to detect some Dutch words, which works quite well. However, one of them is 'aap' (monkey) and it often doesn't get recognized. The .dict I use has 'aap a p' and there's also an entry for it in the .list file I use.

    However, no matter whether the entry in the .list file is 'aap /1/' or 'aap /1e-100/' or 'aap /1000/' the detection doesn't get any better or worse. I understand detecting a keyword should be easier with more complex words (more to go on), but still: howcome threshold doesn't seem to have any effect? Thanks for any insight into this matter!

     
    • Nickolay V. Shmyrev

      You could provide an audio file you are using for tests and the command line you are using to get help on this issue.

       
  • Jelmer Feenstra

    Jelmer Feenstra - 2017-08-27

    Thanks for getting back to me (and quickly, wow). The command I'm using is this:

    pocketsphinx_continuous -hmm cmusphinx-nl-5.2/model_parameters/voxforge_nl_sphinx.cd_cont_2000 -lm cmusphinx-nl-5.2/etc/voxforge_nl_sphinx.lm.bin -inmic yes -kws keyphrase.list -dict keyphrase.dict

    I'm simply trying (through my laptop's mic) to get PocketSphinx to reliably detect a few words, pronounced by various dutch native speakers (my girlfriend, my daughter). I don't actually need pronunciation to be perfect, that's why I'm trying to really lower the threshold to just have it detect the words in almost all cases. But configuring the threshold doesn't seem to have any effect at all....

     
    • Nickolay V. Shmyrev

      You need to make experiments with the audio file first, not with a microphone. That enables us to reproduce your problems and help you.

       
  • Jelmer Feenstra

    Jelmer Feenstra - 2017-08-27

    Ok, so I tried with a different word: 'slang' (Dutch word for snake). I pronounced it three times in the attached sound file ('slang', 'sang' and 'lang'). Right now running the command:

    pocketsphinx_continuous -hmm cmusphinx-nl-5.2/model_parameters/voxforge_nl_sphinx.cd_cont_2000 -lm cmusphinx-nl-5.2/etc/voxforge_nl_sphinx.lm.bin -infile slang.wav -kws keyphrase.list -dict keyphrase.dict

    ..seems to detect the first two (?) pronunciations, and I'm not able to get different results when I significantly modify the treshold for 'slang' in keyphrase.list (e.g. "slang /1e-100/" vs "slang /1/"). Even removing the the line completely from the .list file doesn't change my results. Which leads me to believe somehow the .list file might not even be interpreted...?! (it does read the correct file though, I checked)

    Any help much appreciated!

    PS. My .dict file contains 'slang s l aa nn'.

     

    Last edit: Jelmer Feenstra 2017-08-27
    • Nickolay V. Shmyrev

      With a dictionary test.dict

      slang s l aa nn g
      sang s aa nn g
      lang l aa nn g
      

      and kws file test.kws

      slang /1e-10/
      sang /1e-20/
      lang /1e-20/
      

      and command line (note that -lm conflicts with kws)

       pocketsphinx_continuous -infile slang.wav -hmm cmusphinx-nl-5.2/model_parameters/voxforge_nl_sphinx.cd_cont_2000 -dict test.dict -kws test.kws
      

      result is

      INFO: cmn_live.c(120): Update from < 40.00  3.00 -1.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00  0.00 >
      INFO: cmn_live.c(138): Update to   < 40.65  0.07  1.05 -0.60  4.93  3.61 -3.41 -0.73 -2.40  2.78 -6.97 -6.15  3.05 >
      INFO: kws_search.c(656): kws 0.10 CPU 0.062 xRT
      INFO: kws_search.c(658): kws 0.10 wall 0.065 xRT
      slang 
      INFO: cmn_live.c(120): Update from < 40.65  0.07  1.05 -0.60  4.93  3.61 -3.41 -0.73 -2.40  2.78 -6.97 -6.15  3.05 >
      INFO: cmn_live.c(138): Update to   < 42.87 -0.33 -0.22  1.26  3.03  4.21 -1.32 -1.56 -0.77  2.43 -9.25 -7.16  3.04 >
      INFO: kws_search.c(656): kws 0.06 CPU 0.061 xRT
      INFO: kws_search.c(658): kws 0.06 wall 0.061 xRT
      sang 
      INFO: cmn_live.c(120): Update from < 42.87 -0.33 -0.22  1.26  3.03  4.21 -1.32 -1.56 -0.77  2.43 -9.25 -7.16  3.04 >
      INFO: cmn_live.c(138): Update to   < 42.65  4.88  0.22  1.08  2.19  7.37  1.28 -2.88 -3.12  1.72 -7.18 -5.57  2.69 >
      INFO: kws_search.c(656): kws 0.06 CPU 0.070 xRT
      INFO: kws_search.c(658): kws 0.06 wall 0.077 xRT
      lang 
      INFO: cmn_live.c(120): Update from < 42.65  4.88  0.22  1.08  2.19  7.37  1.28 -2.88 -3.12  1.72 -7.18 -5.57  2.69 >
      INFO: cmn_live.c(138): Update to   < 42.65  4.88  0.22  1.08  2.19  7.37  1.28 -2.88 -3.12  1.72 -7.18 -5.57  2.69 >
      INFO: kws_search.c(448): TOTAL kws 0.22 CPU 0.065 xRT
      INFO: kws_search.c(451): TOTAL kws 0.23 wall 0.069 xRT
      
       
      • Jelmer Feenstra

        Jelmer Feenstra - 2017-08-27

        Thanks for your response, not using the -lm option fixed things for me with regard to thresholds!

        My intended usecase is perhaps somewhat strange: I want to be very forgiving in recognizing just a handul of words, so a kid could mispronounce the (Dutch) word 'slang' and still have the word correctly recognized. I can somewhat allow for this by adding various extra pronunciations for the word in my .dict file (combined with low thresholds) - but when I'm trying to recognize words that are rather short it often results in multiple of them being recognized. For example, the dutch word for monkey is 'aap', which is often recognized as part of other words that contain the 'a' vowel (in that case I could just pick the longest word... I guess).

        Are there better ways of getting to know which word was presumably intended? I don't think there's such a thing as word boundaries in PocketSphinx (or speech recognition for that matter), right? Maybe recognized length combined with some form of confidence?

        Using more complex words is a 'solution' to this problem, but I'm wondering whether there are ways to make this work a little better with short / simple words as well.

        Thanks again.

         
        • Nickolay V. Shmyrev

          Well, children speech recognition requires a specialized acoustic model anyway, it will not accurately work out of box. And it will never reliably work with short words like "ap" unless you detect silence around the word with algorithm modification. Detected word length is available in command line with -time yes and with ps_seg API.

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.