CMU Sphinx / Forums / Help: Threshold not doing anything?

Jelmer Feenstra - 2017-08-27

I've been testing PocketSphinx (pocketsphinx_continuous on Linux) to see if it could be used (in keyword spotting mode) to detect a small subset of simple words. Right now I'm trying to detect some Dutch words, which works quite well. However, one of them is 'aap' (monkey) and it often doesn't get recognized. The .dict I use has 'aap a p' and there's also an entry for it in the .list file I use.

However, no matter whether the entry in the .list file is 'aap /1/' or 'aap /1e-100/' or 'aap /1000/' the detection doesn't get any better or worse. I understand detecting a keyword should be easier with more complex words (more to go on), but still: howcome threshold doesn't seem to have any effect? Thanks for any insight into this matter!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-08-27
  
  You could provide an audio file you are using for tests and the command line you are using to get help on this issue.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jelmer Feenstra - 2017-08-27

Thanks for getting back to me (and quickly, wow). The command I'm using is this:

pocketsphinx_continuous -hmm cmusphinx-nl-5.2/model_parameters/voxforge_nl_sphinx.cd_cont_2000 -lm cmusphinx-nl-5.2/etc/voxforge_nl_sphinx.lm.bin -inmic yes -kws keyphrase.list -dict keyphrase.dict

I'm simply trying (through my laptop's mic) to get PocketSphinx to reliably detect a few words, pronounced by various dutch native speakers (my girlfriend, my daughter). I don't actually need pronunciation to be perfect, that's why I'm trying to really lower the threshold to just have it detect the words in almost all cases. But configuring the threshold doesn't seem to have any effect at all....

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-08-27
  
  You need to make experiments with the audio file first, not with a microphone. That enables us to reproduce your problems and help you.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jelmer Feenstra - 2017-08-27

Ok, so I tried with a different word: 'slang' (Dutch word for snake). I pronounced it three times in the attached sound file ('slang', 'sang' and 'lang'). Right now running the command:

pocketsphinx_continuous -hmm cmusphinx-nl-5.2/model_parameters/voxforge_nl_sphinx.cd_cont_2000 -lm cmusphinx-nl-5.2/etc/voxforge_nl_sphinx.lm.bin -infile slang.wav -kws keyphrase.list -dict keyphrase.dict

..seems to detect the first two (?) pronunciations, and I'm not able to get different results when I significantly modify the treshold for 'slang' in keyphrase.list (e.g. "slang /1e-100/" vs "slang /1/"). Even removing the the line completely from the .list file doesn't change my results. Which leads me to believe somehow the .list file might not even be interpreted...?! (it does read the correct file though, I checked)

Any help much appreciated!

PS. My .dict file contains 'slang s l aa nn'.

Last edit: Jelmer Feenstra 2017-08-27

slang.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-08-27
  
  With a dictionary test.dict
  
  slang s l aa nn g sang s aa nn g lang l aa nn g
  
  and kws file test.kws
  
  slang /1e-10/ sang /1e-20/ lang /1e-20/
  
  and command line (note that -lm conflicts with kws)
  
  pocketsphinx_continuous -infile slang.wav -hmm cmusphinx-nl-5.2/model_parameters/voxforge_nl_sphinx.cd_cont_2000 -dict test.dict -kws test.kws
  
  result is
  
  INFO: cmn_live.c(120): Update from < 40.00 3.00 -1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > INFO: cmn_live.c(138): Update to < 40.65 0.07 1.05 -0.60 4.93 3.61 -3.41 -0.73 -2.40 2.78 -6.97 -6.15 3.05 > INFO: kws_search.c(656): kws 0.10 CPU 0.062 xRT INFO: kws_search.c(658): kws 0.10 wall 0.065 xRT slang INFO: cmn_live.c(120): Update from < 40.65 0.07 1.05 -0.60 4.93 3.61 -3.41 -0.73 -2.40 2.78 -6.97 -6.15 3.05 > INFO: cmn_live.c(138): Update to < 42.87 -0.33 -0.22 1.26 3.03 4.21 -1.32 -1.56 -0.77 2.43 -9.25 -7.16 3.04 > INFO: kws_search.c(656): kws 0.06 CPU 0.061 xRT INFO: kws_search.c(658): kws 0.06 wall 0.061 xRT sang INFO: cmn_live.c(120): Update from < 42.87 -0.33 -0.22 1.26 3.03 4.21 -1.32 -1.56 -0.77 2.43 -9.25 -7.16 3.04 > INFO: cmn_live.c(138): Update to < 42.65 4.88 0.22 1.08 2.19 7.37 1.28 -2.88 -3.12 1.72 -7.18 -5.57 2.69 > INFO: kws_search.c(656): kws 0.06 CPU 0.070 xRT INFO: kws_search.c(658): kws 0.06 wall 0.077 xRT lang INFO: cmn_live.c(120): Update from < 42.65 4.88 0.22 1.08 2.19 7.37 1.28 -2.88 -3.12 1.72 -7.18 -5.57 2.69 > INFO: cmn_live.c(138): Update to < 42.65 4.88 0.22 1.08 2.19 7.37 1.28 -2.88 -3.12 1.72 -7.18 -5.57 2.69 > INFO: kws_search.c(448): TOTAL kws 0.22 CPU 0.065 xRT INFO: kws_search.c(451): TOTAL kws 0.23 wall 0.069 xRT
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Jelmer Feenstra - 2017-08-27
    
    Thanks for your response, not using the -lm option fixed things for me with regard to thresholds!
    
    My intended usecase is perhaps somewhat strange: I want to be very forgiving in recognizing just a handul of words, so a kid could mispronounce the (Dutch) word 'slang' and still have the word correctly recognized. I can somewhat allow for this by adding various extra pronunciations for the word in my .dict file (combined with low thresholds) - but when I'm trying to recognize words that are rather short it often results in multiple of them being recognized. For example, the dutch word for monkey is 'aap', which is often recognized as part of other words that contain the 'a' vowel (in that case I could just pick the longest word... I guess).
    
    Are there better ways of getting to know which word was presumably intended? I don't think there's such a thing as word boundaries in PocketSphinx (or speech recognition for that matter), right? Maybe recognized length combined with some form of confidence?
    
    Using more complex words is a 'solution' to this problem, but I'm wondering whether there are ways to make this work a little better with short / simple words as well.
    
    Thanks again.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2017-08-28
      
      Well, children speech recognition requires a specialized acoustic model anyway, it will not accurately work out of box. And it will never reliably work with short words like "ap" unless you detect silence around the word with algorithm modification. Detected word length is available in command line with -time yes and with ps_seg API.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Threshold not doing anything?

Speech Recognition Toolkit

Forums

Help

Threshold not doing anything? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Threshold not doing anything?