CMU Sphinx / Forums / Help: Keyword threshold dynamic values

David - 2016-02-03

Hi everybody, this is my first post,
I designed a new IVR using pocketsphinx and I need to re-adjust the kws_threshold values automatically because for new end user It's not very easy to know what values to use for kws_threshold.
My app allow to user to set a new kw list and try to spot it, if we got a mach we retrieve the best result and return. In case wich no match we reduce kws_threshold and restart search.
My questions :
1) Is there a default/range values for kws_threshold who match almost all word in keyword list (user is free to set it)?
2)Is there a way to uptdate kws_threshold without reinit decoder?
3)What is the best practice if I got bestpath with many word, use best score or generate a new jsgf and switch to grammar mode?
I did all this for increase noise robustness because my app should accept any call environnement.

Thanks in advance.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-04
  
  Hello David, welcome to CMUSphinx forums.
  
  1) Is there a default/range values for kws_threshold who match almost all word in keyword list (user is free to set it)?
  
  Range depends on number of syllables, you can calculate from that, for example, 1e-10 for 1 syllable, 1e-20 for two and 1e-30 for three. That should give you a good estimation.
  
  Another way to estimate proper threshold is to get some arbitrary audio, say 1 hour and just calculate alarms on it. You need to optimize alarms to be in certain range, say 5 alarms per hour of speech. It does not matter if target word is in audio or not, you'll get a good approximation.
  
  2)Is there a way to uptdate kws_threshold without reinit decoder?
  
  ps_set_kws should create a search without decoder reinit, you can unset search first and then set it again. There is no way to update unfortunately.
  
  Overall, spotting of random phrases is not a good idea, we recommend to have a fixed word and switch to decoding after it. If you want some sort of analytics it is better to recognize speech first and then just look for the words in recognition result or in a lattice/nbest. Maybe technology will improve and we'll be able to spot more efficiently.
  
  3)What is the best practice if I got bestpath with many word, use best score or generate a new jsgf and switch to grammar mode?
  
  I'm not sure about what you mean by this question. Please elaborate.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - David - 2016-02-05
    
    Hi Nickolay,
    Thank you for your response,
    In 1) : How to see and optimise alarms? I use ps_process raw on partial result and if have no match I decode the same audio with another kws_threshold value with ps_decode_raw
    
    In 2) :
    Overall, spotting of random phrases is not a good idea, we recommend to have a fixed word and switch to decoding after it. If you want some sort of analytics it is better to recognize speech first and then just look for the words in recognition result or in a lattice/nbest. Maybe technology will improve and we'll be able to spot more efficiently.
    
    I do not generate a randow phrases but each client have his own solution with different setup
    
    Is it a correct practice?
    
    In 3) I mean :
    If I set a list of keyword and I apply kw_threshold (for example 1-e30) and the result is more than one hypothesis. In this case I need to know the best way between :
    a- use lattice/nbest to retrieve the best hypothesis.
    b- Use the generated result to set a JSGF string and switch to the grammar mode and restart recognition with this JSGF.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David - 2016-02-11

Hi Nickolay,
You mentioned to use lattice/nbest to sort for analytics. I did it and I see that is useful but I have 3 questions about it :

Do I have to use both or just one of them is enough ?

For lattice is it necessary to write it on file or just process it after ps_get_lattice is enough?

what is the best API to use with lattice/nbest between ps_decode_raw and ps_process_raw?

Thank you
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-11
  
  Do I have to use both or just one of them is enough ?
  
  One is enough
  
  For lattice is it necessary to write it on file or just process it after ps_get_lattice is enough?
  
  You can process lattice in memory without writing it to a file
  
  what is the best API to use with lattice/nbest between ps_decode_raw and ps_process_raw?
  
  It's unrelated. We recommend process_raw, decode_raw is only for testing small files.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David - 2016-02-19

Thank you a lot I got it.

I have new question.

I build an IVR application and I am able to got background noise level off my audio before pass it to the decoder. My question is :

Is there an parameter to adjust in the decoder with my background noise?
for example: If my background noise is 10db or 50db. what is the corresponding values to init the decoder?

Is there are a limit value of max noise background for pocketsphinx (For example at 200db of background noise recognition falls)?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-19
  
  Usually recognition accuracy significantly drops at 10db noise, you can warn about that. There is no need to adjust decoder to noise, it adapts automatically.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

David - 2016-02-24

Thank you Nickolay,
Is there a way to know noise level inside pocketsphinx(in db)?

What value to reajust depending background noise?

vad_threshold can help to filter out noise?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-02-24
  
  Is there a way to know noise level inside pocketsphinx(in db)?
  
  The value is available in the code, but not in the API
  
  What value to reajust depending background noise?
  
  You should not readjust anything
  
  vad_threshold can help to filter out noise?
  
  No, it must be fixed
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Keyword threshold dynamic values

Speech Recognition Toolkit

Forums

Help

Keyword threshold dynamic values document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Keyword threshold dynamic values