Menu

Keyword threshold dynamic values

Help
David
2016-02-03
2016-02-24
  • David

    David - 2016-02-03

    Hi everybody, this is my first post,
    I designed a new IVR using pocketsphinx and I need to re-adjust the kws_threshold values automatically because for new end user It's not very easy to know what values to use for kws_threshold.
    My app allow to user to set a new kw list and try to spot it, if we got a mach we retrieve the best result and return. In case wich no match we reduce kws_threshold and restart search.
    My questions :
    1) Is there a default/range values for kws_threshold who match almost all word in keyword list (user is free to set it)?
    2)Is there a way to uptdate kws_threshold without reinit decoder?
    3)What is the best practice if I got bestpath with many word, use best score or generate a new jsgf and switch to grammar mode?
    I did all this for increase noise robustness because my app should accept any call environnement.

    Thanks in advance.

     
    • Nickolay V. Shmyrev

      Hello David, welcome to CMUSphinx forums.

      1) Is there a default/range values for kws_threshold who match almost all word in keyword list (user is free to set it)?

      Range depends on number of syllables, you can calculate from that, for example, 1e-10 for 1 syllable, 1e-20 for two and 1e-30 for three. That should give you a good estimation.

      Another way to estimate proper threshold is to get some arbitrary audio, say 1 hour and just calculate alarms on it. You need to optimize alarms to be in certain range, say 5 alarms per hour of speech. It does not matter if target word is in audio or not, you'll get a good approximation.

      2)Is there a way to uptdate kws_threshold without reinit decoder?

      ps_set_kws should create a search without decoder reinit, you can unset search first and then set it again. There is no way to update unfortunately.

      Overall, spotting of random phrases is not a good idea, we recommend to have a fixed word and switch to decoding after it. If you want some sort of analytics it is better to recognize speech first and then just look for the words in recognition result or in a lattice/nbest. Maybe technology will improve and we'll be able to spot more efficiently.

      3)What is the best practice if I got bestpath with many word, use best score or generate a new jsgf and switch to grammar mode?

      I'm not sure about what you mean by this question. Please elaborate.

       
      • David

        David - 2016-02-05

        Hi Nickolay,
        Thank you for your response,
        In 1) : How to see and optimise alarms? I use ps_process raw on partial result and if have no match I decode the same audio with another kws_threshold value with ps_decode_raw

        In 2) :
        Overall, spotting of random phrases is not a good idea, we recommend to have a fixed word and switch to decoding after it. If you want some sort of analytics it is better to recognize speech first and then just look for the words in recognition result or in a lattice/nbest. Maybe technology will improve and we'll be able to spot more efficiently.

        I do not generate a randow phrases but each client have his own solution with different setup

        Is it a correct practice?

        In 3) I mean :
        If I set a list of keyword and I apply kw_threshold (for example 1-e30) and the result is more than one hypothesis. In this case I need to know the best way between :
        a- use lattice/nbest to retrieve the best hypothesis.
        b- Use the generated result to set a JSGF string and switch to the grammar mode and restart recognition with this JSGF.

         
  • David

    David - 2016-02-11

    Hi Nickolay,
    You mentioned to use lattice/nbest to sort for analytics. I did it and I see that is useful but I have 3 questions about it :

    • Do I have to use both or just one of them is enough ?
    • For lattice is it necessary to write it on file or just process it after ps_get_lattice is enough?
    • what is the best API to use with lattice/nbest between ps_decode_raw and ps_process_raw?

    Thank you

     
    • Nickolay V. Shmyrev

      Do I have to use both or just one of them is enough ?

      One is enough

      For lattice is it necessary to write it on file or just process it after ps_get_lattice is enough?

      You can process lattice in memory without writing it to a file

      what is the best API to use with lattice/nbest between ps_decode_raw and ps_process_raw?

      It's unrelated. We recommend process_raw, decode_raw is only for testing small files.

       
  • David

    David - 2016-02-19

    Thank you a lot I got it.

    I have new question.

    I build an IVR application and I am able to got background noise level off my audio before pass it to the decoder. My question is :

    Is there an parameter to adjust in the decoder with my background noise?
    for example: If my background noise is 10db or 50db. what is the corresponding values to init the decoder?

    Is there are a limit value of max noise background for pocketsphinx (For example at 200db of background noise recognition falls)?

     
    • Nickolay V. Shmyrev

      Usually recognition accuracy significantly drops at 10db noise, you can warn about that. There is no need to adjust decoder to noise, it adapts automatically.

       
  • David

    David - 2016-02-24

    Thank you Nickolay,
    Is there a way to know noise level inside pocketsphinx(in db)?

    What value to reajust depending background noise?

    vad_threshold can help to filter out noise?

     
    • Nickolay V. Shmyrev

      Is there a way to know noise level inside pocketsphinx(in db)?

      The value is available in the code, but not in the API

      What value to reajust depending background noise?

      You should not readjust anything

      vad_threshold can help to filter out noise?

      No, it must be fixed

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.