Hi everybody, this is my first post,
I designed a new IVR using pocketsphinx and I need to re-adjust the kws_threshold values automatically because for new end user It's not very easy to know what values to use for kws_threshold.
My app allow to user to set a new kw list and try to spot it, if we got a mach we retrieve the best result and return. In case wich no match we reduce kws_threshold and restart search.
My questions :
1) Is there a default/range values for kws_threshold who match almost all word in keyword list (user is free to set it)?
2)Is there a way to uptdate kws_threshold without reinit decoder?
3)What is the best practice if I got bestpath with many word, use best score or generate a new jsgf and switch to grammar mode?
I did all this for increase noise robustness because my app should accept any call environnement.
Thanks in advance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
1) Is there a default/range values for kws_threshold who match almost all word in keyword list (user is free to set it)?
Range depends on number of syllables, you can calculate from that, for example, 1e-10 for 1 syllable, 1e-20 for two and 1e-30 for three. That should give you a good estimation.
Another way to estimate proper threshold is to get some arbitrary audio, say 1 hour and just calculate alarms on it. You need to optimize alarms to be in certain range, say 5 alarms per hour of speech. It does not matter if target word is in audio or not, you'll get a good approximation.
2)Is there a way to uptdate kws_threshold without reinit decoder?
ps_set_kws should create a search without decoder reinit, you can unset search first and then set it again. There is no way to update unfortunately.
Overall, spotting of random phrases is not a good idea, we recommend to have a fixed word and switch to decoding after it. If you want some sort of analytics it is better to recognize speech first and then just look for the words in recognition result or in a lattice/nbest. Maybe technology will improve and we'll be able to spot more efficiently.
3)What is the best practice if I got bestpath with many word, use best score or generate a new jsgf and switch to grammar mode?
I'm not sure about what you mean by this question. Please elaborate.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Nickolay,
Thank you for your response,
In 1) : How to see and optimise alarms? I use ps_process raw on partial result and if have no match I decode the same audio with another kws_threshold value with ps_decode_raw
In 2) :
Overall, spotting of random phrases is not a good idea, we recommend to have a fixed word and switch to decoding after it. If you want some sort of analytics it is better to recognize speech first and then just look for the words in recognition result or in a lattice/nbest. Maybe technology will improve and we'll be able to spot more efficiently.
I do not generate a randow phrases but each client have his own solution with different setup
Is it a correct practice?
In 3) I mean :
If I set a list of keyword and I apply kw_threshold (for example 1-e30) and the result is more than one hypothesis. In this case I need to know the best way between :
a- use lattice/nbest to retrieve the best hypothesis.
b- Use the generated result to set a JSGF string and switch to the grammar mode and restart recognition with this JSGF.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I build an IVR application and I am able to got background noise level off my audio before pass it to the decoder. My question is :
Is there an parameter to adjust in the decoder with my background noise?
for example: If my background noise is 10db or 50db. what is the corresponding values to init the decoder?
Is there are a limit value of max noise background for pocketsphinx (For example at 200db of background noise recognition falls)?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Usually recognition accuracy significantly drops at 10db noise, you can warn about that. There is no need to adjust decoder to noise, it adapts automatically.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everybody, this is my first post,
I designed a new IVR using pocketsphinx and I need to re-adjust the kws_threshold values automatically because for new end user It's not very easy to know what values to use for kws_threshold.
My app allow to user to set a new kw list and try to spot it, if we got a mach we retrieve the best result and return. In case wich no match we reduce kws_threshold and restart search.
My questions :
1) Is there a default/range values for kws_threshold who match almost all word in keyword list (user is free to set it)?
2)Is there a way to uptdate kws_threshold without reinit decoder?
3)What is the best practice if I got bestpath with many word, use best score or generate a new jsgf and switch to grammar mode?
I did all this for increase noise robustness because my app should accept any call environnement.
Thanks in advance.
Hello David, welcome to CMUSphinx forums.
Range depends on number of syllables, you can calculate from that, for example, 1e-10 for 1 syllable, 1e-20 for two and 1e-30 for three. That should give you a good estimation.
Another way to estimate proper threshold is to get some arbitrary audio, say 1 hour and just calculate alarms on it. You need to optimize alarms to be in certain range, say 5 alarms per hour of speech. It does not matter if target word is in audio or not, you'll get a good approximation.
ps_set_kws should create a search without decoder reinit, you can unset search first and then set it again. There is no way to update unfortunately.
Overall, spotting of random phrases is not a good idea, we recommend to have a fixed word and switch to decoding after it. If you want some sort of analytics it is better to recognize speech first and then just look for the words in recognition result or in a lattice/nbest. Maybe technology will improve and we'll be able to spot more efficiently.
I'm not sure about what you mean by this question. Please elaborate.
Hi Nickolay,
Thank you for your response,
In 1) : How to see and optimise alarms? I use ps_process raw on partial result and if have no match I decode the same audio with another kws_threshold value with ps_decode_raw
In 2) :
Overall, spotting of random phrases is not a good idea, we recommend to have a fixed word and switch to decoding after it. If you want some sort of analytics it is better to recognize speech first and then just look for the words in recognition result or in a lattice/nbest. Maybe technology will improve and we'll be able to spot more efficiently.
I do not generate a randow phrases but each client have his own solution with different setup
Is it a correct practice?
In 3) I mean :
If I set a list of keyword and I apply kw_threshold (for example 1-e30) and the result is more than one hypothesis. In this case I need to know the best way between :
a- use lattice/nbest to retrieve the best hypothesis.
b- Use the generated result to set a JSGF string and switch to the grammar mode and restart recognition with this JSGF.
Hi Nickolay,
You mentioned to use lattice/nbest to sort for analytics. I did it and I see that is useful but I have 3 questions about it :
Thank you
One is enough
You can process lattice in memory without writing it to a file
It's unrelated. We recommend process_raw, decode_raw is only for testing small files.
Thank you a lot I got it.
I have new question.
I build an IVR application and I am able to got background noise level off my audio before pass it to the decoder. My question is :
Is there an parameter to adjust in the decoder with my background noise?
for example: If my background noise is 10db or 50db. what is the corresponding values to init the decoder?
Is there are a limit value of max noise background for pocketsphinx (For example at 200db of background noise recognition falls)?
Usually recognition accuracy significantly drops at 10db noise, you can warn about that. There is no need to adjust decoder to noise, it adapts automatically.
Thank you Nickolay,
Is there a way to know noise level inside pocketsphinx(in db)?
What value to reajust depending background noise?
vad_threshold can help to filter out noise?
The value is available in the code, but not in the API
You should not readjust anything
No, it must be fixed