Menu

New Thread - Confidence implementation

Help
2017-02-03
2017-02-06
  • Carlo Benussi

    Carlo Benussi - 2017-02-03

    Hello,

    I would like to ask why it is not implemented (I guess since it is hard)
    the confidence score for jsgf grammar (ps_getProb returns always zero).

    Since it is implemented for language models, what makes it hard (or
    impossible?) to implement it on grammars to?

    Thanks in advance

     
    • Nickolay V. Shmyrev

      Confidence estimation requires good score for an alternative path. In grammar with few words only it is hard to make a reliable estimate. In langauge model with few words it does not work as well, you need large vocabulary to reliably estimate confidence currently. Or you need to introduce a phone loop like we did in keyword spotting.

      Next, confidence with small vocabulary is still a subject for research, for example short words of 1-2 syllables are very hard to detect reliably within current approach.

      Third, grammars are not a priority for pocketsphinx most likely, it is hard to make voice interface right with grammars. In the future pocketsphinx will probably support only large language model and small language model built from example phrases.

       
  • Carlo Benussi

    Carlo Benussi - 2017-02-03

    Thanks for the quick response, I understand the constraints (within the limits of my knowledge of SREs).

    I am currently developing an application which uses pocketsphinx with jsgf grammars for In-Vehicle Infotainment systems, and is working quite well. However, when background noise (especially voices) is present and the user is not saying anything, the recognizer gets a non-null hypothesis anyway most of the times.
    I tried to improve the robustness of the recognizer by tuning some parameters (mainly increasing vad_startspeech, vad_threshold and maxhmmpf) and I saw some improvements, but a great solution would have been setting a threshold; I am now thinking of adapting the acoustic model (Italian) by feeding some audio taken on the vehicle (background noise and voices coming from the street), do you think this could bring significant improvements?

     
    • Nickolay V. Shmyrev

      We recommend keyword spotting mode for continuous listening. It can be tuned to avoid false alarms.

      Current Italian model is pretty basic, you need much more data to make it realiable. With training a new model you can introduce common noises.

       
  • Carlo Benussi

    Carlo Benussi - 2017-02-03

    Actually the recognition is triggered with a button, so keyword spotting is not necessary. I guess I'll restrict the grammar possibilities to significantly long phrases, in order to reduce the false alarms, or maybe use two microphones with audio subtraction to achieve directionality.

    Anyway, the Italian acoustic model works very well with grammars, despite being basic.

    I just hope you will not definitely move away from grammars in the near future, I think for driving embedded system (or any kind of software/application) is fundamental.

     

    Last edit: Carlo Benussi 2017-02-03
    • Nickolay V. Shmyrev

      or maybe use two microphones with audio subtraction to achieve directionality.

      This would be the step in a right direction.

       
  • Carlo Benussi

    Carlo Benussi - 2017-02-06

    What about creating two recognizers, one with the grammar actually needed, and one with a filler grammar comprehending the most common and short words in italian (together with the recursive option on the grammar rule)?
    Then I could send the audio input to both recognizers and get the scores (from ps_get_hyp), and compare the two. If the score of my grammar is bigger than the score of the filler grammar, the result would be accepted, otherwise rejected. It seems feasible?

     
    • Nickolay V. Shmyrev

      It is ok, you do not even need two recognizers, you can combine both grammars into a single one with two branches.

      The question is only how fast that would work and how reliable. It requires experiments.

       
  • Carlo Benussi

    Carlo Benussi - 2017-02-07

    I tried to use two recognizers like I said above (the filler Grammar has only one recursive rule, which is an or between all the phonemes) and the results regarding the scoring are quite good. But like you pointed out, the recognition is much slower now.

    Which parameter of the filler recognizer can I tune to make the recognition over the filler grammar faster (even if less reliable)? I was thinking about lowering maxhmmpf and/or maxwpf for pruning, but I am far from sure since it is not my field.

    Thanks for the availability, and sorry for bothering so much.

     

    Last edit: Carlo Benussi 2017-02-07
    • Nickolay V. Shmyrev

      Which parameter of the filler recognizer can I tune to make the recognition over the filler grammar faster (even if less reliable)? I was thinking about lowering maxhmmpf and/or maxwpf for pruning, but I am far from sure since it is not my field.

      All parameters described here:

      http://cmusphinx.sourceforge.net/wiki/pocketsphinxhandhelds

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.