CMU Sphinx / Forums / Help: en-us low out-of-the-box keyword recognition accuracy

Broes De Cat - 2017-04-05

Hi,

I have been experimenting with pocketsphinx for a command-and-control application, but have generally gotten quite low recognition accuracy and have been trying to find out what part of my workflow is causing this (and have been going through the documentation on the website).
I now have a specific isolated example from which I hope some of your input might help me in pinpointing how to improve the accuracy.

I have the Ubuntu pocketsphinx package installed (0.8.0+real5prealpha-1ubuntu2, on 16.04), and am using the en-us dictionary and models that come along with it.
I am trying to do keyword recognition, with the keywords 'stop', 'resume', 'left', 'right', 'forward', 'reverse'.
In attachment, I put a .wav file in which I say the word resume.
However, what is recognized, is the following (per threshold I tried):
1e-50: left right resume reverse left right
1e-40: left right reverse resume
1e-30: (nothing)

Originally I had a slightly larger keywords file, also including all digits.Then for the same file, the recognition (for threshold 1e-30), even looked like:
two three eight two ten nine one eight two eight two two three eight two

What should I try to improve this accuracy?
Is the problem in adaptation to my voice/pronounication, or is it the recording quality, or ...?

Thanks for any help!
Broes

A related question: when running in continous, microphone, keyword mode (pocketsphinx_continuous -inmic yes -kws ./keywords), sphinx also recognized words that were not part of the keyword file. Is that to be expected?

Last edit: Broes De Cat 2017-04-05

resume.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-04-05
  
  Originally I had a slightly larger keywords file, also including all digits.Then for the same file, the recognition (for threshold 1e-30), even looked like:
  two three eight two ten nine one eight two eight two two three eight two
  
  You can provide test data in order to get help with the accuracy. Tutorial also recommends to use 3-4 syllable keyphrases for reliable detection and to tune thresholds properly:
  
  http://cmusphinx.sourceforge.net/wiki/tutoriallm
  
  A related question: when running in continous, microphone, keyword mode (pocketsphinx_continuous -inmic yes -kws ./keywords), sphinx also recognized words that were not part of the keyword file. Is that to be expected?
  
  No
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Broes De Cat - 2017-04-05
    
    Hi Nickolay,
    
    What do you mean with test data exactly? I provided the wav file in the original question, or do you mean the keywords file?
    
    Thx!
    Broes
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2017-04-05
      
      Your file is too short, tutorial lists the requirements. Also, our models are for US English, they will not work for accented speech. We have French model, it might work better for you.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

en-us low out-of-the-box keyword recognition accuracy

Speech Recognition Toolkit

Forums

Help

en-us low out-of-the-box keyword recognition accuracy document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

en-us low out-of-the-box keyword recognition accuracy