Menu

en-us low out-of-the-box keyword recognition accuracy

Help
2017-04-05
2017-04-05
  • Broes De Cat

    Broes De Cat - 2017-04-05

    Hi,

    I have been experimenting with pocketsphinx for a command-and-control application, but have generally gotten quite low recognition accuracy and have been trying to find out what part of my workflow is causing this (and have been going through the documentation on the website).
    I now have a specific isolated example from which I hope some of your input might help me in pinpointing how to improve the accuracy.

    I have the Ubuntu pocketsphinx package installed (0.8.0+real5prealpha-1ubuntu2, on 16.04), and am using the en-us dictionary and models that come along with it.
    I am trying to do keyword recognition, with the keywords 'stop', 'resume', 'left', 'right', 'forward', 'reverse'.
    In attachment, I put a .wav file in which I say the word resume.
    However, what is recognized, is the following (per threshold I tried):
    1e-50: left right resume reverse left right
    1e-40: left right reverse resume
    1e-30: (nothing)

    Originally I had a slightly larger keywords file, also including all digits.Then for the same file, the recognition (for threshold 1e-30), even looked like:
    two three eight two ten nine one eight two eight two two three eight two

    What should I try to improve this accuracy?
    Is the problem in adaptation to my voice/pronounication, or is it the recording quality, or ...?

    Thanks for any help!
    Broes

    A related question: when running in continous, microphone, keyword mode (pocketsphinx_continuous -inmic yes -kws ./keywords), sphinx also recognized words that were not part of the keyword file. Is that to be expected?

     

    Last edit: Broes De Cat 2017-04-05
    • Nickolay V. Shmyrev

      Originally I had a slightly larger keywords file, also including all digits.Then for the same file, the recognition (for threshold 1e-30), even looked like:
      two three eight two ten nine one eight two eight two two three eight two

      You can provide test data in order to get help with the accuracy. Tutorial also recommends to use 3-4 syllable keyphrases for reliable detection and to tune thresholds properly:

      http://cmusphinx.sourceforge.net/wiki/tutoriallm

      A related question: when running in continous, microphone, keyword mode (pocketsphinx_continuous -inmic yes -kws ./keywords), sphinx also recognized words that were not part of the keyword file. Is that to be expected?

      No

       
      • Broes De Cat

        Broes De Cat - 2017-04-05

        Hi Nickolay,

        What do you mean with test data exactly? I provided the wav file in the original question, or do you mean the keywords file?

        Thx!
        Broes

         
        • Nickolay V. Shmyrev

          Your file is too short, tutorial lists the requirements. Also, our models are for US English, they will not work for accented speech. We have French model, it might work better for you.

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.