Menu

100% Error with PocketSphinx

Help
creative64
2010-05-05
2012-09-22
  • Nickolay V. Shmyrev

    pocketsphinx_continuous -hmm hub4wsj_sc_8k -lm wsj0vp.5000.dmp -dict
    cmu07a.dic -samprate 8000 -rawlogdir .

    Why do you put -samprate 8000? try without it. Actually you can just run
    pocketsphinx_continuous without any arguments.

    1. It could be an accent issue but still with my accent (full.wma) is 100%
      error rate OK or I'm missing something ?

    100% error rate couldn't be due of the accent. It's a bug/incorrect setup.
    it's most likely a bug in windows code that doesn't properly input audio at
    8kHz

    1. Is there something I can do to improve recognition accuracy (other than
      adaptation and reducing vocabulary) ?

    Yes, first of all you could try to fix the bug

    1. Why the dumps generated by pocketsphinx have repeatitions of words/parts
      of words ?

    Due to the bug I think

     
  • creative64

    creative64 - 2010-05-06

    Why do you put -samprate 8000? try without it

    1. I was putting it to match input sampling rate to accoustic model. It is not required looks like !

    I tried without it too. Nothing changed on the accuracy front however the dump
    generated now sounds little better. It still has
    echos, clicks and repeatitions though.

    Original (as recorded by windows in parallel) http://www.mediafire.com/file/z
    d4zm0dtkjm/turn_right_original.wma

    Dumped (as dumped by pocketsphinx_continuous) http://www.mediafire.com/file/v
    1m2hdntauy/turn_right_dumped.raw

    It's a bug/incorrect setup. it's most likely a bug in windows code that
    doesn't properly input audio at 8kHz

    1. Bug seems to be for 16 khz also.

    2. Is pocketsphinx0.6 build is tested for live input mode ?

    Thanks and Regards,

     
  • creative64

    creative64 - 2010-05-07

    Hi,

    Just to bypass any possible issues associated with the live input mode, I
    tried pocketsphinx_batch with following parameters:
    Accoustic model: hub4wsj_sc_8k
    Language model: wsj0vp.5000.dmp
    Dictionary: cmu07a.dic

    I used audio clips preset in \test\data directory of pocketsphinx0.6 package
    (numbers.raw, goforward.raw, something.raw) these
    clips are in american native accent.

    Here are the results:
    numbers.raw ("Thirty three four six ninety two") Recognized output: "Thirty
    three four six nine to two "

    goforward.raw ("Go forward ten meters") Recognized output: "Go forward and
    users"

    something.raw ("Go somewhere and do something") Recognized output " Though
    some wear and you something "

    1. Is this the expected accuracy level or there is still something amiss ?

    2. How can I improve on this ?

    Thanks and regards,

     
  • Nickolay V. Shmyrev

    1. Is this the expected accuracy level or there is still something amiss ?

    This is not an accuracy. Accuracy is a number measured on test database that
    represents acoustic properties. Acoustic test database doesn't need to be
    large but it should be bigger than 3 sentences.

    The result itself is expected, I get exactly the same here on Linux.

    1. How can I improve on this ?

    Obvious thing here is that you are trying to decode commands and number
    sequences with language model that's not very suiable for that. WSJ model is
    trained for dictation task with newspaper texts. Careful system design, better
    language and acoustic models, new features, adaptation and postprocessing.
    That are the actions that make system usable.

     
  • creative64

    creative64 - 2010-05-07

    Hi Nshmyrev,

    Thanks for validating my output. Atleast I now know that the pocketsphinx is
    properly up and running on my system. Yes improving
    accuracy is a big task with many tweaking handles. Will be trying some of
    those.

    Thanks and regards,

     

Log in to post a comment.