Menu

Accuracy improvement with pocketsphinx

Help
Rethish
2010-11-29
2012-09-22
  • Nickolay V. Shmyrev

    The issue with your database is that it has too big silences in the middle of
    each recording. Basically you are recognizing isolated words, not words in the
    context as you specified in prompts.

    Since you have accent, you need to adapt the dictionary, for example add

    the(3)  DH AE
    

    If you want more accuracy you can use more adaptation data, with 10 times more
    adaptation prompts you'll get more accuracy as well.

    Since you have big silence and number of insertions is big you can try to
    lower word insertion probability with

    -wip 1e-3
    
     
  • Rethish

    Rethish - 2010-11-30

    Hi Nickolay,

    Thanks for your attention to the issue.

    Though the 'wip' option, but did not give a jump in accuracy, the dictionary
    adaptation gave a 6% improvement in accuracy.
    Since these adaptations are not part of the standard CMU dictionary, where can
    I get these adaptations? or do I have to find them by observing my
    pronunciation?

    Since in my application needs to recognize just a few words, should I still
    train will whole sentences as before or just with the words I need to
    recognize?

    As mentioned before, my controller application uses pocket sphinx in the same
    manner as in freeswitch.
    I find 2 issues during each test.
    1. The first utterance gives a wrong hypothesis every time.
    2. The hypothesis accuracy degrades after a few recognitions.

    Any recommendations on how to solve these? I found the same issue mentioned at
    in other forums and it seems an improved language mode can solve the latter
    issue.
    How can I create an improved language model?

    Thanks in advance
    Rethish

     
  • Nickolay V. Shmyrev

    . Since these adaptations are not part of the standard CMU dictionary, where
    can I get these adaptations? or do I have to find them by observing my
    pronunciation?

    Unfortunately we don't have any data to provide dictionaries for various
    regional dialects of English so you have to create such dictionary yourself.
    Standard dictionary only covers US English

    1. The first utterance gives a wrong hypothesis every time. 2. The
      hypothesis accuracy degrades after a few recognitions. Any recommendations on
      how to solve these?

    It looks like the issue with normalization estimation which is discussed
    currently in the bug

    https://sourceforge.net/tracker/?func=detail&atid=101904&aid=3117707&group_id
    =1904

    I can look on this issue, but you need to provide me an example I can
    reproduce locally. I don't observe such behaviour here.
    This example needs to be self-contained application reading audio from a wav
    file.

    I found the same issue mentioned at in other forums and it seems an improved
    language mode can solve the latter issue. How can I create an improved
    language model?

    I'm not sure what are you talking about, sorry.

     
  • Nickolay V. Shmyrev

    Since in my application needs to recognize just a few words, should I still
    train will whole sentences as before or just with the words I need to
    recognize?

    You should adapt with the sentences containing the words you will use. But you
    can use more sentences to get better adaptation. It's preferable to have 20-30
    samples of each word.

     
  • Rethish

    Rethish - 2010-11-30

    Hi Nickolay,
    Thanks a lot for the clarification.

    This example needs to be self-contained application reading audio from a wav file.
    

    Currently my application is not a standalone one. It receives pcm data from a
    tcp stream. I shall try to come up with a standalone demo.

    Rethish

     

Log in to post a comment.