Menu

PocketSphinx Accuray

Help
2016-06-13
2016-06-14
  • Bavan Palan

    Bavan Palan - 2016-06-13

    I wanted to create an application in Objective-C

    where
    1. I will give it few audio clips.
    2. It will spit out the transcripts.

    The Obvious choise is to go with Pocket Sphinx. But the problem is the accuracy. I ran few tests.

    First I found some clips (16 Bit, 16KHz, Mono, lil endian) online and tried it. Then I tried mine. Online clips were nicely recognized,but my ones were really bad.


    Results:
    Clip 1: Found Online, Native Speaker
    Original Script : once there was a young rat named arthur who never could make up his mind
    PocketSphinx : once there was a young rat named arthur who never could make up his mind
    Accuracy: Fantasitic

    Clip 2: Found Online, Native Speaker
    Original Script : whenever his friends ask him if he would like to go with them
    PocketSphinx : whenever his friends ask him if you would like to call with them
    Accuracy: Very Good

    Clip 3: Found Online, Native Speaker
    Original Script : he would only answer i don't know. He wouldn't say yes or no either
    PocketSphinx : you would only answer i don't know what you wouldn't say yes or no either
    Accuracy: Very Good

    Clip 4: Youtube, Native Speaker
    Original Script : let's talk about merge sort. So far you've seen bubble sort, insertion sort and selection sort. Although all, I kind of waive my hand at what i mean by better, merger sort generally performs better than any of these three sorts.
    PocketSphinx : let's talk about words were so far are you see all story user shoes or in selections or although all kind of waive my hand what i mean i'd better words are generally performs better in any of these resorts
    Accuracy: Lol

    Clip 5: Mine, Non native Speaker
    Original Script : Over the years, there have been many frameworks, using javascript to create ios applications. So what make React native Special?
    PocketSphinx : or three years the army navy frame looks using jobless rate to create a pilot had editions so what makes recreate in spaceship
    Accuracy: Lol


    Based on the results above I thought something wrong with my Audio format. But all files return the following when I type, $file filename.wav

    RIFF (little-endian) data, WAVE audio, Microsoft PCM, 16 bit, mono 16000 Hz

    So, without any clue, I thought I should either train the pocket sphinx to understand my accent better (which i have no clue how to do) or limit pocket sphinx's vocabulary by giving it the transcript of my audio clips. The second option has to be done at runtime, but I am not aware of any api that create a model dynamically.

    1. What is the best way to achieve I am after?
    2. What is wrong with Clip 5 compared to Clip 1? Clip 1 has noice, still, it is better recognised.
    3. If I have a transcript of my scpeech, Is there a way I could feed both Audio and Transcript in the system so that I could get a better accuracy?

    Appricate your time guys. Thank you very much.

     

    Last edit: Bavan Palan 2016-06-13
    • Nickolay V. Shmyrev

      Clip 4: Youtube, Native Speaker

      This one was compressed heavily with a codec, so audio was corrupted. We are not that great on compressed audio yet.

      Clip 5: Mine, Non native Speaker

      Yes, our models are not great for non-natives.

      A specialized topic contributes to the accuracy in two samples above as well. Generic model is more biased to a simple langauge.

      The second option has to be done at runtime, but I am not aware of any api that create a model dynamically.

      It was discussed on the list, the API is simple but not implemented yet. You have to reimplement quick_lm.pl perl script yourself. You also need to integrate a g2p component to assign pronunciation to unknown words.

      https://sourceforge.net/p/cmusphinx/discussion/help/thread/dd998add/

       
      • Bavan Palan

        Bavan Palan - 2016-06-14

        Brilliant Explanation. Thank you so much Nickolay.

         

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.