Menu

Sphinx Recent Code and models

Help
2015-10-15
2015-10-27
  • salientvijay

    salientvijay - 2015-10-15

    I have used the following maven dependency in my Speech Recognition project

    Sphinx Core

    <dependency>
    <groupid>edu.cmu.sphinx</groupid>
    <artifactid>sphinx4-core</artifactid>
    <version>1.0-SNAPSHOT</version>
    </dependency>
    Sphinx Data

    <dependency>
    <groupid>edu.cmu.sphinx</groupid>
    <artifactid>sphinx4-data</artifactid>
    <version>1.0-SNAPSHOT</version>
    </dependency>
    I have opted the default acoustic and language models but the transcriptions accuracy were not great.

    I found the very recent acoustic model from this link http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/cmusphinx-en-us-5.2.tar.gz/download

    i.e, cmusphinx-en-us-5.2.tar.gz

    Now I would like to test this acoustic model with the recent code updates. I could see two repositories one on github (https://github.com/cmusphinx/sphinx4) and another one on sourceforge (http://sourceforge.net/projects/cmusphinx/files/sphinx4/5prealpha/)

    Which code will go well with the mentioned acoustic model? is there maven repository link for recent Github code updates?

     
    • Nickolay V. Shmyrev

      Which code will go well with the mentioned acoustic model? is there maven repository link for recent Github code updates?

      It is not different which code to try, they all should work equally well.

      I have opted the default acoustic and language models but the transcriptions accuracy were not great.

      You can not fix the accuracy by changing the model or the code. There are other problems most likely like incorrect data format of the input data, noise or accent. In order to get help on the accuracy you need to provide test data you are trying.

       
  • salientvijay

    salientvijay - 2015-10-16

    Nickolay

    Attached one is the test data and converted into wav format

     
  • salientvijay

    salientvijay - 2015-10-16

    Attaching an another file with british accent
    I have used the following ffmpeg syntax to convert
    ffmpeg -i british.mp4 -acodec pcm_s16le -ac 1 -ar 16000 british16000.wav

     

    Last edit: salientvijay 2015-10-16
  • salientvijay

    salientvijay - 2015-10-16

    Hi Nickolay

    One more thing the precise is very bad with female voice

     
    • Nickolay V. Shmyrev

      Our model is En-US. US in En-US means we support US English accents. We do not support other accents including British English. To decode other accents you need to train corresponding model.

      Also, transcription is not expected to be 100% accurate, there always will be mistakes.

       
  • salientvijay

    salientvijay - 2015-10-17

    Even US female accent transcription is very bad . How can I improve the accuracy?

     

    Last edit: salientvijay 2015-10-17
  • Nickolay V. Shmyrev

    First of all provide the audio sample for that female recording.

     
  • salientvijay

    salientvijay - 2015-10-17

    Here it is

     
    • Nickolay V. Shmyrev

      This audio has very loud music noise on background. It breaks voice activity detection so almost no speech is detected.

      Try to find audio without noise.

      This particular audio is decoded pretty good if you adjust voice activity detector threshold to 3 instead of 13 in default.config.xml and recompile. Result would be

      ~~~~~~~~~~~~~
      i'm hilary clinton i have been proud and privilege to serve as first lady as
      a senator from new york and as secretary of state
      and the granddaughter of a factory worker and a grandmother lives wonderful light you wrote child and everyday i think about what we need to do to make fewer than opportunity is available not just for clever all of our children and knives that's a very long time i had fired the why
      lazy even the odds to help people get in that
      find ways for each child with the visitor god given potential i traveled across our country over the last wants
      and learning
      and i with more words visited lions
      about how we're going to read a word that a jobs
      i guess the interest on hearing the energy by making it possible once again to invest in science and research and taking the opportunity of lifeline of james while our time at the center of my campaign is how we're going to raise wages yes wars
      raise the minimum wage
      but we have to do so much more include a finding ways that confidence their rockets with the workers who have today that
      then we have to figure out how we're going to attack system and there are one right now the wealthy pay too little and that'll last phase in i'm leaving evil say the women buy a house of live outside the same family leave for a half have
      for the visit about bringing on one thing together again
      and i will do everything i can
      he omitted lives
      the device economically because there's jim i didn't fall against racial divides
      it's a new use them in a city gets the obviate xavier is that we will work together yes i really bothers them able to say their daughters into an ally city that i have
      ~~~~~~~~~~~

      Overall, a better detector could be certainly implemented, for example pocketsphinx_continuous decodes this audio pretty well out of box.

       
  • salientvijay

    salientvijay - 2015-10-18

    Hi Nickolay

    If I set to 3

    <component name="speechClassifier"
        type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier">
        <property name="threshold" value="3" />
      </component>
    

    Gradle build is getting failed due to the failure of Test Cases

      Gradle test > edu.cmu.sphinx.result.LatticeCompTest.testLatticeComp FAILED
        java.lang.AssertionError at LatticeCompTest.java:83
    
    Gradle test > edu.cmu.sphinx.api.LiveRecognizerTest.testGram FAILED
        java.lang.AssertionError at LiveRecognizerTest.java:56
    
    Gradle test > edu.cmu.sphinx.api.LiveRecognizerTest.testLm FAILED
        java.lang.AssertionError at LiveRecognizerTest.java:30
    
     

    Last edit: salientvijay 2015-10-18
    • Nickolay V. Shmyrev

      Tests are not critical, you can skip them.

       
  • salientvijay

    salientvijay - 2015-10-18

    i skipped the test but accuracy for attached file are not great

    i didn't bonnie calling on behalf of the bank","kiki them to call back to your earlier convenient","the number to call back in one eighty eight feet","eight","you too thick","or you make out a number back of your ticket cart","each lecture","time to creeping up on

     
    • Nickolay V. Shmyrev

      This file has format GSM6.10 8khz. To decode it accurately you need to convert it to PCM 16bit 8khz audio. Sphinx4 does not do conversion. To decode telephony quality speech you also need to use 8khz acoustic model, not standard 16khz model.

       
  • salientvijay

    salientvijay - 2015-10-27

    The accuracy is not great

    i this is donny calling on behalf of the peak he gets a call back at your earliest convenience the number to call back in one eighty eight eighty eighteen two to six or you make out of embargo back of a cart he's much hurt times have a great being the boss

     

Log in to post a comment.