CMU Sphinx / Forums / Help: Sphinx Recent Code and models

salientvijay - 2015-10-15

I have used the following maven dependency in my Speech Recognition project

Sphinx Core

<dependency>
<groupid>edu.cmu.sphinx</groupid>
<artifactid>sphinx4-core</artifactid>
<version>1.0-SNAPSHOT</version>
</dependency>
Sphinx Data

<dependency>
<groupid>edu.cmu.sphinx</groupid>
<artifactid>sphinx4-data</artifactid>
<version>1.0-SNAPSHOT</version>
</dependency>
I have opted the default acoustic and language models but the transcriptions accuracy were not great.

I found the very recent acoustic model from this link http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/cmusphinx-en-us-5.2.tar.gz/download

i.e, cmusphinx-en-us-5.2.tar.gz

Now I would like to test this acoustic model with the recent code updates. I could see two repositories one on github (https://github.com/cmusphinx/sphinx4) and another one on sourceforge (http://sourceforge.net/projects/cmusphinx/files/sphinx4/5prealpha/)

Which code will go well with the mentioned acoustic model? is there maven repository link for recent Github code updates?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-10-15
  
  Which code will go well with the mentioned acoustic model? is there maven repository link for recent Github code updates?
  
  It is not different which code to try, they all should work equally well.
  
  I have opted the default acoustic and language models but the transcriptions accuracy were not great.
  
  You can not fix the accuracy by changing the model or the code. There are other problems most likely like incorrect data format of the input data, noise or accent. In order to get help on the accuracy you need to provide test data you are trying.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

salientvijay - 2015-10-16

Nickolay

Attached one is the test data and converted into wav format

test.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

salientvijay - 2015-10-16

Attaching an another file with british accent
I have used the following ffmpeg syntax to convert
ffmpeg -i british.mp4 -acodec pcm_s16le -ac 1 -ar 16000 british16000.wav

Last edit: salientvijay 2015-10-16

british16000.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

salientvijay - 2015-10-16

Hi Nickolay

One more thing the precise is very bad with female voice

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-10-17
  
  Our model is En-US. US in En-US means we support US English accents. We do not support other accents including British English. To decode other accents you need to train corresponding model.
  
  Also, transcription is not expected to be 100% accurate, there always will be mistakes.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

salientvijay - 2015-10-17

Even US female accent transcription is very bad . How can I improve the accuracy?

Last edit: salientvijay 2015-10-17

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2015-10-17

First of all provide the audio sample for that female recording.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

salientvijay - 2015-10-17

Here it is

female.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-10-17
  
  This audio has very loud music noise on background. It breaks voice activity detection so almost no speech is detected.
  
  Try to find audio without noise.
  
  This particular audio is decoded pretty good if you adjust voice activity detector threshold to 3 instead of 13 in default.config.xml and recompile. Result would be
  
  ~~~~~~~~~~~~~
  i'm hilary clinton i have been proud and privilege to serve as first lady as
  a senator from new york and as secretary of state
  and the granddaughter of a factory worker and a grandmother lives wonderful light you wrote child and everyday i think about what we need to do to make fewer than opportunity is available not just for clever all of our children and knives that's a very long time i had fired the why
  lazy even the odds to help people get in that
  find ways for each child with the visitor god given potential i traveled across our country over the last wants
  and learning
  and i with more words visited lions
  about how we're going to read a word that a jobs
  i guess the interest on hearing the energy by making it possible once again to invest in science and research and taking the opportunity of lifeline of james while our time at the center of my campaign is how we're going to raise wages yes wars
  raise the minimum wage
  but we have to do so much more include a finding ways that confidence their rockets with the workers who have today that
  then we have to figure out how we're going to attack system and there are one right now the wealthy pay too little and that'll last phase in i'm leaving evil say the women buy a house of live outside the same family leave for a half have
  for the visit about bringing on one thing together again
  and i will do everything i can
  he omitted lives
  the device economically because there's jim i didn't fall against racial divides
  it's a new use them in a city gets the obviate xavier is that we will work together yes i really bothers them able to say their daughters into an ally city that i have
  ~~~~~~~~~~~
  
  Overall, a better detector could be certainly implemented, for example pocketsphinx_continuous decodes this audio pretty well out of box.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

salientvijay - 2015-10-18

Hi Nickolay

If I set to 3

<component name="speechClassifier" type="edu.cmu.sphinx.frontend.endpoint.SpeechClassifier"> <property name="threshold" value="3" /> </component>

Gradle build is getting failed due to the failure of Test Cases

Gradle test > edu.cmu.sphinx.result.LatticeCompTest.testLatticeComp FAILED java.lang.AssertionError at LatticeCompTest.java:83 Gradle test > edu.cmu.sphinx.api.LiveRecognizerTest.testGram FAILED java.lang.AssertionError at LiveRecognizerTest.java:56 Gradle test > edu.cmu.sphinx.api.LiveRecognizerTest.testLm FAILED java.lang.AssertionError at LiveRecognizerTest.java:30

Last edit: salientvijay 2015-10-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-10-18
  
  Tests are not critical, you can skip them.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

salientvijay - 2015-10-18

i skipped the test but accuracy for attached file are not great

i didn't bonnie calling on behalf of the bank","kiki them to call back to your earlier convenient","the number to call back in one eighty eight feet","eight","you too thick","or you make out a number back of your ticket cart","each lecture","time to creeping up on

citi.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-10-18
  
  This file has format GSM6.10 8khz. To decode it accurately you need to convert it to PCM 16bit 8khz audio. Sphinx4 does not do conversion. To decode telephony quality speech you also need to use 8khz acoustic model, not standard 16khz model.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

salientvijay - 2015-10-27

After converting to PCM 16bit 8khz audio and using 8khz acoustic model http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/cmusphinx-en-us-8khz-5.1.tar.gz/download

output is very poor

oh whoa i i i i got up look

Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s

Last edit: salientvijay 2015-10-27

citi.wav

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-10-27
  
  You need to set a sample rate in the recognizer
  
  configuration.setSampleRate(8000);
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

salientvijay - 2015-10-27

The accuracy is not great

i this is donny calling on behalf of the peak he gets a call back at your earliest convenience the number to call back in one eighty eight eighty eighteen two to six or you make out of embargo back of a cart he's much hurt times have a great being the boss

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sphinx Recent Code and models

Speech Recognition Toolkit

Forums

Help

Sphinx Recent Code and models document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Sphinx Recent Code and models