I have used the following maven dependency in my Speech Recognition project
Sphinx Core
<dependency>
<groupid>edu.cmu.sphinx</groupid>
<artifactid>sphinx4-core</artifactid>
<version>1.0-SNAPSHOT</version>
</dependency>
Sphinx Data
<dependency>
<groupid>edu.cmu.sphinx</groupid>
<artifactid>sphinx4-data</artifactid>
<version>1.0-SNAPSHOT</version>
</dependency>
I have opted the default acoustic and language models but the transcriptions accuracy were not great.
Now I would like to test this acoustic model with the recent code updates. I could see two repositories one on github (https://github.com/cmusphinx/sphinx4) and another one on sourceforge (http://sourceforge.net/projects/cmusphinx/files/sphinx4/5prealpha/)
Which code will go well with the mentioned acoustic model? is there maven repository link for recent Github code updates?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Which code will go well with the mentioned acoustic model? is there maven repository link for recent Github code updates?
It is not different which code to try, they all should work equally well.
I have opted the default acoustic and language models but the transcriptions accuracy were not great.
You can not fix the accuracy by changing the model or the code. There are other problems most likely like incorrect data format of the input data, noise or accent. In order to get help on the accuracy you need to provide test data you are trying.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Attaching an another file with british accent
I have used the following ffmpeg syntax to convert
ffmpeg -i british.mp4 -acodec pcm_s16le -ac 1 -ar 16000 british16000.wav
Our model is En-US. US in En-US means we support US English accents. We do not support other accents including British English. To decode other accents you need to train corresponding model.
Also, transcription is not expected to be 100% accurate, there always will be mistakes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This audio has very loud music noise on background. It breaks voice activity detection so almost no speech is detected.
Try to find audio without noise.
This particular audio is decoded pretty good if you adjust voice activity detector threshold to 3 instead of 13 in default.config.xml and recompile. Result would be
~~~~~~~~~~~~~
i'm hilary clinton i have been proud and privilege to serve as first lady as
a senator from new york and as secretary of state
and the granddaughter of a factory worker and a grandmother lives wonderful light you wrote child and everyday i think about what we need to do to make fewer than opportunity is available not just for clever all of our children and knives that's a very long time i had fired the why
lazy even the odds to help people get in that
find ways for each child with the visitor god given potential i traveled across our country over the last wants
and learning
and i with more words visited lions
about how we're going to read a word that a jobs
i guess the interest on hearing the energy by making it possible once again to invest in science and research and taking the opportunity of lifeline of james while our time at the center of my campaign is how we're going to raise wages yes wars
raise the minimum wage
but we have to do so much more include a finding ways that confidence their rockets with the workers who have today that
then we have to figure out how we're going to attack system and there are one right now the wealthy pay too little and that'll last phase in i'm leaving evil say the women buy a house of live outside the same family leave for a half have
for the visit about bringing on one thing together again
and i will do everything i can
he omitted lives
the device economically because there's jim i didn't fall against racial divides
it's a new use them in a city gets the obviate xavier is that we will work together yes i really bothers them able to say their daughters into an ally city that i have
~~~~~~~~~~~
Overall, a better detector could be certainly implemented, for example pocketsphinx_continuous decodes this audio pretty well out of box.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Gradle build is getting failed due to the failure of Test Cases
Gradle test > edu.cmu.sphinx.result.LatticeCompTest.testLatticeComp FAILED
java.lang.AssertionError at LatticeCompTest.java:83
Gradle test > edu.cmu.sphinx.api.LiveRecognizerTest.testGram FAILED
java.lang.AssertionError at LiveRecognizerTest.java:56
Gradle test > edu.cmu.sphinx.api.LiveRecognizerTest.testLm FAILED
java.lang.AssertionError at LiveRecognizerTest.java:30
Last edit: salientvijay 2015-10-18
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i skipped the test but accuracy for attached file are not great
i didn't bonnie calling on behalf of the bank","kiki them to call back to your earlier convenient","the number to call back in one eighty eight feet","eight","you too thick","or you make out a number back of your ticket cart","each lecture","time to creeping up on
This file has format GSM6.10 8khz. To decode it accurately you need to convert it to PCM 16bit 8khz audio. Sphinx4 does not do conversion. To decode telephony quality speech you also need to use 8khz acoustic model, not standard 16khz model.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
i this is donny calling on behalf of the peak he gets a call back at your earliest convenience the number to call back in one eighty eight eighty eighteen two to six or you make out of embargo back of a cart he's much hurt times have a great being the boss
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have used the following maven dependency in my Speech Recognition project
Sphinx Core
<dependency>
<groupid>edu.cmu.sphinx</groupid>
<artifactid>sphinx4-core</artifactid>
<version>1.0-SNAPSHOT</version>
</dependency>
Sphinx Data
<dependency>
<groupid>edu.cmu.sphinx</groupid>
<artifactid>sphinx4-data</artifactid>
<version>1.0-SNAPSHOT</version>
</dependency>
I have opted the default acoustic and language models but the transcriptions accuracy were not great.
I found the very recent acoustic model from this link http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/cmusphinx-en-us-5.2.tar.gz/download
i.e, cmusphinx-en-us-5.2.tar.gz
Now I would like to test this acoustic model with the recent code updates. I could see two repositories one on github (https://github.com/cmusphinx/sphinx4) and another one on sourceforge (http://sourceforge.net/projects/cmusphinx/files/sphinx4/5prealpha/)
Which code will go well with the mentioned acoustic model? is there maven repository link for recent Github code updates?
It is not different which code to try, they all should work equally well.
You can not fix the accuracy by changing the model or the code. There are other problems most likely like incorrect data format of the input data, noise or accent. In order to get help on the accuracy you need to provide test data you are trying.
Nickolay
Attached one is the test data and converted into wav format
Attaching an another file with british accent
I have used the following ffmpeg syntax to convert
ffmpeg -i british.mp4 -acodec pcm_s16le -ac 1 -ar 16000 british16000.wav
Last edit: salientvijay 2015-10-16
Hi Nickolay
One more thing the precise is very bad with female voice
Our model is En-US. US in En-US means we support US English accents. We do not support other accents including British English. To decode other accents you need to train corresponding model.
Also, transcription is not expected to be 100% accurate, there always will be mistakes.
Even US female accent transcription is very bad . How can I improve the accuracy?
Last edit: salientvijay 2015-10-17
First of all provide the audio sample for that female recording.
Here it is
This audio has very loud music noise on background. It breaks voice activity detection so almost no speech is detected.
Try to find audio without noise.
This particular audio is decoded pretty good if you adjust voice activity detector threshold to 3 instead of 13 in default.config.xml and recompile. Result would be
~~~~~~~~~~~~~
i'm hilary clinton i have been proud and privilege to serve as first lady as
a senator from new york and as secretary of state
and the granddaughter of a factory worker and a grandmother lives wonderful light you wrote child and everyday i think about what we need to do to make fewer than opportunity is available not just for clever all of our children and knives that's a very long time i had fired the why
lazy even the odds to help people get in that
find ways for each child with the visitor god given potential i traveled across our country over the last wants
and learning
and i with more words visited lions
about how we're going to read a word that a jobs
i guess the interest on hearing the energy by making it possible once again to invest in science and research and taking the opportunity of lifeline of james while our time at the center of my campaign is how we're going to raise wages yes wars
raise the minimum wage
but we have to do so much more include a finding ways that confidence their rockets with the workers who have today that
then we have to figure out how we're going to attack system and there are one right now the wealthy pay too little and that'll last phase in i'm leaving evil say the women buy a house of live outside the same family leave for a half have
for the visit about bringing on one thing together again
and i will do everything i can
he omitted lives
the device economically because there's jim i didn't fall against racial divides
it's a new use them in a city gets the obviate xavier is that we will work together yes i really bothers them able to say their daughters into an ally city that i have
~~~~~~~~~~~
Overall, a better detector could be certainly implemented, for example pocketsphinx_continuous decodes this audio pretty well out of box.
Hi Nickolay
If I set to 3
Gradle build is getting failed due to the failure of Test Cases
Last edit: salientvijay 2015-10-18
Tests are not critical, you can skip them.
i skipped the test but accuracy for attached file are not great
i didn't bonnie calling on behalf of the bank","kiki them to call back to your earlier convenient","the number to call back in one eighty eight feet","eight","you too thick","or you make out a number back of your ticket cart","each lecture","time to creeping up on
This file has format GSM6.10 8khz. To decode it accurately you need to convert it to PCM 16bit 8khz audio. Sphinx4 does not do conversion. To decode telephony quality speech you also need to use 8khz acoustic model, not standard 16khz model.
After converting to PCM 16bit 8khz audio and using 8khz acoustic model http://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/US%20English%20Generic%20Acoustic%20Model/cmusphinx-en-us-8khz-5.1.tar.gz/download
output is very poor
oh whoa i i i i got up look
Audio: pcm_s16le ([1][0][0][0] / 0x0001), 8000 Hz, 1 channels, s16, 128 kb/s
Last edit: salientvijay 2015-10-27
You need to set a sample rate in the recognizer
The accuracy is not great
i this is donny calling on behalf of the peak he gets a call back at your earliest convenience the number to call back in one eighty eight eighty eighteen two to six or you make out of embargo back of a cart he's much hurt times have a great being the boss