I have trained a new acoustic model using sphinxtrain. The database used for training is very small and has only 17 words in vocabulary( including fillers ). The duration of speech is only 16 seconds.
For this I have given senone counts randomly(100,200, 300 and 500 - tried all these), as I have no clear idea on senone concept.
When I use the acoustic model trained with above configuration, I found runtime exceptions. Please find the error details
java.lang.ArrayIndexOutOfBoundsException: 39
at edu.cmu.sphinx.linguist.acoustic.tiedstate.MixtureComponent.getScore(MixtureComponent.java:195)
at edu.cmu.sphinx.linguist.acoustic.tiedstate.GaussianMixture.calculateScore(GaussianMixture.java:130)
at edu.cmu.sphinx.linguist.acoustic.tiedstate.ScoreCachingSenone.getScore(ScoreCachingSenone.java:40)
at edu.cmu.sphinx.linguist.acoustic.tiedstate.SenoneHMMState.getScore(SenoneHMMState.java:85)
at edu.cmu.sphinx.linguist.flat.HMMStateState.getScore(HMMStateState.java:85)
at edu.cmu.sphinx.decoder.search.Token.calculateScore(Token.java:177)
at edu.cmu.sphinx.decoder.scorer.SimpleAcousticScorer.doScoring(SimpleAcousticScorer.java:164)
at edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer.doScoring(ThreadedAcousticScorer.java:198)
at edu.cmu.sphinx.decoder.scorer.SimpleAcousticScorer.calculateScores(SimpleAcousticScorer.java:87)
at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.scoreTokens(SimpleBreadthFirstSearchManager.java:363)
at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.recognize(SimpleBreadthFirstSearchManager.java:293)
at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.recognize(SimpleBreadthFirstSearchManager.java:225)
at edu.cmu.sphinx.decoder.Decoder.decode(Decoder.java:65)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:110)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:126)
at voicecommand.ListenerThread.run(ListenerThread.java:64)
at java.lang.Thread.run(Thread.java:722)
*The same set of error is found 17 times at once.
Please help me.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am using Sphinx4 jar file along with acoustic model jar file in my application. The sphinx4 works for me when I use it with default acoustic model provided by CMU Sphinx i.e. WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You need to provide information on how exactly did you use sphinx4, what changes have you made in the code, what demo did you use, how did you modify it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In my application I'm using Grammar file and wav file as audio source.
It can be said as a mixture of 'HelloWorld' and 'LatticeDemo'. I have combined the code from both apps in order to use grammar file and wav file in single app.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
When I tested with LatticeDemo and HelloWorld seperately, it worked. I did not find any error. So I think something was wrong with my code.
Thank you Nickolay.
Now the problem is when I run with LatticeDemo(i.e. using audio file) it gives me result text. But if I speak through microphone (i.e using HelloWorld app), it does not give me the text back. Why it is so?
What is the best senone count for training acoustic model from my database?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The order of elements is important, you shouldn't out audioFileDataSource on the last position. Instead you should replace microphone component with audioFileDataSoruce.
Anyway, if configuration files are too complicated for you, you need to use latest sphinx4-5prealpha API which doesn't require any config files and should work for you out-of box. Please see for details:
Now the problem is, I am not getting the result text (i.e recognized text) with the acoustic model built by me. Is it because of the small amount of data used for training acoustic model? or because of any wrong configuration (like number of tied_states or densities) made by me during the training process?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I have trained a new acoustic model using sphinxtrain. The database used for training is very small and has only 17 words in vocabulary( including fillers ). The duration of speech is only 16 seconds.
For this I have given senone counts randomly(100,200, 300 and 500 - tried all these), as I have no clear idea on senone concept.
When I use the acoustic model trained with above configuration, I found runtime exceptions. Please find the error details
java.lang.ArrayIndexOutOfBoundsException: 39
at edu.cmu.sphinx.linguist.acoustic.tiedstate.MixtureComponent.getScore(MixtureComponent.java:195)
at edu.cmu.sphinx.linguist.acoustic.tiedstate.GaussianMixture.calculateScore(GaussianMixture.java:130)
at edu.cmu.sphinx.linguist.acoustic.tiedstate.ScoreCachingSenone.getScore(ScoreCachingSenone.java:40)
at edu.cmu.sphinx.linguist.acoustic.tiedstate.SenoneHMMState.getScore(SenoneHMMState.java:85)
at edu.cmu.sphinx.linguist.flat.HMMStateState.getScore(HMMStateState.java:85)
at edu.cmu.sphinx.decoder.search.Token.calculateScore(Token.java:177)
at edu.cmu.sphinx.decoder.scorer.SimpleAcousticScorer.doScoring(SimpleAcousticScorer.java:164)
at edu.cmu.sphinx.decoder.scorer.ThreadedAcousticScorer.doScoring(ThreadedAcousticScorer.java:198)
at edu.cmu.sphinx.decoder.scorer.SimpleAcousticScorer.calculateScores(SimpleAcousticScorer.java:87)
at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.scoreTokens(SimpleBreadthFirstSearchManager.java:363)
at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.recognize(SimpleBreadthFirstSearchManager.java:293)
at edu.cmu.sphinx.decoder.search.SimpleBreadthFirstSearchManager.recognize(SimpleBreadthFirstSearchManager.java:225)
at edu.cmu.sphinx.decoder.Decoder.decode(Decoder.java:65)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:110)
at edu.cmu.sphinx.recognizer.Recognizer.recognize(Recognizer.java:126)
at voicecommand.ListenerThread.run(ListenerThread.java:64)
at java.lang.Thread.run(Thread.java:722)
*The same set of error is found 17 times at once.
Please help me.
You need to provide acoustic model training folder in order to get help on this issue.
You need to provide information about sphinx4 version you are using and how exactly did you use it.
You can pack files in a single archive and share them through dropbox.
I am using Sphinx4-1.0beta6.
I have shared files, please find here.
https://drive.google.com/file/d/0B25RAqomLW2nOVlkUlhJUk95QTA/edit?usp=sharing
Your files can not be accessed. Requires permission.
I am sorry. You can access now.
https://drive.google.com/file/d/0B25RAqomLW2nYUNkVXBBUzBWRG8/edit?usp=sharing
You need to provide a whole folder including logs, trained models and
decoding results, not just input files.
You need to provide information on how exactly did you use sphinx4.
Yes I did that. Here is the link
https://drive.google.com/file/d/0B25RAqomLW2nTGJDN0pEWGRnb2s/edit?usp=sharing
I am using Sphinx4 jar file along with acoustic model jar file in my application. The sphinx4 works for me when I use it with default acoustic model provided by CMU Sphinx i.e. WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.
You need to provide information on how exactly did you use sphinx4, what changes have you made in the code, what demo did you use, how did you modify it.
In my application I'm using Grammar file and wav file as audio source.
It can be said as a mixture of 'HelloWorld' and 'LatticeDemo'. I have combined the code from both apps in order to use grammar file and wav file in single app.
You need to share your modified code.
Here I have shared my modified code used for recognition.
https://drive.google.com/file/d/0B25RAqomLW2nZnNKYUttUjJwd1k/edit?usp=sharing
Please help me in sorting out this problem.
When I tested with LatticeDemo and HelloWorld seperately, it worked. I did not find any error. So I think something was wrong with my code.
Thank you Nickolay.
Now the problem is when I run with LatticeDemo(i.e. using audio file) it gives me result text. But if I speak through microphone (i.e using HelloWorld app), it does not give me the text back. Why it is so?
What is the best senone count for training acoustic model from my database?
The problem with your current code is that you incorrectly modified the config file, in particular frontend component:
~~~~~~~
<component name="epFrontEnd" type="edu.cmu.sphinx.frontend.FrontEnd">
<propertylist name="pipeline">
<item>microphone </item>
<item>dataBlocker </item>
<item>speechClassifier </item>
<item>speechMarker </item>
<item>nonSpeechDataFilter </item>
<item>preemphasizer </item>
<item>windower </item>
<item>fft </item>
<item>melFilterBank </item>
<item>dct </item>
<item>liveCMN </item>
<item>featureExtraction </item>
<item>audioFileDataSource </item>
</propertylist>
</component>
~~~~~~~~~~~~~
The order of elements is important, you shouldn't out audioFileDataSource on the last position. Instead you should replace microphone component with audioFileDataSoruce.
Anyway, if configuration files are too complicated for you, you need to use latest sphinx4-5prealpha API which doesn't require any config files and should work for you out-of box. Please see for details:
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4
Thank you so much Nickolay.
Now the problem is, I am not getting the result text (i.e recognized text) with the acoustic model built by me. Is it because of the small amount of data used for training acoustic model? or because of any wrong configuration (like number of tied_states or densities) made by me during the training process?
Yes
What would be the minimum amount of data (in hours) required to build a good acoustic model?
Is there any problem in giving random numbers for tied_states and density count? If yes, how can I give proper number for the same?
You can find the answer on both question in acoustic model training tutorial
http://cmusphinx.sourceforge.net/wiki/tutorialam