I hope that somebody is able to help me.
I'm trying to create my own language and acoustic models. I thought to create an acoustic model starting with a little corpus (12 sentences) and then verify that everything was ok trying to transcribe files of the corpus (I think that even if my model is very poor it should be able to transcribe the corpus).
I've also created a language model with the CMU Toolkit. And the dump with lm3g2dmp.
Then I tried to transcribe files of corpus and I have this error:
Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.getInitialSearchState(LexTreeLinguist.java:341)
What I can understand looking into the code is that into the language model there isn't a "<s>" word that is used for getting the initial search state.
But how can I put this word into the language model?
I don't know if the problem is clear ...
The problem is into method connectSingleUnitWords of class HMMTree where the variable initialNode is never inizialized, so I get NullPointerException .....
Any idea?
Thanks
Mic
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Many thanks again.
Training the language model with the transcription of the wav files give me a precision of more or less 90% with transcribing the training materials.
Next step will be the collection of a rich corpus for training.
Now that I'm sure that I'm able to create acoustic and language models I'll study the theory beneath speech recognition.
I've only another question: training cd acoustic model (module 50.cd_hmm_tied of SphinxTrain) I have many errors of this kind:
time.4.1.norm.log:ERROR: "gauden.c", line 1700: var (mgau= 650, feat= 0, density=1, component=33) < 0
For my tests I'm using ci model so it's not a problem but I'd like to understand why it's happening.
Mic
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
By the way. Anyone knows a Text corpus containing a smaller amount of the english words most used (perhaps about 500 words) and representing most probable sentence kombinations?? I need this to build a language model which is supposed to help rejecting out-og-grammar utterances of a FSG-grammar I use.
Sorry einsteinmic, for misusing your thread - the topic is close, and I dont wanted to open again a new thread.
regards M.D.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Many thanks Nickolay now it's working :) but I have a correctness of < 30% using the same audio files used for the training of the acoustic model :(
I don't know the theory beneath the speech recognition (are there simple technical articles on these subject that I can read?)
I expected better precision using the same material (more or less 100%), but probably I was wrong.
I think that the problem can be caused by:
errors in parameters of configuration file
very poor acoustic/language model
not so good audio files (I've amplified them in order to have a peak amplitude at -3 dB and attached 0.1 - 0.2 seconds of silence at the beginning and at the end of the file)
Do you see any other problem?
My next step will be the creation of more rich acoustic and language models.
Many thanks for your help.
Michele
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
First of all you just have not enough data to decode such rich vocabulary text. Your language model is completely senseless for a few sentences you are trying to decode. Start with recognition of the numbers for example and build language model from your time_train.transcription text instead.
Once you'll have more data, you could proceed with large vocabulary. Models are called "models" because they model the speech you are trying to decode. While you are trying to decode a text with a model trained from completely different text.
Indeed, google for some tutorial on speech recognition. Read HTK book for example. Remember that speech recognition is not an easy area you can jump in with a few minutes of reading a popular article.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I hope that somebody is able to help me.
I'm trying to create my own language and acoustic models. I thought to create an acoustic model starting with a little corpus (12 sentences) and then verify that everything was ok trying to transcribe files of the corpus (I think that even if my model is very poor it should be able to transcribe the corpus).
I've also created a language model with the CMU Toolkit. And the dump with lm3g2dmp.
Then I tried to transcribe files of corpus and I have this error:
Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.getInitialSearchState(LexTreeLinguist.java:341)
What I can understand looking into the code is that into the language model there isn't a "<s>" word that is used for getting the initial search state.
But how can I put this word into the language model?
I don't know if the problem is clear ...
The problem is into method connectSingleUnitWords of class HMMTree where the variable initialNode is never inizialized, so I get NullPointerException .....
Any idea?
Thanks
Mic
Many thanks again.
Training the language model with the transcription of the wav files give me a precision of more or less 90% with transcribing the training materials.
Next step will be the collection of a rich corpus for training.
Now that I'm sure that I'm able to create acoustic and language models I'll study the theory beneath speech recognition.
I've only another question: training cd acoustic model (module 50.cd_hmm_tied of SphinxTrain) I have many errors of this kind:
time.4.1.norm.log:ERROR: "gauden.c", line 1700: var (mgau= 650, feat= 0, density=1, component=33) < 0
For my tests I'm using ci model so it's not a problem but I'd like to understand why it's happening.
Mic
This was discussed too many times, I think we should just fix this in sources.
time.4.1.norm.log:ERROR: "gauden.c", line 1700: var (mgau= 650, feat= 0, density=1, component=33) < 0
You have too many senones for such a small amount of data. Once you'll have more data this error will disappear.
To build proper language model use online tool like lmtool:
http://www.speech.cs.cmu.edu/tools/lmtool.html
or share all your files so we can check where you made a mistake.
By the way. Anyone knows a Text corpus containing a smaller amount of the english words most used (perhaps about 500 words) and representing most probable sentence kombinations?? I need this to build a language model which is supposed to help rejecting out-og-grammar utterances of a FSG-grammar I use.
Sorry einsteinmic, for misusing your thread - the topic is close, and I dont wanted to open again a new thread.
regards M.D.
What about just using phone loop?
Here I am (again...)
I've uploaded all my files so if there is somebody who wants to check where is the mistake (probably it will something of very stupid :()
I've uploaded:
- language model http://research.eurixgroup.com/vigilante/timeLM.tgz (the steps.txt file contains the commands I've used for generating the language model)
- acoustic model http://research.eurixgroup.com/vigilante/TimeAM.tgz (contains the directory generated from SphinxTrain with results of training)
- wav and mfc files http://research.eurixgroup.com/vigilante/Corpus.tgz
- jar containing the acoustic model for sphinx4 (http://research.eurixgroup.com/vigilante/ITA.jar)
- configuration file used http://research.eurixgroup.com/vigilante/config.xml
- jar used with testing application http://research.eurixgroup.com/vigilante/TrascrittoreIta.jar (I've build it starting from the Transcriber demo so it works in the same way java -Xmx1024M -Xms512M bin/TrascrittoreIta.jar file.wav
- train.sh file containing steps done for training my acoustic model
I hope that someone can help me.
Thanks
Michele
PS It's Italian material
As a first step in your steps.txt you should just add <s> and </s> before and after each sentence. You simple script like this one to do that:
awk '{ print "<s>", $0, "</s>"}' < text.txt > text.lc.txt
or look into simplelm.pl script which does exactly that among other things. The result should look like:
<s> BON JOVI CANTA EVERYDAY </s>
Many thanks Nickolay now it's working :) but I have a correctness of < 30% using the same audio files used for the training of the acoustic model :(
I don't know the theory beneath the speech recognition (are there simple technical articles on these subject that I can read?)
I expected better precision using the same material (more or less 100%), but probably I was wrong.
I think that the problem can be caused by:
Do you see any other problem?
My next step will be the creation of more rich acoustic and language models.
Many thanks for your help.
Michele
First of all you just have not enough data to decode such rich vocabulary text. Your language model is completely senseless for a few sentences you are trying to decode. Start with recognition of the numbers for example and build language model from your time_train.transcription text instead.
Once you'll have more data, you could proceed with large vocabulary. Models are called "models" because they model the speech you are trying to decode. While you are trying to decode a text with a model trained from completely different text.
Indeed, google for some tutorial on speech recognition. Read HTK book for example. Remember that speech recognition is not an easy area you can jump in with a few minutes of reading a popular article.