I've already finished my project.
The exploding likelihood explained 2 months ago came from bad sound files and could not be repaired. So I had to switch to another corpus and lo and behold ;-) it works fine :-)
But another problem occures now by building my own language model and involving it into my system. I didn't find any solution in the net, nor in the docu neither in the forum here. I first tried to build the binary-lm explained by the SLMT(statistical language modelling toolkit)-docu and it won't work, so the ARPA version. The one explained in the sphinx4-docu doesn't work, too.
I get some error messages like "Bad format" or something else... Also the lm3g2dmp won't help, first of all, it is not able to handle case-sensitiveness and then it wont't produce the right format. So why do we have it??
could someone who successfully made and integrated an own LM give some helpul tipps?
Thanks a lot,
Sebastian
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So, I decided to build a LM with only lowercase words and tried to build the DMP file with the lm3g2dmp tool. The result is, that I get
a NullPointer exception when starting the speech recognition with the following message:
java.lang.NullPointerException
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.getInitialSearchState(LexTreeLinguist.java:461)
Does anyone has an idea what this could be?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm getting the same problem - the problem is definitely with the toolkit:
I put my text corpus the the Online QuickLM tool (this small corpus has 660 words) and that one works for me.
When I run that exact same corpus thru the CMU SLMT, I get the null pointer exception, specifically:
Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.getInitialSearchState(LexTreeLinguist.java:461)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLinguist.java:487)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:406)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:323)
at edu.cmu.sphinx.decoder.Decoder.allocate(Decoder.java:109)
at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:182)
====================================================================
What I noticed is three things:
1. the working LM has all units in UPPERCASE and the non-working LM has all units in lowercase
2. the working LM has entries for silence tags - <S> and </S> and the non-working LM does not.
3. the working LM has no entry for the unknown tag - <UNK> and the non-working LM does.
I need to get the CMU SLMT working - I have a corpus of 5104 words ready to be tested and I can't build the LM using the QuickLM for that many words.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm having similar problems, I've tried to build my own lanuage model using CMU SLMT but sphinx 4 complained about having unexpeted EOF in the .lm file.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hello,
I've already finished my project.
The exploding likelihood explained 2 months ago came from bad sound files and could not be repaired. So I had to switch to another corpus and lo and behold ;-) it works fine :-)
But another problem occures now by building my own language model and involving it into my system. I didn't find any solution in the net, nor in the docu neither in the forum here. I first tried to build the binary-lm explained by the SLMT(statistical language modelling toolkit)-docu and it won't work, so the ARPA version. The one explained in the sphinx4-docu doesn't work, too.
I get some error messages like "Bad format" or something else... Also the lm3g2dmp won't help, first of all, it is not able to handle case-sensitiveness and then it wont't produce the right format. So why do we have it??
could someone who successfully made and integrated an own LM give some helpul tipps?
Thanks a lot,
Sebastian
So, I decided to build a LM with only lowercase words and tried to build the DMP file with the lm3g2dmp tool. The result is, that I get
a NullPointer exception when starting the speech recognition with the following message:
java.lang.NullPointerException
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.getInitialSearchState(LexTreeLinguist.java:461)
Does anyone has an idea what this could be?
I'm getting the same problem - the problem is definitely with the toolkit:
I put my text corpus the the Online QuickLM tool (this small corpus has 660 words) and that one works for me.
When I run that exact same corpus thru the CMU SLMT, I get the null pointer exception, specifically:
Exception in thread "main" java.lang.NullPointerException
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.getInitialSearchState(LexTreeLinguist.java:461)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLinguist.java:487)
at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:406)
at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:323)
at edu.cmu.sphinx.decoder.Decoder.allocate(Decoder.java:109)
at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:182)
====================================================================
What I noticed is three things:
1. the working LM has all units in UPPERCASE and the non-working LM has all units in lowercase
2. the working LM has entries for silence tags - <S> and </S> and the non-working LM does not.
3. the working LM has no entry for the unknown tag - <UNK> and the non-working LM does.
I need to get the CMU SLMT working - I have a corpus of 5104 words ready to be tested and I can't build the LM using the QuickLM for that many words.
FOUND IT - You need to have entries for silence tags ... <s> and </s> ...
I added the silence tags to my text corpus (at the beginning and end of each sentence) and the LM I created using the CMU SLMT worked just fine.
Good Luck on yours.
noone any idea? does noone hasexperiences with CMU SLT? I cant believe that!
I'm having similar problems, I've tried to build my own lanuage model using CMU SLMT but sphinx 4 complained about having unexpeted EOF in the .lm file.