Hi, I have a question about SRILM. I have trained a 10 MB text corpus with
SRILM and now i can use it successfully with Sphinx4. When I train a 400 MB
text corpus with SRILM and try to use it in Sphinx4 i receive exception such
this :
Exception in thread "AWT-EventQueue-0"
java.lang.ArrayIndexOutOfBoundsException: 0 at edu.cmu.sphinx.linguist.lextree
.HMMTree.collectEntryAndExitUnits(HMMTree.java:835) at
edu.cmu.sphinx.linguist.lextree.HMMTree.compile(HMMTree.java:792) at
edu.cmu.sphinx.linguist.lextree.HMMTree.<init>(HMMTree.java:716) at edu.cmu.sp
hinx.linguist.lextree.LexTreeLinguist.generateHmmTree(LexTreeLinguist.java:442
) at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLin
guist.java:429) at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(Le
xTreeLinguist.java:343) at edu.cmu.sphinx.decoder.search.WordPruningBreadthFir
stSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:238) at
edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:87) at
edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:168) at
BatchMode.BatchMode.decode(BatchMode.java:37) at
AraYuz.AraYuz.decodeButtonActionPerformed(AraYuz.java:340) at
AraYuz.AraYuz.access$200(AraYuz.java:30) at
AraYuz.AraYuz$3.actionPerformed(AraYuz.java:118) at
javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1995) at
javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2318)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:
387) at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListene
r.java:236) at java.awt.Component.processMouseEvent(Component.java:6288) at
javax.swing.JComponent.processMouseEvent(JComponent.java:3267) at
java.awt.Component.processEvent(Component.java:6053) at
java.awt.Container.processEvent(Container.java:2041) at
java.awt.Component.dispatchEventImpl(Component.java:4651) at
java.awt.Container.dispatchEventImpl(Container.java:2099) at
java.awt.Component.dispatchEvent(Component.java:4481) at
java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4577) at
java.awt.LightweightDispatcher.processMouseEvent(Container.java:4238) at
java.awt.LightweightDispatcher.dispatchEvent(Container.java:4168) at
java.awt.Container.dispatchEventImpl(Container.java:2085) at
java.awt.Window.dispatchEventImpl(Window.java:2478) at
java.awt.Component.dispatchEvent(Component.java:4481) at
java.awt.EventQueue.dispatchEventImpl(EventQueue.java:643) at
java.awt.EventQueue.access$000(EventQueue.java:84) at
java.awt.EventQueue$1.run(EventQueue.java:602) at
java.awt.EventQueue$1.run(EventQueue.java:600) at
java.security.AccessController.doPrivileged(Native Method) at java.security.Ac
cessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87) at
java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlCont
ext.java:98) at java.awt.EventQueue$2.run(EventQueue.java:616) at
java.awt.EventQueue$2.run(EventQueue.java:614) at
java.security.AccessController.doPrivileged(Native Method) at java.security.Ac
cessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87) at
java.awt.EventQueue.dispatchEvent(EventQueue.java:613) at java.awt.EventDispat
chThread.pumpOneEventForFilters(EventDispatchThread.java:269) at
java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.jav
a:174) at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:169) at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:161) at
java.awt.EventDispatchThread.run(EventDispatchThread.java:122) </init>
Please help me to solve this problem. Tnx
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
This problem is caused by mismatch in acoustic units between pronunciation
dictionary and the acoustic model. You need to fix your pronunciation
dictionary.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nickolay, tnx to yor reply. Have sphinx any limitation on text corpus size for
language model training? Because i dont have any problem when i use small
training corpus.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Nickolay, thanks for your reply. I try to train 1 GB text file as a LM. I use
SRILM to produce ARPA file. then i sort it using with Sphinx_lm_sort. After
sorting, I am using Sphinx_lm_convert to convert it to binary format but i
receive such error . please help me. tnx
-case
-debug 0
-help no no
-i sortdilmodeli.arpa
-ienc
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o dml.lm.DMP
-oenc utf8 utf8
-ofmt
INFO: ngram_model_arpa.c(477): ngrams 1=120702, 2=5261809, 3=5303598
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516): 120702 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
.................................................ERROR: "ngram_model_arpa.c",
line 253: Bigrams not in unigram order
ERROR: "ngram_model_dmp.c", line 121: Wrong magic header size number a5c6461:
sortdilmodeli.arpa is not a dump file
Segmentation fault
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I do not think that sphinx_lm_convert supports more than 64 unigrams, your
model is too large. There is LIUM branch in our sources which allows you to
use bigger langauge model but you need to compile it separately.
To start discussion of a new issue please start a new thread. Please use Help
forum to ask for help on language model utilities. This forum is about
sphinx4.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, I have a question about SRILM. I have trained a 10 MB text corpus with
SRILM and now i can use it successfully with Sphinx4. When I train a 400 MB
text corpus with SRILM and try to use it in Sphinx4 i receive exception such
this :
Exception in thread "AWT-EventQueue-0"
java.lang.ArrayIndexOutOfBoundsException: 0 at edu.cmu.sphinx.linguist.lextree
.HMMTree.collectEntryAndExitUnits(HMMTree.java:835) at
edu.cmu.sphinx.linguist.lextree.HMMTree.compile(HMMTree.java:792) at
edu.cmu.sphinx.linguist.lextree.HMMTree.<init>(HMMTree.java:716) at edu.cmu.sp
hinx.linguist.lextree.LexTreeLinguist.generateHmmTree(LexTreeLinguist.java:442
) at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLin
guist.java:429) at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(Le
xTreeLinguist.java:343) at edu.cmu.sphinx.decoder.search.WordPruningBreadthFir
stSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:238) at
edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:87) at
edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:168) at
BatchMode.BatchMode.decode(BatchMode.java:37) at
AraYuz.AraYuz.decodeButtonActionPerformed(AraYuz.java:340) at
AraYuz.AraYuz.access$200(AraYuz.java:30) at
AraYuz.AraYuz$3.actionPerformed(AraYuz.java:118) at
javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1995) at
javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2318)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:
387) at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListene
r.java:236) at java.awt.Component.processMouseEvent(Component.java:6288) at
javax.swing.JComponent.processMouseEvent(JComponent.java:3267) at
java.awt.Component.processEvent(Component.java:6053) at
java.awt.Container.processEvent(Container.java:2041) at
java.awt.Component.dispatchEventImpl(Component.java:4651) at
java.awt.Container.dispatchEventImpl(Container.java:2099) at
java.awt.Component.dispatchEvent(Component.java:4481) at
java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4577) at
java.awt.LightweightDispatcher.processMouseEvent(Container.java:4238) at
java.awt.LightweightDispatcher.dispatchEvent(Container.java:4168) at
java.awt.Container.dispatchEventImpl(Container.java:2085) at
java.awt.Window.dispatchEventImpl(Window.java:2478) at
java.awt.Component.dispatchEvent(Component.java:4481) at
java.awt.EventQueue.dispatchEventImpl(EventQueue.java:643) at
java.awt.EventQueue.access$000(EventQueue.java:84) at
java.awt.EventQueue$1.run(EventQueue.java:602) at
java.awt.EventQueue$1.run(EventQueue.java:600) at
java.security.AccessController.doPrivileged(Native Method) at java.security.Ac
cessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87) at
java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlCont
ext.java:98) at java.awt.EventQueue$2.run(EventQueue.java:616) at
java.awt.EventQueue$2.run(EventQueue.java:614) at
java.security.AccessController.doPrivileged(Native Method) at java.security.Ac
cessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87) at
java.awt.EventQueue.dispatchEvent(EventQueue.java:613) at java.awt.EventDispat
chThread.pumpOneEventForFilters(EventDispatchThread.java:269) at
java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.jav
a:174) at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:169) at
java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:161) at
java.awt.EventDispatchThread.run(EventDispatchThread.java:122) </init>
Please help me to solve this problem. Tnx
This problem is caused by mismatch in acoustic units between pronunciation
dictionary and the acoustic model. You need to fix your pronunciation
dictionary.
Nickolay, tnx to yor reply. Have sphinx any limitation on text corpus size for
language model training? Because i dont have any problem when i use small
training corpus.
There is no limitation.
You don't have this problem because word which caused the error didn't get
into small corpus. In large corpus you have the words which cause errors.
Nickolay, thanks for your reply. I try to train 1 GB text file as a LM. I use
SRILM to produce ARPA file. then i sort it using with Sphinx_lm_sort. After
sorting, I am using Sphinx_lm_convert to convert it to binary format but i
receive such error . please help me. tnx
INFO: cmd_ln.c(559): Parsing command line:
sphinx_lm_convert \
-i sortdilmodeli.arpa \
-o dml.lm.DMP
Current configuration:
-case
-debug 0
-help no no
-i sortdilmodeli.arpa
-ienc
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o dml.lm.DMP
-oenc utf8 utf8
-ofmt
INFO: ngram_model_arpa.c(477): ngrams 1=120702, 2=5261809, 3=5303598
INFO: ngram_model_arpa.c(135): Reading unigrams
INFO: ngram_model_arpa.c(516): 120702 = #unigrams created
INFO: ngram_model_arpa.c(195): Reading bigrams
.................................................ERROR: "ngram_model_arpa.c",
line 253: Bigrams not in unigram order
ERROR: "ngram_model_dmp.c", line 121: Wrong magic header size number a5c6461:
sortdilmodeli.arpa is not a dump file
Segmentation fault
Hello
I do not think that sphinx_lm_convert supports more than 64 unigrams, your
model is too large. There is LIUM branch in our sources which allows you to
use bigger langauge model but you need to compile it separately.
To start discussion of a new issue please start a new thread. Please use Help
forum to ask for help on language model utilities. This forum is about
sphinx4.