Menu

Using SRILM with Sphinx4

Help
2012-02-02
2012-09-22
  • Behnam Asefi

    Behnam Asefi - 2012-02-02

    Hi, I have a question about SRILM. I have trained a 10 MB text corpus with
    SRILM and now i can use it successfully with Sphinx4. When I train a 400 MB
    text corpus with SRILM and try to use it in Sphinx4 i receive exception such
    this :

    Exception in thread "AWT-EventQueue-0"
    java.lang.ArrayIndexOutOfBoundsException: 0 at edu.cmu.sphinx.linguist.lextree
    .HMMTree.collectEntryAndExitUnits(HMMTree.java:835) at
    edu.cmu.sphinx.linguist.lextree.HMMTree.compile(HMMTree.java:792) at
    edu.cmu.sphinx.linguist.lextree.HMMTree.<init>(HMMTree.java:716) at edu.cmu.sp
    hinx.linguist.lextree.LexTreeLinguist.generateHmmTree(LexTreeLinguist.java:442
    ) at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLin
    guist.java:429) at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(Le
    xTreeLinguist.java:343) at edu.cmu.sphinx.decoder.search.WordPruningBreadthFir
    stSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:238) at
    edu.cmu.sphinx.decoder.AbstractDecoder.allocate(AbstractDecoder.java:87) at
    edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:168) at
    BatchMode.BatchMode.decode(BatchMode.java:37) at
    AraYuz.AraYuz.decodeButtonActionPerformed(AraYuz.java:340) at
    AraYuz.AraYuz.access$200(AraYuz.java:30) at
    AraYuz.AraYuz$3.actionPerformed(AraYuz.java:118) at
    javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1995) at
    javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2318)
    at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:
    387) at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:242)
    at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(BasicButtonListene
    r.java:236) at java.awt.Component.processMouseEvent(Component.java:6288) at
    javax.swing.JComponent.processMouseEvent(JComponent.java:3267) at
    java.awt.Component.processEvent(Component.java:6053) at
    java.awt.Container.processEvent(Container.java:2041) at
    java.awt.Component.dispatchEventImpl(Component.java:4651) at
    java.awt.Container.dispatchEventImpl(Container.java:2099) at
    java.awt.Component.dispatchEvent(Component.java:4481) at
    java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4577) at
    java.awt.LightweightDispatcher.processMouseEvent(Container.java:4238) at
    java.awt.LightweightDispatcher.dispatchEvent(Container.java:4168) at
    java.awt.Container.dispatchEventImpl(Container.java:2085) at
    java.awt.Window.dispatchEventImpl(Window.java:2478) at
    java.awt.Component.dispatchEvent(Component.java:4481) at
    java.awt.EventQueue.dispatchEventImpl(EventQueue.java:643) at
    java.awt.EventQueue.access$000(EventQueue.java:84) at
    java.awt.EventQueue$1.run(EventQueue.java:602) at
    java.awt.EventQueue$1.run(EventQueue.java:600) at
    java.security.AccessController.doPrivileged(Native Method) at java.security.Ac
    cessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87) at
    java.security.AccessControlContext$1.doIntersectionPrivilege(AccessControlCont
    ext.java:98) at java.awt.EventQueue$2.run(EventQueue.java:616) at
    java.awt.EventQueue$2.run(EventQueue.java:614) at
    java.security.AccessController.doPrivileged(Native Method) at java.security.Ac
    cessControlContext$1.doIntersectionPrivilege(AccessControlContext.java:87) at
    java.awt.EventQueue.dispatchEvent(EventQueue.java:613) at java.awt.EventDispat
    chThread.pumpOneEventForFilters(EventDispatchThread.java:269) at
    java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:184)
    at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.jav
    a:174) at
    java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:169) at
    java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:161) at
    java.awt.EventDispatchThread.run(EventDispatchThread.java:122) </init>

    Please help me to solve this problem. Tnx

     
  • Nickolay V. Shmyrev

    This problem is caused by mismatch in acoustic units between pronunciation
    dictionary and the acoustic model. You need to fix your pronunciation
    dictionary.

     
  • Behnam Asefi

    Behnam Asefi - 2012-02-02

    Nickolay, tnx to yor reply. Have sphinx any limitation on text corpus size for
    language model training? Because i dont have any problem when i use small
    training corpus.

     
  • Nickolay V. Shmyrev

    There is no limitation.

    You don't have this problem because word which caused the error didn't get
    into small corpus. In large corpus you have the words which cause errors.

     
  • Behnam Asefi

    Behnam Asefi - 2012-02-08

    Nickolay, thanks for your reply. I try to train 1 GB text file as a LM. I use
    SRILM to produce ARPA file. then i sort it using with Sphinx_lm_sort. After
    sorting, I am using Sphinx_lm_convert to convert it to binary format but i
    receive such error . please help me. tnx

    INFO: cmd_ln.c(559): Parsing command line:
    sphinx_lm_convert \
    -i sortdilmodeli.arpa \
    -o dml.lm.DMP

    Current configuration:

    -case
    -debug 0
    -help no no
    -i sortdilmodeli.arpa
    -ienc
    -ifmt
    -logbase 1.0001 1.000100e+00
    -mmap no no
    -o dml.lm.DMP
    -oenc utf8 utf8
    -ofmt

    INFO: ngram_model_arpa.c(477): ngrams 1=120702, 2=5261809, 3=5303598
    INFO: ngram_model_arpa.c(135): Reading unigrams
    INFO: ngram_model_arpa.c(516): 120702 = #unigrams created
    INFO: ngram_model_arpa.c(195): Reading bigrams
    .................................................ERROR: "ngram_model_arpa.c",
    line 253: Bigrams not in unigram order
    ERROR: "ngram_model_dmp.c", line 121: Wrong magic header size number a5c6461:
    sortdilmodeli.arpa is not a dump file
    Segmentation fault

     
  • Nickolay V. Shmyrev

    Hello

    I do not think that sphinx_lm_convert supports more than 64 unigrams, your
    model is too large. There is LIUM branch in our sources which allows you to
    use bigger langauge model but you need to compile it separately.

    To start discussion of a new issue please start a new thread. Please use Help
    forum to ask for help on language model utilities. This forum is about
    sphinx4.

     

Log in to post a comment.