Menu

CMU SLMT does not work

Help
NeoGermi
2005-11-04
2012-09-22
  • NeoGermi

    NeoGermi - 2005-11-04

    Hello,

    I've already finished my project.
    The exploding likelihood explained 2 months ago came from bad sound files and could not be repaired. So I had to switch to another corpus and lo and behold ;-) it works fine :-)

    But another problem occures now by building my own language model and involving it into my system. I didn't find any solution in the net, nor in the docu neither in the forum here. I first tried to build the binary-lm explained by the SLMT(statistical language modelling toolkit)-docu and it won't work, so the ARPA version. The one explained in the sphinx4-docu doesn't work, too.
    I get some error messages like "Bad format" or something else... Also the lm3g2dmp won't help, first of all, it is not able to handle case-sensitiveness and then it wont't produce the right format. So why do we have it??

    could someone who successfully made and integrated an own LM give some helpul tipps?

    Thanks a lot,

    Sebastian

     
    • NeoGermi

      NeoGermi - 2005-11-05

      So, I decided to build a LM with only lowercase words and tried to build the DMP file with the lm3g2dmp tool. The result is, that I get
      a NullPointer exception when starting the speech recognition with the following message:

      java.lang.NullPointerException
      at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.getInitialSearchState(LexTreeLinguist.java:461)

      Does anyone has an idea what this could be?

       
      • Darren Remington

        I'm getting the same problem - the problem is definitely with the toolkit:

        I put my text corpus the the Online QuickLM tool (this small corpus has 660 words) and that one works for me.

        When I run that exact same corpus thru the CMU SLMT, I get the null pointer exception, specifically:

        Exception in thread "main" java.lang.NullPointerException
        at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.getInitialSearchState(LexTreeLinguist.java:461)
        at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLinguist.java:487)
        at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:406)
        at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:323)
        at edu.cmu.sphinx.decoder.Decoder.allocate(Decoder.java:109)
        at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:182)
        ====================================================================

        What I noticed is three things:
        1. the working LM has all units in UPPERCASE and the non-working LM has all units in lowercase
        2. the working LM has entries for silence tags - <S> and </S> and the non-working LM does not.
        3. the working LM has no entry for the unknown tag - <UNK> and the non-working LM does.

        I need to get the CMU SLMT working - I have a corpus of 5104 words ready to be tested and I can't build the LM using the QuickLM for that many words.

         
      • Darren Remington

        FOUND IT - You need to have entries for silence tags ... <s> and </s> ...

        I added the silence tags to my text corpus (at the beginning and end of each sentence) and the LM I created using the CMU SLMT worked just fine.

        Good Luck on yours.

         
    • NeoGermi

      NeoGermi - 2005-11-19

      noone any idea? does noone hasexperiences with CMU SLT? I cant believe that!

       
    • yazanj

      yazanj - 2005-11-26

      I'm having similar problems, I've tried to build my own lanuage model using CMU SLMT but sphinx 4 complained about having unexpeted EOF in the .lm file.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.