Menu

QuickLM v. CMU SLMT issues

Help
2006-01-05
2012-09-22
  • Darren Remington

    I posted much of this in another thread, but thought it might be worth putting in its own thread.

    Quick summary: I'm trying to use the CMU SLMT to create my own language model. I have used two different text corpii and get the same result: Null pointer exception.

    • the problem is definitely with the toolkit:

    I put my smaller text corpus through the Online QuickLM tool (the small corpus has 660 words) and that LM works for me.

    When I run that exact same corpus thru the CMU SLMT, I get the null pointer exception, specifically:

    Exception in thread "main" java.lang.NullPointerException
    at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.getInitialSearchState(LexTreeLinguist.java:461)
    at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.compileGrammar(LexTreeLinguist.java:487)
    at edu.cmu.sphinx.linguist.lextree.LexTreeLinguist.allocate(LexTreeLinguist.java:406)
    at edu.cmu.sphinx.decoder.search.WordPruningBreadthFirstSearchManager.allocate(WordPruningBreadthFirstSearchManager.java:323)
    at edu.cmu.sphinx.decoder.Decoder.allocate(Decoder.java:109)
    at edu.cmu.sphinx.recognizer.Recognizer.allocate(Recognizer.java:182)
    ====================================================================

    What I noticed is three things:
    1. the working LM has all units in UPPERCASE while the non-working LM has all units in lowercase
    2. the working LM has entries for silence tags - <S> and </S> while the non-working LM does not.
    3. the working LM doesn't have an entry for the unknown tag - <UNK> while the non-working LM does have that entry.

    I also noted that the weights were significantly different for the identical words.

    Any ideas?

     
    • Darren Remington

      well, I figured it out ....

      silence tags must be included in the original text corpus.

      I also am going through the trouble of converting everything to UPPERCASE before adding the silence tags (using tr and sed)

      should be fun ..

      Ren

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.