Menu

calculating perplexity for LM

Help
sumitraj
2016-08-03
2016-08-16
  • sumitraj

    sumitraj - 2016-08-03

    How do we calculate perplexity for LM which is of closed vocab_type?
    I'm using CMU Tool kit to do using evallm but it says "this is a closed vocabulary model ".
    any other tool or we can do it in CMU tool kit. I have created LM using CMU tool kit

     
    • Nickolay V. Shmyrev

      With SRILM

       ngram -lm your.lm -ppl test.txt
      
       
  • sumitraj

    sumitraj - 2016-08-09

    Hi
    the test.txt should have start and end tags for the sentences or it should be without tags?

     

    Last edit: sumitraj 2016-08-09
    • Nickolay V. Shmyrev

      It does not matter. For simplicity I recommend you to have text without tags. There is no need to add them.

       
  • sumitraj

    sumitraj - 2016-08-16

    Hi,
    I have a train data of 4000 sentences as per the documentation test data should be of 1 to 10 proposition. So I have now 400 sentences. I calculated perplexity for test data using SRILM tool.
    so here is the result:
    400 sentences, 3184 words, 3148 oovs
    0 zeroprobs, logprob= -452.076 ppl= 10.8861 ppl1 3.61124e+12

    Does this makes any sense ? ppl with 10.8861 ??

     
    • Nickolay V. Shmyrev

      It is good perplexity. For large vocabulary the perplexity is about 100, for very large up to 200. For medium vocabulary domains perplexity is usually 20-50.

       
    • Arseniy Gorin

      Arseniy Gorin - 2016-08-16

      Just an observation: your oovs (out-of-vocbulary words) seems too large. You should check that test data has not much new words as compared to train data. Otherwise your perplexity estimate is meaningless

       
  • sumitraj

    sumitraj - 2016-08-16

    Hi all,

    I had done a bit mistake my LM was in Uppercase and test data was in smaller case. So now I have changed it to Uppercase and this is new output.

    400 sentences, 3184 words, 37 oovs
    0 zeroprobs, logprob= -5256.67 ppl= 30.3392 ppl1 46.8139

     

    Last edit: sumitraj 2016-08-16
    • Arseniy Gorin

      Arseniy Gorin - 2016-08-16

      now it's much more realistic

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.