Anonymous - 2002-06-05

I'm trying to get the Language Modeling Toolkit to emulate the web-based Language Modeling Toolkit, but I'm having a problem with either the discounting information and/or the back-off weights (assuming those are the numbers on the front and back of each line under the n-gram headers).  Most of the lines end up showing up with -99.9990 as the first number, when they should be down between -0.2 to -3.  The web-based tool makes these correctly.  With this problem, testing with the toolkit lm results in no text being recognized.  Does anybody know the tools and/or commands used by the web-based tool so I can get these made correctly?

Also, the Toolkit doesn't have an obvious way to include context cues.  I tried using a .ccs file, but it basically ignored the <s> and </s> in that file and put everything all references to <s> to <UNK> as the out-of-vocabulary words.  The web-based tool gives me a .sent file, but this concept doesn't seem to exist in the Toolkit.

Thanks for any help you can provide.