Language Modeling Toolkit

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Language Modeling Toolkit

Forum: Speech Recognition Theory

Creator: Anonymous

Created: 2002-06-05

Updated: 2012-09-22

Anonymous - 2002-06-05

I'm trying to get the Language Modeling Toolkit to emulate the web-based Language Modeling Toolkit, but I'm having a problem with either the discounting information and/or the back-off weights (assuming those are the numbers on the front and back of each line under the n-gram headers). Most of the lines end up showing up with -99.9990 as the first number, when they should be down between -0.2 to -3. The web-based tool makes these correctly. With this problem, testing with the toolkit lm results in no text being recognized. Does anybody know the tools and/or commands used by the web-based tool so I can get these made correctly?

Also, the Toolkit doesn't have an obvious way to include context cues. I tried using a .ccs file, but it basically ignored the <s> and </s> in that file and put everything all references to <s> to <UNK> as the out-of-vocabulary words. The web-based tool gives me a .sent file, but this concept doesn't seem to exist in the Toolkit.

Thanks for any help you can provide.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.