Text to Language Model + Dictionary pipeline

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Text to Language Model + Dictionary pipeline

Forum: Help

Creator: Hespen

Created: 2017-01-11

Updated: 2017-01-11

Hespen - 2017-01-11

I've been having trouble getting a correct and working language model in Dutch, so after a while I tried using the standard Sphinx toolkit. That one actually worked.

I've been testing a lot of different content for the language model, so I got annoyed by writing and converting text to a normal corpus.

That is why I created a pipeline. Its input is a text file with punctuation and its output is a lm-file including a dictionary which only contains the words inside the lm.

If anyone is intersted in using it, or giving me feedback on it.
Please do!

https://github.com/Hespen/Java---CMU-Sphinx---Text-to-Language-Model

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2017-01-11
  
  If you want to run everything in Java, I would use https://code.google.com/archive/p/berkeleylm/downloads instead for lm estimation. It lacks good interpolation and pruning support though.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.