Menu

Text to Language Model + Dictionary pipeline

Help
Hespen
2017-01-11
2017-01-11
  • Hespen

    Hespen - 2017-01-11

    I've been having trouble getting a correct and working language model in Dutch, so after a while I tried using the standard Sphinx toolkit. That one actually worked.

    I've been testing a lot of different content for the language model, so I got annoyed by writing and converting text to a normal corpus.

    That is why I created a pipeline. Its input is a text file with punctuation and its output is a lm-file including a dictionary which only contains the words inside the lm.

    If anyone is intersted in using it, or giving me feedback on it.
    Please do!

    https://github.com/Hespen/Java---CMU-Sphinx---Text-to-Language-Model

     
    • Nickolay V. Shmyrev

      If you want to run everything in Java, I would use https://code.google.com/archive/p/berkeleylm/downloads instead for lm estimation. It lacks good interpolation and pruning support though.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.