Maximum size of language model?

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Maximum size of language model?

Forum: Help

Created: 2011-12-14

Updated: 2012-09-22

jason - 2011-12-14

I'm running into issues creating a language model using the CMU-CAM toolkit.
At its core is converting the text-based arpa model to a binary DMP format via
sphinx_lm_convert.

When using a vocabulary of both 20 and 30000 words culled from a text corpus
made of every 30th line from the english language wikipedia dump, I get a
segfault saying that the size of the trigram segment is > 65535

Sorry, but it seems that any decent sized vocabulary will run up against this
limit. Are there any workarounds?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2011-12-14

Provide the language model you are trying to convert.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.