Menu

Maximum size of language model?

Help
jason
2011-12-14
2012-09-22
  • jason

    jason - 2011-12-14

    I'm running into issues creating a language model using the CMU-CAM toolkit.
    At its core is converting the text-based arpa model to a binary DMP format via
    sphinx_lm_convert.

    When using a vocabulary of both 20 and 30000 words culled from a text corpus
    made of every 30th line from the english language wikipedia dump, I get a
    segfault saying that the size of the trigram segment is > 65535

    Sorry, but it seems that any decent sized vocabulary will run up against this
    limit. Are there any workarounds?

     
  • Nickolay V. Shmyrev

    Provide the language model you are trying to convert.

     

Log in to post a comment.