Menu

sphinx2 dictionary/lm problem

Help
Anonymous
2003-06-06
2012-09-22
  • Anonymous

    Anonymous - 2003-06-06

    Hi, I'm trying to run Sphinx2 using cmudict.06d from http://www.speech.cs.cmu.edu/sphinx/models/  and language_model.arpaformat from same place. (The bn.bigram.arpa didn't work either...) I am getting  this error, has anyone seen this or know how to correct the problem?

    ...
    INFO: lm_3g.c(874): lm_3g.c(874): ngrams 1=64001, 2=9382014, 3=13459879
    INFO: lm_3g.c(882): lm_3g.c(882): 130608 words in dictionary
    lm_3g.c(899): #dict-words(130608) > 65534
    ...

     
    • Jessica P. Hekman

      You have too many words in your dictionary -- over 65000. The xvoice-sphinx project has two dictionaries with fewer words:

      http://xvoice.sourceforge.net/xvoice-sphinx/

       
    • Anonymous

      Anonymous - 2003-06-06

      As Jessica has already said, loading the entire cmudict.0.6d has exceeded Sphinx2's limit of 65,534.  The LM you're using has only 64K+1 words, so you should make a smaller dictionary that's the intersection of  a large dictionary (such as cmudict.0.6d) and the words in the LM (the unigrams).

       
    • Anonymous

      Anonymous - 2003-06-10

      Thanks for the url, hadn't seen that project before!
      At least now it starts doing something, but now it gets stuck at
      ...
      INFO: lm_3g.c(924):    60001 = #unigrams created
      INFO: lm_3g.c(580): lm_3g.c(580): Reading bigrams
      INFO: lm_3g.c(637): .INFO: lm_3g.c(637): .INFO: lm_3g.c(637): .
      ...

       
      • Anonymous

        Anonymous - 2003-06-10

        The code responsible for those printouts is:

                if ((bgcount & 0x0000ffff) == 0) {
                    E_INFO (".");

        where bgcount is the number of bigrams read so far.  So it prints a "." whenever bgcount passes a multiple of 0xffff = 65K.  It's not clear that this is an error, just showing progress in loading the bigrams.  Since you appear to have 60K unigrams in your LM, the number of bigrams is probably pretty large.  What is it? -- it should be given near the top of your LM file.

         
        • Anonymous

          Anonymous - 2003-06-11

          It says the number of bigrams is 1044719. I guess I was being to unpatient... Might also have had some additional problem earlier. It sure gets past the reading of the bigrams but the program terminates after
          ...
          WARNING: "lm_3g.c", line 1042: lm_3g.c(1043): 11651 LM words not in dict; ignored

          The errormessage in the console reads:

                7 [main] sphinx2-continuous 376 handle_exceptions: Exception: STATUS_ACCES
          S_VIOLATION
             2207 [main] sphinx2-continuous 376 open_stackdumpfile: Dumping stack trace to
          sphinx2-continuous.exe.stackdump

          I was using '1test60k.5.5.arpa' and 'full.dic' from xvoice project, acoustic model 'sphinx_2_format' from cmu.

          If anyone got a clue, please let me know!

           

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.