Hi, I'm trying to run Sphinx2 using cmudict.06d from http://www.speech.cs.cmu.edu/sphinx/models/ and language_model.arpaformat from same place. (The bn.bigram.arpa didn't work either...) I am getting this error, has anyone seen this or know how to correct the problem?
...
INFO: lm_3g.c(874): lm_3g.c(874): ngrams 1=64001, 2=9382014, 3=13459879
INFO: lm_3g.c(882): lm_3g.c(882): 130608 words in dictionary
lm_3g.c(899): #dict-words(130608) > 65534
...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-06-06
As Jessica has already said, loading the entire cmudict.0.6d has exceeded Sphinx2's limit of 65,534. The LM you're using has only 64K+1 words, so you should make a smaller dictionary that's the intersection of a large dictionary (such as cmudict.0.6d) and the words in the LM (the unigrams).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-06-10
Thanks for the url, hadn't seen that project before!
At least now it starts doing something, but now it gets stuck at
...
INFO: lm_3g.c(924): 60001 = #unigrams created
INFO: lm_3g.c(580): lm_3g.c(580): Reading bigrams
INFO: lm_3g.c(637): .INFO: lm_3g.c(637): .INFO: lm_3g.c(637): .
...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-06-10
The code responsible for those printouts is:
if ((bgcount & 0x0000ffff) == 0) {
E_INFO (".");
where bgcount is the number of bigrams read so far. So it prints a "." whenever bgcount passes a multiple of 0xffff = 65K. It's not clear that this is an error, just showing progress in loading the bigrams. Since you appear to have 60K unigrams in your LM, the number of bigrams is probably pretty large. What is it? -- it should be given near the top of your LM file.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2003-06-11
It says the number of bigrams is 1044719. I guess I was being to unpatient... Might also have had some additional problem earlier. It sure gets past the reading of the bigrams but the program terminates after
...
WARNING: "lm_3g.c", line 1042: lm_3g.c(1043): 11651 LM words not in dict; ignored
Hi, I'm trying to run Sphinx2 using cmudict.06d from http://www.speech.cs.cmu.edu/sphinx/models/ and language_model.arpaformat from same place. (The bn.bigram.arpa didn't work either...) I am getting this error, has anyone seen this or know how to correct the problem?
...
INFO: lm_3g.c(874): lm_3g.c(874): ngrams 1=64001, 2=9382014, 3=13459879
INFO: lm_3g.c(882): lm_3g.c(882): 130608 words in dictionary
lm_3g.c(899): #dict-words(130608) > 65534
...
You have too many words in your dictionary -- over 65000. The xvoice-sphinx project has two dictionaries with fewer words:
http://xvoice.sourceforge.net/xvoice-sphinx/
As Jessica has already said, loading the entire cmudict.0.6d has exceeded Sphinx2's limit of 65,534. The LM you're using has only 64K+1 words, so you should make a smaller dictionary that's the intersection of a large dictionary (such as cmudict.0.6d) and the words in the LM (the unigrams).
Thanks for the url, hadn't seen that project before!
At least now it starts doing something, but now it gets stuck at
...
INFO: lm_3g.c(924): 60001 = #unigrams created
INFO: lm_3g.c(580): lm_3g.c(580): Reading bigrams
INFO: lm_3g.c(637): .INFO: lm_3g.c(637): .INFO: lm_3g.c(637): .
...
The code responsible for those printouts is:
if ((bgcount & 0x0000ffff) == 0) {
E_INFO (".");
where bgcount is the number of bigrams read so far. So it prints a "." whenever bgcount passes a multiple of 0xffff = 65K. It's not clear that this is an error, just showing progress in loading the bigrams. Since you appear to have 60K unigrams in your LM, the number of bigrams is probably pretty large. What is it? -- it should be given near the top of your LM file.
It says the number of bigrams is 1044719. I guess I was being to unpatient... Might also have had some additional problem earlier. It sure gets past the reading of the bigrams but the program terminates after
...
WARNING: "lm_3g.c", line 1042: lm_3g.c(1043): 11651 LM words not in dict; ignored
The errormessage in the console reads:
7 [main] sphinx2-continuous 376 handle_exceptions: Exception: STATUS_ACCES
S_VIOLATION
2207 [main] sphinx2-continuous 376 open_stackdumpfile: Dumping stack trace to
sphinx2-continuous.exe.stackdump
I was using '1test60k.5.5.arpa' and 'full.dic' from xvoice project, acoustic model 'sphinx_2_format' from cmu.
If anyone got a clue, please let me know!