In the ARPA lm files generated by the CMU lmtool , , and <UNK> are not surrounded by apostrophes (') but in the examples of the ARPA format on the pocketsphinx help page regarding language models they are. Is there any differece in how they are treated?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'll note in passing that lm_trie.c seems to enforce rules about the ARPA format that are over and above the format specifications. For example, if I have 0 1-, 2-, 3-, and 4-grams and 1 5-gram then lm_trie.c/recursive_insert fails the assertion priority_queue_size(ngrams)==0. If I just put the 5-gram in the lm file then ngram_model_trie.c/Line 458 fails with "Wrong magic header size number".
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
ERROR: "ngrams_raw.c", line 84: Format error; 6-gram ignored at line 25
INFO: lm_trie.c(474): Training quantizer Error in `pocketsphinx_continuous': double free or corruption (out): 0x0000000032cae150
I.e. pocketsphinx_continuous doesn't do 6-grams, right?
It does seem to handle 5-grams, however. At least it eats them and doesn't crash.
Are you also signalling that pocketsphinx_continuous doesn't do <unk>?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for great tests, Scott. It will be nice to get them fixed one day. Anything beside 3grams are not useful with pocketsphinx. <unk> is not supported either.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
In the ARPA lm files generated by the CMU lmtool
,, and <UNK> are not surrounded by apostrophes (') but in the examples of the ARPA format on the pocketsphinx help page regarding language models they are. Is there any differece in how they are treated?I fixed the wiki page, thanks for the notification.
I'll note in passing that lm_trie.c seems to enforce rules about the ARPA format that are over and above the format specifications. For example, if I have 0 1-, 2-, 3-, and 4-grams and 1 5-gram then lm_trie.c/recursive_insert fails the assertion priority_queue_size(ngrams)==0. If I just put the 5-gram in the lm file then ngram_model_trie.c/Line 458 fails with "Wrong magic header size number".
Further to ARPA grammars:
ERROR: "ngrams_raw.c", line 84: Format error; 6-gram ignored at line 25
INFO: lm_trie.c(474): Training quantizer
Error in `pocketsphinx_continuous': double free or corruption (out): 0x0000000032cae150
\data\ ngram 1=1
ngram 2=1
ngram 3=1
ngram 4=1
ngram 5=1
ngram 6=1
\1-grams:
-10.000 foo
\2-grams:
-10.0 foo foo
\3-grams:
-10.0 foo foo foo
\4-grams:
-10.0 foo foo foo foo
\5-grams:
0.0000
when you hear <unk>\6-grams:
0.0000
when you hear <unk>\end\I.e. pocketsphinx_continuous doesn't do 6-grams, right?
It does seem to handle 5-grams, however. At least it eats them and doesn't crash.
Are you also signalling that pocketsphinx_continuous doesn't do <unk>?
Thanks for great tests, Scott. It will be nice to get them fixed one day. Anything beside 3grams are not useful with pocketsphinx. <unk> is not supported either.