ARPA Format Syntax

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

ARPA Format Syntax

Forum: Help

Creator: Scott Guthery

Created: 2018-04-25

Updated: 2018-04-28

Scott Guthery - 2018-04-25

In the ARPA lm files generated by the CMU lmtool , , and <unk> are not surrounded by apostrophes (') but in the examples of the ARPA format on the pocketsphinx help page regarding language models they are. Is there any differece in how they are treated?</unk>

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-04-25
  
  I fixed the wiki page, thanks for the notification.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Guthery - 2018-04-25

I'll note in passing that lm_trie.c seems to enforce rules about the ARPA format that are over and above the format specifications. For example, if I have 0 1-, 2-, 3-, and 4-grams and 1 5-gram then lm_trie.c/recursive_insert fails the assertion priority_queue_size(ngrams)==0. If I just put the 5-gram in the lm file then ngram_model_trie.c/Line 458 fails with "Wrong magic header size number".

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Guthery - 2018-04-25

Further to ARPA grammars:

ERROR: "ngrams_raw.c", line 84: Format error; 6-gram ignored at line 25
INFO: lm_trie.c(474): Training quantizer
*** Error in `pocketsphinx_continuous': double free or corruption (out): 0x0000000032cae150 ***

\data\
ngram 1=1
ngram 2=1
ngram 3=1
ngram 4=1
ngram 5=1
ngram 6=1

\1-grams:
-10.000 foo

\2-grams:
-10.0 foo foo

\3-grams:
-10.0 foo foo foo

\4-grams:
-10.0 foo foo foo foo

\5-grams:
0.0000 ~~when you hear <unk></unk>~~

\6-grams:
0.0000 ~~when you hear <unk> </unk>~~

~~\end\~~

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Scott Guthery - 2018-04-25

I.e. pocketsphinx_continuous doesn't do 6-grams, right?
It does seem to handle 5-grams, however. At least it eats them and doesn't crash.
Are you also signalling that pocketsphinx_continuous doesn't do <unk>?</unk>

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2018-04-28
  
  Thanks for great tests, Scott. It will be nice to get them fixed one day. Anything beside 3grams are not useful with pocketsphinx. <unk> is not supported either.</unk>
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.