Now, I do the step 5 of http://cmusphinx.sourceforge.net/wiki/tutoriallm, but when I execute the command: idngram2lm -vocab_type 0 -idngram weather.idngram -vocab weather.vocab -arpa weather.lm, I get the error below and I don't know how to change the table size it indicates me.
This is the console output. The error is the last line.
n : 3
Input file : a.idngram (binary format)
Output files :
ARPA format : a.lm
Vocabulary file : a.vocab
Cutoffs :
2-gram : 0 3-gram : 0
Vocabulary type : Closed
Minimum unigram count : 0
Zeroton fraction : 1
Counts will be stored in two bytes.
Count table size : 65535
Discounting method : Good-Turing
Discounting ranges :
1-gram : 1 2-gram : 7 3-gram : 7
Memory allocation for tree structure :
Allocate 100 MB of memory, shared equally between all n-gram tables.
Back-off weight storage :
Back-off weights will be stored in four bytes.
Reading vocabulary.
...................
read_wlist_into_siht: a list of 19996 words was read from "a.vocab".
read_wlist_into_array: a list of 19996 words was read from "a.vocab".
Allocated space for 3571428 2-grams.
Allocated space for 8333333 3-grams.
table_size 19997
Allocated 57142848 bytes to table for 2-grams.
Allocated (2+33333332) bytes to table for 3-grams.
Processing id n-gram file.
20,000 n-grams processed for each ".", 1,000,000 for each line.
Warning : id n-gram stream contains OOV's (n-grams will be ignored).
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
................
More than 8333333 3-grams needed to be stored. Rerun with a higher table size.
Thanks in advance.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm creating a new lenguage model for spanish because I don't find some words that I need it in: https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/Spanish%20Voxforge/
Now, I do the step 5 of http://cmusphinx.sourceforge.net/wiki/tutoriallm, but when I execute the command: idngram2lm -vocab_type 0 -idngram weather.idngram -vocab weather.vocab -arpa weather.lm, I get the error below and I don't know how to change the table size it indicates me.
This is the console output. The error is the last line.
n : 3
Input file : a.idngram (binary format)
Output files :
ARPA format : a.lm
Vocabulary file : a.vocab
Cutoffs :
2-gram : 0 3-gram : 0
Vocabulary type : Closed
Minimum unigram count : 0
Zeroton fraction : 1
Counts will be stored in two bytes.
Count table size : 65535
Discounting method : Good-Turing
Discounting ranges :
1-gram : 1 2-gram : 7 3-gram : 7
Memory allocation for tree structure :
Allocate 100 MB of memory, shared equally between all n-gram tables.
Back-off weight storage :
Back-off weights will be stored in four bytes.
Reading vocabulary.
...................
read_wlist_into_siht: a list of 19996 words was read from "a.vocab".
read_wlist_into_array: a list of 19996 words was read from "a.vocab".
Allocated space for 3571428 2-grams.
Allocated space for 8333333 3-grams.
table_size 19997
Allocated 57142848 bytes to table for 2-grams.
Allocated (2+33333332) bytes to table for 3-grams.
Processing id n-gram file.
20,000 n-grams processed for each ".", 1,000,000 for each line.
Warning : id n-gram stream contains OOV's (n-grams will be ignored).
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
..................................................
................
More than 8333333 3-grams needed to be stored. Rerun with a higher table size.
Thanks in advance.
Use srilm
Thanks Nickolay.
But I have a question, is there a tutorial for srilm?