Word Count
ngram-count -text corpus.txt -order 3 -write corpus.count
WORKING FINE
Smooth LM
ngram-count -text corpus.txt -order 3 -addsmooth 0 -lm corpus.lm
Warning: DOW denominator for context "same here" is zero; scaling probabilities to some to 1
The above warnings are coming for multiple bigrams.
We are unable to run the command to convert LM to BIN
test@ubuntu:~/LM Testing/15FebSrilmTransLM$ sphinx_lm_convert -i corpus.lm -o corpus.lm.bin
Current configuration: [NAME][DEFLT][VALUE]
-case
-debug 0
-help no no
-i corpus.lm
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o corpus.lm.bin
-ofmt
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't match
INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
INFO: ngram_model_trie.c(193): LM of order 3
INFO: ngram_model_trie.c(195): #1-grams: 6978
INFO: ngram_model_trie.c(195): #2-grams: 57080
INFO: ngram_model_trie.c(195): #3-grams: 26523
INFO: lm_trie.c(474): Training quantizer
INFO: lm_trie.c(482): Building LM trie
test@ubuntu:~/LM Testing/15FebSrilmTransLM$
We tried this command- ngram-count -vocab corpus.txt -text corpus.Train -order 3 -write corpus.count -unk
Can you please tell us which type of data is required for "corpus.txt" and "corpus.Train".
What is the use of unk (Is it OOV?)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Steps followed:
Word Count
ngram-count -text corpus.txt -order 3 -write corpus.count
WORKING FINE
Smooth LM
ngram-count -text corpus.txt -order 3 -addsmooth 0 -lm corpus.lm
Warning: DOW denominator for context "same here" is zero; scaling probabilities to some to 1
The above warnings are coming for multiple bigrams.
We are unable to run the command to convert LM to BIN
test@ubuntu:~/LM Testing/15FebSrilmTransLM$ sphinx_lm_convert -i corpus.lm -o corpus.lm.bin
Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 0
-help no no
-i corpus.lm
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o corpus.lm.bin
-ofmt
INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't match
INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
INFO: ngram_model_trie.c(193): LM of order 3
INFO: ngram_model_trie.c(195): #1-grams: 6978
INFO: ngram_model_trie.c(195): #2-grams: 57080
INFO: ngram_model_trie.c(195): #3-grams: 26523
INFO: lm_trie.c(474): Training quantizer
INFO: lm_trie.c(482): Building LM trie
test@ubuntu:~/LM Testing/15FebSrilmTransLM$
Can you please tell us which type of data is required for "corpus.txt" and "corpus.Train".
What is the use of unk (Is it OOV?)
please check tutorial http://cmusphinx.sourceforge.net/wiki/tutoriallm