CMU Sphinx / Forums / Help: Language Model using SRILM

Speech Recognition Toolkit

Language Model using SRILM

Forum: Help

Creator: Sami

Created: 2017-02-21

Updated: 2017-02-23

Sami - 2017-02-21

Steps followed:

Word Count
ngram-count -text corpus.txt -order 3 -write corpus.count
WORKING FINE

Smooth LM
ngram-count -text corpus.txt -order 3 -addsmooth 0 -lm corpus.lm
Warning: DOW denominator for context "same here" is zero; scaling probabilities to some to 1
The above warnings are coming for multiple bigrams.

We are unable to run the command to convert LM to BIN

test@ubuntu:~/LM Testing/15FebSrilmTransLM$ sphinx_lm_convert -i corpus.lm -o corpus.lm.bin
Current configuration:
[NAME] [DEFLT] [VALUE]
-case
-debug 0
-help no no
-i corpus.lm
-ifmt
-logbase 1.0001 1.000100e+00
-mmap no no
-o corpus.lm.bin
-ofmt

INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
INFO: ngram_model_trie.c(365): Header doesn't match
INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
INFO: ngram_model_trie.c(193): LM of order 3
INFO: ngram_model_trie.c(195): #1-grams: 6978
INFO: ngram_model_trie.c(195): #2-grams: 57080
INFO: ngram_model_trie.c(195): #3-grams: 26523
INFO: lm_trie.c(474): Training quantizer
INFO: lm_trie.c(482): Building LM trie
test@ubuntu:~/LM Testing/15FebSrilmTransLM$

We tried this command- ngram-count -vocab corpus.txt -text corpus.Train -order 3 -write corpus.count -unk

Can you please tell us which type of data is required for "corpus.txt" and "corpus.Train".
What is the use of unk (Is it OOV?)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Arseniy Gorin - 2017-02-21
  
  please check tutorial http://cmusphinx.sourceforge.net/wiki/tutoriallm
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Language Model using SRILM

Speech Recognition Toolkit

Forums

Help

Language Model using SRILM document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Language Model using SRILM