Menu

Language Model using SRILM

Help
Sami
2017-02-21
2017-02-23
  • Sami

    Sami - 2017-02-21

    Steps followed:

    1. Word Count
      ngram-count -text corpus.txt -order 3 -write corpus.count
      WORKING FINE

    2. Smooth LM
      ngram-count -text corpus.txt -order 3 -addsmooth 0 -lm corpus.lm
      Warning: DOW denominator for context "same here" is zero; scaling probabilities to some to 1
      The above warnings are coming for multiple bigrams.

    3. We are unable to run the command to convert LM to BIN

    test@ubuntu:~/LM Testing/15FebSrilmTransLM$ sphinx_lm_convert -i corpus.lm -o corpus.lm.bin
    Current configuration:
    [NAME] [DEFLT] [VALUE]
    -case
    -debug 0
    -help no no
    -i corpus.lm
    -ifmt
    -logbase 1.0001 1.000100e+00
    -mmap no no
    -o corpus.lm.bin
    -ofmt

    INFO: ngram_model_trie.c(354): Trying to read LM in trie binary format
    INFO: ngram_model_trie.c(365): Header doesn't match
    INFO: ngram_model_trie.c(177): Trying to read LM in arpa format
    INFO: ngram_model_trie.c(193): LM of order 3
    INFO: ngram_model_trie.c(195): #1-grams: 6978
    INFO: ngram_model_trie.c(195): #2-grams: 57080
    INFO: ngram_model_trie.c(195): #3-grams: 26523
    INFO: lm_trie.c(474): Training quantizer
    INFO: lm_trie.c(482): Building LM trie
    test@ubuntu:~/LM Testing/15FebSrilmTransLM$

    1. We tried this command- ngram-count -vocab corpus.txt -text corpus.Train -order 3 -write corpus.count -unk

    Can you please tell us which type of data is required for "corpus.txt" and "corpus.Train".
    What is the use of unk (Is it OOV?)

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.