Menu

Combine default CMUSphinx Language Model and Custom Training Language Model using CMUCLMTK

Help
2017-07-05
2017-07-05
  • Manoj Gaonkar

    Manoj Gaonkar - 2017-07-05

    Command i used:
    lm_combine.exe -lm1 custom.lm -lm2 en-70k-0.1.lm -weight w.wt -lm mix.lm

    w.wt:
    custom.lm 0.5
    en-70k-0.1.lm 0.5

    When i tried merging the default CMU Sphinx LM with custom language model, I got the following

    Reading in a 3-gram language model.
    Number of 1-grams = 226.
    Number of 2-grams = 913.
    Number of 3-grams = 1595.
    Reading unigrams...

    Reading 2-grams...

    Reading 3-grams...
    Reading in a 3-gram language model.
    Number of 1-grams = 72354.
    Number of 2-grams = 6581523.
    Number of 3-grams = 7704188.
    Reading unigrams...

    Reading 2-grams...
    Error - Repeated 2-gram in ARPA format language model.

    When i tried combined two Custom Trained Model, The execution was successfull.

    Any Information will be helpful!!

     
    • Nickolay V. Shmyrev

      The message says that big LM has duplicated ngrams, it might be the case. You can fix duplicated ngrams with text editor or with a script.

       
  • Manoj Gaonkar

    Manoj Gaonkar - 2017-07-06

    Thanks for the information,

    Is there any such script available.
    Can you tell me the dataset used for Default Language Model Training

     
  • Manoj Gaonkar

    Manoj Gaonkar - 2017-07-06

    Hi Nickolay,

    I searched for the duplicates in the big LM, There are no dupliactes:

    There are n-grams in LM like,
    -3.8544 service joined 0.0000
    -4.3938 service joining 0.0000
    -3.9777 service joint -0.1926
    -3.8638 service jointly 0.0000

    Are these considered as duplicates n-grams?

    if (pos_of_novelty == i && j != 1)
    quit(-1,"Error - Repeated %d-gram in ARPA format language model.\n", i);

    This the code in lm_combine.c where i get the error.

    Can you tell me what might be the other possibilities for that error while combining two LMs.

    Anything will be helpful!!

    Regards,
    Manoj

     
    • Nickolay V. Shmyrev

      Maybe it expects the ngrams to be sorted, try to sort with sphinx_lm_sort.

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.