When i tried merging the default CMU Sphinx LM with custom language model, I got the following
Reading in a 3-gram language model.
Number of 1-grams = 226.
Number of 2-grams = 913.
Number of 3-grams = 1595.
Reading unigrams...
Reading 2-grams...
Reading 3-grams...
Reading in a 3-gram language model.
Number of 1-grams = 72354.
Number of 2-grams = 6581523.
Number of 3-grams = 7704188.
Reading unigrams...
Reading 2-grams...
Error - Repeated 2-gram in ARPA format language model.
When i tried combined two Custom Trained Model, The execution was successfull.
Any Information will be helpful!!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I searched for the duplicates in the big LM, There are no dupliactes:
There are n-grams in LM like,
-3.8544 service joined 0.0000
-4.3938 service joining 0.0000
-3.9777 service joint -0.1926
-3.8638 service jointly 0.0000
Are these considered as duplicates n-grams?
if (pos_of_novelty == i && j != 1)
quit(-1,"Error - Repeated %d-gram in ARPA format language model.\n", i);
This the code in lm_combine.c where i get the error.
Can you tell me what might be the other possibilities for that error while combining two LMs.
Anything will be helpful!!
Regards,
Manoj
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Command i used:
lm_combine.exe -lm1 custom.lm -lm2 en-70k-0.1.lm -weight w.wt -lm mix.lm
w.wt:
custom.lm 0.5
en-70k-0.1.lm 0.5
When i tried merging the default CMU Sphinx LM with custom language model, I got the following
Reading in a 3-gram language model.
Number of 1-grams = 226.
Number of 2-grams = 913.
Number of 3-grams = 1595.
Reading unigrams...
Reading 2-grams...
Reading 3-grams...
Reading in a 3-gram language model.
Number of 1-grams = 72354.
Number of 2-grams = 6581523.
Number of 3-grams = 7704188.
Reading unigrams...
Reading 2-grams...
Error - Repeated 2-gram in ARPA format language model.
When i tried combined two Custom Trained Model, The execution was successfull.
Any Information will be helpful!!
The message says that big LM has duplicated ngrams, it might be the case. You can fix duplicated ngrams with text editor or with a script.
Thanks for the information,
Is there any such script available.
Can you tell me the dataset used for Default Language Model Training
Hi Nickolay,
I searched for the duplicates in the big LM, There are no dupliactes:
There are n-grams in LM like,
-3.8544 service joined 0.0000
-4.3938 service joining 0.0000
-3.9777 service joint -0.1926
-3.8638 service jointly 0.0000
Are these considered as duplicates n-grams?
This the code in lm_combine.c where i get the error.
Can you tell me what might be the other possibilities for that error while combining two LMs.
Anything will be helpful!!
Regards,
Manoj
Maybe it expects the ngrams to be sorted, try to sort with sphinx_lm_sort.