But according to log file, your texts are very small (just 1 trigram). LM mixing is used to adap a huge model to a specific domain. Simply do not use it with this data. Concatenate your texts and train your language model
Last edit: Arseniy Gorin 2016-07-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
But according to log file, your texts are very small (just 1 trigram). LM mixing is used to adap a huge model to a specific domain. Simply do not use it with this data. Concatenate your texts and train your language model
In this case, yes, my language model are very small. But this is just for testing. Before I determine the sentences of model, I preferred to test it. Because, determining the model texts will take some time for me.
Thanks again.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
you should use the same adaptation text for both LMs.
Moreover, in your last post you pass LMs to compute-best-mix, while script expects ppl files. Is it what you are actually doing?
Check out this video. It explains a little bit the method. This is intended for a really large text data. Like when you have large google n-grams and adapt it on local newspapers data (still better to have a couple of MB there)
Last edit: Arseniy Gorin 2016-07-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
you should use the same adaptation text for both LMs.
Aww, my bad, Thanks a lot. It solved my problem.
Check out this video. It explains a little bit the method. This is intended for a really large text data. Like when you have large google n-grams and adapt it on local newspapers data (still better to have a couple of MB there)
I have just watched the video. thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everyone,
I have followed http://cmusphinx.sourceforge.net/wiki/tutoriallmadvanced tutorial for that.
ngram -lm my.lm -ppl adaptest.txt -debug 2 > my.ppl
reading 24 1-grams
reading 39 2-grams
reading 1 3-grams
ngram -lm my2.lm -ppl adaptest2.txt -debug 2 > my2.ppl
reading 161 1-grams
reading 273 2-grams
reading 1 3-grams
However, when I try to run this code.
compute-best-mix my.ppl my2.ppl
The output :
Am I missing something ?
It is a small awk script located usually in the same directory with ngram. Do
which ngram
and check if the script is there. If it is, try running using absolute path
if it is not, you can download here
But according to log file, your texts are very small (just 1 trigram). LM mixing is used to adap a huge model to a specific domain. Simply do not use it with this data. Concatenate your texts and train your language model
Last edit: Arseniy Gorin 2016-07-21
There is no any file whose name is "compute-best-mix" in this path "cygwin64\srilm\bin\Debug"
I have just found it in "cygwin64\srilm\utils\src" as a "compute-best-mix.gawk" file
By the way, thanks again, I have also downloaded with your link.
But How can I run this script. according to a documentation
awk compute-best-mix my.ppl my2.ppl
Unfortunately, It doesn't work.
Usually you run as it is
maybe before that you should make sure it is executable
chmod +x compute-best-mix
I again emphasize that you do not need this script for your task
The file is not executable, (I guess),
When I run this,
chmod +x compute-best-mix
, there is no any output and log text.However, when I run this
awk -f compute-best-mix.awk my.lm my2.lm
the output is
awk: compute-best-mix.awk:140: (FILENAME=my2.lm FNR=448) fatal: division by zero attempted
Why not ? Doesn't it give me the best lambda result to be able to use to below code
ngram -lm your.lm -mix-lm generic.lm -lambda <factor from above> -write-lm mixed.lm
In this case, yes, my language model are very small. But this is just for testing. Before I determine the sentences of model, I preferred to test it. Because, determining the model texts will take some time for me.
Thanks again.
Normally after giving permissions it should work without awk -f prefix
The error you get seems to appear because my2.lm has no common words with my.lm. When you run this
you should use the same adaptation text for both LMs.
Moreover, in your last post you pass LMs to compute-best-mix, while script expects ppl files. Is it what you are actually doing?
Check out this video. It explains a little bit the method. This is intended for a really large text data. Like when you have large google n-grams and adapt it on local newspapers data (still better to have a couple of MB there)
Last edit: Arseniy Gorin 2016-07-21
Aww, my bad, Thanks a lot. It solved my problem.
I have just watched the video. thanks.