But there are lot of differences between the 2 generated language model
files online_tool.lm and offline_tool respectively.
Otherwise, the one generated from the online tool is used in Android
PocketSphinx app gives the result much accurately, the other one always
recognises wrong.
Could you tell me how to use the offline tool to build the language model
like the online one does.
Below is my command lines I used via the offline one
idngram2lm -vocab_type 0 -idngram test.idngram -vocab test.vocab -arpa
test.lm
Please review it and tell which parameters I should add to the 3 command
lines above.
Yours sincerely, Toan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear cmusphinx team,
I already use the cmuclm toolkit online(
http://www.speech.cs.cmu.edu/tools/lmtool-new.html) and offline(
https://sourceforge.net/p/cmusphinx/code/HEAD/tree/trunk/cmuclmtk/) to
build the file test.txt attached in this post to the language model files.
But there are lot of differences between the 2 generated language model
files online_tool.lm and offline_tool respectively.
Otherwise, the one generated from the online tool is used in Android
PocketSphinx app gives the result much accurately, the other one always
recognises wrong.
Could you tell me how to use the offline tool to build the language model
like the online one does.
Below is my command lines I used via the offline one
text2wfreq < test.txt | wfreq2vocab > test.vocab
text2idngram -vocab test.vocab -idngram test.idngram < test.txt
idngram2lm -vocab_type 0 -idngram test.idngram -vocab test.vocab -arpa
test.lm
Please review it and tell which parameters I should add to the 3 command
lines above.
Yours sincerely, Toan
idngram2lm uses kndiscount by default. online tool uses absolute discount. You can use absolute discount with idngram2lm with -absolute option.
You can also use perl script to train models identical to online models:
http://www.speech.cs.cmu.edu/tools/download/quick_lm.pl
ok, thanks!