Hi,
I am trying to adapt the generic language model(en-us.lm) with my own model(my.lm) my.lm is borne out of my set of new transcriptions that I generated for training and adapting(test.txt)
ngram -lm my.nlm -ppl test.txt -debug 2 > your.ppl ngram -lm en-us.lm -ppl test.txt -debug 2 > en-us.ppl compute-best-mix my.ppl en-us.ppl
I ran the aforementioned commands present in tutorial and got the statistics
iteration 1, lambda = (0.5 0.5), ppl = 17.7361 iteration 2, lambda = (0.829771 0.170229), ppl = 12.9919 iteration 3, lambda = (0.915271 0.084729), ppl = 12.4683 iteration 4, lambda = (0.939784 0.0602155), ppl = 12.3964 iteration 5, lambda = (0.947627 0.0523734), ppl = 12.387 iteration 6, lambda = (0.950252 0.0497476), ppl = 12.3858 1566 non-oov words, best lambda (0.951146 0.0488536)
ngram -lm my.lm -mix-lm en-us.lm -lambda <factor from="" above=""> -write-lm mixed.lm</factor>
I entered it as
ngram -lm add.lm -mix-lm en-us.lm -lambda 0.951146 0.0488536 -write-lm mixed.lm
So, my questions are: 1. Is the format correct? 0.951146 0.0488536 or only one of these or both in <0.951146 0.0488536>
3) I wanted to also test it out with the non-statistical grammar(JSGF) but I didnt find any appropriate link as to how to do it.
Please help me out.
Is the format correct? 0.951146 0.0488536 or only one of these or both in <0.951146 0.0488536>
No, just 0.0488536 (second value). See SRILM documentation for detail
For adaptation I am using test.txt as the same transcription file from which i generated the my.lm file?? Is that the right way to go about it?
No, you need to split file on two parts for test and train. LM is trained from train and tested on test. Tutorial talks about that in the beginning.
3) I wanted to also test it out with the non-statistical grammar(JSGF) but I didn't find any appropriate link as to how to do it.
Unfortunately there is no easy tool to verify perplexity with JSGF. It is possible, but not straightforward.
Log in to post a comment.
Hi,
I am trying to adapt the generic language model(en-us.lm) with my own model(my.lm)
my.lm is borne out of my set of new transcriptions that I generated for training and adapting(test.txt)
ngram -lm my.nlm -ppl test.txt -debug 2 > your.ppl
ngram -lm en-us.lm -ppl test.txt -debug 2 > en-us.ppl
compute-best-mix my.ppl en-us.ppl
I ran the aforementioned commands present in tutorial and got the statistics
iteration 1, lambda = (0.5 0.5), ppl = 17.7361
iteration 2, lambda = (0.829771 0.170229), ppl = 12.9919
iteration 3, lambda = (0.915271 0.084729), ppl = 12.4683
iteration 4, lambda = (0.939784 0.0602155), ppl = 12.3964
iteration 5, lambda = (0.947627 0.0523734), ppl = 12.387
iteration 6, lambda = (0.950252 0.0497476), ppl = 12.3858
1566 non-oov words, best lambda (0.951146 0.0488536)
ngram -lm my.lm -mix-lm en-us.lm -lambda <factor from="" above=""> -write-lm mixed.lm</factor>
I entered it as
ngram -lm add.lm -mix-lm en-us.lm -lambda 0.951146 0.0488536 -write-lm mixed.lm
So, my questions are:
1. Is the format correct? 0.951146 0.0488536 or only one of these or both in <0.951146 0.0488536>
3) I wanted to also test it out with the non-statistical grammar(JSGF) but I didnt find any appropriate link as to how to do it.
Please help me out.
No, just 0.0488536 (second value). See SRILM documentation for detail
No, you need to split file on two parts for test and train. LM is trained from train and tested on test. Tutorial talks about that in the beginning.
Unfortunately there is no easy tool to verify perplexity with JSGF. It is possible, but not straightforward.
Last edit: Nickolay V. Shmyrev 2015-05-28