Menu

Language model adaptation

Help
2015-05-27
2015-05-29
  • Sabyasachi Upadhyay

    Hi,

    I am trying to adapt the generic language model(en-us.lm) with my own model(my.lm)
    my.lm is borne out of my set of new transcriptions that I generated for training and adapting(test.txt)

    ngram -lm my.nlm -ppl test.txt -debug 2 > your.ppl
    ngram -lm en-us.lm -ppl test.txt -debug 2 > en-us.ppl
    compute-best-mix my.ppl en-us.ppl

    I ran the aforementioned commands present in tutorial and got the statistics

    iteration 1, lambda = (0.5 0.5), ppl = 17.7361
    iteration 2, lambda = (0.829771 0.170229), ppl = 12.9919
    iteration 3, lambda = (0.915271 0.084729), ppl = 12.4683
    iteration 4, lambda = (0.939784 0.0602155), ppl = 12.3964
    iteration 5, lambda = (0.947627 0.0523734), ppl = 12.387
    iteration 6, lambda = (0.950252 0.0497476), ppl = 12.3858
    1566 non-oov words, best lambda (0.951146 0.0488536)

    ngram -lm my.lm -mix-lm en-us.lm -lambda <factor from="" above=""> -write-lm mixed.lm</factor>

    I entered it as

    ngram -lm add.lm -mix-lm en-us.lm -lambda 0.951146 0.0488536 -write-lm mixed.lm

    So, my questions are:
    1. Is the format correct? 0.951146 0.0488536 or only one of these or both in <0.951146 0.0488536>

    1. For adaptation I am using test.txt as the same transcription file from which i generated the my.lm file?? Is that the right way to go about it?

    3) I wanted to also test it out with the non-statistical grammar(JSGF) but I didnt find any appropriate link as to how to do it.

    Please help me out.

     
  • Nickolay V. Shmyrev

    1. Is the format correct? 0.951146 0.0488536 or only one of these or both in <0.951146 0.0488536>

    No, just 0.0488536 (second value). See SRILM documentation for detail

    For adaptation I am using test.txt as the same transcription file from which i generated the my.lm file?? Is that the right way to go about it?

    No, you need to split file on two parts for test and train. LM is trained from train and tested on test. Tutorial talks about that in the beginning.

    3) I wanted to also test it out with the non-statistical grammar(JSGF) but I didn't find any appropriate link as to how to do it.

    Unfortunately there is no easy tool to verify perplexity with JSGF. It is possible, but not straightforward.

     

    Last edit: Nickolay V. Shmyrev 2015-05-28

Log in to post a comment.