CMU Sphinx / Forums / Help: Language model adaptation

Speech Recognition Toolkit

Language model adaptation

Forum: Help

Creator: Sabyasachi Upadhyay

Created: 2015-05-27

Updated: 2015-05-29

Sabyasachi Upadhyay - 2015-05-27

Hi,

I am trying to adapt the generic language model(en-us.lm) with my own model(my.lm)
my.lm is borne out of my set of new transcriptions that I generated for training and adapting(test.txt)

ngram -lm my.nlm -ppl test.txt -debug 2 > your.ppl
ngram -lm en-us.lm -ppl test.txt -debug 2 > en-us.ppl
compute-best-mix my.ppl en-us.ppl

I ran the aforementioned commands present in tutorial and got the statistics

iteration 1, lambda = (0.5 0.5), ppl = 17.7361
iteration 2, lambda = (0.829771 0.170229), ppl = 12.9919
iteration 3, lambda = (0.915271 0.084729), ppl = 12.4683
iteration 4, lambda = (0.939784 0.0602155), ppl = 12.3964
iteration 5, lambda = (0.947627 0.0523734), ppl = 12.387
iteration 6, lambda = (0.950252 0.0497476), ppl = 12.3858
1566 non-oov words, best lambda (0.951146 0.0488536)

ngram -lm my.lm -mix-lm en-us.lm -lambda <factor from="" above=""> -write-lm mixed.lm</factor>

I entered it as

ngram -lm add.lm -mix-lm en-us.lm -lambda 0.951146 0.0488536 -write-lm mixed.lm

So, my questions are:
1. Is the format correct? 0.951146 0.0488536 or only one of these or both in <0.951146 0.0488536>

For adaptation I am using test.txt as the same transcription file from which i generated the my.lm file?? Is that the right way to go about it?

3) I wanted to also test it out with the non-statistical grammar(JSGF) but I didnt find any appropriate link as to how to do it.

Please help me out.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2015-05-28

Is the format correct? 0.951146 0.0488536 or only one of these or both in <0.951146 0.0488536>

No, just 0.0488536 (second value). See SRILM documentation for detail

For adaptation I am using test.txt as the same transcription file from which i generated the my.lm file?? Is that the right way to go about it?

No, you need to split file on two parts for test and train. LM is trained from train and tested on test. Tutorial talks about that in the beginning.

3) I wanted to also test it out with the non-statistical grammar(JSGF) but I didn't find any appropriate link as to how to do it.

Unfortunately there is no easy tool to verify perplexity with JSGF. It is possible, but not straightforward.

Last edit: Nickolay V. Shmyrev 2015-05-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Language model adaptation

Speech Recognition Toolkit

Forums

Help

Language model adaptation document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Language model adaptation