Menu

bigram language model - sphinx 3

abhishek
2009-07-22
2012-09-22
  • abhishek

    abhishek - 2009-07-22

    Hi ,

    I have question regarding the use of bigram language model in sphinx 3. We can only train trigram model using the CMU lmtool utility. What is the way to train bigram models?

    What is the way to make sphinx 3 work with bigram models? I really appreciate your help.

    Thanks a lot,
    abhishek.

     
    • eliasmajic

      eliasmajic - 2009-07-23

      >I used add-dummy-bows on the bigram lm obtained from srilm. It puts a zero as backoff number on the first unigram (</s>). Is it ok?
      yes

      >I didn''t use sort-lm before and lm3g2dmp didn't give me any error. Do I still need to use it for some other reason?
      yes you do. lm3g2dmp assumes its sorted. It will not work if you dont sort.

       
    • Nickolay V. Shmyrev

      > What is the way to train bigram models?

      http://www.speech.sri.com/projects/srilm/

      > What is the way to make sphinx 3 work with bigram models? I

      There is no difference, just -lm bigram.lm

      Check sphinx3/src/tests/performance/rm1/ARGS.rm1_bigram
      and sphinx3/src/tests/performance/rm1/RM.2845.bigram.arpa.DMP for example

       
    • abhishek

      abhishek - 2009-07-22

      Thanks Nickolay. It's working. Although I have to do some changes in the model generated by SRILM. It doesn't put the backoff number at the end of first token in the unigram. I have to put a number there to make it readable to lm3g2dmp. And, the second token (in my case </s>) has a very low probability score of -99 regardless of the text I give. The first token <s> has a reasonable probability score. I couldn't understand the reason for it.

       
      • Nickolay V. Shmyrev

        > Although I have to do some changes in the model generated by SRILM. It doesn't put the backoff number at the end of first token in the unigram

        Yes, it's documented in the SRILM, you need to use add-dummy-bows script to add the probs. Also you need to use sort-lm to sort the lm.

        > . And, the second token (in my case </s>) has a very low probability score of -99 regardless of the text I give.

        It should be so, every lm has 99 as the prob for </s> start

         
        • abhishek

          abhishek - 2009-07-23

          I used add-dummy-bows on the bigram lm obtained from srilm. It puts a zero as backoff number on the first unigram (</s>). Is it ok?

          I didn''t use sort-lm before and lm3g2dmp didn't give me any error. Do I still need to use it for some other reason?

          Thanks a lot,
          abhishek.

           
    • abhishek

      abhishek - 2009-07-22

      Thanks Nickolay. It's working. Although I have to do some changes in the model generated by SRILM. It doesn't put the backoff number at the end of first token in the unigram. I have to put a number there to make it readable to lm3g2dmp. And, the second token (in my case </s>) has a very low probability score of -99 regardless of the text I give. The first token <s> has a reasonable probability score. I couldn't understand the reason for it.

       

Log in to post a comment.