CMU Sphinx / Forums / Speech Recognition Theory: bigram language model

abhishek - 2009-07-22

Hi ,

I have question regarding the use of bigram language model in sphinx 3. We can only train trigram model using the CMU lmtool utility. What is the way to train bigram models?

What is the way to make sphinx 3 work with bigram models? I really appreciate your help.

Thanks a lot,
abhishek.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- eliasmajic - 2009-07-23
  
  >I used add-dummy-bows on the bigram lm obtained from srilm. It puts a zero as backoff number on the first unigram (</s>). Is it ok?
  yes
  
  >I didn''t use sort-lm before and lm3g2dmp didn't give me any error. Do I still need to use it for some other reason?
  yes you do. lm3g2dmp assumes its sorted. It will not work if you dont sort.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2009-07-22
  
  > What is the way to train bigram models?
  
  http://www.speech.sri.com/projects/srilm/
  
  > What is the way to make sphinx 3 work with bigram models? I
  
  There is no difference, just -lm bigram.lm
  
  Check sphinx3/src/tests/performance/rm1/ARGS.rm1_bigram
  and sphinx3/src/tests/performance/rm1/RM.2845.bigram.arpa.DMP for example
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- abhishek - 2009-07-22
  
  Thanks Nickolay. It's working. Although I have to do some changes in the model generated by SRILM. It doesn't put the backoff number at the end of first token in the unigram. I have to put a number there to make it readable to lm3g2dmp. And, the second token (in my case </s>) has a very low probability score of -99 regardless of the text I give. The first token <s> has a reasonable probability score. I couldn't understand the reason for it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2009-07-22
    
    > Although I have to do some changes in the model generated by SRILM. It doesn't put the backoff number at the end of first token in the unigram
    
    Yes, it's documented in the SRILM, you need to use add-dummy-bows script to add the probs. Also you need to use sort-lm to sort the lm.
    
    > . And, the second token (in my case </s>) has a very low probability score of -99 regardless of the text I give.
    
    It should be so, every lm has 99 as the prob for </s> start
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - abhishek - 2009-07-23
      
      I used add-dummy-bows on the bigram lm obtained from srilm. It puts a zero as backoff number on the first unigram (</s>). Is it ok?
      
      I didn''t use sort-lm before and lm3g2dmp didn't give me any error. Do I still need to use it for some other reason?
      
      Thanks a lot,
      abhishek.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- abhishek - 2009-07-22
  
  Thanks Nickolay. It's working. Although I have to do some changes in the model generated by SRILM. It doesn't put the backoff number at the end of first token in the unigram. I have to put a number there to make it readable to lm3g2dmp. And, the second token (in my case </s>) has a very low probability score of -99 regardless of the text I give. The first token <s> has a reasonable probability score. I couldn't understand the reason for it.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bigram language model - sphinx 3

Speech Recognition Toolkit

Forums

Help

bigram language model - sphinx 3

bigram language model - sphinx 3

Speech Recognition Toolkit

Forums

Help

bigram language model - sphinx 3 document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

bigram language model - sphinx 3