CMU Sphinx / Forums / Help: Mixing two language models

kk_huk - 2016-07-21

Hi everyone,

I have followed http://cmusphinx.sourceforge.net/wiki/tutoriallmadvanced tutorial for that.

ngram -lm my.lm -ppl adaptest.txt -debug 2 > my.ppl
reading 24 1-grams
reading 39 2-grams
reading 1 3-grams

ngram -lm my2.lm -ppl adaptest2.txt -debug 2 > my2.ppl
reading 161 1-grams
reading 273 2-grams
reading 1 3-grams

However, when I try to run this code.

compute-best-mix my.ppl my2.ppl

The output :

compute-best-mix is not recognized as an internal or external command, operable program or batch file.

Am I missing something ?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Arseniy Gorin - 2016-07-21

It is a small awk script located usually in the same directory with ngram. Do

which ngram

and check if the script is there. If it is, try running using absolute path

if it is not, you can download here

But according to log file, your texts are very small (just 1 trigram). LM mixing is used to adap a huge model to a specific domain. Simply do not use it with this data. Concatenate your texts and train your language model

Last edit: Arseniy Gorin 2016-07-21

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- kk_huk - 2016-07-21
  
  There is no any file whose name is "compute-best-mix" in this path "cygwin64\srilm\bin\Debug"
  
  I have just found it in "cygwin64\srilm\utils\src" as a "compute-best-mix.gawk" file
  
  By the way, thanks again, I have also downloaded with your link.
  
  But How can I run this script. according to a documentation
  
  awk compute-best-mix my.ppl my2.ppl
  
  Unfortunately, It doesn't work.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Arseniy Gorin - 2016-07-21
    
    Usually you run as it is
    maybe before that you should make sure it is executable
    chmod +x compute-best-mix
    
    I again emphasize that you do not need this script for your task
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - kk_huk - 2016-07-21
      
      The file is not executable, (I guess),
      
      When I run this, chmod +x compute-best-mix , there is no any output and log text.
      
      However, when I run this
      
      awk -f compute-best-mix.awk my.lm my2.lm
      
      the output is
      
      awk: compute-best-mix.awk:140: (FILENAME=my2.lm FNR=448) fatal: division by zero attempted
      
      I again emphasize that you do not need this script for your task
      
      Why not ? Doesn't it give me the best lambda result to be able to use to below code
      
      ngram -lm your.lm -mix-lm generic.lm -lambda <factor from above> -write-lm mixed.lm
      
      But according to log file, your texts are very small (just 1 trigram). LM mixing is used to adap a huge model to a specific domain. Simply do not use it with this data. Concatenate your texts and train your language model
      
      In this case, yes, my language model are very small. But this is just for testing. Before I determine the sentences of model, I preferred to test it. Because, determining the model texts will take some time for me.
      
      Thanks again.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - Arseniy Gorin - 2016-07-21
        
        Normally after giving permissions it should work without awk -f prefix
        
        chmod +x compute-best-mix compute-best-mix my.ppl my2.ppl
        
        The error you get seems to appear because my2.lm has no common words with my.lm. When you run this
        
        ngram -lm my.lm -ppl adaptest.txt -debug 2 > my.ppl ngram -lm my2.lm -ppl adaptest2.txt -debug 2 > my2.ppl
        
        you should use the same adaptation text for both LMs.
        
        Moreover, in your last post you pass LMs to compute-best-mix, while script expects ppl files. Is it what you are actually doing?
        
        Check out this video. It explains a little bit the method. This is intended for a really large text data. Like when you have large google n-grams and adapt it on local newspapers data (still better to have a couple of MB there)
        
        Last edit: Arseniy Gorin 2016-07-21
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
        
        kk_huk - 2016-07-21
        
        you should use the same adaptation text for both LMs.
        
        Aww, my bad, Thanks a lot. It solved my problem.
        
        Check out this video. It explains a little bit the method. This is intended for a really large text data. Like when you have large google n-grams and adapt it on local newspapers data (still better to have a couple of MB there)
        
        I have just watched the video. thanks.
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Mixing two language models

Speech Recognition Toolkit

Forums

Help

Mixing two language models document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Mixing two language models