Menu

Mixing two language models

Help
kk_huk
2016-07-21
2016-07-21
  • kk_huk

    kk_huk - 2016-07-21

    Hi everyone,

    I have followed http://cmusphinx.sourceforge.net/wiki/tutoriallmadvanced tutorial for that.

    ngram -lm my.lm -ppl adaptest.txt -debug 2 > my.ppl
    reading 24 1-grams
    reading 39 2-grams
    reading 1 3-grams

    ngram -lm my2.lm -ppl adaptest2.txt -debug 2 > my2.ppl
    reading 161 1-grams
    reading 273 2-grams
    reading 1 3-grams

    However, when I try to run this code.

    compute-best-mix my.ppl my2.ppl

    The output :

    compute-best-mix is not recognized as an internal or external command, operable program or batch file.
    

    Am I missing something ?

     
  • Arseniy Gorin

    Arseniy Gorin - 2016-07-21

    It is a small awk script located usually in the same directory with ngram. Do

    which ngram

    and check if the script is there. If it is, try running using absolute path

    if it is not, you can download here

    But according to log file, your texts are very small (just 1 trigram). LM mixing is used to adap a huge model to a specific domain. Simply do not use it with this data. Concatenate your texts and train your language model

     

    Last edit: Arseniy Gorin 2016-07-21
    • kk_huk

      kk_huk - 2016-07-21

      There is no any file whose name is "compute-best-mix" in this path "cygwin64\srilm\bin\Debug"

      I have just found it in "cygwin64\srilm\utils\src" as a "compute-best-mix.gawk" file

      By the way, thanks again, I have also downloaded with your link.

      But How can I run this script. according to a documentation

      awk compute-best-mix my.ppl my2.ppl

      Unfortunately, It doesn't work.

       
      • Arseniy Gorin

        Arseniy Gorin - 2016-07-21

        Usually you run as it is
        maybe before that you should make sure it is executable
        chmod +x compute-best-mix

        I again emphasize that you do not need this script for your task

         
        • kk_huk

          kk_huk - 2016-07-21

          The file is not executable, (I guess),

          When I run this, chmod +x compute-best-mix , there is no any output and log text.

          However, when I run this

          awk -f compute-best-mix.awk my.lm my2.lm

          the output is

          awk: compute-best-mix.awk:140: (FILENAME=my2.lm FNR=448) fatal: division by zero attempted

          I again emphasize that you do not need this script for your task

          Why not ? Doesn't it give me the best lambda result to be able to use to below code

          ngram -lm your.lm -mix-lm generic.lm -lambda <factor from above> -write-lm mixed.lm

          But according to log file, your texts are very small (just 1 trigram). LM mixing is used to adap a huge model to a specific domain. Simply do not use it with this data. Concatenate your texts and train your language model

          In this case, yes, my language model are very small. But this is just for testing. Before I determine the sentences of model, I preferred to test it. Because, determining the model texts will take some time for me.

          Thanks again.

           
          • Arseniy Gorin

            Arseniy Gorin - 2016-07-21

            Normally after giving permissions it should work without awk -f prefix

            chmod +x compute-best-mix
            compute-best-mix my.ppl my2.ppl
            

            The error you get seems to appear because my2.lm has no common words with my.lm. When you run this

            ngram -lm my.lm -ppl adaptest.txt -debug 2 > my.ppl
            ngram -lm my2.lm -ppl adaptest2.txt -debug 2 > my2.ppl
            

            you should use the same adaptation text for both LMs.

            Moreover, in your last post you pass LMs to compute-best-mix, while script expects ppl files. Is it what you are actually doing?

            Check out this video. It explains a little bit the method. This is intended for a really large text data. Like when you have large google n-grams and adapt it on local newspapers data (still better to have a couple of MB there)

             

            Last edit: Arseniy Gorin 2016-07-21
            • kk_huk

              kk_huk - 2016-07-21

              you should use the same adaptation text for both LMs.

              Aww, my bad, Thanks a lot. It solved my problem.

              Check out this video. It explains a little bit the method. This is intended for a really large text data. Like when you have large google n-grams and adapt it on local newspapers data (still better to have a couple of MB there)

              I have just watched the video. thanks.

               

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.