Menu

Sphinx3 and class based Language models

Help
2007-03-03
2012-09-22
  • Nagendra Kumar Goel

    I want to use a class based language model with Sphinx3. Is it supported?
    Can each word in the class have a probability associated with it (instead of
    uniform prior?)
    I see source code supporting it, but I do not see an example anywhere.
    Can someone please point me to an example where each word in the class
    has a probability associated with it.

     
    • David Huggins-Daines

      Yes, it's supported. It works the same as in Sphinx2 - you make a "control file" listing each language model with its associated classes, and a class definition file which lists the words in each class with their probabilities.

      Look at the example in sphinx3/model/lm/an4, specifically these files:

      args.an4.test.cls
      an4.ug.cls.lmctl
      an4.cls.probdef

      The last file doesn't have any probabilities in it since there is only one member in the class (I don't know why the test was made this way, it isn't a very good test!). You can enter probabilities like this:

      LMCLASS [v_class]
      A 0.25
      E 0.3
      I 0.1
      O 0.25
      U 0.1
      END [v_class]

      It's probably a good idea for them to add up to one within each class. The code should just normalize them for you, but for some reason it doesn't.

       
      • Nagendra Kumar Goel

        Thanks a lot. I assume you meant the following files:
        args.an4.test.cls.in
        an4.ug.cls.lmctl.in
        and an4.cls.probdef

         
    • Nagendra Kumar Goel

      Sphinx3 seems to be ignoring the class based probabilities. I can say that because I built a class based LM and
      a regular trigram LM.

      the perplexity of the sentance to be recognized is about 235 using the class based LM (measured by using the
      equivalent class file definition and SRILM toolkit). I increase the LM weight to 14 and still the recognized string has
      words that have class conditional probability of only 1e-7.
      The perplexity of the recognized text (measured using srilm) is 2.8e6. This is possible only if Sphinx3 (I used livepretend) is ignoring the class conditionals and assigning a uniform prior instead.

       

Log in to post a comment.