CMU Sphinx / Forums / Help: Sphinx3 and class based Language models

Nagendra Kumar Goel - 2007-03-03

I want to use a class based language model with Sphinx3. Is it supported?
Can each word in the class have a probability associated with it (instead of
uniform prior?)
I see source code supporting it, but I do not see an example anywhere.
Can someone please point me to an example where each word in the class
has a probability associated with it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- David Huggins-Daines - 2007-03-03
  
  Yes, it's supported. It works the same as in Sphinx2 - you make a "control file" listing each language model with its associated classes, and a class definition file which lists the words in each class with their probabilities.
  
  Look at the example in sphinx3/model/lm/an4, specifically these files:
  
  args.an4.test.cls
  an4.ug.cls.lmctl
  an4.cls.probdef
  
  The last file doesn't have any probabilities in it since there is only one member in the class (I don't know why the test was made this way, it isn't a very good test!). You can enter probabilities like this:
  
  LMCLASS [v_class]
  A 0.25
  E 0.3
  I 0.1
  O 0.25
  U 0.1
  END [v_class]
  
  It's probably a good idea for them to add up to one within each class. The code should just normalize them for you, but for some reason it doesn't.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nagendra Kumar Goel - 2007-03-04
    
    Thanks a lot. I assume you meant the following files:
    args.an4.test.cls.in
    an4.ug.cls.lmctl.in
    and an4.cls.probdef
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nagendra Kumar Goel - 2007-03-07
  
  Sphinx3 seems to be ignoring the class based probabilities. I can say that because I built a class based LM and
  a regular trigram LM.
  
  the perplexity of the sentance to be recognized is about 235 using the class based LM (measured by using the
  equivalent class file definition and SRILM toolkit). I increase the LM weight to 14 and still the recognized string has
  words that have class conditional probability of only 1e-7.
  The perplexity of the recognized text (measured using srilm) is 2.8e6. This is possible only if Sphinx3 (I used livepretend) is ignoring the class conditionals and assigning a uniform prior instead.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sphinx3 and class based Language models

Speech Recognition Toolkit

Forums

Help

Sphinx3 and class based Language models document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Sphinx3 and class based Language models