CMU Sphinx / Forums / Speech Recognition Theory: Gaussians and gausian mixtures

Andreas Ravndal - 2016-04-07

Hi, I read(http://cmusphinx.sourceforge.net/wiki/acousticmodeltypes) that the ptm model uses around 5000 gaussians and sem-cont model uses 700 gaussians. When training one of these two models it is recomended to set number of gaussians densities to 256. If i understand correctly the 256 densities one set in the cfg file is used for VQ the feature vectors. The other 5000(or 700) are used in making the senones(gaussian mixtures for the state output probablilites of the HMMs).
Is this correct?
And is there any lectures,papers or books one can get more insight in the theory behind CMUSpeech recognition toolkit?
This manual(http://www.speech.cs.cmu.edu/sphinxman/scriptman1.html) have given me some great info so far, but I was hoping to find something similar that was up to date.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2016-04-07
  
  the 256 densities one set in the cfg file is used for VQ the feature vectors. The other 5000(or 700) are used in making the senones(gaussian mixtures for the state output probablilites of the HMMs).
  
  There are no "other" gaussians. In configuration file you set 256 gaussians per stream, in semi-continuous models you usually have 3 streams so 3 * 256 ~ 700 gaussians. In ptm models gaussians are phone-dependent but default setting is 64 gaussian per stream, so 3 streams * 64 gaussians per stream * 30 phones ~ 5000 gaussians in the model.
  
  And is there any lectures,papers or books one can get more insight in the theory behind CMUSpeech recognition toolkit?
  
  Spoken Language Processing
  http://www.amazon.com/Spoken-Language-Processing-Algorithm-Development/dp/0130226165
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Andreas Ravndal - 2016-04-07
    
    one more question, when you say 3 streams, do you mean feature streams?
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Nickolay V. Shmyrev - 2016-04-07
      
      do you mean feature streams?
      
      Yes, for semi models we analyze feature, feature deltas and feature delta-deltas in separate streams with separate gaussians.
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andreas Ravndal - 2016-04-07

Thank you

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Gaussians and gausian mixtures

Speech Recognition Toolkit

Forums

Help

Gaussians and gausian mixtures

Gaussians and gausian mixtures

Speech Recognition Toolkit

Forums

Help

Gaussians and gausian mixtures document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Gaussians and gausian mixtures