Posterior in sphinx 3

Speech Recognition Toolkit

Brought to you by: air, arthchan2003, awb, bhiksha, and 5 others

This project can now be found here.

Posterior in sphinx 3

Forum: Help

Creator: Sylvain

Created: 2008-04-03

Updated: 2012-09-22

Sylvain - 2008-04-03

Hi,

Sphinx 3 outputs an acoustic score and a 'language model' score. As far as I understand, the acoustic score is the log posterior, or at least, it only depends on the acoustic model and the list of recognized phonemes. However, the same recording, used with different FSG grammars, but giving the same recognized sentence, gives different acoustic scores (the segmentation is also the same). Is it a bug? How is the acoustic score computed?

In addition, I don't understand the meaning of the scaling factor.

Thanks a lot for any help!

Sylvain

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2008-04-03
  
  > it only depends on the acoustic model and the list of recognized phonemes.
  
  No, it's not correct. The key idea here is rescoring, exactly related to scaling factor. To make search faster and more stable (to give better units more score) scaling is used, good units get higher weight thus they are more prefered during further search.
  
  For more precise description of the algorithm read the articles on sphinx3, probably Ravi's thesis:
  
  http://citeseer.ist.psu.edu/ravishankar96efficient.html
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.