CMU Sphinx / Forums / Help: Overview of Sphinx algorithms.

Jamal - 2012-11-30

Is there somewhere that describes in detail the sphinx algorithms? I can step through the code, but that's a little inefficient.

For instance it uses HMM routines, but what exactly is it modeling? This is what I'm assuming it does:

Chops amplitude data up into frames

Applies FFT to the frames to get a frequency spectrum for each frame.

Uses HMM to match frequency spectrums of combinations of frames to frequency spectrums of known words.

Am I completely off base?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Joseph S. Wisniewski - 2012-11-30

Well, 1 and 2 are OK...

3 - Render the FFT down to something almost like a "cepstrum", a way of looking at a waveform that eliminates a lot of what makes us individuals and just concentrates on the commonality that lets us communicate.

(thing is, 1, 2, and 3 are well documented and implemented, if not entirely understood) and they make up maybe 7% of the system (no matter how you look at it. 7% of the code, 7% of the processor time and memory used, etc). The big, big thing is...

4 - Search a bunch of HMMs for the sequences that best match the cepstral frames. (HMMs don't "match" things, you have to make things match the HMMs, and the a good search algorithm is the secret sauce that makes an efficient recognizer).

One of the best overviews of how Sphinx works is the first 48 pages of David Huggins-Dains's Ph.D. thesis. Don't let that put you off, it's very approachable.

http://www.lti.cs.cmu.edu/research/thesis/2011/david_hugginsdaines.pdf

If you want to dive deep, create language models, etc. the aptly named "The Hieroglyphs" is worth a look.

http://www-2.cs.cmu.edu/~archan/documentation/sphinxDocDraft3.pdf

There's also the CMUSphinx Wiki

http://cmusphinx.sourceforge.net/wiki/start

Good luck.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Roslyn Debacker - 2017-07-21
  
  Here is an updated link David Huggins-Danis's Ph.D. thesis.
  
  http://www.lti.cs.cmu.edu/sites/default/files/research/thesis/2011/david_huggins-daines_an_architecture_for_scalable_universal_speech_recognition.pdf
  
  Last edit: Roslyn Debacker 2017-07-21
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

creative64 - 2012-12-01

Another very readable document to understand more about sphinx is the Phd thesis
of Mosur Ravishankar "Efficient Algorithms For Speech Recognition"
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.72.3560

And a very good overview of HMMs in speech recognition is provided in the classic
paper of Rabiner "A Tutorial In Hidden Markov Models And Selected Applications In
Speech Recognition"
http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Overview of Sphinx algorithms.

Speech Recognition Toolkit

Forums

Help

Overview of Sphinx algorithms. document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Overview of Sphinx algorithms.