Menu

Overview of Sphinx algorithms.

Help
Jamal
2012-11-30
2012-12-01
  • Jamal

    Jamal - 2012-11-30

    Is there somewhere that describes in detail the sphinx algorithms? I can step through the code, but that's a little inefficient.

    For instance it uses HMM routines, but what exactly is it modeling? This is what I'm assuming it does:

    1. Chops amplitude data up into frames
    2. Applies FFT to the frames to get a frequency spectrum for each frame.
    3. Uses HMM to match frequency spectrums of combinations of frames to frequency spectrums of known words.

    Am I completely off base?

     
  • Joseph S. Wisniewski

    Well, 1 and 2 are OK...

    3 - Render the FFT down to something almost like a "cepstrum", a way of looking at a waveform that eliminates a lot of what makes us individuals and just concentrates on the commonality that lets us communicate.

    (thing is, 1, 2, and 3 are well documented and implemented, if not entirely understood) and they make up maybe 7% of the system (no matter how you look at it. 7% of the code, 7% of the processor time and memory used, etc). The big, big thing is...

    4 - Search a bunch of HMMs for the sequences that best match the cepstral frames. (HMMs don't "match" things, you have to make things match the HMMs, and the a good search algorithm is the secret sauce that makes an efficient recognizer).

    One of the best overviews of how Sphinx works is the first 48 pages of David Huggins-Dains's Ph.D. thesis. Don't let that put you off, it's very approachable.

    http://www.lti.cs.cmu.edu/research/thesis/2011/david_hugginsdaines.pdf

    If you want to dive deep, create language models, etc. the aptly named "The Hieroglyphs" is worth a look.

    http://www-2.cs.cmu.edu/~archan/documentation/sphinxDocDraft3.pdf

    There's also the CMUSphinx Wiki

    http://cmusphinx.sourceforge.net/wiki/start

    Good luck.

     
  • creative64

    creative64 - 2012-12-01

    Another very readable document to understand more about sphinx is the Phd thesis
    of Mosur Ravishankar "Efficient Algorithms For Speech Recognition"
    http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.72.3560

    And a very good overview of HMMs in speech recognition is provided in the classic
    paper of Rabiner "A Tutorial In Hidden Markov Models And Selected Applications In
    Speech Recognition"
    http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.