I'm trying to figure out how pocketsphinx processes pnonetic hmm in searching pass.
my question is how to konw the final state of phoneme hmm has been reached in cross-hmm transitions since pronounciation speed varies ?
Do we need to split the sequence of frames according to phonemes before processing it?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The decoding algorithm is called Viterbi search, it does not split frames to phonemes, instead it considers every possible frame to be the end of the phoneme and selects best variants.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for quick answer. Actually i'm doing some research work on SR , choosing pocketsphinx as the test tools. i run pocketsphinx on models downloaded from https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/ only to find it performed poorly. that's really frustrating. so i decided to go deeper into the sourcecode and try to find ways of improving the performance with regard to my project requirements.
i realy appreciated you guys would like to help me out by providing some documents relating to sourcecode, so that i can stick to my choice. my email address is tiaoshuiyu@outlook.com.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm trying to figure out how pocketsphinx processes pnonetic hmm in searching pass.
my question is how to konw the final state of phoneme hmm has been reached in cross-hmm transitions since pronounciation speed varies ?
Do we need to split the sequence of frames according to phonemes before processing it?
To learn more about speech recognition with HMM you can read Rabiner's HMM tutorial
http://www.cs.ubc.ca/~murphyk/Bayes/rabiner.pdf
The decoding algorithm is called Viterbi search, it does not split frames to phonemes, instead it considers every possible frame to be the end of the phoneme and selects best variants.
Thanks for quick answer. Actually i'm doing some research work on SR , choosing pocketsphinx as the test tools. i run pocketsphinx on models downloaded from https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/ only to find it performed poorly. that's really frustrating. so i decided to go deeper into the sourcecode and try to find ways of improving the performance with regard to my project requirements.
i realy appreciated you guys would like to help me out by providing some documents relating to sourcecode, so that i can stick to my choice. my email address is tiaoshuiyu@outlook.com.
PocketSphinx API Documentation
Starting Point
Last edit: G10DRAS 2017-03-13