Hi
I have use sphinx2 and sphinxtrain for more than 1 year. Now I am studying on the training algorithm of SCHMM.
can anyone tell me what is the exact formulas use in this sphinxtrain for SCHMM training?
according to my findings in the coding:
1. aggseg: accumulate all the mfcc files in a binary dump file.
2. kmeans_init: read the dump file and initialize the means and variance. random kmeans method is used. 256 sample are choosen randomly as initial guess of centroid mean value. After kmeans clustering, 4 stream of 256 means and variance are saved. means are the means of each cluster, variance are the variance of samples in each cluster.
Is my finding correct? Thanks
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
For training algorithm, you need to find a good book in teaching people EM in speech recognition. Unfortunately, that's not many, I would try Xue-Dong Huang's "Hidden Markov Models for Speech Recognition (Edinburgh Information Technology Series, 7) "
Also, try to look up Dr. M. Y. Hwang's paper on senone
"Subphonetic Modeling with Markov States - Senone" in ICASSP 1992.
This should give you some ideas on what actually Sphinx does.
You description of aggseg and kmeans_init are correct.
I hope this could help and thanks for continuing to support Sphinx.
Arthur
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Your findings so far are correct. Well, almost. ag_seg doesn't necessarily accumulate all cepstra. The switch "stride" allows you to skip every n samples.
kmeans initializes the gaussians, which are shared among all HMM states. But this is the initialization only. That's the easy part :-). The real action is on the baum welch part.
--Evandro
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi All,
Thank you for your reply. However, it is hard for me to get the resources you guys mentioned. Can anyone forward the necessary formulas in Baum Welch to me? Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi
I have use sphinx2 and sphinxtrain for more than 1 year. Now I am studying on the training algorithm of SCHMM.
can anyone tell me what is the exact formulas use in this sphinxtrain for SCHMM training?
according to my findings in the coding:
1. aggseg: accumulate all the mfcc files in a binary dump file.
2. kmeans_init: read the dump file and initialize the means and variance. random kmeans method is used. 256 sample are choosen randomly as initial guess of centroid mean value. After kmeans clustering, 4 stream of 256 means and variance are saved. means are the means of each cluster, variance are the variance of samples in each cluster.
Is my finding correct? Thanks
For training algorithm, you need to find a good book in teaching people EM in speech recognition. Unfortunately, that's not many, I would try Xue-Dong Huang's "Hidden Markov Models for Speech Recognition (Edinburgh Information Technology Series, 7) "
Also, try to look up Dr. M. Y. Hwang's paper on senone
"Subphonetic Modeling with Markov States - Senone" in ICASSP 1992.
This should give you some ideas on what actually Sphinx does.
You description of aggseg and kmeans_init are correct.
I hope this could help and thanks for continuing to support Sphinx.
Arthur
Your findings so far are correct. Well, almost. ag_seg doesn't necessarily accumulate all cepstra. The switch "stride" allows you to skip every n samples.
kmeans initializes the gaussians, which are shared among all HMM states. But this is the initialization only. That's the easy part :-). The real action is on the baum welch part.
--Evandro
Hi All,
Thank you for your reply. However, it is hard for me to get the resources you guys mentioned. Can anyone forward the necessary formulas in Baum Welch to me? Thanks.