I was trying to duplicate phone recognition results of http://ttic.uchicago.edu/~jkeshet/papers/KeshetGrBe08.pdf on TIMIT test
set (training done on TIMIT train set). Pls see pp 15-16. They have
created 39 CI HMM models using a software package "Torch" which is
compatible with HTK. They use time alignment info too. They get 64%
accuracy. They didn't mention any phone language model.
When I used null grammar in sphinx-3, I get 39% phone accuracy, and 58%
accuracy when trigram-LM is used. I used 5 state HMM (skipstate = yes) and
32 gaussians as used in the above paper (I used SphinxTain for training) .
Also I varied WIP in some broad
range. Beamwidth was kept default.
If anybody has Timit CI phone recognition results using Sphinx3/4, please
share.
Thanks.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
CMUSphinx shouldn't be different from other toolkits. You results are more or
less fine, maybe need some small tweaks but in range. Many researchers report
64% CI accuracy with bigram phone model. For example
H. Glass et al. “A probabilistic framework for feature-based speech
recognition”.
For the reference script to train a model with HTK you can check HTKTimit from
Tony Robinson:
In setting up a phoneme recogniser, can phoneme segmentation files be used in
CMUSphinx to bootstrap the models? FlatInit in the SphinxTrain scripts is
hardcoded in Module 20, is there a way to use the slave pearl scripts maybe by
setting iter to 2? Otherwise I am considering using HTK and converting to
sphinx format, seems to work with most of the CMU decoders.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I was trying to duplicate phone recognition results of
http://ttic.uchicago.edu/~jkeshet/papers/KeshetGrBe08.pdf on TIMIT test
set (training done on TIMIT train set). Pls see pp 15-16. They have
created 39 CI HMM models using a software package "Torch" which is
compatible with HTK. They use time alignment info too. They get 64%
accuracy. They didn't mention any phone language model.
When I used null grammar in sphinx-3, I get 39% phone accuracy, and 58%
accuracy when trigram-LM is used. I used 5 state HMM (skipstate = yes) and
32 gaussians as used in the above paper (I used SphinxTain for training) .
Also I varied WIP in some broad
range. Beamwidth was kept default.
If anybody has Timit CI phone recognition results using Sphinx3/4, please
share.
Thanks.
CMUSphinx shouldn't be different from other toolkits. You results are more or
less fine, maybe need some small tweaks but in range. Many researchers report
64% CI accuracy with bigram phone model. For example
H. Glass et al. “A probabilistic framework for feature-based speech
recognition”.
For the reference script to train a model with HTK you can check HTKTimit from
Tony Robinson:
http://www.cantabResearch.com/HTKtimit.sh
There are some surveys on subject which basically site same results:
http://laps.ufpa.br/aldebaro/papers/Timitresults.pdf
http://www.intechopen.com/download/pdf/pdfs_id/15948
As for the paper you cited, I think it also uses bigram language model just
doesn't mention that. Or it's not quite correct.
You might now already but you can download all Josef Keshet sources from his
home page:
http://ttic.uchicago.edu/~jkeshet/Source_Code.html
Thanks a lot for various pointers.
I will try to further tweak my config.
There is not much to tweak. The most critical thing to do is an initialization
from segmentation, not flat start as in default sphinxtrain.
In setting up a phoneme recogniser, can phoneme segmentation files be used in
CMUSphinx to bootstrap the models? FlatInit in the SphinxTrain scripts is
hardcoded in Module 20, is there a way to use the slave pearl scripts maybe by
setting iter to 2? Otherwise I am considering using HTK and converting to
sphinx format, seems to work with most of the CMU decoders.