Anonymous - 2006-03-10

Who knows the conditions under which the Resource Management model "RM1_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar" has been created, especially:

1.) amount of training data
- did they only use the files that are used in the rm1 tutorial (1600 utterances from speaker independent corpus, only 1.5 hours audio)
- whole training corpus of the Resource Management database used (corresponds roughly to the suggested 10 hours for the RM corpus)?

2.) creation of cepstrum files
- if I look into the 'model.props' in the 'RM1_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar' it seems that the default parameters have been used for the wave2feat tool. Is that right?

3.) steps in the training process
- is it sufficient to start the 'RunAll.pl' script or do I have to split the training process into more complex steps
- which parameters have to be adapted in the sphinx_train.cfg file after creation by the 'setup_SphinxTrain.pl' script.

Background:
I like to create an acoustic model (for SPHINX4) based on (a noisy version of) the Resource Management corpus. To obtain a reference model, I have to train it on the clean speech RM training corpus.
For comparison I downloaded the available model RM1_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar from the SPHINX webpage.
To evaluate the models I used the regressiontest-setup from the tests/performance/rm1 folder in the SPHINX4 directory and put in my own models. Unfortunately I can't achieve the performance of the available model. Using the bigram language model in both setups, my WER is 6.3% whereas people from the SPHINX group achieve about 4% WER (http://cmusphinx.sourceforge.net/MediumVocabResults.html, rm1_bigram)!

Thanks a lot
Andreas