CMU Sphinx / Forums / Sphinx4 Help: Acoustic model training

Florian - 2015-01-20

Hello,

I'm currently adapting the ILA voice assistant to work with the new PTM model. As I'm also working on an interface to train the acoustic models I was wondering if someone has the exact parameters for the new cmusphinx-en-us-5.2-2.0 and cmusphinx-en-us-5.2-2.0-ptm models. Currently I'm using the windows executables with the following settings:

sphinx_fe -argfile en-us/feat.params
-samprate 16000
-c ILA_voice_train.listoffiles
-di .
-do .
-ei wav
-eo mfc
-mswav yes

bw -hmmdir en-us
-moddeffn en-us/mdef
-ts2cbfn .cont.
-feat 1s_c_d_dd
-lda en-us/feature_transform
-dictfn en-us/dict/dictionary.dic
-ctlfn ILA_voice_train.listoffiles
-lsnfn ILA_voice_train.transcription
-agc none -accumdir .

mllr_solve -meanfn en-us/means
-varfn en-us/variances
-outmllrfn mllr_matrix
-accumdir .

map_adapt -meanfn en-us/means
-varfn en-us/variances
-mixwfn en-us/mixture_weights
-tmatfn en-us/transition_matrices
-accumdir .
-mapmeanfn en-us-adapt/means
-mapvarfn en-us-adapt/variances
-mapmixwfn en-us-adapt/mixture_weights
-maptmatfn en-us-adapt/transition_matrices

Does that look reasonable? :-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Nickolay V. Shmyrev - 2015-01-20

Looks correct

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Florian - 2015-01-21

Hi Nickolay,

I successfully trained the "normal" model with the above script. For the PTM model I had to make some smaller changes (assuming the folder is called "en-us" too):

sphinx_fe -argfile en-us/feat.params
-samprate 16000
-c ILA_voice_train.listoffiles
-di . -do . -ei wav -eo mfc -mswav yes

bw -hmmdir en-us
-moddeffn en-us/mdef
-ts2cbfn .ptm.
-feat 1s_c_d_dd
-svspec 0-12/13-25/26-38
-agc none
-cmn current
-dictfn en-us/dict/dictionary.dic
-ctlfn ILA_voice_train.listoffiles
-lsnfn ILA_voice_train.transcription
-accumdir .

mllr_solve -meanfn en-us/means
-varfn en-us/variances
-outmllrfn mllr_matrix
-accumdir .

map_adapt -meanfn en-us/means
-varfn en-us/variances
-mixwfn en-us/mixture_weights
-tmatfn en-us/transition_matrices
-accumdir .
-mapmeanfn en-us-adapt/means
-mapvarfn en-us-adapt/variances
-mapmixwfn en-us-adapt/mixture_weights
-maptmatfn en-us-adapt/transition_matrices

But unfortunately at step 4 the map_adapt.exe crashes (I'm using the windows version) and the model will not be updated. Windows doesn't give me more details but the process stops at
"INFO: main.c<77>: Estimating tau hyperparameter from variances and observations"
Any ideas? :-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Nickolay V. Shmyrev - 2015-01-21
  
  Proper adaptation with different types of models is a complex thing, proper parameters depend on the amount of data present. You also need to choose the appropriate adaptation method depending on amount of data. MLLR adaptation works with 30 seconds of data, MAP adaptation of continuous model requires 1 hour of adaptation data and MAP adaptation of PTM requires about 10 minutes of data. There is also smoothing parameter tau of map_adapt which you can adjust for semi-cont and PTM models.
  
  If you have less than 5 minutes of adaptation data I recommend you to stick with MLLR adaptation for PTM and cont models.
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Nickolay V. Shmyrev - 2015-01-21
    
    And, instead of all sphinxtrain binaries you can just use sphinx4 MLLR speaker adaptation as demonstrated in Transcriber demo. It might be broken a bit but its certainly easier for the users to use pure Java implementation.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Florian - 2015-01-23

Thanks for the info! I got the MLLR adaption working now so you can make an unsupervised adaption inside ILA or use a supervised adaption done with the tools described in the first post (it is supervised isn't it?). You just have to copy the mllr_matrix to your acoustic model and it will automatically load on startup. I had to fix the Transformation Class a bit to get it working for some reason the 'input.nextFloat()' didn't work and I had to change it to 'Float.parseFloat(input.next())'.
Together with my new microphone (^^) and the MLLR adaption the results are really great now for the grammar based version and even without grammar I finally get some promising results :-D

I have around 10 minutes of training data for the MAP adaption. According to your suggestions that means I could try it with the PTM model, but as I'm not sure about the parameters yet (and the map_adapt.exe keeps crashing) maybe I'll skip this to a later point. But do you have some suggestions for the parameters maybe? ^^

And a nother problem that came up was that it seems the LiveSpeechRecognizer does not work with MLLR speaker adaption is that true? When I first record my data and then use the StreamSpeechRecognizer it works like charm, but the LSR spits out Errors (I think it was nullpointer and something else). Maybe the problem is my LiveSpeechRecognizer because it's working fine with grammar restrictions but completely fails without grammar not beeing able of transcribing more that 3-4 words in a row. Have you ever encountered this problem? First I thought it was the buffer of the microphone but it looks fine and right now I have no idea.

thank you for your help,
Florian

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Florian - 2015-01-30

a quick note:

I think the MLLR unsupervised speaker adaptation hase some issues when saving the matrix for the PTM model. I implemented the adaptation in ILA v3.2 and it works fine for the non-PTM models but when I try to load a previously saved MLLR_matrix for the PTM model it crashes because I think it tries to read a matrix with wrong dimensions.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Acoustic model training - parameters for the new PTM model?

Speech Recognition Toolkit

Forums

Help

Acoustic model training - parameters for the new PTM model?

Acoustic model training - parameters for the new PTM model?

Speech Recognition Toolkit

Forums

Help

Acoustic model training - parameters for the new PTM model? document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Acoustic model training - parameters for the new PTM model?