Menu

Acoustic model training - parameters for the new PTM model?

Florian
2015-01-20
2015-01-30
  • Florian

    Florian - 2015-01-20

    Hello,

    I'm currently adapting the ILA voice assistant to work with the new PTM model. As I'm also working on an interface to train the acoustic models I was wondering if someone has the exact parameters for the new cmusphinx-en-us-5.2-2.0 and cmusphinx-en-us-5.2-2.0-ptm models. Currently I'm using the windows executables with the following settings:

    sphinx_fe -argfile en-us/feat.params
    -samprate 16000
    -c ILA_voice_train.listoffiles
    -di .
    -do .
    -ei wav
    -eo mfc
    -mswav yes

    bw -hmmdir en-us
    -moddeffn en-us/mdef
    -ts2cbfn .cont.
    -feat 1s_c_d_dd
    -lda en-us/feature_transform
    -dictfn en-us/dict/dictionary.dic
    -ctlfn ILA_voice_train.listoffiles
    -lsnfn ILA_voice_train.transcription
    -agc none -accumdir .

    mllr_solve -meanfn en-us/means
    -varfn en-us/variances
    -outmllrfn mllr_matrix
    -accumdir .

    map_adapt -meanfn en-us/means
    -varfn en-us/variances
    -mixwfn en-us/mixture_weights
    -tmatfn en-us/transition_matrices
    -accumdir .
    -mapmeanfn en-us-adapt/means
    -mapvarfn en-us-adapt/variances
    -mapmixwfn en-us-adapt/mixture_weights
    -maptmatfn en-us-adapt/transition_matrices

    Does that look reasonable? :-)

     
  • Nickolay V. Shmyrev

    Looks correct

     
  • Florian

    Florian - 2015-01-21

    Hi Nickolay,

    I successfully trained the "normal" model with the above script. For the PTM model I had to make some smaller changes (assuming the folder is called "en-us" too):

    sphinx_fe -argfile en-us/feat.params
    -samprate 16000
    -c ILA_voice_train.listoffiles
    -di . -do . -ei wav -eo mfc -mswav yes

    bw -hmmdir en-us
    -moddeffn en-us/mdef
    -ts2cbfn .ptm.
    -feat 1s_c_d_dd
    -svspec 0-12/13-25/26-38
    -agc none
    -cmn current
    -dictfn en-us/dict/dictionary.dic
    -ctlfn ILA_voice_train.listoffiles
    -lsnfn ILA_voice_train.transcription
    -accumdir .

    mllr_solve -meanfn en-us/means
    -varfn en-us/variances
    -outmllrfn mllr_matrix
    -accumdir .

    map_adapt -meanfn en-us/means
    -varfn en-us/variances
    -mixwfn en-us/mixture_weights
    -tmatfn en-us/transition_matrices
    -accumdir .
    -mapmeanfn en-us-adapt/means
    -mapvarfn en-us-adapt/variances
    -mapmixwfn en-us-adapt/mixture_weights
    -maptmatfn en-us-adapt/transition_matrices

    But unfortunately at step 4 the map_adapt.exe crashes (I'm using the windows version) and the model will not be updated. Windows doesn't give me more details but the process stops at
    "INFO: main.c<77>: Estimating tau hyperparameter from variances and observations"
    Any ideas? :-)

     
    • Nickolay V. Shmyrev

      Proper adaptation with different types of models is a complex thing, proper parameters depend on the amount of data present. You also need to choose the appropriate adaptation method depending on amount of data. MLLR adaptation works with 30 seconds of data, MAP adaptation of continuous model requires 1 hour of adaptation data and MAP adaptation of PTM requires about 10 minutes of data. There is also smoothing parameter tau of map_adapt which you can adjust for semi-cont and PTM models.

      If you have less than 5 minutes of adaptation data I recommend you to stick with MLLR adaptation for PTM and cont models.

       
      • Nickolay V. Shmyrev

        And, instead of all sphinxtrain binaries you can just use sphinx4 MLLR speaker adaptation as demonstrated in Transcriber demo. It might be broken a bit but its certainly easier for the users to use pure Java implementation.

         
  • Florian

    Florian - 2015-01-23

    Thanks for the info! I got the MLLR adaption working now so you can make an unsupervised adaption inside ILA or use a supervised adaption done with the tools described in the first post (it is supervised isn't it?). You just have to copy the mllr_matrix to your acoustic model and it will automatically load on startup. I had to fix the Transformation Class a bit to get it working for some reason the 'input.nextFloat()' didn't work and I had to change it to 'Float.parseFloat(input.next())'.
    Together with my new microphone (^^) and the MLLR adaption the results are really great now for the grammar based version and even without grammar I finally get some promising results :-D

    I have around 10 minutes of training data for the MAP adaption. According to your suggestions that means I could try it with the PTM model, but as I'm not sure about the parameters yet (and the map_adapt.exe keeps crashing) maybe I'll skip this to a later point. But do you have some suggestions for the parameters maybe? ^^

    And a nother problem that came up was that it seems the LiveSpeechRecognizer does not work with MLLR speaker adaption is that true? When I first record my data and then use the StreamSpeechRecognizer it works like charm, but the LSR spits out Errors (I think it was nullpointer and something else). Maybe the problem is my LiveSpeechRecognizer because it's working fine with grammar restrictions but completely fails without grammar not beeing able of transcribing more that 3-4 words in a row. Have you ever encountered this problem? First I thought it was the buffer of the microphone but it looks fine and right now I have no idea.

    thank you for your help,
    Florian

     
  • Florian

    Florian - 2015-01-30

    a quick note:

    I think the MLLR unsupervised speaker adaptation hase some issues when saving the matrix for the PTM model. I implemented the adaptation in ILA v3.2 and it works fine for the non-PTM models but when I try to load a previously saved MLLR_matrix for the PTM model it crashes because I think it tries to read a matrix with wrong dimensions.

     

Log in to post a comment.

MongoDB Logo MongoDB