Re: [Kaldi-users] LSTM as feature transform for RBM training

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi,
the outputs of LSTM block, are more like Gaussian random variables. So 
it makes more sense to use Gaussain visible units in RBM.
However in case of RBM training we assume that the individual input 
features are normalized to have zero mean and unit variance,
which is not guaranteed in the LSTM outptut.

You can try running 'steps/nnet/pretrain_dbn.sh' with '--input-vis-type 
gauss' and see what happens,
despite the wrong assumption, it may still train reasonably.

Best regards,
Karel.

On 06/06/2015 07:51 AM, Xingyu Na wrote:
> Hi,
>
> I trained 2 layers of LSTM, with 2 hidden layers on top of that.
> The decoding performance on eval92 is reasonable.
> Now I want to do RBM pre-training.
> The straightforward way is to remove the hidden layers, and use the LSTM
> layers as feature transform, just as the way in Karel's cnn pre-train
> recipe.
> However, no matter how small the learn rate is, the first RBM seems not
> converging, log is pasted below:
> ================================================
> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by
> srand with : 777
> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM
> TRAINING STARTED
> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141)
> Iteration 1/2
> LOG
> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303)
> Running nnet-forward with per-utterance LSTM-state reset
> LOG
> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303)
> Running nnet-forward with per-utterance LSTM-state reset
> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
> Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h
> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
> ProgressLoss[last 1h of 1h]: 218.955 (Mse)
> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
> Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h
> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
> ProgressLoss[last 1h of 2h]: 222.583 (Mse)
> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h
> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
> ProgressLoss[last 1h of 3h]: 220.827 (Mse)
> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
> ProgressLoss[last 1h of 4h]: 221.531 (Mse)
> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
> Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h
> .......
> ================================================
>
> Mse does not decrease.
> However, after 1.rbm is trained, and concatenated with LSTM, (now the
> transform become LSTM+RBM), the training of 2.rbm seems converging....
> ================================================
> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by
> srand with : 777
> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM
> TRAINING STARTED
> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141)
> Iteration 1/2
> LOG
> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303)
> Running nnet-forward with per-utterance LSTM-state reset
> LOG
> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303)
> Running nnet-forward with per-utterance LSTM-state reset
> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
> Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h
> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
> ProgressLoss[last 1h of 1h]: 56.9416 (Mse)
> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
> Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h
> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
> ProgressLoss[last 1h of 2h]: 39.1901 (Mse)
> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h
> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
> ProgressLoss[last 1h of 3h]: 34.2891 (Mse)
> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
> ProgressLoss[last 1h of 4h]: 30.5311 (Mse)
> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
> Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h
> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
> ProgressLoss[last 1h of 5h]: 29.2614 (Mse)
> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
> Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h
> .......
> ===============================================
>
> I am quite confused about this. I believe further fine tuning of the
> weights based on these RBMs does not make sense.
> What am I missing?
>
> Best,
> Xingyu
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users