From: Vesely K. <ve...@gm...> - 2015-06-10 13:40:30
|
Or other option would be to add Sigmoid component after the LSTM and pre-train the RBM with Bernoulli visible units. K. On 06/10/2015 02:38 PM, Vesely Karel wrote: > Hi, > the outputs of LSTM block, are more like Gaussian random variables. So > it makes more sense to use Gaussain visible units in RBM. > However in case of RBM training we assume that the individual input > features are normalized to have zero mean and unit variance, > which is not guaranteed in the LSTM outptut. > > You can try running 'steps/nnet/pretrain_dbn.sh' with > '--input-vis-type gauss' and see what happens, > despite the wrong assumption, it may still train reasonably. > > Best regards, > Karel. > > > On 06/06/2015 07:51 AM, Xingyu Na wrote: >> Hi, >> >> I trained 2 layers of LSTM, with 2 hidden layers on top of that. >> The decoding performance on eval92 is reasonable. >> Now I want to do RBM pre-training. >> The straightforward way is to remove the hidden layers, and use the LSTM >> layers as feature transform, just as the way in Karel's cnn pre-train >> recipe. >> However, no matter how small the learn rate is, the first RBM seems not >> converging, log is pasted below: >> ================================================ >> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >> srand with : 777 >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >> TRAINING STARTED >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >> Iteration 1/2 >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.5e-06 after processing >> 0.000277778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 1h]: 218.955 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.45e-06 after processing >> 1.38889h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 2h]: 222.583 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 3h]: 220.827 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 4h]: 221.531 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.35e-06 after processing >> 4.16667h >> ....... >> ================================================ >> >> Mse does not decrease. >> However, after 1.rbm is trained, and concatenated with LSTM, (now the >> transform become LSTM+RBM), the training of 2.rbm seems converging.... >> ================================================ >> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >> srand with : 777 >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >> TRAINING STARTED >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >> Iteration 1/2 >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.5e-06 after processing >> 0.000277778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 1h]: 56.9416 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.45e-06 after processing >> 1.38889h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 2h]: 39.1901 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 3h]: 34.2891 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 4h]: 30.5311 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.35e-06 after processing >> 4.16667h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 5h]: 29.2614 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h >> ....... >> =============================================== >> >> I am quite confused about this. I believe further fine tuning of the >> weights based on these RBMs does not make sense. >> What am I missing? >> >> Best, >> Xingyu >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > |