From: Xingyu Na <asr...@gm...> - 2015-06-06 05:51:30
|
Hi, I trained 2 layers of LSTM, with 2 hidden layers on top of that. The decoding performance on eval92 is reasonable. Now I want to do RBM pre-training. The straightforward way is to remove the hidden layers, and use the LSTM layers as feature transform, just as the way in Karel's cnn pre-train recipe. However, no matter how small the learn rate is, the first RBM seems not converging, log is pasted below: ================================================ LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by srand with : 777 LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM TRAINING STARTED LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) Iteration 1/2 LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 1h]: 218.955 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 2h]: 222.583 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 3h]: 220.827 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 4h]: 221.531 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h ....... ================================================ Mse does not decrease. However, after 1.rbm is trained, and concatenated with LSTM, (now the transform become LSTM+RBM), the training of 2.rbm seems converging.... ================================================ LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by srand with : 777 LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM TRAINING STARTED LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) Iteration 1/2 LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 1h]: 56.9416 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 2h]: 39.1901 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 3h]: 34.2891 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 4h]: 30.5311 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 5h]: 29.2614 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h ....... =============================================== I am quite confused about this. I believe further fine tuning of the weights based on these RBMs does not make sense. What am I missing? Best, Xingyu |