From: Xingyu Na <asr...@gm...> - 2015-06-06 05:51:30
|
Hi, I trained 2 layers of LSTM, with 2 hidden layers on top of that. The decoding performance on eval92 is reasonable. Now I want to do RBM pre-training. The straightforward way is to remove the hidden layers, and use the LSTM layers as feature transform, just as the way in Karel's cnn pre-train recipe. However, no matter how small the learn rate is, the first RBM seems not converging, log is pasted below: ================================================ LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by srand with : 777 LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM TRAINING STARTED LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) Iteration 1/2 LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 1h]: 218.955 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 2h]: 222.583 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 3h]: 220.827 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 4h]: 221.531 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h ....... ================================================ Mse does not decrease. However, after 1.rbm is trained, and concatenated with LSTM, (now the transform become LSTM+RBM), the training of 2.rbm seems converging.... ================================================ LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by srand with : 777 LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM TRAINING STARTED LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) Iteration 1/2 LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset LOG (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) Running nnet-forward with per-utterance LSTM-state reset VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 1h]: 56.9416 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 2h]: 39.1901 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 3h]: 34.2891 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 4h]: 30.5311 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) ProgressLoss[last 1h of 5h]: 29.2614 (Mse) VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h ....... =============================================== I am quite confused about this. I believe further fine tuning of the weights based on these RBMs does not make sense. What am I missing? Best, Xingyu |
From: Vesely K. <ve...@gm...> - 2015-06-10 12:38:57
|
Hi, the outputs of LSTM block, are more like Gaussian random variables. So it makes more sense to use Gaussain visible units in RBM. However in case of RBM training we assume that the individual input features are normalized to have zero mean and unit variance, which is not guaranteed in the LSTM outptut. You can try running 'steps/nnet/pretrain_dbn.sh' with '--input-vis-type gauss' and see what happens, despite the wrong assumption, it may still train reasonably. Best regards, Karel. On 06/06/2015 07:51 AM, Xingyu Na wrote: > Hi, > > I trained 2 layers of LSTM, with 2 hidden layers on top of that. > The decoding performance on eval92 is reasonable. > Now I want to do RBM pre-training. > The straightforward way is to remove the hidden layers, and use the LSTM > layers as feature transform, just as the way in Karel's cnn pre-train > recipe. > However, no matter how small the learn rate is, the first RBM seems not > converging, log is pasted below: > ================================================ > LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by > srand with : 777 > LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM > TRAINING STARTED > LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) > Iteration 1/2 > LOG > (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) > Running nnet-forward with per-utterance LSTM-state reset > LOG > (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) > Running nnet-forward with per-utterance LSTM-state reset > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 1h]: 218.955 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 2h]: 222.583 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 3h]: 220.827 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 4h]: 221.531 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h > ....... > ================================================ > > Mse does not decrease. > However, after 1.rbm is trained, and concatenated with LSTM, (now the > transform become LSTM+RBM), the training of 2.rbm seems converging.... > ================================================ > LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by > srand with : 777 > LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM > TRAINING STARTED > LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) > Iteration 1/2 > LOG > (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) > Running nnet-forward with per-utterance LSTM-state reset > LOG > (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) > Running nnet-forward with per-utterance LSTM-state reset > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.5e-06 after processing 0.000277778h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 1h]: 56.9416 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.45e-06 after processing 1.38889h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 2h]: 39.1901 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 3h]: 34.2891 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 4h]: 30.5311 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.35e-06 after processing 4.16667h > VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) > ProgressLoss[last 1h of 5h]: 29.2614 (Mse) > VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) > Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h > ....... > =============================================== > > I am quite confused about this. I believe further fine tuning of the > weights based on these RBMs does not make sense. > What am I missing? > > Best, > Xingyu > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Vesely K. <ve...@gm...> - 2015-06-10 13:40:30
|
Or other option would be to add Sigmoid component after the LSTM and pre-train the RBM with Bernoulli visible units. K. On 06/10/2015 02:38 PM, Vesely Karel wrote: > Hi, > the outputs of LSTM block, are more like Gaussian random variables. So > it makes more sense to use Gaussain visible units in RBM. > However in case of RBM training we assume that the individual input > features are normalized to have zero mean and unit variance, > which is not guaranteed in the LSTM outptut. > > You can try running 'steps/nnet/pretrain_dbn.sh' with > '--input-vis-type gauss' and see what happens, > despite the wrong assumption, it may still train reasonably. > > Best regards, > Karel. > > > On 06/06/2015 07:51 AM, Xingyu Na wrote: >> Hi, >> >> I trained 2 layers of LSTM, with 2 hidden layers on top of that. >> The decoding performance on eval92 is reasonable. >> Now I want to do RBM pre-training. >> The straightforward way is to remove the hidden layers, and use the LSTM >> layers as feature transform, just as the way in Karel's cnn pre-train >> recipe. >> However, no matter how small the learn rate is, the first RBM seems not >> converging, log is pasted below: >> ================================================ >> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >> srand with : 777 >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >> TRAINING STARTED >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >> Iteration 1/2 >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.5e-06 after processing >> 0.000277778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 1h]: 218.955 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.45e-06 after processing >> 1.38889h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 2h]: 222.583 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 3h]: 220.827 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 4h]: 221.531 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.35e-06 after processing >> 4.16667h >> ....... >> ================================================ >> >> Mse does not decrease. >> However, after 1.rbm is trained, and concatenated with LSTM, (now the >> transform become LSTM+RBM), the training of 2.rbm seems converging.... >> ================================================ >> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >> srand with : 777 >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >> TRAINING STARTED >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >> Iteration 1/2 >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.5e-06 after processing >> 0.000277778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 1h]: 56.9416 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.45e-06 after processing >> 1.38889h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 2h]: 39.1901 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 3h]: 34.2891 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 4h]: 30.5311 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.35e-06 after processing >> 4.16667h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 5h]: 29.2614 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h >> ....... >> =============================================== >> >> I am quite confused about this. I believe further fine tuning of the >> weights based on these RBMs does not make sense. >> What am I missing? >> >> Best, >> Xingyu >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Xingyu Na <asr...@gm...> - 2015-06-11 03:36:11
|
Hi Karel, Thank you so much. The pre-training goes reasonably now. BTW, I might be asking silly questions but, why "the outputs of LSTM block, are more like Gaussian random variables"? Where could I find such analysis? Does it affect the convergence when I trained the stack of 2 LSTM and 2 DNN(not RBM)? Should I add a sigmoid for that training as well? Thank you again and best regards, Xingyu On 06/10/2015 08:38 PM, Vesely Karel wrote: > Hi, > the outputs of LSTM block, are more like Gaussian random variables. So > it makes more sense to use Gaussain visible units in RBM. > However in case of RBM training we assume that the individual input > features are normalized to have zero mean and unit variance, > which is not guaranteed in the LSTM outptut. > > You can try running 'steps/nnet/pretrain_dbn.sh' with > '--input-vis-type gauss' and see what happens, > despite the wrong assumption, it may still train reasonably. > > Best regards, > Karel. > > > On 06/06/2015 07:51 AM, Xingyu Na wrote: >> Hi, >> >> I trained 2 layers of LSTM, with 2 hidden layers on top of that. >> The decoding performance on eval92 is reasonable. >> Now I want to do RBM pre-training. >> The straightforward way is to remove the hidden layers, and use the LSTM >> layers as feature transform, just as the way in Karel's cnn pre-train >> recipe. >> However, no matter how small the learn rate is, the first RBM seems not >> converging, log is pasted below: >> ================================================ >> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >> srand with : 777 >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >> TRAINING STARTED >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >> Iteration 1/2 >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.5e-06 after processing >> 0.000277778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 1h]: 218.955 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.45e-06 after processing >> 1.38889h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 2h]: 222.583 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 3h]: 220.827 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 4h]: 221.531 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.35e-06 after processing >> 4.16667h >> ....... >> ================================================ >> >> Mse does not decrease. >> However, after 1.rbm is trained, and concatenated with LSTM, (now the >> transform become LSTM+RBM), the training of 2.rbm seems converging.... >> ================================================ >> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >> srand with : 777 >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >> TRAINING STARTED >> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >> Iteration 1/2 >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> LOG >> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >> >> Running nnet-forward with per-utterance LSTM-state reset >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.5e-06 after processing >> 0.000277778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 1h]: 56.9416 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.45e-06 after processing >> 1.38889h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 2h]: 39.1901 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 3h]: 34.2891 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 4h]: 30.5311 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.35e-06 after processing >> 4.16667h >> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >> ProgressLoss[last 1h of 5h]: 29.2614 (Mse) >> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >> Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h >> ....... >> =============================================== >> >> I am quite confused about this. I believe further fine tuning of the >> weights based on these RBMs does not make sense. >> What am I missing? >> >> Best, >> Xingyu >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Vesely K. <ve...@gm...> - 2015-06-11 13:33:30
|
If you look into the LSTM code, you'd see that the last operation on the output is multiplying by a linear transform, while there is no activation function used. K. On 06/11/2015 05:35 AM, Xingyu Na wrote: > Hi Karel, > > Thank you so much. The pre-training goes reasonably now. > BTW, I might be asking silly questions but, why "the outputs of LSTM > block, are more like Gaussian random variables"? Where could I find > such analysis? > Does it affect the convergence when I trained the stack of 2 LSTM and > 2 DNN(not RBM)? Should I add a sigmoid for that training as well? > > Thank you again and best regards, > Xingyu > > On 06/10/2015 08:38 PM, Vesely Karel wrote: >> Hi, >> the outputs of LSTM block, are more like Gaussian random variables. >> So it makes more sense to use Gaussain visible units in RBM. >> However in case of RBM training we assume that the individual input >> features are normalized to have zero mean and unit variance, >> which is not guaranteed in the LSTM outptut. >> >> You can try running 'steps/nnet/pretrain_dbn.sh' with >> '--input-vis-type gauss' and see what happens, >> despite the wrong assumption, it may still train reasonably. >> >> Best regards, >> Karel. >> >> >> On 06/06/2015 07:51 AM, Xingyu Na wrote: >>> Hi, >>> >>> I trained 2 layers of LSTM, with 2 hidden layers on top of that. >>> The decoding performance on eval92 is reasonable. >>> Now I want to do RBM pre-training. >>> The straightforward way is to remove the hidden layers, and use the >>> LSTM >>> layers as feature transform, just as the way in Karel's cnn pre-train >>> recipe. >>> However, no matter how small the learn rate is, the first RBM seems not >>> converging, log is pasted below: >>> ================================================ >>> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >>> srand with : 777 >>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >>> TRAINING STARTED >>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >>> Iteration 1/2 >>> LOG >>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >>> >>> Running nnet-forward with per-utterance LSTM-state reset >>> LOG >>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >>> >>> Running nnet-forward with per-utterance LSTM-state reset >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.5e-06 after processing >>> 0.000277778h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 1h]: 218.955 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.45e-06 after processing >>> 1.38889h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 2h]: 222.583 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.4e-06 after processing >>> 2.77778h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 3h]: 220.827 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 4h]: 221.531 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.35e-06 after processing >>> 4.16667h >>> ....... >>> ================================================ >>> >>> Mse does not decrease. >>> However, after 1.rbm is trained, and concatenated with LSTM, (now the >>> transform become LSTM+RBM), the training of 2.rbm seems converging.... >>> ================================================ >>> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by >>> srand with : 777 >>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM >>> TRAINING STARTED >>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141) >>> Iteration 1/2 >>> LOG >>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >>> >>> Running nnet-forward with per-utterance LSTM-state reset >>> LOG >>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) >>> >>> Running nnet-forward with per-utterance LSTM-state reset >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.5e-06 after processing >>> 0.000277778h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 1h]: 56.9416 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.45e-06 after processing >>> 1.38889h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 2h]: 39.1901 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.4e-06 after processing >>> 2.77778h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 3h]: 34.2891 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 4h]: 30.5311 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.35e-06 after processing >>> 4.16667h >>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213) >>> ProgressLoss[last 1h of 5h]: 29.2614 (Mse) >>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235) >>> Setting momentum 0.9 and learning rate 2.3e-06 after processing >>> 5.55556h >>> ....... >>> =============================================== >>> >>> I am quite confused about this. I believe further fine tuning of the >>> weights based on these RBMs does not make sense. >>> What am I missing? >>> >>> Best, >>> Xingyu >>> >>> ------------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > |