Re: [Kaldi-users] LSTM as feature transform for RBM training

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Or other option would be to add Sigmoid component after the LSTM and 
pre-train the RBM with Bernoulli visible units.
K.

On 06/10/2015 02:38 PM, Vesely Karel wrote:
> Hi,
> the outputs of LSTM block, are more like Gaussian random variables. So 
> it makes more sense to use Gaussain visible units in RBM.
> However in case of RBM training we assume that the individual input 
> features are normalized to have zero mean and unit variance,
> which is not guaranteed in the LSTM outptut.
>
> You can try running 'steps/nnet/pretrain_dbn.sh' with 
> '--input-vis-type gauss' and see what happens,
> despite the wrong assumption, it may still train reasonably.
>
> Best regards,
> Karel.
>
>
> On 06/06/2015 07:51 AM, Xingyu Na wrote:
>> Hi,
>>
>> I trained 2 layers of LSTM, with 2 hidden layers on top of that.
>> The decoding performance on eval92 is reasonable.
>> Now I want to do RBM pre-training.
>> The straightforward way is to remove the hidden layers, and use the LSTM
>> layers as feature transform, just as the way in Karel's cnn pre-train
>> recipe.
>> However, no matter how small the learn rate is, the first RBM seems not
>> converging, log is pasted below:
>> ================================================
>> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by
>> srand with : 777
>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM
>> TRAINING STARTED
>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141)
>> Iteration 1/2
>> LOG
>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) 
>>
>> Running nnet-forward with per-utterance LSTM-state reset
>> LOG
>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) 
>>
>> Running nnet-forward with per-utterance LSTM-state reset
>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
>> Setting momentum 0.9 and learning rate 2.5e-06 after processing 
>> 0.000277778h
>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
>> ProgressLoss[last 1h of 1h]: 218.955 (Mse)
>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
>> Setting momentum 0.9 and learning rate 2.45e-06 after processing 
>> 1.38889h
>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
>> ProgressLoss[last 1h of 2h]: 222.583 (Mse)
>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
>> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h
>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
>> ProgressLoss[last 1h of 3h]: 220.827 (Mse)
>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
>> ProgressLoss[last 1h of 4h]: 221.531 (Mse)
>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
>> Setting momentum 0.9 and learning rate 2.35e-06 after processing 
>> 4.16667h
>> .......
>> ================================================
>>
>> Mse does not decrease.
>> However, after 1.rbm is trained, and concatenated with LSTM, (now the
>> transform become LSTM+RBM), the training of 2.rbm seems converging....
>> ================================================
>> LOG (rbm-train-cd1-frmshuff:Init():nnet-randomizer.cc:31) Seeding by
>> srand with : 777
>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:138) RBM
>> TRAINING STARTED
>> LOG (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:141)
>> Iteration 1/2
>> LOG
>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) 
>>
>> Running nnet-forward with per-utterance LSTM-state reset
>> LOG
>> (rbm-train-cd1-frmshuff:PropagateFnc():nnet/nnet-lstm-projected-streams.h:303) 
>>
>> Running nnet-forward with per-utterance LSTM-state reset
>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
>> Setting momentum 0.9 and learning rate 2.5e-06 after processing 
>> 0.000277778h
>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
>> ProgressLoss[last 1h of 1h]: 56.9416 (Mse)
>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
>> Setting momentum 0.9 and learning rate 2.45e-06 after processing 
>> 1.38889h
>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
>> ProgressLoss[last 1h of 2h]: 39.1901 (Mse)
>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
>> Setting momentum 0.9 and learning rate 2.4e-06 after processing 2.77778h
>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
>> ProgressLoss[last 1h of 3h]: 34.2891 (Mse)
>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
>> ProgressLoss[last 1h of 4h]: 30.5311 (Mse)
>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
>> Setting momentum 0.9 and learning rate 2.35e-06 after processing 
>> 4.16667h
>> VLOG[1] (rbm-train-cd1-frmshuff:Eval():nnet-loss.cc:213)
>> ProgressLoss[last 1h of 5h]: 29.2614 (Mse)
>> VLOG[1] (rbm-train-cd1-frmshuff:main():rbm-train-cd1-frmshuff.cc:235)
>> Setting momentum 0.9 and learning rate 2.3e-06 after processing 5.55556h
>> .......
>> ===============================================
>>
>> I am quite confused about this. I believe further fine tuning of the
>> weights based on these RBMs does not make sense.
>> What am I missing?
>>
>> Best,
>> Xingyu
>>
>> ------------------------------------------------------------------------------ 
>>
>> _______________________________________________
>> Kaldi-users mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>