From: Gupta V. <vis...@cr...> - 2015-07-03 18:10:44
|
Hi, I was finally able to discriminatively train the LSTM, but only after reducing the learning rate from 0.0001 to 0.000000001. The training is rather slow. It took 8 days to train one iteration. I have not tried to adjust gradient clipping threshold, but I will try it also. Thanks, Vishwa _____ From: Jerry.Jiayu.DU [mailto:jer...@qq...] To: Vishwa.Gupta [mailto:Vis...@cr...] Cc: kal...@li... [mailto:kal...@li...], Daniel Povey [mailto:dp...@gm...] Sent: Wed, 10 Jun 2015 23:57:51 -0500 Subject: Re: [Kaldi-users] discriminative LSTM training Hi Vishwa, "NaN" normally means your LSTM model is exploded during the training, Dan's suggestion to tune down the learning rate should be helpful in your case. Since I've encountered exactly the same problem as you, when I was doing the sequential training over LSTM, here is an additional suggestion: Apply smaller gradient clipping threshold, it worked for me. I suggest you have a try as well, setting the gradient clipping threshold to 5 to 20 or so. also remember to check the denominator lattice size if it is reasonable. Sometimes default beam results very "sparse" denominator lattice (like linear), and in this case the sequential training won't work. best, Jiayu(Jerry) ------------------ Original ------------------ From: "Daniel Povey";<dp...@gm...>; Date: Jun 11, 2015 To: "Vishwa.Gupta"<Vis...@cr...>; Cc: "kal...@li..."<kal...@li...>; Subject: Re: [Kaldi-users] discriminative LSTM training Usually cases like this where after a while you see NaN's, are due to some kind of instability in the training, which causes the parameters to diverge. It could be due to too-high learning rates. It could also be that if you apply LSTMs on long pieces of audio, as happens in the discriminative training code, there is some kind of gradient explosion. However, IIRC LSTMs were specifically designed to avoid the possibility of gradient explosion, so this would be surprising. You could try smaller learning rates. Dan > When I try to do discriminative LSTM training I get the following error: > > If I use train_mpe.sh, it runs for a few thousand utterances and then I get > the following error: > > ERROR > (nnet-train-mpe-sequential:LatticeForwardBackwardMpeVariants():lattice-functions.cc:833) > Total forward score over lattice = -nan, while total backward score = 0 > and then the program crashes. > > If I use train_mmi.sh then after few thousand utterances I get logs with > "nan": > > VLOG[1] (nnet-train-mmi-sequential:main():nnet-train-mmi-sequential.cc:346) > Utterance 20080401_170000_bbcone_bbc_news_spk-0025_seg-0150897:0151494: > Average MMI obj. value = nan over 595 frames. (Avg. den-posterior on ali > -nan) > > However, the program keeps on running. > Is there a workaround for that? > > Thanks, > > Vishwa > > ------------------------------------------------------------------------------ > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > ------------------------------------------------------------------------------ _______________________________________________ Kaldi-users mailing list Kal...@li... https://lists.sourceforge.net/lists/listinfo/kaldi-users |