Re: [Kaldi-users] discriminative LSTM training

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

I was finally able to discriminatively train the LSTM, but only after reducing the learning rate from 0.0001 to 0.000000001. The training is rather slow. It took 8 days to train one iteration.

I have not tried to adjust gradient clipping threshold, but I will try it also.

Thanks,

Vishwa
  _____  

From: Jerry.Jiayu.DU [mailto:jer...@qq...]
To: Vishwa.Gupta [mailto:Vis...@cr...]
Cc: kal...@li... [mailto:kal...@li...], Daniel Povey [mailto:dp...@gm...]
Sent: Wed, 10 Jun 2015 23:57:51 -0500
Subject: Re: [Kaldi-users] discriminative LSTM training

Hi Vishwa,

"NaN" normally means your LSTM model is exploded during the training, Dan's suggestion to tune down the learning rate should be helpful in your case. 

Since I've encountered exactly the same problem as you, when I was doing the sequential training over LSTM, here is an additional suggestion:
Apply smaller gradient clipping threshold, it worked for me. I suggest you have a try as well, setting the gradient clipping threshold to 5 to 20 or so.

also remember to check the denominator lattice size if it is reasonable.  Sometimes default beam results very "sparse" denominator lattice (like linear), and in this case the sequential training won't work.

best,
Jiayu(Jerry)

------------------ Original ------------------

From:  "Daniel Povey";<dp...@gm...>;
Date:  Jun 11, 2015
To:  "Vishwa.Gupta"<Vis...@cr...>; 
Cc:  "kal...@li..."<kal...@li...>; 
Subject:  Re: [Kaldi-users] discriminative LSTM training

Usually cases like this where after a while you see NaN's, are due to
some kind of instability in the training, which causes the parameters
to diverge.  It could be due to too-high learning rates.  It could
also be that if you apply LSTMs on long pieces of audio, as happens in
the discriminative training code, there is some kind of gradient
explosion.  However, IIRC LSTMs were specifically designed to avoid
the possibility of gradient explosion, so this would be surprising.
You could try smaller learning rates.
Dan

> When I try to do discriminative LSTM training I get the following error:
>
> If I use train_mpe.sh, it runs for a few thousand utterances and then I get
> the following error:
>
> ERROR
> (nnet-train-mpe-sequential:LatticeForwardBackwardMpeVariants():lattice-functions.cc:833)
> Total forward score over lattice = -nan, while total backward score = 0
> and then the program crashes.
>
> If I use train_mmi.sh then after few thousand utterances I get logs with
> "nan":
>
>  VLOG[1] (nnet-train-mmi-sequential:main():nnet-train-mmi-sequential.cc:346)
> Utterance 20080401_170000_bbcone_bbc_news_spk-0025_seg-0150897:0151494:
> Average MMI obj. value = nan over 595 frames. (Avg. den-posterior on ali
> -nan)
>
> However, the program keeps on running.
> Is there a workaround for that?
>
> Thanks,
>
> Vishwa
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>

------------------------------------------------------------------------------
_______________________________________________
Kaldi-users mailing list
Kal...@li...
https://lists.sourceforge.net/lists/listinfo/kaldi-users