From: Daniel P. <dp...@gm...> - 2015-02-28 19:42:08
|
Good news! I'm bcc'ing kaldi-developers on just this one message. My aim here is for a trickle of messages to get to kaldi-developers enough to give people on the list a sense of the kinds of things that are happening in Kaldi, without overwhelming them. Note to people on kaldi-developers: most of the list traffic right now is on the help forum, https://sourceforge.net/p/kaldi/discussion/1355348/, and you can click the envelope icon to subscribe if you are logged into sourceforge; but note, much of traffic is user questions that are varying degrees of clueless, with maybe 5-15 messages per day, so make your choice. Dan On Sat, Feb 28, 2015 at 1:43 PM, Jerry.Jiayu.DU <jer...@qq...> wrote: > Hi Karel, > > For the past two nights I have been tunning the LSTM on RM recipe, and now > I'm able to get a wer of 2.04% , with LSTM that is 4 times smaller than > baseline DNN(LSTM-1.8M param vs DNN-7.2M param). > --- > %WER 2.04 [ 256 / 12533, 18 ins, 60 del, 178 sub ] > exp/lstm4f_c512_r200_c512_r200_lr0.0001_mmt0.9_clip50/decode/wer_4_0.5 > --- > If I remember right, you mentioned the fbank DNN baseline is about 2% > something, if that's right, the LSTM result I got is now a reasonable and > competitive one. > > I will submit a diff patch to current lstm recipe( local/nnet/run_lstm.sh > ) a couple of days later, coz I currently have an urgent business at hand, > just wait for my patch > > Modifications that I made right now: > 1). momentum: 0.7 -> 0.9 > 2). use a deep nnet.proto to init the network, and 50 ClipGradient is used: > -------- > <NnetProto> > <LstmProjectedStreams> <InputDim> 43 <OutputDim> 200 <CellDim> 512 > <ParamScale> 0.010000 <ClipGradient> 50.000000 > <LstmProjectedStreams> <InputDim> 200 <OutputDim> 200 <CellDim> 512 > <ParamScale> 0.010000 <ClipGradient> 50.000000 > <AffineTransform> <InputDim> 200 <OutputDim> 1479 <BiasMean> 0.0 > <BiasRange> 0.0 <ParamStddev> 0.040000 > <Softmax> <InputDim> 1479 <OutputDim> 1479 > </NnetProto> > -------- > Although it's a 2-layer LSTM, it is still far smaller than baseline DNN. > > 3) it's better to re-shuffle the training data at the beginning of *each > epoch*: > in steps/nnet/train.sh, original training data: > feats_tr="ark:copy-feats scp:$dir/train.scp ark:- |" > ----> > feats_tr="ark:shuf $dir/train.scp | copy-feats scp:- ark:- |" > > This is due to the way multi-stream feature buffer is filled in > nnet-train-lstm.cc. without utterance re-shuffle at the beginning of each > epoch, some frames will always be filled at the beginning of batch(bptt20) > across epochs, and their error are bound to be truncated along the whole > training process. > > 4). halving factor 0.8 -> 0.5, this point is irrelevant to wer > improvement, because best wer always occur before having, but this > modification reduces the total epochs, so we don't need to wait too long) > > please wait for my patch, I might have further modifications. > > best, > Jerry > > > |