Re: [Kaldi-developers] Kaldi nnet1 LSTM tunning update on RM recipe

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Good news!
I'm bcc'ing kaldi-developers on just this one message.
My aim here is for a trickle of messages to get to kaldi-developers enough
to give people on the list a sense of the kinds of things that are
happening in Kaldi, without overwhelming them.
Note to people on kaldi-developers: most of the list traffic right now is
on the help forum, https://sourceforge.net/p/kaldi/discussion/1355348/, and
you can click the envelope icon to subscribe if you are logged into
sourceforge; but note, much of traffic is user questions that are varying
degrees of clueless, with maybe 5-15 messages per day, so make your choice.

Dan

On Sat, Feb 28, 2015 at 1:43 PM, Jerry.Jiayu.DU <jer...@qq...> wrote:

> Hi Karel,
>
> For the past two nights I have been tunning the LSTM on RM recipe, and now
> I'm able to get a wer of 2.04% , with LSTM that is 4 times smaller than
> baseline DNN(LSTM-1.8M param vs DNN-7.2M param).
> ---
> %WER 2.04 [ 256 / 12533, 18 ins, 60 del, 178 sub ]
> exp/lstm4f_c512_r200_c512_r200_lr0.0001_mmt0.9_clip50/decode/wer_4_0.5‍
> ---
> If I remember right, you mentioned the fbank DNN baseline is about 2%
> something, if that's right, the LSTM result I got is now a reasonable and
> competitive one.
>
> I will submit a diff patch to current lstm recipe( local/nnet/run_lstm.sh
> ) a couple of days later, coz I currently have an urgent business at hand,
> just wait for my patch
>
> Modifications that I made right now:
> 1). momentum: 0.7 -> 0.9
> 2). use a deep nnet.proto to init the network, and 50 ClipGradient is used:
> --------
> <NnetProto>
> <LstmProjectedStreams> <InputDim> 43 <OutputDim> 200 <CellDim> 512
> <ParamScale> 0.010000 <ClipGradient> 50.000000
> <LstmProjectedStreams> <InputDim> 200 <OutputDim> 200 <CellDim> 512
> <ParamScale> 0.010000 <ClipGradient> 50.000000
> <AffineTransform> <InputDim> 200 <OutputDim> 1479 <BiasMean> 0.0
> <BiasRange> 0.0 <ParamStddev> 0.040000
> <Softmax> <InputDim> 1479 <OutputDim> 1479
> </NnetProto>
> --------
> Although it's a 2-layer LSTM, it is still far smaller than baseline DNN.‍
>
> 3) it's better to re-shuffle the training data at the beginning of *each
> epoch*:
>     in steps/nnet/train.sh, original training data:
>         feats_tr="ark:copy-feats scp:$dir/train.scp ark:- |"‍
>     ---->
>         feats_tr="ark:shuf $dir/train.scp | copy-feats scp:- ark:- |"‍
>
>     This is due to the way multi-stream feature buffer is filled in
> nnet-train-lstm.cc. without utterance re-shuffle at the beginning of each
> epoch, some frames will always be filled at the beginning of batch(bptt20)
> across epochs, and their error are bound to be truncated along the whole
> training process.
>
> 4). halving factor 0.8 -> 0.5, this point is irrelevant to wer
> improvement, because best wer always occur before having, but this
> modification reduces the total epochs, so we don't need to wait too long)
>
> please wait for my patch, I might have further modifications.
>
> best,
> Jerry
>
>
>