Re: [Kaldi-users] cuda dnn

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,

If someone gets the same errors..
I had identical utterance ids in train and cv sets, while sound files
were in different directories. That caused confusion in some feature
preparation steps where the sets had been processed together,

Regards,
Valentin

On Sat, Sep 21, 2013 at 12:37 AM, Valentin Mendelev <vm...@gm...> wrote:
> Hi.
>
> That really was a partial message. I pressed shift+enter occasionally
> and then realized how to solve my problem.
> But another one have emerged.
>
> I’m trying to train a dnn  on my own small base (1 speaker, about
> 10hrs  splitted on  7 words long utterances less than 10s duration
> each) using egs/swbd/s5b/local/run_dnn.sh with appropriate alterations
> (no feature-transform, paths).
>
> I run this
>
> $cuda_cmd $dir/_pretrain_dbn.log \
>  steps/pretrain_dbn.sh --hid_dim 2048 --train_utts 15000 --cmvn_utts
> 1000 $t $dir || exit 1
> <set proper paths>
>
> and this
>
> $cuda_cmd $dir/_train_nnet.log \
>  steps/train_nnet.sh --dbn $dbn --hid-layers 0 --learn-rate 0.008 \
>  $t $cv $lang $ali $ali_cv $dir || exit 1;
>
> Pre-training is ok now, but MLP training falls.
> In  prerun.log there are a lot of messages like this
>
> WARNING (nnet-train-xent-hardlab-frmshuff:main():nnet-train-xent-hardlab-frmshuf
> f.cc:148) Alignment has wrong length, ali 258 vs. feats 334, utt 101-11
> and finally
> KALDI_ASSERT: at
> nnet-train-xent-hardlab-frmshuff:CloseInternal:util/kaldi-table-inl.h:1546,
> failed: holder_ == NULL
> Stack trace is:
> kaldi::KaldiGetStackTrace()
> kaldi::KaldiAssertFailure_(char const*, char const*, int, char const*)
> kaldi::RandomAccessTableReaderArchiveImplBase<kaldi::BasicVectorHolder<int>
>>::CloseInternal()
>
> .In _train_nnet.;log (last stage) :
>
> # RUNNING THE NN-TRAINING SCHEDULER
> steps/train_nnet_scheduler.sh --feature-transform
> exp/tri3b2_pretrain-dbn73_dnn/tr_splice5-1_cmvn-g.nnet --learn-rate
> 0.008 --seed 777 exp/tri3b2_pretrain-dbn73_dnn/nnet_6.dbn_dnn.init
> ark:copy-feats scp:exp/tri3b2_pretrain-dbn73_dnn/train.scp ark:- |
> ark:copy-feats scp:exp/tri3b2_pretrain-dbn73_dnn/cv.scp ark:- |
> ark:ali-to-pdf exp/tri3b2_ali/final.mdl "ark:gunzip -c
> exp/tri3b2_ali/ali.*.gz exp/tri3b2_ali_cvseg/ali.*.gz |" ark:- |
> exp/tri3b2_pretrain-dbn73_dnn
> steps/train_nnet_scheduler.sh: line 78:  5525 Aborted
> (core dumped) $train_tool --cross-validate=true
> --bunchsize=$bunch_size --cachesize=$cache_size --verbose=$verbose
> ${feature_transform:+ --feature-transform=$feature_transform}
> ${use_gpu_id:+ --use-gpu-id=$use_gpu_id} $mlp_best "$feats_cv"
> "$labels" 2> $dir/log/prerun.log
>
> It’s not a list sort problem because I can train simple triphone
> models on the same alignment and decode the cv set.
>
> I’m using default feature settings, so I suppose it should be plain
> mfcc with 5 frames contexts.
> Could you tell where to look to make this work?
>
> I run ubuntu 12.10 64-bits and my video card is  GTX 580. with 1.5G RAM
>
> Regards,
> Valentin