From: Xingyu Na <asr...@gm...> - 2015-06-12 10:30:53
|
No, the user didn't kill the script. And the terminal is alive. It happens rather randomly, but only when the job is submitted to a certain node, called "g05". The log hangs at ======================================= # Running on g05 # Started at Fri Jun 12 17:02:47 CST 2015 # nnet-shuffle-egs --buffer-size=5000 --srand=2094 ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- | nnet-train-simple --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl ark:- exp/nnet4d/2095.12.mdl nnet-train-simple --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl ark:- exp/nnet4d_gpu/2095.12.mdl nnet-shuffle-egs --buffer-size=5000 --srand=2094 ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- ======================================= It seems that nnet-shuffle-egs and nnet-train-simple do not cooperate on this specific job. Weird..... Best, X. On 06/12/2015 01:05 PM, Daniel Povey wrote: > Possibly it is in zombie status because something interrupted or > killed the run.pl process that had launched that process. E.g. a user > did ctrl-z to the to-level script, maybe. > > Dan > > > On Thu, Jun 11, 2015 at 11:11 PM, Xingyu Na <asr...@gm...> wrote: >> Hi, >> >> A user report this when he was using the train_pnorm_fast script. Top >> gave this: >> 60442 zhangpe+ 20 0 1193408 13112 10704 S 0.0 0.0 0:02.30 >> nnet-shuffle-eg >> 60443 zhangpe+ 20 0 0 0 0 Z 0.0 0.0 0:02.19 >> nnet-train-simp >> >> It remains in zombie status forever.... >> Any idea how this goes wrong? >> >> Best, >> Xingyu >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users |