From: Daniel P. <dp...@gm...> - 2015-06-12 17:59:47
|
This is not a Kaldi problem, it's almost certainly a problem either with your GridEngine software or equivalent, or with your machine (e.g. the linux mem-killer might be being invoked). Check the system logs and the GridEngine logs. Dan On Fri, Jun 12, 2015 at 6:30 AM, Xingyu Na <asr...@gm...> wrote: > No, the user didn't kill the script. And the terminal is alive. > It happens rather randomly, but only when the job is submitted to a certain > node, called "g05". > The log hangs at > ======================================= > # Running on g05 > # Started at Fri Jun 12 17:02:47 CST 2015 > # nnet-shuffle-egs --buffer-size=5000 --srand=2094 > ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- | nnet-train-simple > --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl ark:- > exp/nnet4d/2095.12.mdl > nnet-train-simple --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl > ark:- exp/nnet4d_gpu/2095.12.mdl > nnet-shuffle-egs --buffer-size=5000 --srand=2094 > ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- > ======================================= > > It seems that nnet-shuffle-egs and nnet-train-simple do not cooperate on > this specific job. Weird..... > > Best, > X. > > > On 06/12/2015 01:05 PM, Daniel Povey wrote: >> >> Possibly it is in zombie status because something interrupted or >> killed the run.pl process that had launched that process. E.g. a user >> did ctrl-z to the to-level script, maybe. >> >> Dan >> >> >> On Thu, Jun 11, 2015 at 11:11 PM, Xingyu Na <asr...@gm...> >> wrote: >>> >>> Hi, >>> >>> A user report this when he was using the train_pnorm_fast script. Top >>> gave this: >>> 60442 zhangpe+ 20 0 1193408 13112 10704 S 0.0 0.0 0:02.30 >>> nnet-shuffle-eg >>> 60443 zhangpe+ 20 0 0 0 0 Z 0.0 0.0 0:02.19 >>> nnet-train-simp >>> >>> It remains in zombie status forever.... >>> Any idea how this goes wrong? >>> >>> Best, >>> Xingyu >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |