|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-08-22 14:45:56
|
Hi, did you specify you SGE queue and queue parameters (especially the expected time to run) correctly? Some SGE grids have really short time the task is allowed to run (and is killed after exhausting this limit)... In that case you have to explicitly request more "walltime" when submitting the tasks to SGE. y. On Thu, Aug 22, 2013 at 10:22 AM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > The training fails while "Generating training examples on disk" giving > the following message: > > *queue.pl: Error, unfinished job no longer exists, log is in > exp/nnet5c1/log/get_egs.*.log > Possible reasons: a) Exceeded time limit? -> Use more jobs! b) > Shutdown/Frozen machine? -> Run again!* > > I checked all the get_egs.*.log files and only the jobs submitted to one > server are unfinished. > And I can access that server using unix commands So it is not frozen. > > Is there a time limit in this stage of training? > > Any ideas? > > Anyway I am going to run again. > > Thanks, > Lahiru > > > On Wed, Aug 21, 2013 at 5:08 PM, Lahiru Samarakoon <lah...@gm...>wrote: > >> Thanks a lot. >> >> >> On Wed, Aug 21, 2013 at 5:07 PM, Mailing list used for User Communication >> and Updates <kal...@li...> wrote: >> >>> No, you don't have to submit the script to sge, the script doesn't do >>> much apart from submit jobs to SGE itself, and it's more convenient to >>> just run it from a command line. (but use nohup or run it inside >>> screen to stop it getting interrupted if you log out). >>> >>> Dan >>> >>> >>> On Wed, Aug 21, 2013 at 11:04 AM, Mailing list used for User >>> Communication and Updates <kal...@li...> wrote: >>> > When running the dnn training on cpu, is it necessary to submit the >>> script >>> > to SGE explicitly? >>> > Or just running the script will take care of the job because the >>> script uses >>> > the queue.pl? >>> > >>> > Thanks, >>> > Lahiru >>> > >>> > >>> > >>> > On Tue, Aug 20, 2013 at 3:28 PM, Mailing list used for User >>> Communication >>> > and Updates <kal...@li...> wrote: >>> >> >>> >> num_jobs_nnet should be the same as the #machine (i.e. 4), but you may >>> >> want to decrease the learning rate a bit (e.g. by a factor of 2) if >>> >> you reducde the #machines from 16 to 4. >>> >> Dan >>> >> >>> >> >>> >> On Tue, Aug 20, 2013 at 9:21 AM, Mailing list used for User >>> >> Communication and Updates <kal...@li...> wrote: >>> >> > Hi All, >>> >> > >>> >> > I am planning to run the wsj/s5 set up for DNN training on 4 >>> machines >>> >> > and >>> >> > each has 16 cores. Could anyone give me some pointers about how to >>> >> > change >>> >> > important parameters like num_jobs_nnet in the script for my setup? >>> >> > >>> >> > Thank you, >>> >> > >>> >> > Best Regards, >>> >> > Lahiru >>> >> > >>> >> > >>> >> > >>> ------------------------------------------------------------------------------ >>> >> > Introducing Performance Central, a new site from SourceForge and >>> >> > AppDynamics. Performance Central is your source for news, insights, >>> >> > analysis and resources for efficient Application Performance >>> Management. >>> >> > Visit us today! >>> >> > >>> >> > >>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk >>> >> > _______________________________________________ >>> >> > Kaldi-users mailing list >>> >> > Kal...@li... >>> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >> > >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------------ >>> >> Introducing Performance Central, a new site from SourceForge and >>> >> AppDynamics. Performance Central is your source for news, insights, >>> >> analysis and resources for efficient Application Performance >>> Management. >>> >> Visit us today! >>> >> >>> >> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk >>> >> _______________________________________________ >>> >> Kaldi-users mailing list >>> >> Kal...@li... >>> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Introducing Performance Central, a new site from SourceForge and >>> > AppDynamics. Performance Central is your source for news, insights, >>> > analysis and resources for efficient Application Performance >>> Management. >>> > Visit us today! >>> > >>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk >>> > _______________________________________________ >>> > Kaldi-users mailing list >>> > Kal...@li... >>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> > >>> >>> >>> ------------------------------------------------------------------------------ >>> Introducing Performance Central, a new site from SourceForge and >>> AppDynamics. Performance Central is your source for news, insights, >>> analysis and resources for efficient Application Performance Management. >>> Visit us today! >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >> >> > > > ------------------------------------------------------------------------------ > Introducing Performance Central, a new site from SourceForge and > AppDynamics. Performance Central is your source for news, insights, > analysis and resources for efficient Application Performance Management. > Visit us today! > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |