|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-08-22 14:32:24
|
Likely that machine is or was having NFS problems, e.g. at that time it could not see the directory concerned, or maybe there were other problems on that machine. Dan On Thu, Aug 22, 2013 at 4:22 PM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > The training fails while "Generating training examples on disk" giving the > following message: > > queue.pl: Error, unfinished job no longer exists, log is in > exp/nnet5c1/log/get_egs.*.log > Possible reasons: a) Exceeded time limit? -> Use more jobs! b) > Shutdown/Frozen machine? -> Run again! > > I checked all the get_egs.*.log files and only the jobs submitted to one > server are unfinished. > And I can access that server using unix commands So it is not frozen. > > Is there a time limit in this stage of training? > > Any ideas? > > Anyway I am going to run again. > > Thanks, > Lahiru > > > On Wed, Aug 21, 2013 at 5:08 PM, Lahiru Samarakoon <lah...@gm...> > wrote: >> >> Thanks a lot. >> >> >> On Wed, Aug 21, 2013 at 5:07 PM, Mailing list used for User Communication >> and Updates <kal...@li...> wrote: >>> >>> No, you don't have to submit the script to sge, the script doesn't do >>> much apart from submit jobs to SGE itself, and it's more convenient to >>> just run it from a command line. (but use nohup or run it inside >>> screen to stop it getting interrupted if you log out). >>> >>> Dan >>> >>> >>> On Wed, Aug 21, 2013 at 11:04 AM, Mailing list used for User >>> Communication and Updates <kal...@li...> wrote: >>> > When running the dnn training on cpu, is it necessary to submit the >>> > script >>> > to SGE explicitly? >>> > Or just running the script will take care of the job because the script >>> > uses >>> > the queue.pl? >>> > >>> > Thanks, >>> > Lahiru >>> > >>> > >>> > >>> > On Tue, Aug 20, 2013 at 3:28 PM, Mailing list used for User >>> > Communication >>> > and Updates <kal...@li...> wrote: >>> >> >>> >> num_jobs_nnet should be the same as the #machine (i.e. 4), but you may >>> >> want to decrease the learning rate a bit (e.g. by a factor of 2) if >>> >> you reducde the #machines from 16 to 4. >>> >> Dan >>> >> >>> >> >>> >> On Tue, Aug 20, 2013 at 9:21 AM, Mailing list used for User >>> >> Communication and Updates <kal...@li...> wrote: >>> >> > Hi All, >>> >> > >>> >> > I am planning to run the wsj/s5 set up for DNN training on 4 >>> >> > machines >>> >> > and >>> >> > each has 16 cores. Could anyone give me some pointers about how to >>> >> > change >>> >> > important parameters like num_jobs_nnet in the script for my setup? >>> >> > >>> >> > Thank you, >>> >> > >>> >> > Best Regards, >>> >> > Lahiru >>> >> > >>> >> > >>> >> > >>> >> > ------------------------------------------------------------------------------ >>> >> > Introducing Performance Central, a new site from SourceForge and >>> >> > AppDynamics. Performance Central is your source for news, insights, >>> >> > analysis and resources for efficient Application Performance >>> >> > Management. >>> >> > Visit us today! >>> >> > >>> >> > >>> >> > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk >>> >> > _______________________________________________ >>> >> > Kaldi-users mailing list >>> >> > Kal...@li... >>> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >> > >>> >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >>> >> Introducing Performance Central, a new site from SourceForge and >>> >> AppDynamics. Performance Central is your source for news, insights, >>> >> analysis and resources for efficient Application Performance >>> >> Management. >>> >> Visit us today! >>> >> >>> >> >>> >> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk >>> >> _______________________________________________ >>> >> Kaldi-users mailing list >>> >> Kal...@li... >>> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> > >>> > >>> > >>> > >>> > ------------------------------------------------------------------------------ >>> > Introducing Performance Central, a new site from SourceForge and >>> > AppDynamics. Performance Central is your source for news, insights, >>> > analysis and resources for efficient Application Performance >>> > Management. >>> > Visit us today! >>> > >>> > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk >>> > _______________________________________________ >>> > Kaldi-users mailing list >>> > Kal...@li... >>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> > >>> >>> >>> ------------------------------------------------------------------------------ >>> Introducing Performance Central, a new site from SourceForge and >>> AppDynamics. Performance Central is your source for news, insights, >>> analysis and resources for efficient Application Performance Management. >>> Visit us today! >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> > > > ------------------------------------------------------------------------------ > Introducing Performance Central, a new site from SourceForge and > AppDynamics. Performance Central is your source for news, insights, > analysis and resources for efficient Application Performance Management. > Visit us today! > http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |