|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-08-22 14:22:38
|
The training fails while "Generating training examples on disk" giving the
following message:
*queue.pl: Error, unfinished job no longer exists, log is in
exp/nnet5c1/log/get_egs.*.log
Possible reasons: a) Exceeded time limit? -> Use more jobs! b)
Shutdown/Frozen machine? -> Run again!*
I checked all the get_egs.*.log files and only the jobs submitted to one
server are unfinished.
And I can access that server using unix commands So it is not frozen.
Is there a time limit in this stage of training?
Any ideas?
Anyway I am going to run again.
Thanks,
Lahiru
On Wed, Aug 21, 2013 at 5:08 PM, Lahiru Samarakoon <lah...@gm...>wrote:
> Thanks a lot.
>
>
> On Wed, Aug 21, 2013 at 5:07 PM, Mailing list used for User Communication
> and Updates <kal...@li...> wrote:
>
>> No, you don't have to submit the script to sge, the script doesn't do
>> much apart from submit jobs to SGE itself, and it's more convenient to
>> just run it from a command line. (but use nohup or run it inside
>> screen to stop it getting interrupted if you log out).
>>
>> Dan
>>
>>
>> On Wed, Aug 21, 2013 at 11:04 AM, Mailing list used for User
>> Communication and Updates <kal...@li...> wrote:
>> > When running the dnn training on cpu, is it necessary to submit the
>> script
>> > to SGE explicitly?
>> > Or just running the script will take care of the job because the script
>> uses
>> > the queue.pl?
>> >
>> > Thanks,
>> > Lahiru
>> >
>> >
>> >
>> > On Tue, Aug 20, 2013 at 3:28 PM, Mailing list used for User
>> Communication
>> > and Updates <kal...@li...> wrote:
>> >>
>> >> num_jobs_nnet should be the same as the #machine (i.e. 4), but you may
>> >> want to decrease the learning rate a bit (e.g. by a factor of 2) if
>> >> you reducde the #machines from 16 to 4.
>> >> Dan
>> >>
>> >>
>> >> On Tue, Aug 20, 2013 at 9:21 AM, Mailing list used for User
>> >> Communication and Updates <kal...@li...> wrote:
>> >> > Hi All,
>> >> >
>> >> > I am planning to run the wsj/s5 set up for DNN training on 4 machines
>> >> > and
>> >> > each has 16 cores. Could anyone give me some pointers about how to
>> >> > change
>> >> > important parameters like num_jobs_nnet in the script for my setup?
>> >> >
>> >> > Thank you,
>> >> >
>> >> > Best Regards,
>> >> > Lahiru
>> >> >
>> >> >
>> >> >
>> ------------------------------------------------------------------------------
>> >> > Introducing Performance Central, a new site from SourceForge and
>> >> > AppDynamics. Performance Central is your source for news, insights,
>> >> > analysis and resources for efficient Application Performance
>> Management.
>> >> > Visit us today!
>> >> >
>> >> >
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>> >> > _______________________________________________
>> >> > Kaldi-users mailing list
>> >> > Kal...@li...
>> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >> >
>> >>
>> >>
>> >>
>> ------------------------------------------------------------------------------
>> >> Introducing Performance Central, a new site from SourceForge and
>> >> AppDynamics. Performance Central is your source for news, insights,
>> >> analysis and resources for efficient Application Performance
>> Management.
>> >> Visit us today!
>> >>
>> >>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>> >> _______________________________________________
>> >> Kaldi-users mailing list
>> >> Kal...@li...
>> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >
>> >
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Introducing Performance Central, a new site from SourceForge and
>> > AppDynamics. Performance Central is your source for news, insights,
>> > analysis and resources for efficient Application Performance Management.
>> > Visit us today!
>> >
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>> > _______________________________________________
>> > Kaldi-users mailing list
>> > Kal...@li...
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Introducing Performance Central, a new site from SourceForge and
>> AppDynamics. Performance Central is your source for news, insights,
>> analysis and resources for efficient Application Performance Management.
>> Visit us today!
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=48897511&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Kaldi-users mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>
>
>
|