From: Daniel P. <dp...@gm...> - 2015-07-03 21:08:44
|
No, it doesn't implement early stopping. You would have to just decode different iterations and see which seems to give the best results. I haven't really gone with early stopping because it tends to stop training before you reach the best WER. On the other hand, if you don't know what you are doing it can be dangerous not to do early stopping, because there is a danger of seriously overtraining. Dan On Fri, Jul 3, 2015 at 1:52 PM, Mate Andre <ele...@gm...> wrote: > Thanks Tony, that makes sense. > > Does the train_more2.sh script implement early stopping using the validation > set created in the egs/ directory? > > On Fri, Jul 3, 2015 at 3:39 PM, Tony Robinson <to...@ca...> > wrote: >> >> I may be missing something but I read the question as "When I convert >> the MP3 files into 16kHz sample-rate WAV files, what bitrate should I >> convert them to?" >> >> The answer is that they should be converted to 16bits per sample, 16kHz >> mono files, so that's 256,000 bits per second. There's not a lot of >> point in using more than 16 bits per sample as the mp3 quanitisation is >> worse than this and there's not a lot of point in using less than 16bits >> per sample as why throw information away. >> >> >> Tony >> >> On 03/07/15 19:53, Daniel Povey wrote: >> > The sampling rate is critical, but the bitrate is not really critical- >> > just make sure it sounds OK without super-obvious artifacts. Vassil >> > (cc'd) will know what bitrate he encoded the Librispeech data with, >> > but matching this exactly is probably not important. >> > Dan >> > >> > >> > On Fri, Jul 3, 2015 at 10:45 AM, Jonathan L <jon...@gm...> >> > wrote: >> >> The data I want to train on is in MP3 format at a 128kbps bitrate and a >> >> 44.1kHz sample rate. The LibriSpeech data has a 16kHz sample rate, but >> >> doesn't seem to have a specified bitrate, When I convert the MP3 files >> >> into >> >> 16kHz sample-rate WAV files, what bitrate should I convert them to? >> >> >> >> Is there anything else I should consider when converting the speech >> >> files? >> >> >> >> On Mon, Jun 29, 2015 at 12:24 PM, Vijayaditya Peddinti >> >> <p.v...@gm...> wrote: >> >>> You need to provide the egs directory, not exp directory. You can >> >>> check >> >>> stage -3 of steps/nnet2/train_multisplice_accel2.sh to see how egs >> >>> directory >> >>> can be created from the alignment and data directories. >> >>> The context variables necessary for creating these examples can be >> >>> found >> >>> in nnet_ms_a_online/conf/splice.conf file. >> >>> >> >>> Vijay >> >>> >> >>> On Mon, Jun 29, 2015 at 9:14 AM, Jonathan L >> >>> <jon...@gm...> >> >>> wrote: >> >>>> The train_more*.sh scripts accept an 'exp' directory instead of a >> >>>> 'data/train' directory. Is there another script that would accept the >> >>>> 'data/train' directory as input instead? >> >>>> >> >>>> On Mon, Jun 29, 2015 at 12:08 PM, Vijayaditya Peddinti >> >>>> <p.v...@gm...> wrote: >> >>>>> See the scripts steps/nnet2/train_more*.sh >> >>>>> >> >>>>> Vijay >> >>>>> >> >>>>> On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L >> >>>>> <jon...@gm...> >> >>>>> wrote: >> >>>>>> I'm looking to further train an existing LibriSpeech nnet2_a_online >> >>>>>> model on a new dataset. >> >>>>>> >> >>>>>> I have prepared the files for this new dataset inside a data/train >> >>>>>> directory, as described in the Data Preparation tutorial. I want to >> >>>>>> keep the >> >>>>>> nnet2_a_online model initialized to the parameters it learned from >> >>>>>> training >> >>>>>> on LibriSpeech, but continue its training on this new dataset. Is >> >>>>>> there a >> >>>>>> script that would allow me to specify the nnet2_a_online model and >> >>>>>> the >> >>>>>> dataset's data/train directory as input in order to output a model >> >>>>>> that has >> >>>>>> been trained more on this new dataset? >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> ------------------------------------------------------------------------------ >> >>>>>> Monitor 25 network devices or servers for free with OpManager! >> >>>>>> OpManager is web-based network management software that monitors >> >>>>>> network devices and physical & virtual servers, alerts via email & >> >>>>>> sms >> >>>>>> for fault. Monitor 25 devices for free with no restriction. >> >>>>>> Download >> >>>>>> now >> >>>>>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> >>>>>> _______________________________________________ >> >>>>>> Kaldi-users mailing list >> >>>>>> Kal...@li... >> >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>>>> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Don't Limit Your Business. Reach for the Cloud. >> >> GigeNET's Cloud Solutions provide you with the tools and support that >> >> you need to offload your IT needs and focus on growing your business. >> >> Configured For All Businesses. Start Your Cloud Today. >> >> https://www.gigenetcloud.com/ >> >> _______________________________________________ >> >> Kaldi-users mailing list >> >> Kal...@li... >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> > >> > ------------------------------------------------------------------------------ >> > Don't Limit Your Business. Reach for the Cloud. >> > GigeNET's Cloud Solutions provide you with the tools and support that >> > you need to offload your IT needs and focus on growing your business. >> > Configured For All Businesses. Start Your Cloud Today. >> > https://www.gigenetcloud.com/ >> > _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> -- >> Dr A J Robinson, Founder >> We are hiring: www.speechmatics.com/careers >> Speechmatics is a trading name of Cantab Research Limited >> Phone direct: 01223 778240 office: 01223 794497 >> Company reg no GB 05697423, VAT reg no 925606030 >> 51 Canterbury Street, Cambridge, CB4 3QG, UK >> >> >> ------------------------------------------------------------------------------ >> Don't Limit Your Business. Reach for the Cloud. >> GigeNET's Cloud Solutions provide you with the tools and support that >> you need to offload your IT needs and focus on growing your business. >> Configured For All Businesses. Start Your Cloud Today. >> https://www.gigenetcloud.com/ >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |