You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
|
Feb
|
Mar
(8) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(7) |
May
(31) |
Jun
(40) |
Jul
(65) |
Aug
(37) |
Sep
(12) |
Oct
(57) |
Nov
(15) |
Dec
(35) |
2014 |
Jan
(3) |
Feb
(30) |
Mar
(57) |
Apr
(26) |
May
(49) |
Jun
(26) |
Jul
(63) |
Aug
(33) |
Sep
(20) |
Oct
(153) |
Nov
(62) |
Dec
(20) |
2015 |
Jan
(6) |
Feb
(21) |
Mar
(42) |
Apr
(33) |
May
(76) |
Jun
(102) |
Jul
(39) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jan T. <jt...@gm...> - 2015-07-06 13:35:46
|
I'm afraid that WSJ is copyrighted and not publicly available for free, so I don't think anyone will be able to help you. You could try to contact LDC directly, they have have a initiative providing the data to elligible students (see https://www.ldc.upenn.edu/language-resources/data/data-scholarships) y. On Sun, Jul 5, 2015 at 11:24 PM, Xu <as...@16...> wrote: > Dear kaldi-users, > This is Xu, a new kaldi-user. I did experiment about DNN speaker > adaptation on WSJ corpus in Kaldi recently. The adaptation model train was > finished. In order to validate correctness of my experiment, the language > mode for test is need. But I don't have the default language model in WSJ > corpus for test. Would you send me the default language model which is for > WSJ corpus test? The language model is > "../13-32.1/wsj1/doc/lng_modl/base_lm/bcb20onp.z" in kaldi default script. > Thank you very much. > Best Wishes. > Xu > > > > > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
From: Sunit S. <sun...@in...> - 2015-07-06 11:55:19
|
Hi all, I am getting a buffer overflow error while running RNNLM scripts of WSJ. Any idea as to what could have gone wrong? I trained the model using a subset of WSJ utterances and from the logs, the training seemed alright. Below are the rnnlm rescore logs followed by the RNN training logs. steps/rnnlmrescore.sh --rnnlm_ver rnnlm-hs-0.1b --N 100 0.5 data/lang_test_tgpr_5k data/lang_rnnlm_h30_me5-1000 data/dt05_multi_r_mc exp/tri4a/decode_tgpr_5k exp/tri4a/decode_tgpr_5k_rnnlm_h30_me5-1000_L0.5 steps/rnnlmrescore.sh: converting lattices to N-best. steps/rnnlmrescore.sh: removing old LM scores. steps/rnnlmrescore.sh: creating separate-archive form of N-best lists. steps/rnnlmrescore.sh: doing the same with old LM scores. steps/rnnlmrescore.sh: Creating archives with text-form of words, and LM scores without graph scores. steps/rnnlmrescore.sh: invoking rnnlm_compute_scores.sh which calls rnnlm, to get RNN LM scores. *** buffer overflow detected ***: ../../../tools/rnnlm-hs-0.1b/rnnlm terminated ======= Backtrace: ========= /lib/x86_64-linux-gnu/libc.so.6(+0x7338f)[0x7fda06f6b38f] /lib/x86_64-linux-gnu/libc.so.6(__fortify_fail+0x5c)[0x7fda07002c9c] /lib/x86_64-linux-gnu/libc.so.6(+0x109b60)[0x7fda07001b60] ../../../tools/rnnlm-hs-0.1b/rnnlm[0x4011ea] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fda06f19ec5] ../../../tools/rnnlm-hs-0.1b/rnnlm[0x4018ac] Training logs: ../../../tools/rnnlm-hs-0.1b/rnnlm -threads 1 -independent -train /tmp/tmp.6aF3RDTFnf -valid /tmp/tmp.aTiUNgZnWT -rnnlm data/lang_rnnlm_h30_me5-1000/rnnlm -hidden 30 -rand-seed 1 -debug 2 -class 200 -bptt 2 -bptt-block 20 -direct-order 4 -direct 1000 -binary # Vocab size: 9066 Words in train file: 164907 Starting training using file /tmp/tmp.6aF3RDTFnf Iteration 0 Valid Entropy 9.595403 Alpha: 0.100000 ME-alpha: 0.100000 Progress: 97.12% Words/thread/sec: 117.21k Iteration 1 Valid Entropy 8.564008 Alpha: 0.100000 ME-alpha: 0.100000 Progress: 97.12% Words/thread/sec: 123.54k Iteration 2 Valid Entropy 8.297136 Alpha: 0.100000 ME-alpha: 0.100000 Progress: 97.12% Words/thread/sec: 122.70k Iteration 3 Valid Entropy 8.175531 Alpha: 0.100000 ME-alpha: 0.100000 Progress: 97.12% Words/thread/sec: 108.22k Iteration 4 Valid Entropy 8.107678 Alpha: 0.100000 ME-alpha: 0.100000 Progress: 97.12% Words/thread/sec: 121.89k Iteration 5 Valid Entropy 8.069274 Alpha: 0.100000 ME-alpha: 0.100000 Progress: 97.12% Words/thread/sec: 124.64k Iteration 6 Valid Entropy 8.049375 Decay started Alpha: 0.050000 ME-alpha: 0.050000 Progress: 97.12% Words/thread/sec: 111.30k Iteration 7 Valid Entropy 8.009795 Alpha: 0.025000 ME-alpha: 0.025000 Progress: 97.12% Words/thread/sec: 124.70k Iteration 8 Valid Entropy 7.989441 Retry 1/2 Alpha: 0.012500 ME-alpha: 0.012500 Progress: 97.12% Words/thread/sec: 113.82k Iteration 9 Valid Entropy 7.982499 Retry 2/2 # Accounting: time=439 threads=1 # Ended (code 0) at Fri Jun 26 16:22:52 CEST 2015, elapsed time 439 seconds Regards, Sunit |
From: Vassil P. <vas...@gm...> - 2015-07-06 10:14:49
|
Hi Neal, yes, I guess there might be tools, or combination thereof, that could produce even better results. This is one of the raison d'êtres for the "original-mp3" archive. It should contain enough metadata to allow re-extraction of the aligned utterances, possibly using different tools (it also contains 15-20% additional audio, that was discarded in order to make LibriSpeech more balanced). It seems to me, however, that the the audio quality of the current corpus is OK. Thanks for mentioning these audio analysis tools, I wasn't aware of some of them. I've found WaveSurfer to be pretty useful too. Vassil On Sun, Jul 5, 2015 at 8:36 PM, Neil Nelson <nn...@in...> wrote: > Vassil, > > I have always used Lame (Ubuntu Software Center is your friend) to > convert between wav and mp3. It is well regarded. I suggest SOX for > down-sampling. Spek will give a spectral analysis picture for the entire > file. Audacity will give an ongoing spectral analysis but it is not > frequency labeled. Sonic Visualizer may have something. Upgrading to > Ubuntu 14.04 can be tricky in spots but something to consider since the > versions of GCC and all the software tend to be limited to the OS rev. > > Neil > > On 07/05/2015 02:05 AM, Vassil Panayotov wrote: > > BTW, when preparing LibriSpeech, I've noticed that the quality of MP3 > > conversion can vary substantially, depending on the particular tool used. > > For example the output of mpg123(or maybe it was mpg321) was very noisy > and > > the ASR WER was 10-15% absolute higher than when alternative MP3 decoders > > were used. When converting to 16kHz .wav ffmpeg cuts off the frequencies > > higher than 7kHz. So eventually I settled for mplayer. It preserves the > > frequency content in the 7-8kHz range and as far as I could tell the > audio > > sounded a bit "closer" to the original recording, although I'm not sure > if > > there is any measurable difference in ASR performance b/w ffmpeg and > > mplayer produced .wav-s. The versions of the tools I've tried were those > > shipped with Ubuntu 10.04 and 12.04, so the issues may be fixed in the > more > > recent releases. > > > > Vassil > > -- > RSA public key for this email address at http://pgp.mit.edu/ > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Xu <as...@16...> - 2015-07-06 03:24:59
|
Dear kaldi-users, This is Xu, a new kaldi-user. I did experiment about DNN speaker adaptation on WSJ corpus in Kaldi recently. The adaptation model train was finished. In order to validate correctness of my experiment, the language mode for test is need. But I don't have the default language model in WSJ corpus for test. Would you send me the default language model which is for WSJ corpus test? The language model is "../13-32.1/wsj1/doc/lng_modl/base_lm/bcb20onp.z" in kaldi default script. Thank you very much. Best Wishes. Xu |
From: Neil N. <nn...@in...> - 2015-07-05 17:53:21
|
Vassil, I have always used Lame (Ubuntu Software Center is your friend) to convert between wav and mp3. It is well regarded. I suggest SOX for down-sampling. Spek will give a spectral analysis picture for the entire file. Audacity will give an ongoing spectral analysis but it is not frequency labeled. Sonic Visualizer may have something. Upgrading to Ubuntu 14.04 can be tricky in spots but something to consider since the versions of GCC and all the software tend to be limited to the OS rev. Neil On 07/05/2015 02:05 AM, Vassil Panayotov wrote: > BTW, when preparing LibriSpeech, I've noticed that the quality of MP3 > conversion can vary substantially, depending on the particular tool used. > For example the output of mpg123(or maybe it was mpg321) was very noisy and > the ASR WER was 10-15% absolute higher than when alternative MP3 decoders > were used. When converting to 16kHz .wav ffmpeg cuts off the frequencies > higher than 7kHz. So eventually I settled for mplayer. It preserves the > frequency content in the 7-8kHz range and as far as I could tell the audio > sounded a bit "closer" to the original recording, although I'm not sure if > there is any measurable difference in ASR performance b/w ffmpeg and > mplayer produced .wav-s. The versions of the tools I've tried were those > shipped with Ubuntu 10.04 and 12.04, so the issues may be fixed in the more > recent releases. > > Vassil -- RSA public key for this email address at http://pgp.mit.edu/ |
From: Vassil P. <vas...@gm...> - 2015-07-05 08:05:25
|
BTW, when preparing LibriSpeech, I've noticed that the quality of MP3 conversion can vary substantially, depending on the particular tool used. For example the output of mpg123(or maybe it was mpg321) was very noisy and the ASR WER was 10-15% absolute higher than when alternative MP3 decoders were used. When converting to 16kHz .wav ffmpeg cuts off the frequencies higher than 7kHz. So eventually I settled for mplayer. It preserves the frequency content in the 7-8kHz range and as far as I could tell the audio sounded a bit "closer" to the original recording, although I'm not sure if there is any measurable difference in ASR performance b/w ffmpeg and mplayer produced .wav-s. The versions of the tools I've tried were those shipped with Ubuntu 10.04 and 12.04, so the issues may be fixed in the more recent releases. Vassil On Fri, Jul 3, 2015 at 9:53 PM, Daniel Povey <dp...@gm...> wrote: > The sampling rate is critical, but the bitrate is not really critical- > just make sure it sounds OK without super-obvious artifacts. Vassil > (cc'd) will know what bitrate he encoded the Librispeech data with, > but matching this exactly is probably not important. > Dan > > > On Fri, Jul 3, 2015 at 10:45 AM, Jonathan L <jon...@gm...> > wrote: > > The data I want to train on is in MP3 format at a 128kbps bitrate and a > > 44.1kHz sample rate. The LibriSpeech data has a 16kHz sample rate, but > > doesn't seem to have a specified bitrate, When I convert the MP3 files > into > > 16kHz sample-rate WAV files, what bitrate should I convert them to? > > > > Is there anything else I should consider when converting the speech > files? > > > > On Mon, Jun 29, 2015 at 12:24 PM, Vijayaditya Peddinti > > <p.v...@gm...> wrote: > >> > >> You need to provide the egs directory, not exp directory. You can check > >> stage -3 of steps/nnet2/train_multisplice_accel2.sh to see how egs > directory > >> can be created from the alignment and data directories. > >> The context variables necessary for creating these examples can be found > >> in nnet_ms_a_online/conf/splice.conf file. > >> > >> Vijay > >> > >> On Mon, Jun 29, 2015 at 9:14 AM, Jonathan L <jon...@gm...> > >> wrote: > >>> > >>> The train_more*.sh scripts accept an 'exp' directory instead of a > >>> 'data/train' directory. Is there another script that would accept the > >>> 'data/train' directory as input instead? > >>> > >>> On Mon, Jun 29, 2015 at 12:08 PM, Vijayaditya Peddinti > >>> <p.v...@gm...> wrote: > >>>> > >>>> See the scripts steps/nnet2/train_more*.sh > >>>> > >>>> Vijay > >>>> > >>>> On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L < > jon...@gm...> > >>>> wrote: > >>>>> > >>>>> I'm looking to further train an existing LibriSpeech nnet2_a_online > >>>>> model on a new dataset. > >>>>> > >>>>> I have prepared the files for this new dataset inside a data/train > >>>>> directory, as described in the Data Preparation tutorial. I want to > keep the > >>>>> nnet2_a_online model initialized to the parameters it learned from > training > >>>>> on LibriSpeech, but continue its training on this new dataset. Is > there a > >>>>> script that would allow me to specify the nnet2_a_online model and > the > >>>>> dataset's data/train directory as input in order to output a model > that has > >>>>> been trained more on this new dataset? > >>>>> > >>>>> > >>>>> > ------------------------------------------------------------------------------ > >>>>> Monitor 25 network devices or servers for free with OpManager! > >>>>> OpManager is web-based network management software that monitors > >>>>> network devices and physical & virtual servers, alerts via email & > sms > >>>>> for fault. Monitor 25 devices for free with no restriction. Download > >>>>> now > >>>>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > >>>>> _______________________________________________ > >>>>> Kaldi-users mailing list > >>>>> Kal...@li... > >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>>> > >>>> > >>> > >> > > > > > > > ------------------------------------------------------------------------------ > > Don't Limit Your Business. Reach for the Cloud. > > GigeNET's Cloud Solutions provide you with the tools and support that > > you need to offload your IT needs and focus on growing your business. > > Configured For All Businesses. Start Your Cloud Today. > > https://www.gigenetcloud.com/ > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > |
From: Daniel P. <dp...@gm...> - 2015-07-03 21:08:44
|
No, it doesn't implement early stopping. You would have to just decode different iterations and see which seems to give the best results. I haven't really gone with early stopping because it tends to stop training before you reach the best WER. On the other hand, if you don't know what you are doing it can be dangerous not to do early stopping, because there is a danger of seriously overtraining. Dan On Fri, Jul 3, 2015 at 1:52 PM, Mate Andre <ele...@gm...> wrote: > Thanks Tony, that makes sense. > > Does the train_more2.sh script implement early stopping using the validation > set created in the egs/ directory? > > On Fri, Jul 3, 2015 at 3:39 PM, Tony Robinson <to...@ca...> > wrote: >> >> I may be missing something but I read the question as "When I convert >> the MP3 files into 16kHz sample-rate WAV files, what bitrate should I >> convert them to?" >> >> The answer is that they should be converted to 16bits per sample, 16kHz >> mono files, so that's 256,000 bits per second. There's not a lot of >> point in using more than 16 bits per sample as the mp3 quanitisation is >> worse than this and there's not a lot of point in using less than 16bits >> per sample as why throw information away. >> >> >> Tony >> >> On 03/07/15 19:53, Daniel Povey wrote: >> > The sampling rate is critical, but the bitrate is not really critical- >> > just make sure it sounds OK without super-obvious artifacts. Vassil >> > (cc'd) will know what bitrate he encoded the Librispeech data with, >> > but matching this exactly is probably not important. >> > Dan >> > >> > >> > On Fri, Jul 3, 2015 at 10:45 AM, Jonathan L <jon...@gm...> >> > wrote: >> >> The data I want to train on is in MP3 format at a 128kbps bitrate and a >> >> 44.1kHz sample rate. The LibriSpeech data has a 16kHz sample rate, but >> >> doesn't seem to have a specified bitrate, When I convert the MP3 files >> >> into >> >> 16kHz sample-rate WAV files, what bitrate should I convert them to? >> >> >> >> Is there anything else I should consider when converting the speech >> >> files? >> >> >> >> On Mon, Jun 29, 2015 at 12:24 PM, Vijayaditya Peddinti >> >> <p.v...@gm...> wrote: >> >>> You need to provide the egs directory, not exp directory. You can >> >>> check >> >>> stage -3 of steps/nnet2/train_multisplice_accel2.sh to see how egs >> >>> directory >> >>> can be created from the alignment and data directories. >> >>> The context variables necessary for creating these examples can be >> >>> found >> >>> in nnet_ms_a_online/conf/splice.conf file. >> >>> >> >>> Vijay >> >>> >> >>> On Mon, Jun 29, 2015 at 9:14 AM, Jonathan L >> >>> <jon...@gm...> >> >>> wrote: >> >>>> The train_more*.sh scripts accept an 'exp' directory instead of a >> >>>> 'data/train' directory. Is there another script that would accept the >> >>>> 'data/train' directory as input instead? >> >>>> >> >>>> On Mon, Jun 29, 2015 at 12:08 PM, Vijayaditya Peddinti >> >>>> <p.v...@gm...> wrote: >> >>>>> See the scripts steps/nnet2/train_more*.sh >> >>>>> >> >>>>> Vijay >> >>>>> >> >>>>> On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L >> >>>>> <jon...@gm...> >> >>>>> wrote: >> >>>>>> I'm looking to further train an existing LibriSpeech nnet2_a_online >> >>>>>> model on a new dataset. >> >>>>>> >> >>>>>> I have prepared the files for this new dataset inside a data/train >> >>>>>> directory, as described in the Data Preparation tutorial. I want to >> >>>>>> keep the >> >>>>>> nnet2_a_online model initialized to the parameters it learned from >> >>>>>> training >> >>>>>> on LibriSpeech, but continue its training on this new dataset. Is >> >>>>>> there a >> >>>>>> script that would allow me to specify the nnet2_a_online model and >> >>>>>> the >> >>>>>> dataset's data/train directory as input in order to output a model >> >>>>>> that has >> >>>>>> been trained more on this new dataset? >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> ------------------------------------------------------------------------------ >> >>>>>> Monitor 25 network devices or servers for free with OpManager! >> >>>>>> OpManager is web-based network management software that monitors >> >>>>>> network devices and physical & virtual servers, alerts via email & >> >>>>>> sms >> >>>>>> for fault. Monitor 25 devices for free with no restriction. >> >>>>>> Download >> >>>>>> now >> >>>>>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> >>>>>> _______________________________________________ >> >>>>>> Kaldi-users mailing list >> >>>>>> Kal...@li... >> >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>>>> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Don't Limit Your Business. Reach for the Cloud. >> >> GigeNET's Cloud Solutions provide you with the tools and support that >> >> you need to offload your IT needs and focus on growing your business. >> >> Configured For All Businesses. Start Your Cloud Today. >> >> https://www.gigenetcloud.com/ >> >> _______________________________________________ >> >> Kaldi-users mailing list >> >> Kal...@li... >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> > >> > ------------------------------------------------------------------------------ >> > Don't Limit Your Business. Reach for the Cloud. >> > GigeNET's Cloud Solutions provide you with the tools and support that >> > you need to offload your IT needs and focus on growing your business. >> > Configured For All Businesses. Start Your Cloud Today. >> > https://www.gigenetcloud.com/ >> > _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> -- >> Dr A J Robinson, Founder >> We are hiring: www.speechmatics.com/careers >> Speechmatics is a trading name of Cantab Research Limited >> Phone direct: 01223 778240 office: 01223 794497 >> Company reg no GB 05697423, VAT reg no 925606030 >> 51 Canterbury Street, Cambridge, CB4 3QG, UK >> >> >> ------------------------------------------------------------------------------ >> Don't Limit Your Business. Reach for the Cloud. >> GigeNET's Cloud Solutions provide you with the tools and support that >> you need to offload your IT needs and focus on growing your business. >> Configured For All Businesses. Start Your Cloud Today. >> https://www.gigenetcloud.com/ >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Mate A. <ele...@gm...> - 2015-07-03 20:53:00
|
Thanks Tony, that makes sense. Does the train_more2.sh script implement early stopping using the validation set created in the egs/ directory? On Fri, Jul 3, 2015 at 3:39 PM, Tony Robinson <to...@ca...> wrote: > I may be missing something but I read the question as "When I convert > the MP3 files into 16kHz sample-rate WAV files, what bitrate should I > convert them to?" > > The answer is that they should be converted to 16bits per sample, 16kHz > mono files, so that's 256,000 bits per second. There's not a lot of > point in using more than 16 bits per sample as the mp3 quanitisation is > worse than this and there's not a lot of point in using less than 16bits > per sample as why throw information away. > > > Tony > > On 03/07/15 19:53, Daniel Povey wrote: > > The sampling rate is critical, but the bitrate is not really critical- > > just make sure it sounds OK without super-obvious artifacts. Vassil > > (cc'd) will know what bitrate he encoded the Librispeech data with, > > but matching this exactly is probably not important. > > Dan > > > > > > On Fri, Jul 3, 2015 at 10:45 AM, Jonathan L <jon...@gm...> > wrote: > >> The data I want to train on is in MP3 format at a 128kbps bitrate and a > >> 44.1kHz sample rate. The LibriSpeech data has a 16kHz sample rate, but > >> doesn't seem to have a specified bitrate, When I convert the MP3 files > into > >> 16kHz sample-rate WAV files, what bitrate should I convert them to? > >> > >> Is there anything else I should consider when converting the speech > files? > >> > >> On Mon, Jun 29, 2015 at 12:24 PM, Vijayaditya Peddinti > >> <p.v...@gm...> wrote: > >>> You need to provide the egs directory, not exp directory. You can check > >>> stage -3 of steps/nnet2/train_multisplice_accel2.sh to see how egs > directory > >>> can be created from the alignment and data directories. > >>> The context variables necessary for creating these examples can be > found > >>> in nnet_ms_a_online/conf/splice.conf file. > >>> > >>> Vijay > >>> > >>> On Mon, Jun 29, 2015 at 9:14 AM, Jonathan L <jon...@gm... > > > >>> wrote: > >>>> The train_more*.sh scripts accept an 'exp' directory instead of a > >>>> 'data/train' directory. Is there another script that would accept the > >>>> 'data/train' directory as input instead? > >>>> > >>>> On Mon, Jun 29, 2015 at 12:08 PM, Vijayaditya Peddinti > >>>> <p.v...@gm...> wrote: > >>>>> See the scripts steps/nnet2/train_more*.sh > >>>>> > >>>>> Vijay > >>>>> > >>>>> On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L < > jon...@gm...> > >>>>> wrote: > >>>>>> I'm looking to further train an existing LibriSpeech nnet2_a_online > >>>>>> model on a new dataset. > >>>>>> > >>>>>> I have prepared the files for this new dataset inside a data/train > >>>>>> directory, as described in the Data Preparation tutorial. I want to > keep the > >>>>>> nnet2_a_online model initialized to the parameters it learned from > training > >>>>>> on LibriSpeech, but continue its training on this new dataset. Is > there a > >>>>>> script that would allow me to specify the nnet2_a_online model and > the > >>>>>> dataset's data/train directory as input in order to output a model > that has > >>>>>> been trained more on this new dataset? > >>>>>> > >>>>>> > >>>>>> > ------------------------------------------------------------------------------ > >>>>>> Monitor 25 network devices or servers for free with OpManager! > >>>>>> OpManager is web-based network management software that monitors > >>>>>> network devices and physical & virtual servers, alerts via email & > sms > >>>>>> for fault. Monitor 25 devices for free with no restriction. Download > >>>>>> now > >>>>>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > >>>>>> _______________________________________________ > >>>>>> Kaldi-users mailing list > >>>>>> Kal...@li... > >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >>>>>> > >> > >> > ------------------------------------------------------------------------------ > >> Don't Limit Your Business. Reach for the Cloud. > >> GigeNET's Cloud Solutions provide you with the tools and support that > >> you need to offload your IT needs and focus on growing your business. > >> Configured For All Businesses. Start Your Cloud Today. > >> https://www.gigenetcloud.com/ > >> _______________________________________________ > >> Kaldi-users mailing list > >> Kal...@li... > >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > > > ------------------------------------------------------------------------------ > > Don't Limit Your Business. Reach for the Cloud. > > GigeNET's Cloud Solutions provide you with the tools and support that > > you need to offload your IT needs and focus on growing your business. > > Configured For All Businesses. Start Your Cloud Today. > > https://www.gigenetcloud.com/ > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > -- > Dr A J Robinson, Founder > We are hiring: www.speechmatics.com/careers > Speechmatics is a trading name of Cantab Research Limited > Phone direct: 01223 778240 office: 01223 794497 > Company reg no GB 05697423, VAT reg no 925606030 > 51 Canterbury Street, Cambridge, CB4 3QG, UK > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Tony R. <to...@ca...> - 2015-07-03 19:52:09
|
I may be missing something but I read the question as "When I convert the MP3 files into 16kHz sample-rate WAV files, what bitrate should I convert them to?" The answer is that they should be converted to 16bits per sample, 16kHz mono files, so that's 256,000 bits per second. There's not a lot of point in using more than 16 bits per sample as the mp3 quanitisation is worse than this and there's not a lot of point in using less than 16bits per sample as why throw information away. Tony On 03/07/15 19:53, Daniel Povey wrote: > The sampling rate is critical, but the bitrate is not really critical- > just make sure it sounds OK without super-obvious artifacts. Vassil > (cc'd) will know what bitrate he encoded the Librispeech data with, > but matching this exactly is probably not important. > Dan > > > On Fri, Jul 3, 2015 at 10:45 AM, Jonathan L <jon...@gm...> wrote: >> The data I want to train on is in MP3 format at a 128kbps bitrate and a >> 44.1kHz sample rate. The LibriSpeech data has a 16kHz sample rate, but >> doesn't seem to have a specified bitrate, When I convert the MP3 files into >> 16kHz sample-rate WAV files, what bitrate should I convert them to? >> >> Is there anything else I should consider when converting the speech files? >> >> On Mon, Jun 29, 2015 at 12:24 PM, Vijayaditya Peddinti >> <p.v...@gm...> wrote: >>> You need to provide the egs directory, not exp directory. You can check >>> stage -3 of steps/nnet2/train_multisplice_accel2.sh to see how egs directory >>> can be created from the alignment and data directories. >>> The context variables necessary for creating these examples can be found >>> in nnet_ms_a_online/conf/splice.conf file. >>> >>> Vijay >>> >>> On Mon, Jun 29, 2015 at 9:14 AM, Jonathan L <jon...@gm...> >>> wrote: >>>> The train_more*.sh scripts accept an 'exp' directory instead of a >>>> 'data/train' directory. Is there another script that would accept the >>>> 'data/train' directory as input instead? >>>> >>>> On Mon, Jun 29, 2015 at 12:08 PM, Vijayaditya Peddinti >>>> <p.v...@gm...> wrote: >>>>> See the scripts steps/nnet2/train_more*.sh >>>>> >>>>> Vijay >>>>> >>>>> On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L <jon...@gm...> >>>>> wrote: >>>>>> I'm looking to further train an existing LibriSpeech nnet2_a_online >>>>>> model on a new dataset. >>>>>> >>>>>> I have prepared the files for this new dataset inside a data/train >>>>>> directory, as described in the Data Preparation tutorial. I want to keep the >>>>>> nnet2_a_online model initialized to the parameters it learned from training >>>>>> on LibriSpeech, but continue its training on this new dataset. Is there a >>>>>> script that would allow me to specify the nnet2_a_online model and the >>>>>> dataset's data/train directory as input in order to output a model that has >>>>>> been trained more on this new dataset? >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Monitor 25 network devices or servers for free with OpManager! >>>>>> OpManager is web-based network management software that monitors >>>>>> network devices and physical & virtual servers, alerts via email & sms >>>>>> for fault. Monitor 25 devices for free with no restriction. Download >>>>>> now >>>>>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >>>>>> _______________________________________________ >>>>>> Kaldi-users mailing list >>>>>> Kal...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>>>>> >> >> ------------------------------------------------------------------------------ >> Don't Limit Your Business. Reach for the Cloud. >> GigeNET's Cloud Solutions provide you with the tools and support that >> you need to offload your IT needs and focus on growing your business. >> Configured For All Businesses. Start Your Cloud Today. >> https://www.gigenetcloud.com/ >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users -- Dr A J Robinson, Founder We are hiring: www.speechmatics.com/careers Speechmatics is a trading name of Cantab Research Limited Phone direct: 01223 778240 office: 01223 794497 Company reg no GB 05697423, VAT reg no 925606030 51 Canterbury Street, Cambridge, CB4 3QG, UK |
From: Daniel P. <dp...@gm...> - 2015-07-03 18:54:02
|
The sampling rate is critical, but the bitrate is not really critical- just make sure it sounds OK without super-obvious artifacts. Vassil (cc'd) will know what bitrate he encoded the Librispeech data with, but matching this exactly is probably not important. Dan On Fri, Jul 3, 2015 at 10:45 AM, Jonathan L <jon...@gm...> wrote: > The data I want to train on is in MP3 format at a 128kbps bitrate and a > 44.1kHz sample rate. The LibriSpeech data has a 16kHz sample rate, but > doesn't seem to have a specified bitrate, When I convert the MP3 files into > 16kHz sample-rate WAV files, what bitrate should I convert them to? > > Is there anything else I should consider when converting the speech files? > > On Mon, Jun 29, 2015 at 12:24 PM, Vijayaditya Peddinti > <p.v...@gm...> wrote: >> >> You need to provide the egs directory, not exp directory. You can check >> stage -3 of steps/nnet2/train_multisplice_accel2.sh to see how egs directory >> can be created from the alignment and data directories. >> The context variables necessary for creating these examples can be found >> in nnet_ms_a_online/conf/splice.conf file. >> >> Vijay >> >> On Mon, Jun 29, 2015 at 9:14 AM, Jonathan L <jon...@gm...> >> wrote: >>> >>> The train_more*.sh scripts accept an 'exp' directory instead of a >>> 'data/train' directory. Is there another script that would accept the >>> 'data/train' directory as input instead? >>> >>> On Mon, Jun 29, 2015 at 12:08 PM, Vijayaditya Peddinti >>> <p.v...@gm...> wrote: >>>> >>>> See the scripts steps/nnet2/train_more*.sh >>>> >>>> Vijay >>>> >>>> On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L <jon...@gm...> >>>> wrote: >>>>> >>>>> I'm looking to further train an existing LibriSpeech nnet2_a_online >>>>> model on a new dataset. >>>>> >>>>> I have prepared the files for this new dataset inside a data/train >>>>> directory, as described in the Data Preparation tutorial. I want to keep the >>>>> nnet2_a_online model initialized to the parameters it learned from training >>>>> on LibriSpeech, but continue its training on this new dataset. Is there a >>>>> script that would allow me to specify the nnet2_a_online model and the >>>>> dataset's data/train directory as input in order to output a model that has >>>>> been trained more on this new dataset? >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Monitor 25 network devices or servers for free with OpManager! >>>>> OpManager is web-based network management software that monitors >>>>> network devices and physical & virtual servers, alerts via email & sms >>>>> for fault. Monitor 25 devices for free with no restriction. Download >>>>> now >>>>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >>>>> _______________________________________________ >>>>> Kaldi-users mailing list >>>>> Kal...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>>>> >>>> >>> >> > > > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Gupta V. <vis...@cr...> - 2015-07-03 18:10:44
|
Hi, I was finally able to discriminatively train the LSTM, but only after reducing the learning rate from 0.0001 to 0.000000001. The training is rather slow. It took 8 days to train one iteration. I have not tried to adjust gradient clipping threshold, but I will try it also. Thanks, Vishwa _____ From: Jerry.Jiayu.DU [mailto:jer...@qq...] To: Vishwa.Gupta [mailto:Vis...@cr...] Cc: kal...@li... [mailto:kal...@li...], Daniel Povey [mailto:dp...@gm...] Sent: Wed, 10 Jun 2015 23:57:51 -0500 Subject: Re: [Kaldi-users] discriminative LSTM training Hi Vishwa, "NaN" normally means your LSTM model is exploded during the training, Dan's suggestion to tune down the learning rate should be helpful in your case. Since I've encountered exactly the same problem as you, when I was doing the sequential training over LSTM, here is an additional suggestion: Apply smaller gradient clipping threshold, it worked for me. I suggest you have a try as well, setting the gradient clipping threshold to 5 to 20 or so. also remember to check the denominator lattice size if it is reasonable. Sometimes default beam results very "sparse" denominator lattice (like linear), and in this case the sequential training won't work. best, Jiayu(Jerry) ------------------ Original ------------------ From: "Daniel Povey";<dp...@gm...>; Date: Jun 11, 2015 To: "Vishwa.Gupta"<Vis...@cr...>; Cc: "kal...@li..."<kal...@li...>; Subject: Re: [Kaldi-users] discriminative LSTM training Usually cases like this where after a while you see NaN's, are due to some kind of instability in the training, which causes the parameters to diverge. It could be due to too-high learning rates. It could also be that if you apply LSTMs on long pieces of audio, as happens in the discriminative training code, there is some kind of gradient explosion. However, IIRC LSTMs were specifically designed to avoid the possibility of gradient explosion, so this would be surprising. You could try smaller learning rates. Dan > When I try to do discriminative LSTM training I get the following error: > > If I use train_mpe.sh, it runs for a few thousand utterances and then I get > the following error: > > ERROR > (nnet-train-mpe-sequential:LatticeForwardBackwardMpeVariants():lattice-functions.cc:833) > Total forward score over lattice = -nan, while total backward score = 0 > and then the program crashes. > > If I use train_mmi.sh then after few thousand utterances I get logs with > "nan": > > VLOG[1] (nnet-train-mmi-sequential:main():nnet-train-mmi-sequential.cc:346) > Utterance 20080401_170000_bbcone_bbc_news_spk-0025_seg-0150897:0151494: > Average MMI obj. value = nan over 595 frames. (Avg. den-posterior on ali > -nan) > > However, the program keeps on running. > Is there a workaround for that? > > Thanks, > > Vishwa > > ------------------------------------------------------------------------------ > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > ------------------------------------------------------------------------------ _______________________________________________ Kaldi-users mailing list Kal...@li... https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Jonathan L <jon...@gm...> - 2015-07-03 17:45:23
|
The data I want to train on is in MP3 format at a 128kbps bitrate and a 44.1kHz sample rate. The LibriSpeech data has a 16kHz sample rate, but doesn't seem to have a specified bitrate, When I convert the MP3 files into 16kHz sample-rate WAV files, what bitrate should I convert them to? Is there anything else I should consider when converting the speech files? On Mon, Jun 29, 2015 at 12:24 PM, Vijayaditya Peddinti < p.v...@gm...> wrote: > You need to provide the egs directory, not exp directory. You can check > stage -3 of steps/nnet2/train_multisplice_accel2.sh to see how egs > directory can be created from the alignment and data directories. > The context variables necessary for creating these examples can be found > in nnet_ms_a_online/conf/splice.conf file. > > Vijay > > On Mon, Jun 29, 2015 at 9:14 AM, Jonathan L <jon...@gm...> > wrote: > >> The train_more*.sh scripts accept an 'exp' directory instead of a >> 'data/train' directory. Is there another script that would accept the >> 'data/train' directory as input instead? >> >> On Mon, Jun 29, 2015 at 12:08 PM, Vijayaditya Peddinti < >> p.v...@gm...> wrote: >> >>> See the scripts steps/nnet2/train_more*.sh >>> >>> Vijay >>> >>> On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L <jon...@gm...> >>> wrote: >>> >>>> I'm looking to further train an existing LibriSpeech nnet2_a_online >>>> model on a new dataset. >>>> >>>> I have prepared the files for this new dataset inside a data/train >>>> directory, as described in the *Data Preparation *tutorial. I want to >>>> keep the nnet2_a_online model initialized to the parameters it learned from >>>> training on LibriSpeech, but continue its training on this new dataset. Is >>>> there a script that would allow me to specify the nnet2_a_online model and >>>> the dataset's data/train directory as input in order to output a model that >>>> has been trained more on this new dataset? >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Monitor 25 network devices or servers for free with OpManager! >>>> OpManager is web-based network management software that monitors >>>> network devices and physical & virtual servers, alerts via email & sms >>>> for fault. Monitor 25 devices for free with no restriction. Download now >>>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >>>> _______________________________________________ >>>> Kaldi-users mailing list >>>> Kal...@li... >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>>> >>>> >>> >> > |
From: Daniel P. <dp...@gm...> - 2015-07-02 18:25:53
|
> > Back to training on the Blizzard dataset, I was able to dump the iVectors > for Blizzard's 19-hour subset. Where are they needed, though? Neither > train_more2.sh nor get_egs2.sh seem to accept dumped iVectors as input. It's the --online-ivector-dir option. > Regardless, I ran the train_more2.sh script on Blizzard's data/ and egs/ > folder (generated with get_egs2.sh), and I get the following errors in > train.*.*.log: > > KALDI_ASSERT: at nnet-train-parallel:FormatNnetInput:nnet-update.cc:212, > failed: > data[0].input_frames.NumRows() >= num_splice > [...] > LOG (nnet-train-parallel:DoBackprop():nnet-update.cc:275) Error doing > backprop, nnet info is: num-components 17 > num-updatable-components 5 > left-context 7 > right-context 7 > input-dim 140 > output-dim 5816 > parameter-dim 10351000 > [...] > > The logs tell me that the left and right contexts were set to 7. However, I > specified them both to 3 when running get_egs2.sh. The > egs/info/{left,right}_context files even confirm that they are set to 3. Is > it possible that train_more2.sh is using the contexts from another > directory? The problem is that 3 < 7. The neural net requires a certain amount of temporal context (7 left and right, here) and if you dump less than that in the egs it will crash. So you need to set them to 7 when dumping egs. Dan > On Tue, Jun 30, 2015 at 2:07 PM, Daniel Povey <dp...@gm...> wrote: >> >> Check the script that generated it; probably the graph directory was >> in a different location e.g. in tri6 or something like that. >> Hopefully we would have uploaded that too. >> We only need to regenerate the graph when the tree changes. >> Dan >> >> >> On Tue, Jun 30, 2015 at 2:05 PM, Mate Andre <ele...@gm...> wrote: >> > To ensure that the nnet_a_online model is performing well on the 19-hour >> > Blizzard dataset and that it is producing correct alignments, I want to >> > run >> > the decoding script on the Blizzard data. However, the nnet_a_online >> > model >> > on kadi-asr.org doesn't seem to have a graph directory needed for >> > decoding. >> > Is there any way I can get a hold of this directory without training the >> > entire model? > > |
From: Mate A. <ele...@gm...> - 2015-07-02 18:22:57
|
The graph was indeed in tri6b. Back to training on the Blizzard dataset, I was able to dump the iVectors for Blizzard's 19-hour subset. Where are they needed, though? Neither *train_more2.sh* nor *get_egs2.sh* seem to accept dumped iVectors as input. Regardless, I ran the *train_more2.sh* script on Blizzard's data/ and egs/ folder (generated with *get_egs2.sh*), and I get the following errors in train.*.*.log: KALDI_ASSERT: at nnet-train-parallel:FormatNnetInput:nnet-update.cc:212, failed: data[0].input_frames.NumRows() >= num_splice [...] LOG (nnet-train-parallel:DoBackprop():nnet-update.cc:275) Error doing backprop, nnet info is: num-components 17 num-updatable-components 5 left-context 7 right-context 7 input-dim 140 output-dim 5816 parameter-dim 10351000 [...] The logs tell me that the left and right contexts were set to 7. However, I specified them both to 3 when running *get_egs2.sh*. The *egs/info/{left,right}_context* files even confirm that they are set to 3. Is it possible that train_more2.sh is using the contexts from another directory? On Tue, Jun 30, 2015 at 2:07 PM, Daniel Povey <dp...@gm...> wrote: > Check the script that generated it; probably the graph directory was > in a different location e.g. in tri6 or something like that. > Hopefully we would have uploaded that too. > We only need to regenerate the graph when the tree changes. > Dan > > > On Tue, Jun 30, 2015 at 2:05 PM, Mate Andre <ele...@gm...> wrote: > > To ensure that the nnet_a_online model is performing well on the 19-hour > > Blizzard dataset and that it is producing correct alignments, I want to > run > > the decoding script on the Blizzard data. However, the nnet_a_online > model > > on kadi-asr.org doesn't seem to have a graph directory needed for > decoding. > > Is there any way I can get a hold of this directory without training the > > entire model? > |
From: Daniel P. <dp...@gm...> - 2015-06-30 18:07:24
|
Check the script that generated it; probably the graph directory was in a different location e.g. in tri6 or something like that. Hopefully we would have uploaded that too. We only need to regenerate the graph when the tree changes. Dan On Tue, Jun 30, 2015 at 2:05 PM, Mate Andre <ele...@gm...> wrote: > To ensure that the nnet_a_online model is performing well on the 19-hour > Blizzard dataset and that it is producing correct alignments, I want to run > the decoding script on the Blizzard data. However, the nnet_a_online model > on kadi-asr.org doesn't seem to have a graph directory needed for decoding. > Is there any way I can get a hold of this directory without training the > entire model? |
From: Daniel P. <dp...@gm...> - 2015-06-30 18:05:42
|
It is still useful. i-vector adaptation does not need very much data to be effective. Dan On Tue, Jun 30, 2015 at 1:19 AM, Kirill Katsnelson <kir...@sm...> wrote: > In my scenario, I am processing multiple short (1-5 words typical) utterances by new unfamiliar speakers. This is a phone conversation between the system and a random caller. The caller connects, chats for a while (10 these short utterances is already a lot) and disappears forever. Do you find i-vector adaptation useful in such a scenario for nnet2-online models? Is the fact that training utterances are typically much longer will be rather detrimental? > > -kkm > ------------------------------------------------------------------------------ > Don't Limit Your Business. Reach for the Cloud. > GigeNET's Cloud Solutions provide you with the tools and support that > you need to offload your IT needs and focus on growing your business. > Configured For All Businesses. Start Your Cloud Today. > https://www.gigenetcloud.com/ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Mate A. <ele...@gm...> - 2015-06-30 18:05:28
|
To ensure that the nnet_a_online model is performing well on the 19-hour Blizzard dataset and that it is producing correct alignments, I want to run the decoding script on the Blizzard data. However, the nnet_a_online model on kadi-asr.org <http://kaldi-asr.org/> doesn't seem to have a *graph *directory needed for decoding. Is there any way I can get a hold of this directory without training the entire model? |
From: Kirill K. <kir...@sm...> - 2015-06-30 05:20:08
|
In my scenario, I am processing multiple short (1-5 words typical) utterances by new unfamiliar speakers. This is a phone conversation between the system and a random caller. The caller connects, chats for a while (10 these short utterances is already a lot) and disappears forever. Do you find i-vector adaptation useful in such a scenario for nnet2-online models? Is the fact that training utterances are typically much longer will be rather detrimental? -kkm |
From: Daniel P. <dp...@gm...> - 2015-06-29 20:28:19
|
> I am using the nnet_a_online model. Does this model require iVectors? Yes it does. You would have to extract them for your data- see the commands used in the script that trained the nnet_a_online model. You'd have to download the iVector extractor. You may also have to dump 40-dim features. That will be in the script to- also see the _common.sh script which it sources near the beginning. Some parts are in there. > Also, regarding the left and right contexts for get_egs2.sh, would I have to > use the values found in nnet_a_online/conf/splice.conf (--left-context=3, > --right-context=3) ? No, I don't think so; you have to use the left and right contexts of the nnet model itself. These are printed out by nnet-am-info. Dan > > Thank you for the advice. I am tackling a 19-hour subset of Blizzard before > moving on to the full, 300-hour dataset. > > On Mon, Jun 29, 2015 at 2:42 PM, Daniel Povey <dp...@gm...> wrote: >> >> Actually, you should probably be using train_more2.sh. It looks like >> the update_nnet.sh script is deprecated. >> train_more2.sh requires egs dumped by get_egs2.sh. [the "2" format of >> the egs is more compact.] >> In your scenario you would be dumping egs for the blizzard data. You >> would need alignments for the blizzard data. Be careful with the >> get_egs2.sh script because like other get_egs scripts, it will dump >> egs with the left-context and right context you specify, and the >> features you give it, but it can't check that it's correct. If you >> are using one of the "online" models that uses ivectors you would have >> to provide dumped ivectors, and these need to be computed with the >> same iVector extractor as the model that you are starting from. >> >> You might want to run on a small subset first; make sure that the >> training objective (e.g. in compute_train_prob.*.sh) is in the normal >> range, otherwise it may mean that you did something wrong. >> >> To get the alignments you would need to align using the same model as >> was used to align the data for training the original nnet- you can >> download that from kaldi-asr.org. >> Dan >> >> >> On Mon, Jun 29, 2015 at 7:03 AM, Mate Andre <ele...@gm...> wrote: >> > The train_more.sh script requires an egs directory, which seems to be >> > created by update_nnet.sh. However, update_nnet.sh requires an >> > alignments >> > directory. >> > >> > If I'm planning to run update_nnet.sh with data/train_960, does that >> > mean I >> > have to find alignments for train_960 before running update_nnet.sh? Is >> > there a faster way to generate the egs directory without having to >> > update >> > the neural net? >> > >> > On Thu, Jun 25, 2015 at 2:26 PM, Daniel Povey <dp...@gm...> wrote: >> >> >> >> I think the script train_more.sh might be useful here. >> >> If you only have 1 GPU it might take as long as a week, but >> >> downloading the trained models might be a better idea. >> >> Dan >> >> >> >> >> >> >> >> > l am going to train a deep neural net model with "multi-splice" using >> >> > the >> >> > LibriSpeech dataset with the local/online/run_nnet2_ms.sh script >> >> > included in >> >> > Kaldi's repository, which I think will give the best resulting WER. >> >> > The >> >> > end >> >> > goal is to use the trained model in this phase for initializing a >> >> > next >> >> > model >> >> > to train and do forced alignment on Blizzard2013 dataset, >> >> > specifically >> >> > the >> >> > 2013-EH2 subset including 1 female speaker, 19 hours of speech and >> >> > sentence-level alignments. >> >> > I don't have much of experience with Kaldi and my questions are: >> >> > >> >> > 1. How long does it take to train on all (960hrs) of Librispeech on a >> >> > GPU >> >> > (say GTX TITAN X or K6000)? Even a rough estimate could be useful. >> >> > 2. Is there anything to take into account before training on >> >> > Librispeech? >> >> > 3. And more importantly, how should I initialize/train the next model >> >> > for >> >> > the Blizzard2013 dataset? I managed to go through data preparation >> >> > for >> >> > that >> >> > and created the necessary files. >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ >> >> > Monitor 25 network devices or servers for free with OpManager! >> >> > OpManager is web-based network management software that monitors >> >> > network devices and physical & virtual servers, alerts via email & >> >> > sms >> >> > for fault. Monitor 25 devices for free with no restriction. Download >> >> > now >> >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> >> > _______________________________________________ >> >> > Kaldi-users mailing list >> >> > Kal...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> > >> > >> > > > |
From: Mate A. <ele...@gm...> - 2015-06-29 20:25:52
|
I am using the nnet_a_online model. Does this model require iVectors? Also, regarding the left and right contexts for get_egs2.sh, would I have to use the values found in nnet_a_online/conf/splice.conf (--left-context=3, --right-context=3) ? Thank you for the advice. I am tackling a 19-hour subset of Blizzard before moving on to the full, 300-hour dataset. On Mon, Jun 29, 2015 at 2:42 PM, Daniel Povey <dp...@gm...> wrote: > Actually, you should probably be using train_more2.sh. It looks like > the update_nnet.sh script is deprecated. > train_more2.sh requires egs dumped by get_egs2.sh. [the "2" format of > the egs is more compact.] > In your scenario you would be dumping egs for the blizzard data. You > would need alignments for the blizzard data. Be careful with the > get_egs2.sh script because like other get_egs scripts, it will dump > egs with the left-context and right context you specify, and the > features you give it, but it can't check that it's correct. If you > are using one of the "online" models that uses ivectors you would have > to provide dumped ivectors, and these need to be computed with the > same iVector extractor as the model that you are starting from. > > You might want to run on a small subset first; make sure that the > training objective (e.g. in compute_train_prob.*.sh) is in the normal > range, otherwise it may mean that you did something wrong. > > To get the alignments you would need to align using the same model as > was used to align the data for training the original nnet- you can > download that from kaldi-asr.org. > Dan > > > On Mon, Jun 29, 2015 at 7:03 AM, Mate Andre <ele...@gm...> wrote: > > The train_more.sh script requires an egs directory, which seems to be > > created by update_nnet.sh. However, update_nnet.sh requires an alignments > > directory. > > > > If I'm planning to run update_nnet.sh with data/train_960, does that > mean I > > have to find alignments for train_960 before running update_nnet.sh? Is > > there a faster way to generate the egs directory without having to update > > the neural net? > > > > On Thu, Jun 25, 2015 at 2:26 PM, Daniel Povey <dp...@gm...> wrote: > >> > >> I think the script train_more.sh might be useful here. > >> If you only have 1 GPU it might take as long as a week, but > >> downloading the trained models might be a better idea. > >> Dan > >> > >> > >> > >> > l am going to train a deep neural net model with "multi-splice" using > >> > the > >> > LibriSpeech dataset with the local/online/run_nnet2_ms.sh script > >> > included in > >> > Kaldi's repository, which I think will give the best resulting WER. > The > >> > end > >> > goal is to use the trained model in this phase for initializing a next > >> > model > >> > to train and do forced alignment on Blizzard2013 dataset, specifically > >> > the > >> > 2013-EH2 subset including 1 female speaker, 19 hours of speech and > >> > sentence-level alignments. > >> > I don't have much of experience with Kaldi and my questions are: > >> > > >> > 1. How long does it take to train on all (960hrs) of Librispeech on a > >> > GPU > >> > (say GTX TITAN X or K6000)? Even a rough estimate could be useful. > >> > 2. Is there anything to take into account before training on > >> > Librispeech? > >> > 3. And more importantly, how should I initialize/train the next model > >> > for > >> > the Blizzard2013 dataset? I managed to go through data preparation for > >> > that > >> > and created the necessary files. > >> > > >> > > >> > > ------------------------------------------------------------------------------ > >> > Monitor 25 network devices or servers for free with OpManager! > >> > OpManager is web-based network management software that monitors > >> > network devices and physical & virtual servers, alerts via email & sms > >> > for fault. Monitor 25 devices for free with no restriction. Download > now > >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > >> > _______________________________________________ > >> > Kaldi-users mailing list > >> > Kal...@li... > >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > > > > > > |
From: Daniel P. <dp...@gm...> - 2015-06-29 19:04:14
|
That's OK. Those warnings happen when words in the "text" file are not covered in the words.txt (they get replaced with the designated OOV word), but all those words are either super-rare words or mis-spellings or normalization failures, so it's not a problem that they are not in the vocabulary. Dan On Mon, Jun 29, 2015 at 6:45 AM, Mate Andre <ele...@gm...> wrote: > The alignments script as been running for about a day and I've found these > warnings in align.*.log: > > sym2int.pl: replacing HIGGINSES with 2 > sym2int.pl: replacing MEASTERS with 2 > sym2int.pl: replacing YO'RS with 2 > sym2int.pl: replacing HIGGINSES with 2 > sym2int.pl: replacing THEVENOT with 2 > sym2int.pl: replacing PASQUA with 2 > sym2int.pl: replacing COCHINEALS with 2 > sym2int.pl: replacing HAMPER'S with 2 > sym2int.pl: replacing HUNDRED'LL with 2 > sym2int.pl: replacing CLEMMING with 2 > sym2int.pl: replacing CLEMMING with 2 > sym2int.pl: replacing HOU'D with 2 > sym2int.pl: replacing OURSEL with 2 > sym2int.pl: replacing SWOUNDING with 2 > sym2int.pl: replacing DID' with 2 > sym2int.pl: replacing INSTINCTLY with 2 > sym2int.pl: replacing DEFYINGLY with 2 > sym2int.pl: replacing BELIEVE' with 2 > sym2int.pl: replacing BROSSEN with 2 > sym2int.pl: replacing CLEAVINGS with 2 > sym2int.pl: not warning for OOVs any more times > > Can these warnings be safely ignored, or am I possibly using the wrong lang > directory? I'm currently using data/lang_nosp. > > > On Fri, Jun 26, 2015 at 6:05 PM, Daniel Povey <dp...@gm...> wrote: >> >> Use the tree from the regular nnet_a directory- the system has the same >> tree. >> Dan >> >> >> On Fri, Jun 26, 2015 at 5:55 PM, Mate Andre <ele...@gm...> wrote: >> > The "tree" file is missing from the nnet_a_online directory in the >> > Kaldi-ASR >> > build. Is it possible to create it without retraining the entire model? >> > >> > On Fri, Jun 26, 2015 at 5:02 PM, Daniel Povey <dp...@gm...> wrote: >> >> >> >> You need to point it to the nnet_a_online directory instead. >> >> Dan >> >> >> >> >> >> On Fri, Jun 26, 2015 at 4:59 PM, Mate Andre <ele...@gm...> >> >> wrote: >> >> > Thanks for the prompt reply. >> >> > >> >> > When using steps/online/nnet2/align.sh, I get the following error: >> >> > "no >> >> > such >> >> > file exp/nnet2_online/nnet_a/conf/online_nnet2_decoding.conf". Do I >> >> > need >> >> > to >> >> > generate "online_nnet2_decoding.conf" and the "conf" directory with >> >> > another >> >> > script, since they aren't included in the Kaldi-ASR build? >> >> > >> >> > On Fri, Jun 26, 2015 at 4:44 PM, Daniel Povey <dp...@gm...> >> >> > wrote: >> >> >> >> >> >> It expects 140 because 140 = 40 + 100, the 40 is the "hires" MFCC >> >> >> features (the Librispeech scripts create these from the wav data), >> >> >> and >> >> >> the 100 is the iVector features. You would have to get these from >> >> >> the >> >> >> iVector extractor. >> >> >> However, you may find your life is easier if you use >> >> >> steps/online/nnet2/align.sh, that will start from the wav data and >> >> >> do >> >> >> the feature extraction itself. >> >> >> Dan >> >> >> >> >> >> >> >> >> On Fri, Jun 26, 2015 at 4:41 PM, Mate Andre <ele...@gm...> >> >> >> wrote: >> >> >> > My goal is to find alignments for the 960-hour LibriSpeech >> >> >> > dataset. I >> >> >> > am >> >> >> > using the nnet2_online/nnet_a LibriSpeech model from the Kaldi-ASR >> >> >> > site, >> >> >> > and >> >> >> > I am running the steps/nnet2/align.sh script in Kaldi's >> >> >> > LibriSpeech >> >> >> > folder >> >> >> > using the following command: >> >> >> > >> >> >> > steps/nnet2/align.sh --nj 10 --cmd 'run.pl' data/train_960 >> >> >> > data/lang_nosp >> >> >> > exp/nnet2_online/nnet_a exp/nnet2_online/nnet_a_ali >> >> >> > >> >> >> > where exp/nnet2_online/nnet_a contains the files in >> >> >> > nnet2_online/nnet_a >> >> >> > and >> >> >> > exp/nnet2_online/nnet_a_ali is an empty directory. >> >> >> > >> >> >> > I'm getting the following error in the log files: >> >> >> > >> >> >> > ERROR (nnet-align-compiled:NnetComputer():nnet-compute.cc:70) >> >> >> > Feature >> >> >> > dimension is 13 but network expects 140 >> >> >> > >> >> >> > Am I using the correct script to generate the alignments, or is >> >> >> > there >> >> >> > another reason I am getting this error? >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > ------------------------------------------------------------------------------ >> >> >> > Monitor 25 network devices or servers for free with OpManager! >> >> >> > OpManager is web-based network management software that monitors >> >> >> > network devices and physical & virtual servers, alerts via email & >> >> >> > sms >> >> >> > for fault. Monitor 25 devices for free with no restriction. >> >> >> > Download >> >> >> > now >> >> >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> >> >> > _______________________________________________ >> >> >> > Kaldi-users mailing list >> >> >> > Kal...@li... >> >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> > >> >> > >> >> > >> > >> > > > |
From: Daniel P. <dp...@gm...> - 2015-06-29 18:42:52
|
Actually, you should probably be using train_more2.sh. It looks like the update_nnet.sh script is deprecated. train_more2.sh requires egs dumped by get_egs2.sh. [the "2" format of the egs is more compact.] In your scenario you would be dumping egs for the blizzard data. You would need alignments for the blizzard data. Be careful with the get_egs2.sh script because like other get_egs scripts, it will dump egs with the left-context and right context you specify, and the features you give it, but it can't check that it's correct. If you are using one of the "online" models that uses ivectors you would have to provide dumped ivectors, and these need to be computed with the same iVector extractor as the model that you are starting from. You might want to run on a small subset first; make sure that the training objective (e.g. in compute_train_prob.*.sh) is in the normal range, otherwise it may mean that you did something wrong. To get the alignments you would need to align using the same model as was used to align the data for training the original nnet- you can download that from kaldi-asr.org. Dan On Mon, Jun 29, 2015 at 7:03 AM, Mate Andre <ele...@gm...> wrote: > The train_more.sh script requires an egs directory, which seems to be > created by update_nnet.sh. However, update_nnet.sh requires an alignments > directory. > > If I'm planning to run update_nnet.sh with data/train_960, does that mean I > have to find alignments for train_960 before running update_nnet.sh? Is > there a faster way to generate the egs directory without having to update > the neural net? > > On Thu, Jun 25, 2015 at 2:26 PM, Daniel Povey <dp...@gm...> wrote: >> >> I think the script train_more.sh might be useful here. >> If you only have 1 GPU it might take as long as a week, but >> downloading the trained models might be a better idea. >> Dan >> >> >> >> > l am going to train a deep neural net model with "multi-splice" using >> > the >> > LibriSpeech dataset with the local/online/run_nnet2_ms.sh script >> > included in >> > Kaldi's repository, which I think will give the best resulting WER. The >> > end >> > goal is to use the trained model in this phase for initializing a next >> > model >> > to train and do forced alignment on Blizzard2013 dataset, specifically >> > the >> > 2013-EH2 subset including 1 female speaker, 19 hours of speech and >> > sentence-level alignments. >> > I don't have much of experience with Kaldi and my questions are: >> > >> > 1. How long does it take to train on all (960hrs) of Librispeech on a >> > GPU >> > (say GTX TITAN X or K6000)? Even a rough estimate could be useful. >> > 2. Is there anything to take into account before training on >> > Librispeech? >> > 3. And more importantly, how should I initialize/train the next model >> > for >> > the Blizzard2013 dataset? I managed to go through data preparation for >> > that >> > and created the necessary files. >> > >> > >> > ------------------------------------------------------------------------------ >> > Monitor 25 network devices or servers for free with OpManager! >> > OpManager is web-based network management software that monitors >> > network devices and physical & virtual servers, alerts via email & sms >> > for fault. Monitor 25 devices for free with no restriction. Download now >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> > _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > > > |
From: Vijayaditya P. <p.v...@gm...> - 2015-06-29 16:25:03
|
You need to provide the egs directory, not exp directory. You can check stage -3 of steps/nnet2/train_multisplice_accel2.sh to see how egs directory can be created from the alignment and data directories. The context variables necessary for creating these examples can be found in nnet_ms_a_online/conf/splice.conf file. Vijay On Mon, Jun 29, 2015 at 9:14 AM, Jonathan L <jon...@gm...> wrote: > The train_more*.sh scripts accept an 'exp' directory instead of a > 'data/train' directory. Is there another script that would accept the > 'data/train' directory as input instead? > > On Mon, Jun 29, 2015 at 12:08 PM, Vijayaditya Peddinti < > p.v...@gm...> wrote: > >> See the scripts steps/nnet2/train_more*.sh >> >> Vijay >> >> On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L <jon...@gm...> >> wrote: >> >>> I'm looking to further train an existing LibriSpeech nnet2_a_online >>> model on a new dataset. >>> >>> I have prepared the files for this new dataset inside a data/train >>> directory, as described in the *Data Preparation *tutorial. I want to >>> keep the nnet2_a_online model initialized to the parameters it learned from >>> training on LibriSpeech, but continue its training on this new dataset. Is >>> there a script that would allow me to specify the nnet2_a_online model and >>> the dataset's data/train directory as input in order to output a model that >>> has been trained more on this new dataset? >>> >>> >>> ------------------------------------------------------------------------------ >>> Monitor 25 network devices or servers for free with OpManager! >>> OpManager is web-based network management software that monitors >>> network devices and physical & virtual servers, alerts via email & sms >>> for fault. Monitor 25 devices for free with no restriction. Download now >>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >>> >> > |
From: Jonathan L <jon...@gm...> - 2015-06-29 16:14:37
|
The train_more*.sh scripts accept an 'exp' directory instead of a 'data/train' directory. Is there another script that would accept the 'data/train' directory as input instead? On Mon, Jun 29, 2015 at 12:08 PM, Vijayaditya Peddinti < p.v...@gm...> wrote: > See the scripts steps/nnet2/train_more*.sh > > Vijay > > On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L <jon...@gm...> > wrote: > >> I'm looking to further train an existing LibriSpeech nnet2_a_online model >> on a new dataset. >> >> I have prepared the files for this new dataset inside a data/train >> directory, as described in the *Data Preparation *tutorial. I want to >> keep the nnet2_a_online model initialized to the parameters it learned from >> training on LibriSpeech, but continue its training on this new dataset. Is >> there a script that would allow me to specify the nnet2_a_online model and >> the dataset's data/train directory as input in order to output a model that >> has been trained more on this new dataset? >> >> >> ------------------------------------------------------------------------------ >> Monitor 25 network devices or servers for free with OpManager! >> OpManager is web-based network management software that monitors >> network devices and physical & virtual servers, alerts via email & sms >> for fault. Monitor 25 devices for free with no restriction. Download now >> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> > |
From: Vijayaditya P. <p.v...@gm...> - 2015-06-29 16:09:00
|
See the scripts steps/nnet2/train_more*.sh Vijay On Mon, Jun 29, 2015 at 9:02 AM, Jonathan L <jon...@gm...> wrote: > I'm looking to further train an existing LibriSpeech nnet2_a_online model > on a new dataset. > > I have prepared the files for this new dataset inside a data/train > directory, as described in the *Data Preparation *tutorial. I want to > keep the nnet2_a_online model initialized to the parameters it learned from > training on LibriSpeech, but continue its training on this new dataset. Is > there a script that would allow me to specify the nnet2_a_online model and > the dataset's data/train directory as input in order to output a model that > has been trained more on this new dataset? > > > ------------------------------------------------------------------------------ > Monitor 25 network devices or servers for free with OpManager! > OpManager is web-based network management software that monitors > network devices and physical & virtual servers, alerts via email & sms > for fault. Monitor 25 devices for free with no restriction. Download now > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |