You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
|
Feb
|
Mar
(8) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(7) |
May
(31) |
Jun
(40) |
Jul
(65) |
Aug
(37) |
Sep
(12) |
Oct
(57) |
Nov
(15) |
Dec
(35) |
2014 |
Jan
(3) |
Feb
(30) |
Mar
(57) |
Apr
(26) |
May
(49) |
Jun
(26) |
Jul
(63) |
Aug
(33) |
Sep
(20) |
Oct
(153) |
Nov
(62) |
Dec
(20) |
2015 |
Jan
(6) |
Feb
(21) |
Mar
(42) |
Apr
(33) |
May
(76) |
Jun
(102) |
Jul
(39) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jonathan L <jon...@gm...> - 2015-06-29 16:02:37
|
I'm looking to further train an existing LibriSpeech nnet2_a_online model on a new dataset. I have prepared the files for this new dataset inside a data/train directory, as described in the *Data Preparation *tutorial. I want to keep the nnet2_a_online model initialized to the parameters it learned from training on LibriSpeech, but continue its training on this new dataset. Is there a script that would allow me to specify the nnet2_a_online model and the dataset's data/train directory as input in order to output a model that has been trained more on this new dataset? |
From: Mate A. <ele...@gm...> - 2015-06-29 11:03:35
|
The train_more.sh script requires an egs directory, which seems to be created by update_nnet.sh. However, update_nnet.sh requires an alignments directory. If I'm planning to run update_nnet.sh with data/train_960, does that mean I have to find alignments for train_960 before running update_nnet.sh? Is there a faster way to generate the egs directory without having to update the neural net? On Thu, Jun 25, 2015 at 2:26 PM, Daniel Povey <dp...@gm...> wrote: > I think the script train_more.sh might be useful here. > If you only have 1 GPU it might take as long as a week, but > downloading the trained models might be a better idea. > Dan > > > > > l am going to train a deep neural net model with "multi-splice" using the > > LibriSpeech dataset with the local/online/run_nnet2_ms.sh script > included in > > Kaldi's repository, which I think will give the best resulting WER. The > end > > goal is to use the trained model in this phase for initializing a next > model > > to train and do forced alignment on Blizzard2013 dataset, specifically > the > > 2013-EH2 subset including 1 female speaker, 19 hours of speech and > > sentence-level alignments. > > I don't have much of experience with Kaldi and my questions are: > > > > 1. How long does it take to train on all (960hrs) of Librispeech on a GPU > > (say GTX TITAN X or K6000)? Even a rough estimate could be useful. > > 2. Is there anything to take into account before training on Librispeech? > > 3. And more importantly, how should I initialize/train the next model for > > the Blizzard2013 dataset? I managed to go through data preparation for > that > > and created the necessary files. > > > > > ------------------------------------------------------------------------------ > > Monitor 25 network devices or servers for free with OpManager! > > OpManager is web-based network management software that monitors > > network devices and physical & virtual servers, alerts via email & sms > > for fault. Monitor 25 devices for free with no restriction. Download now > > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > |
From: Mate A. <ele...@gm...> - 2015-06-29 10:45:57
|
The alignments script as been running for about a day and I've found these warnings in align.*.log: sym2int.pl: replacing HIGGINSES with 2 sym2int.pl: replacing MEASTERS with 2 sym2int.pl: replacing YO'RS with 2 sym2int.pl: replacing HIGGINSES with 2 sym2int.pl: replacing THEVENOT with 2 sym2int.pl: replacing PASQUA with 2 sym2int.pl: replacing COCHINEALS with 2 sym2int.pl: replacing HAMPER'S with 2 sym2int.pl: replacing HUNDRED'LL with 2 sym2int.pl: replacing CLEMMING with 2 sym2int.pl: replacing CLEMMING with 2 sym2int.pl: replacing HOU'D with 2 sym2int.pl: replacing OURSEL with 2 sym2int.pl: replacing SWOUNDING with 2 sym2int.pl: replacing DID' with 2 sym2int.pl: replacing INSTINCTLY with 2 sym2int.pl: replacing DEFYINGLY with 2 sym2int.pl: replacing BELIEVE' with 2 sym2int.pl: replacing BROSSEN with 2 sym2int.pl: replacing CLEAVINGS with 2 sym2int.pl: not warning for OOVs any more times Can these warnings be safely ignored, or am I possibly using the wrong lang directory? I'm currently using data/lang_nosp. On Fri, Jun 26, 2015 at 6:05 PM, Daniel Povey <dp...@gm...> wrote: > Use the tree from the regular nnet_a directory- the system has the same > tree. > Dan > > > On Fri, Jun 26, 2015 at 5:55 PM, Mate Andre <ele...@gm...> wrote: > > The "tree" file is missing from the nnet_a_online directory in the > Kaldi-ASR > > build. Is it possible to create it without retraining the entire model? > > > > On Fri, Jun 26, 2015 at 5:02 PM, Daniel Povey <dp...@gm...> wrote: > >> > >> You need to point it to the nnet_a_online directory instead. > >> Dan > >> > >> > >> On Fri, Jun 26, 2015 at 4:59 PM, Mate Andre <ele...@gm...> > wrote: > >> > Thanks for the prompt reply. > >> > > >> > When using steps/online/nnet2/align.sh, I get the following error: "no > >> > such > >> > file exp/nnet2_online/nnet_a/conf/online_nnet2_decoding.conf". Do I > need > >> > to > >> > generate "online_nnet2_decoding.conf" and the "conf" directory with > >> > another > >> > script, since they aren't included in the Kaldi-ASR build? > >> > > >> > On Fri, Jun 26, 2015 at 4:44 PM, Daniel Povey <dp...@gm...> > wrote: > >> >> > >> >> It expects 140 because 140 = 40 + 100, the 40 is the "hires" MFCC > >> >> features (the Librispeech scripts create these from the wav data), > and > >> >> the 100 is the iVector features. You would have to get these from > the > >> >> iVector extractor. > >> >> However, you may find your life is easier if you use > >> >> steps/online/nnet2/align.sh, that will start from the wav data and do > >> >> the feature extraction itself. > >> >> Dan > >> >> > >> >> > >> >> On Fri, Jun 26, 2015 at 4:41 PM, Mate Andre <ele...@gm...> > >> >> wrote: > >> >> > My goal is to find alignments for the 960-hour LibriSpeech > dataset. I > >> >> > am > >> >> > using the nnet2_online/nnet_a LibriSpeech model from the Kaldi-ASR > >> >> > site, > >> >> > and > >> >> > I am running the steps/nnet2/align.sh script in Kaldi's LibriSpeech > >> >> > folder > >> >> > using the following command: > >> >> > > >> >> > steps/nnet2/align.sh --nj 10 --cmd 'run.pl' data/train_960 > >> >> > data/lang_nosp > >> >> > exp/nnet2_online/nnet_a exp/nnet2_online/nnet_a_ali > >> >> > > >> >> > where exp/nnet2_online/nnet_a contains the files in > >> >> > nnet2_online/nnet_a > >> >> > and > >> >> > exp/nnet2_online/nnet_a_ali is an empty directory. > >> >> > > >> >> > I'm getting the following error in the log files: > >> >> > > >> >> > ERROR (nnet-align-compiled:NnetComputer():nnet-compute.cc:70) > Feature > >> >> > dimension is 13 but network expects 140 > >> >> > > >> >> > Am I using the correct script to generate the alignments, or is > there > >> >> > another reason I am getting this error? > >> >> > > >> >> > > >> >> > > >> >> > > ------------------------------------------------------------------------------ > >> >> > Monitor 25 network devices or servers for free with OpManager! > >> >> > OpManager is web-based network management software that monitors > >> >> > network devices and physical & virtual servers, alerts via email & > >> >> > sms > >> >> > for fault. Monitor 25 devices for free with no restriction. > Download > >> >> > now > >> >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > >> >> > _______________________________________________ > >> >> > Kaldi-users mailing list > >> >> > Kal...@li... > >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> >> > > >> > > >> > > > > > > |
From: Kirill K. <kir...@sm...> - 2015-06-27 00:57:55
|
> From: Mate Andre [mailto:ele...@gm...] > Sent: 2015-06-25 0809 > To: kal...@li... > > 1. How long does it take to train on all (960hrs) of Librispeech on a > GPU (say GTX TITAN X or K6000)? Even a rough estimate could be useful. Changing the estimate. I am training a model on the full 960 hour set now, and just passed 980 iterations of total 7040 in 24 hours on a single GTX980 GPU. Since the Titan X may be 10 to 30% faster, depending on task, you are looking at about 6 days of computation. If you do not need to squeeze every last percent of performance out of it from the first run, train on 460 hours, it's twice as fast, and leave it churning the rest of data while working on the rest of your problem. Training takes very little resources outside of the GPU and 1 CPU core. I initially played with different transformations and augmentation of raw data on a 100 hour set, to understand where it is all going. This can be trained in a day. YMMV. -kkm |
From: Daniel P. <dp...@gm...> - 2015-06-26 21:02:27
|
You need to point it to the nnet_a_online directory instead. Dan On Fri, Jun 26, 2015 at 4:59 PM, Mate Andre <ele...@gm...> wrote: > Thanks for the prompt reply. > > When using steps/online/nnet2/align.sh, I get the following error: "no such > file exp/nnet2_online/nnet_a/conf/online_nnet2_decoding.conf". Do I need to > generate "online_nnet2_decoding.conf" and the "conf" directory with another > script, since they aren't included in the Kaldi-ASR build? > > On Fri, Jun 26, 2015 at 4:44 PM, Daniel Povey <dp...@gm...> wrote: >> >> It expects 140 because 140 = 40 + 100, the 40 is the "hires" MFCC >> features (the Librispeech scripts create these from the wav data), and >> the 100 is the iVector features. You would have to get these from the >> iVector extractor. >> However, you may find your life is easier if you use >> steps/online/nnet2/align.sh, that will start from the wav data and do >> the feature extraction itself. >> Dan >> >> >> On Fri, Jun 26, 2015 at 4:41 PM, Mate Andre <ele...@gm...> wrote: >> > My goal is to find alignments for the 960-hour LibriSpeech dataset. I am >> > using the nnet2_online/nnet_a LibriSpeech model from the Kaldi-ASR site, >> > and >> > I am running the steps/nnet2/align.sh script in Kaldi's LibriSpeech >> > folder >> > using the following command: >> > >> > steps/nnet2/align.sh --nj 10 --cmd 'run.pl' data/train_960 >> > data/lang_nosp >> > exp/nnet2_online/nnet_a exp/nnet2_online/nnet_a_ali >> > >> > where exp/nnet2_online/nnet_a contains the files in nnet2_online/nnet_a >> > and >> > exp/nnet2_online/nnet_a_ali is an empty directory. >> > >> > I'm getting the following error in the log files: >> > >> > ERROR (nnet-align-compiled:NnetComputer():nnet-compute.cc:70) Feature >> > dimension is 13 but network expects 140 >> > >> > Am I using the correct script to generate the alignments, or is there >> > another reason I am getting this error? >> > >> > >> > ------------------------------------------------------------------------------ >> > Monitor 25 network devices or servers for free with OpManager! >> > OpManager is web-based network management software that monitors >> > network devices and physical & virtual servers, alerts via email & sms >> > for fault. Monitor 25 devices for free with no restriction. Download now >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> > _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > > > |
From: Mate A. <ele...@gm...> - 2015-06-26 21:00:00
|
Thanks for the prompt reply. When using steps/online/nnet2/align.sh, I get the following error: "no such file exp/nnet2_online/nnet_a/conf/online_nnet2_decoding.conf". Do I need to generate "online_nnet2_decoding.conf" and the "conf" directory with another script, since they aren't included in the Kaldi-ASR build? On Fri, Jun 26, 2015 at 4:44 PM, Daniel Povey <dp...@gm...> wrote: > It expects 140 because 140 = 40 + 100, the 40 is the "hires" MFCC > features (the Librispeech scripts create these from the wav data), and > the 100 is the iVector features. You would have to get these from the > iVector extractor. > However, you may find your life is easier if you use > steps/online/nnet2/align.sh, that will start from the wav data and do > the feature extraction itself. > Dan > > > On Fri, Jun 26, 2015 at 4:41 PM, Mate Andre <ele...@gm...> wrote: > > My goal is to find alignments for the 960-hour LibriSpeech dataset. I am > > using the nnet2_online/nnet_a LibriSpeech model from the Kaldi-ASR site, > and > > I am running the steps/nnet2/align.sh script in Kaldi's LibriSpeech > folder > > using the following command: > > > > steps/nnet2/align.sh --nj 10 --cmd 'run.pl' data/train_960 > data/lang_nosp > > exp/nnet2_online/nnet_a exp/nnet2_online/nnet_a_ali > > > > where exp/nnet2_online/nnet_a contains the files in nnet2_online/nnet_a > and > > exp/nnet2_online/nnet_a_ali is an empty directory. > > > > I'm getting the following error in the log files: > > > > ERROR (nnet-align-compiled:NnetComputer():nnet-compute.cc:70) Feature > > dimension is 13 but network expects 140 > > > > Am I using the correct script to generate the alignments, or is there > > another reason I am getting this error? > > > > > ------------------------------------------------------------------------------ > > Monitor 25 network devices or servers for free with OpManager! > > OpManager is web-based network management software that monitors > > network devices and physical & virtual servers, alerts via email & sms > > for fault. Monitor 25 devices for free with no restriction. Download now > > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > |
From: Daniel P. <dp...@gm...> - 2015-06-26 20:44:43
|
It expects 140 because 140 = 40 + 100, the 40 is the "hires" MFCC features (the Librispeech scripts create these from the wav data), and the 100 is the iVector features. You would have to get these from the iVector extractor. However, you may find your life is easier if you use steps/online/nnet2/align.sh, that will start from the wav data and do the feature extraction itself. Dan On Fri, Jun 26, 2015 at 4:41 PM, Mate Andre <ele...@gm...> wrote: > My goal is to find alignments for the 960-hour LibriSpeech dataset. I am > using the nnet2_online/nnet_a LibriSpeech model from the Kaldi-ASR site, and > I am running the steps/nnet2/align.sh script in Kaldi's LibriSpeech folder > using the following command: > > steps/nnet2/align.sh --nj 10 --cmd 'run.pl' data/train_960 data/lang_nosp > exp/nnet2_online/nnet_a exp/nnet2_online/nnet_a_ali > > where exp/nnet2_online/nnet_a contains the files in nnet2_online/nnet_a and > exp/nnet2_online/nnet_a_ali is an empty directory. > > I'm getting the following error in the log files: > > ERROR (nnet-align-compiled:NnetComputer():nnet-compute.cc:70) Feature > dimension is 13 but network expects 140 > > Am I using the correct script to generate the alignments, or is there > another reason I am getting this error? > > ------------------------------------------------------------------------------ > Monitor 25 network devices or servers for free with OpManager! > OpManager is web-based network management software that monitors > network devices and physical & virtual servers, alerts via email & sms > for fault. Monitor 25 devices for free with no restriction. Download now > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Mate A. <ele...@gm...> - 2015-06-26 20:42:01
|
My goal is to find alignments for the 960-hour LibriSpeech dataset. I am using the nnet2_online/nnet_a <http://kaldi-asr.org/downloads/build/6/trunk/egs/librispeech/s5/exp/nnet2_online/nnet_a/> LibriSpeech model from the Kaldi-ASR <http://kaldi-asr.org/> site, and I am running the steps/nnet2/align.sh script in Kaldi's LibriSpeech folder using the following command: steps/nnet2/align.sh --nj 10 --cmd 'run.pl' data/train_960 data/lang_nosp exp/nnet2_online/nnet_a exp/nnet2_online/nnet_a_ali where *exp/nnet2_online/nnet_a *contains the files in nnet2_online/nnet_a <http://kaldi-asr.org/downloads/build/6/trunk/egs/librispeech/s5/exp/nnet2_online/nnet_a/> and *exp/nnet2_online/nnet_a_ali *is an empty directory. I'm getting the following error in the log files: ERROR (nnet-align-compiled:NnetComputer():nnet-compute.cc:70) Feature dimension is 13 but network expects 140 Am I using the correct script to generate the alignments, or is there another reason I am getting this error? |
From: Kirill K. <kir...@sm...> - 2015-06-25 23:30:44
|
> -----Original Message----- > From: Joan Puigcerver [mailto:joa...@gm...] > Sent: 2015-06-25 0308 > To: kal...@li... > Subject: [Kaldi-users] Generate sript-file (scp) from archive file (ark) > > I was thinking of using the generate-scp-from-ark tool to generate a location-specific scp file from the ark. > > How do you avoid this problem? Do you think this kind of tool would be useful? It would indeed be useful, but it is not possible in general. Read on kaldi table format carefully: you can find the beginning of the first archive member, but there is nothing that marks the end. This is why there are so many copy-X tools. -kkm |
From: Kirill K. <kir...@sm...> - 2015-06-25 23:27:31
|
> -----Original Message----- > From: Mate Andre [mailto:ele...@gm...] > Sent: 2015-06-25 0809 > Subject: [Kaldi-users] Adapting LibriSpeech model to Blizzard2013 > corpus > > 1. How long does it take to train on all (960hrs) of Librispeech on a > GPU (say GTX TITAN X or K6000)? Even a rough estimate could be useful. I estimate 8-9 days on the Titan X. Note it is about 25% more performant in SP that the K6000. See this message <https://sourceforge.net/p/kaldi/mailman/message/34202312/> (and the whole thread) for little mods you need to make to train on a single GPU. -kkm |
From: Daniel P. <dp...@gm...> - 2015-06-25 20:52:48
|
> I know that the traditional recipe is to generate the script-file at > the same time as the archive file with a write specifier like: > "ark,scp:my.ark,my.scp" > > I was wondering if there is any tool to generate the my.scp file a > posteriori, something like: "generate-scp-from-ark my.ark > my.scp" No, there is not. > My problem is: When I generate the ark file and the scp file together, > the content of my.scp is something like this: > > key1 my.ark:pos1 > key2 my.ark:pos2 > ... etc ... > > However, if I try to process the script file from a different > directory, it won't be able to find the "my.ark" file, so need to > generate the ark and scp files from the same directory I will use > them, or use absolute paths. > > I was thinking of using the generate-scp-from-ark tool to generate a > location-specific scp file from the ark. > > How do you avoid this problem? Do you think this kind of tool would be useful? That is an inherent problem of filenames. Either you use relative filenames, in which case they won't be valid if you access them from a different directory, or you use absolute pathnames, in which case they won't be valid if you move the files. It's your choice; Kaldi lets you use whichever one you want. However, it should be simple to write an awk script to modify the .scp files to change the directory. Dan > ------------------------------------------------------------------------------ > Monitor 25 network devices or servers for free with OpManager! > OpManager is web-based network management software that monitors > network devices and physical & virtual servers, alerts via email & sms > for fault. Monitor 25 devices for free with no restriction. Download now > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Daniel P. <dp...@gm...> - 2015-06-25 18:26:19
|
I think the script train_more.sh might be useful here. If you only have 1 GPU it might take as long as a week, but downloading the trained models might be a better idea. Dan > l am going to train a deep neural net model with "multi-splice" using the > LibriSpeech dataset with the local/online/run_nnet2_ms.sh script included in > Kaldi's repository, which I think will give the best resulting WER. The end > goal is to use the trained model in this phase for initializing a next model > to train and do forced alignment on Blizzard2013 dataset, specifically the > 2013-EH2 subset including 1 female speaker, 19 hours of speech and > sentence-level alignments. > I don't have much of experience with Kaldi and my questions are: > > 1. How long does it take to train on all (960hrs) of Librispeech on a GPU > (say GTX TITAN X or K6000)? Even a rough estimate could be useful. > 2. Is there anything to take into account before training on Librispeech? > 3. And more importantly, how should I initialize/train the next model for > the Blizzard2013 dataset? I managed to go through data preparation for that > and created the necessary files. > > ------------------------------------------------------------------------------ > Monitor 25 network devices or servers for free with OpManager! > OpManager is web-based network management software that monitors > network devices and physical & virtual servers, alerts via email & sms > for fault. Monitor 25 devices for free with no restriction. Download now > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Jan T. <jt...@gm...> - 2015-06-25 15:13:37
|
I cannot comment on your question, but you could possibly download the libriSpeech model from kaldi-asr.org (so that you could directly proceed with adaptation). y. On Thu, Jun 25, 2015 at 11:09 AM, Mate Andre <ele...@gm...> wrote: > l am going to train a deep neural net model with "multi-splice" using the > LibriSpeech dataset with the local/online/run_nnet2_ms.sh > <https://goo.gl/72A2Zx> script included in Kaldi's repository, which I > think will give the best resulting WER. The end goal is to use the trained > model in this phase for initializing a next model to train and do forced > alignment on Blizzard2013 > <http://www.synsig.org/images/b/b1/Blizzard2013.pdf> dataset, > specifically the 2013-EH2 subset including 1 female speaker, 19 hours of > speech and sentence-level alignments. > I don't have much of experience with Kaldi and my questions are: > > 1. How long does it take to train on all (960hrs) of Librispeech on a GPU > (say GTX TITAN X or K6000)? Even a rough estimate could be useful. > 2. Is there anything to take into account before training on Librispeech? > 3. And more importantly, how should I initialize/train the next model for > the Blizzard2013 dataset? I managed to go through data preparation for that > and created the necessary files. > > > ------------------------------------------------------------------------------ > Monitor 25 network devices or servers for free with OpManager! > OpManager is web-based network management software that monitors > network devices and physical & virtual servers, alerts via email & sms > for fault. Monitor 25 devices for free with no restriction. Download now > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
From: Mate A. <ele...@gm...> - 2015-06-25 15:09:13
|
l am going to train a deep neural net model with "multi-splice" using the LibriSpeech dataset with the local/online/run_nnet2_ms.sh <https://goo.gl/72A2Zx> script included in Kaldi's repository, which I think will give the best resulting WER. The end goal is to use the trained model in this phase for initializing a next model to train and do forced alignment on Blizzard2013 <http://www.synsig.org/images/b/b1/Blizzard2013.pdf> dataset, specifically the 2013-EH2 subset including 1 female speaker, 19 hours of speech and sentence-level alignments. I don't have much of experience with Kaldi and my questions are: 1. How long does it take to train on all (960hrs) of Librispeech on a GPU (say GTX TITAN X or K6000)? Even a rough estimate could be useful. 2. Is there anything to take into account before training on Librispeech? 3. And more importantly, how should I initialize/train the next model for the Blizzard2013 dataset? I managed to go through data preparation for that and created the necessary files. |
From: Joan P. <joa...@gm...> - 2015-06-25 10:08:16
|
Hi, I know that the traditional recipe is to generate the script-file at the same time as the archive file with a write specifier like: "ark,scp:my.ark,my.scp" I was wondering if there is any tool to generate the my.scp file a posteriori, something like: "generate-scp-from-ark my.ark > my.scp" My problem is: When I generate the ark file and the scp file together, the content of my.scp is something like this: key1 my.ark:pos1 key2 my.ark:pos2 ... etc ... However, if I try to process the script file from a different directory, it won't be able to find the "my.ark" file, so need to generate the ark and scp files from the same directory I will use them, or use absolute paths. I was thinking of using the generate-scp-from-ark tool to generate a location-specific scp file from the ark. How do you avoid this problem? Do you think this kind of tool would be useful? Thanks. |
From: Daniel P. <dp...@gm...> - 2015-06-24 02:38:35
|
If you need most but not all of what's at the input, the easiest way is to make sure that at least one of the inputs to a program is filtered, and this is easiest to do at the .scp level (e.g. filter_scp.pl). Almost always one of the program's arguments is derived originally from an .scp file that may be filtered. On the other hand if you want to significantly reduce the size of the table, it's generally best to write it as ark,scp (so you write both the archive and the index) and then use the .scp as input instead of the archive, because the .scp is easier to filter. Of course if you write an archive and only later realize that you need a subset, it will be necessary to copy the archive using the appropriate copy program in order to get an archive with an index. There is a program called subset-feats, but it is intended for specialized usage, and almost always what you want is filtering the .scp. Programs will just ignore inputs that they don't need, so there is less need for filtering tables than you might expect. Dan On Tue, Jun 23, 2015 at 10:23 PM, Kirill Katsnelson <kir...@sm...> wrote: > Is there a standard command-line way of subsetting tables, given a list of known IDs to keep on output? I seem to be missing something very basic. > > -kkm > > ------------------------------------------------------------------------------ > Monitor 25 network devices or servers for free with OpManager! > OpManager is web-based network management software that monitors > network devices and physical & virtual servers, alerts via email & sms > for fault. Monitor 25 devices for free with no restriction. Download now > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Kirill K. <kir...@sm...> - 2015-06-24 02:24:01
|
Is there a standard command-line way of subsetting tables, given a list of known IDs to keep on output? I seem to be missing something very basic. -kkm |
From: Daniel P. <dp...@gm...> - 2015-06-23 17:22:05
|
There is no theory, it's just experience. But sometimes if there are too many insertions it can be because of OOV words in the vocabulary, or problems with normalization or training-data alignment, or other problems with specific words. So look carefully at the output. Dan On Tue, Jun 23, 2015 at 12:09 PM, Kirill Katsnelson <kir...@sm...> wrote: > Aha, thanks, I see a pattern! I've got roughly same number of insertions and deletions at the original weight 0.5. I'll split off a dev set and try to tune the penalty. > > Is there any theory behind this optimal ins/del ratio, or is it just a trick of the art? > > -kkm > >> -----Original Message----- >> From: Daniel Povey [mailto:dp...@gm...] >> Sent: 2015-06-22 2357 >> To: Kirill Katsnelson >> Cc: Nagendra Goel; kal...@li... >> Subject: Re: [Kaldi-users] LM weight >> >> It could still be about insertion errors. Typically you want insertion >> rates about 1/3 to 1/2 as big as deletion rates. If your setup is >> getting too many insertions, it could be using the LM scale to >> compensate. Playing with an insertion penalty may help (see the more >> recent scoring scripts). >> Dan >> >> >> On Tue, Jun 23, 2015 at 1:04 AM, Kirill Katsnelson >> <kir...@sm...> wrote: >> > Yes, I am using the pretty standard nnet2_online model with the >> librispeech data, with a 8 kHz conversion and a squished frequency >> range of the high-res features, as I am finding there is a lot of >> rather useless variance in the very low range, given the data are >> coming mostly from cell phones. But nothing fancy there overall. >> > >> > -kkm >> > >> >> -----Original Message----- >> >> From: Daniel Povey [mailto:dp...@gm...] >> >> Sent: 2015-06-22 2131 >> >> To: Kirill Katsnelson >> >> Cc: Nagendra Goel; kal...@li... >> >> Subject: Re: [Kaldi-users] LM weight >> >> >> >> By a lot of context I mean left-context and right-context, in the >> >> splicing. But I guess you are using one of the standard types of >> >> model. >> >> Dan >> >> >> >> >> >> On Tue, Jun 23, 2015 at 12:24 AM, Kirill Katsnelson >> >> <kir...@sm...> wrote: >> >> > The majority of the WER comes from subs, so this part looks pretty >> >> normal. >> >> > >> >> > A lot of acoustic context--probably, depending on the definition >> of >> >> "a lot." :-) Not sure I understand this part. How can I tell? It >> >> makes sense, looking at the base dev set figures that I got training >> >> the model from the first 500 hr of the librispeech corpus (best >> range >> >> of 16-17). Which are still higher than the reference in the RESULTS >> >> for the full 1Khr corpus, which is rather in the 12-15 range. >> >> > >> >> > -kkm >> >> > >> >> >> -----Original Message----- >> >> >> From: Daniel Povey [mailto:dp...@gm...] >> >> >> Sent: 2015-06-22 2059 >> >> >> To: Kirill Katsnelson >> >> >> Cc: Nagendra Goel; kal...@li... >> >> >> Subject: Re: [Kaldi-users] LM weight >> >> >> >> >> >> Usually if there is a lot of acoustic context in your model you >> >> >> will require a larger LM weight. >> >> >> Also, if for some reason there tend to be a lot of insertions in >> >> >> decoding (e.g. something weird went wrong in training, or there >> is >> >> >> some kind of normalization problem), a large LM weight can help >> >> >> reduce insertions and so improve the WER. >> >> >> >> >> >> Dan >> >> >> >> >> >> >> >> >> On Mon, Jun 22, 2015 at 11:36 PM, Kirill Katsnelson >> >> >> <kir...@sm...> wrote: >> >> >> > I am getting the same ratio on both small and more targeted, >> and >> >> >> > a >> >> >> quite large general LM. I do not understand what to make out if >> it! >> >> >> > >> >> >> > -kkm >> >> >> > >> >> >> >> -----Original Message----- >> >> >> >> From: Nagendra Goel [mailto:nag...@go...] >> >> >> >> Sent: 2015-06-22 2032 >> >> >> >> To: Kirill Katsnelson; kal...@li... >> >> >> >> Subject: RE: [Kaldi-users] LM weight >> >> >> >> >> >> >> >> Or maybe your domain is limited and LM very nicely matched to >> >> >> >> the task at hand? >> >> >> >> >> >> >> >> -----Original Message----- >> >> >> >> From: Kirill Katsnelson >> >> [mailto:kir...@sm...] >> >> >> >> Sent: Monday, June 22, 2015 11:29 PM >> >> >> >> To: kal...@li... >> >> >> >> Subject: [Kaldi-users] LM weight >> >> >> >> >> >> >> >> I my test sets I am getting the best WER at LM/acoustic weight >> >> >> >> in >> >> >> the >> >> >> >> range of 18-19, with multiple LMs of different size and >> origin. >> >> >> >> I >> >> >> was >> >> >> >> usually thinking the usual ballpark figure about 10, give or >> >> take. >> >> >> >> From your experience, does this larger LM weight mean >> anything, >> >> >> >> and what if it does? I am guessing an inadequate acoustic >> >> >> >> model, requiring more LM "pull"--am I making sense? >> >> >> >> >> >> >> >> -kkm >> >> >> >> >> >> >> >> -------------------------------------------------------------- >> - >> >> >> >> -- >> >> - >> >> >> >> -- >> >> >> - >> >> >> >> -- >> >> >> >> ----- >> >> >> >> -- >> >> >> >> Monitor 25 network devices or servers for free with OpManager! >> >> >> >> OpManager is web-based network management software that >> >> >> >> monitors network devices and physical & virtual servers, >> alerts >> >> >> >> via email >> >> & >> >> >> >> sms for fault. >> >> >> >> Monitor 25 devices for free with no restriction. Download now >> >> >> >> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> >> >> >> _______________________________________________ >> >> >> >> Kaldi-users mailing list >> >> >> >> Kal...@li... >> >> >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> > >> >> >> > --------------------------------------------------------------- >> - >> >> >> > -- >> >> - >> >> >> > -- >> >> >> - >> >> >> > -------- Monitor 25 network devices or servers for free with >> >> >> > OpManager! >> >> >> > OpManager is web-based network management software that >> monitors >> >> >> > network devices and physical & virtual servers, alerts via >> email >> >> >> > & >> >> >> sms >> >> >> > for fault. Monitor 25 devices for free with no restriction. >> >> >> > Download now >> >> >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> >> >> > _______________________________________________ >> >> >> > Kaldi-users mailing list >> >> >> > Kal...@li... >> >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Daniel P. <dp...@gm...> - 2015-06-23 17:09:22
|
lattice-to-phone-lattice may be what you want. On Tue, Jun 23, 2015 at 12:12 PM, Kirill Katsnelson <kir...@sm...> wrote: > Can I build a phone lattice FST from the state lattice (normal decoder output) with out-of-the-box tools? I have a phone-position-dependent model. > > -kkm > > ------------------------------------------------------------------------------ > Monitor 25 network devices or servers for free with OpManager! > OpManager is web-based network management software that monitors > network devices and physical & virtual servers, alerts via email & sms > for fault. Monitor 25 devices for free with no restriction. Download now > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Kirill K. <kir...@sm...> - 2015-06-23 16:13:05
|
Can I build a phone lattice FST from the state lattice (normal decoder output) with out-of-the-box tools? I have a phone-position-dependent model. -kkm |
From: Kirill K. <kir...@sm...> - 2015-06-23 16:09:50
|
Aha, thanks, I see a pattern! I've got roughly same number of insertions and deletions at the original weight 0.5. I'll split off a dev set and try to tune the penalty. Is there any theory behind this optimal ins/del ratio, or is it just a trick of the art? -kkm > -----Original Message----- > From: Daniel Povey [mailto:dp...@gm...] > Sent: 2015-06-22 2357 > To: Kirill Katsnelson > Cc: Nagendra Goel; kal...@li... > Subject: Re: [Kaldi-users] LM weight > > It could still be about insertion errors. Typically you want insertion > rates about 1/3 to 1/2 as big as deletion rates. If your setup is > getting too many insertions, it could be using the LM scale to > compensate. Playing with an insertion penalty may help (see the more > recent scoring scripts). > Dan > > > On Tue, Jun 23, 2015 at 1:04 AM, Kirill Katsnelson > <kir...@sm...> wrote: > > Yes, I am using the pretty standard nnet2_online model with the > librispeech data, with a 8 kHz conversion and a squished frequency > range of the high-res features, as I am finding there is a lot of > rather useless variance in the very low range, given the data are > coming mostly from cell phones. But nothing fancy there overall. > > > > -kkm > > > >> -----Original Message----- > >> From: Daniel Povey [mailto:dp...@gm...] > >> Sent: 2015-06-22 2131 > >> To: Kirill Katsnelson > >> Cc: Nagendra Goel; kal...@li... > >> Subject: Re: [Kaldi-users] LM weight > >> > >> By a lot of context I mean left-context and right-context, in the > >> splicing. But I guess you are using one of the standard types of > >> model. > >> Dan > >> > >> > >> On Tue, Jun 23, 2015 at 12:24 AM, Kirill Katsnelson > >> <kir...@sm...> wrote: > >> > The majority of the WER comes from subs, so this part looks pretty > >> normal. > >> > > >> > A lot of acoustic context--probably, depending on the definition > of > >> "a lot." :-) Not sure I understand this part. How can I tell? It > >> makes sense, looking at the base dev set figures that I got training > >> the model from the first 500 hr of the librispeech corpus (best > range > >> of 16-17). Which are still higher than the reference in the RESULTS > >> for the full 1Khr corpus, which is rather in the 12-15 range. > >> > > >> > -kkm > >> > > >> >> -----Original Message----- > >> >> From: Daniel Povey [mailto:dp...@gm...] > >> >> Sent: 2015-06-22 2059 > >> >> To: Kirill Katsnelson > >> >> Cc: Nagendra Goel; kal...@li... > >> >> Subject: Re: [Kaldi-users] LM weight > >> >> > >> >> Usually if there is a lot of acoustic context in your model you > >> >> will require a larger LM weight. > >> >> Also, if for some reason there tend to be a lot of insertions in > >> >> decoding (e.g. something weird went wrong in training, or there > is > >> >> some kind of normalization problem), a large LM weight can help > >> >> reduce insertions and so improve the WER. > >> >> > >> >> Dan > >> >> > >> >> > >> >> On Mon, Jun 22, 2015 at 11:36 PM, Kirill Katsnelson > >> >> <kir...@sm...> wrote: > >> >> > I am getting the same ratio on both small and more targeted, > and > >> >> > a > >> >> quite large general LM. I do not understand what to make out if > it! > >> >> > > >> >> > -kkm > >> >> > > >> >> >> -----Original Message----- > >> >> >> From: Nagendra Goel [mailto:nag...@go...] > >> >> >> Sent: 2015-06-22 2032 > >> >> >> To: Kirill Katsnelson; kal...@li... > >> >> >> Subject: RE: [Kaldi-users] LM weight > >> >> >> > >> >> >> Or maybe your domain is limited and LM very nicely matched to > >> >> >> the task at hand? > >> >> >> > >> >> >> -----Original Message----- > >> >> >> From: Kirill Katsnelson > >> [mailto:kir...@sm...] > >> >> >> Sent: Monday, June 22, 2015 11:29 PM > >> >> >> To: kal...@li... > >> >> >> Subject: [Kaldi-users] LM weight > >> >> >> > >> >> >> I my test sets I am getting the best WER at LM/acoustic weight > >> >> >> in > >> >> the > >> >> >> range of 18-19, with multiple LMs of different size and > origin. > >> >> >> I > >> >> was > >> >> >> usually thinking the usual ballpark figure about 10, give or > >> take. > >> >> >> From your experience, does this larger LM weight mean > anything, > >> >> >> and what if it does? I am guessing an inadequate acoustic > >> >> >> model, requiring more LM "pull"--am I making sense? > >> >> >> > >> >> >> -kkm > >> >> >> > >> >> >> -------------------------------------------------------------- > - > >> >> >> -- > >> - > >> >> >> -- > >> >> - > >> >> >> -- > >> >> >> ----- > >> >> >> -- > >> >> >> Monitor 25 network devices or servers for free with OpManager! > >> >> >> OpManager is web-based network management software that > >> >> >> monitors network devices and physical & virtual servers, > alerts > >> >> >> via email > >> & > >> >> >> sms for fault. > >> >> >> Monitor 25 devices for free with no restriction. Download now > >> >> >> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > >> >> >> _______________________________________________ > >> >> >> Kaldi-users mailing list > >> >> >> Kal...@li... > >> >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> >> > > >> >> > --------------------------------------------------------------- > - > >> >> > -- > >> - > >> >> > -- > >> >> - > >> >> > -------- Monitor 25 network devices or servers for free with > >> >> > OpManager! > >> >> > OpManager is web-based network management software that > monitors > >> >> > network devices and physical & virtual servers, alerts via > email > >> >> > & > >> >> sms > >> >> > for fault. Monitor 25 devices for free with no restriction. > >> >> > Download now > >> >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > >> >> > _______________________________________________ > >> >> > Kaldi-users mailing list > >> >> > Kal...@li... > >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Daniel P. <dp...@gm...> - 2015-06-23 06:57:36
|
It could still be about insertion errors. Typically you want insertion rates about 1/3 to 1/2 as big as deletion rates. If your setup is getting too many insertions, it could be using the LM scale to compensate. Playing with an insertion penalty may help (see the more recent scoring scripts). Dan On Tue, Jun 23, 2015 at 1:04 AM, Kirill Katsnelson <kir...@sm...> wrote: > Yes, I am using the pretty standard nnet2_online model with the librispeech data, with a 8 kHz conversion and a squished frequency range of the high-res features, as I am finding there is a lot of rather useless variance in the very low range, given the data are coming mostly from cell phones. But nothing fancy there overall. > > -kkm > >> -----Original Message----- >> From: Daniel Povey [mailto:dp...@gm...] >> Sent: 2015-06-22 2131 >> To: Kirill Katsnelson >> Cc: Nagendra Goel; kal...@li... >> Subject: Re: [Kaldi-users] LM weight >> >> By a lot of context I mean left-context and right-context, in the >> splicing. But I guess you are using one of the standard types of >> model. >> Dan >> >> >> On Tue, Jun 23, 2015 at 12:24 AM, Kirill Katsnelson >> <kir...@sm...> wrote: >> > The majority of the WER comes from subs, so this part looks pretty >> normal. >> > >> > A lot of acoustic context--probably, depending on the definition of >> "a lot." :-) Not sure I understand this part. How can I tell? It makes >> sense, looking at the base dev set figures that I got training the >> model from the first 500 hr of the librispeech corpus (best range of >> 16-17). Which are still higher than the reference in the RESULTS for >> the full 1Khr corpus, which is rather in the 12-15 range. >> > >> > -kkm >> > >> >> -----Original Message----- >> >> From: Daniel Povey [mailto:dp...@gm...] >> >> Sent: 2015-06-22 2059 >> >> To: Kirill Katsnelson >> >> Cc: Nagendra Goel; kal...@li... >> >> Subject: Re: [Kaldi-users] LM weight >> >> >> >> Usually if there is a lot of acoustic context in your model you will >> >> require a larger LM weight. >> >> Also, if for some reason there tend to be a lot of insertions in >> >> decoding (e.g. something weird went wrong in training, or there is >> >> some kind of normalization problem), a large LM weight can help >> >> reduce insertions and so improve the WER. >> >> >> >> Dan >> >> >> >> >> >> On Mon, Jun 22, 2015 at 11:36 PM, Kirill Katsnelson >> >> <kir...@sm...> wrote: >> >> > I am getting the same ratio on both small and more targeted, and a >> >> quite large general LM. I do not understand what to make out if it! >> >> > >> >> > -kkm >> >> > >> >> >> -----Original Message----- >> >> >> From: Nagendra Goel [mailto:nag...@go...] >> >> >> Sent: 2015-06-22 2032 >> >> >> To: Kirill Katsnelson; kal...@li... >> >> >> Subject: RE: [Kaldi-users] LM weight >> >> >> >> >> >> Or maybe your domain is limited and LM very nicely matched to the >> >> >> task at hand? >> >> >> >> >> >> -----Original Message----- >> >> >> From: Kirill Katsnelson >> [mailto:kir...@sm...] >> >> >> Sent: Monday, June 22, 2015 11:29 PM >> >> >> To: kal...@li... >> >> >> Subject: [Kaldi-users] LM weight >> >> >> >> >> >> I my test sets I am getting the best WER at LM/acoustic weight in >> >> the >> >> >> range of 18-19, with multiple LMs of different size and origin. I >> >> was >> >> >> usually thinking the usual ballpark figure about 10, give or >> take. >> >> >> From your experience, does this larger LM weight mean anything, >> >> >> and what if it does? I am guessing an inadequate acoustic model, >> >> >> requiring more LM "pull"--am I making sense? >> >> >> >> >> >> -kkm >> >> >> >> >> >> ----------------------------------------------------------------- >> - >> >> >> -- >> >> - >> >> >> -- >> >> >> ----- >> >> >> -- >> >> >> Monitor 25 network devices or servers for free with OpManager! >> >> >> OpManager is web-based network management software that monitors >> >> >> network devices and physical & virtual servers, alerts via email >> & >> >> >> sms for fault. >> >> >> Monitor 25 devices for free with no restriction. Download now >> >> >> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> >> >> _______________________________________________ >> >> >> Kaldi-users mailing list >> >> >> Kal...@li... >> >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> > >> >> > ------------------------------------------------------------------ >> - >> >> > -- >> >> - >> >> > -------- Monitor 25 network devices or servers for free with >> >> > OpManager! >> >> > OpManager is web-based network management software that monitors >> >> > network devices and physical & virtual servers, alerts via email & >> >> sms >> >> > for fault. Monitor 25 devices for free with no restriction. >> >> > Download now >> >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> >> > _______________________________________________ >> >> > Kaldi-users mailing list >> >> > Kal...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Kirill K. <kir...@sm...> - 2015-06-23 05:05:09
|
Yes, I am using the pretty standard nnet2_online model with the librispeech data, with a 8 kHz conversion and a squished frequency range of the high-res features, as I am finding there is a lot of rather useless variance in the very low range, given the data are coming mostly from cell phones. But nothing fancy there overall. -kkm > -----Original Message----- > From: Daniel Povey [mailto:dp...@gm...] > Sent: 2015-06-22 2131 > To: Kirill Katsnelson > Cc: Nagendra Goel; kal...@li... > Subject: Re: [Kaldi-users] LM weight > > By a lot of context I mean left-context and right-context, in the > splicing. But I guess you are using one of the standard types of > model. > Dan > > > On Tue, Jun 23, 2015 at 12:24 AM, Kirill Katsnelson > <kir...@sm...> wrote: > > The majority of the WER comes from subs, so this part looks pretty > normal. > > > > A lot of acoustic context--probably, depending on the definition of > "a lot." :-) Not sure I understand this part. How can I tell? It makes > sense, looking at the base dev set figures that I got training the > model from the first 500 hr of the librispeech corpus (best range of > 16-17). Which are still higher than the reference in the RESULTS for > the full 1Khr corpus, which is rather in the 12-15 range. > > > > -kkm > > > >> -----Original Message----- > >> From: Daniel Povey [mailto:dp...@gm...] > >> Sent: 2015-06-22 2059 > >> To: Kirill Katsnelson > >> Cc: Nagendra Goel; kal...@li... > >> Subject: Re: [Kaldi-users] LM weight > >> > >> Usually if there is a lot of acoustic context in your model you will > >> require a larger LM weight. > >> Also, if for some reason there tend to be a lot of insertions in > >> decoding (e.g. something weird went wrong in training, or there is > >> some kind of normalization problem), a large LM weight can help > >> reduce insertions and so improve the WER. > >> > >> Dan > >> > >> > >> On Mon, Jun 22, 2015 at 11:36 PM, Kirill Katsnelson > >> <kir...@sm...> wrote: > >> > I am getting the same ratio on both small and more targeted, and a > >> quite large general LM. I do not understand what to make out if it! > >> > > >> > -kkm > >> > > >> >> -----Original Message----- > >> >> From: Nagendra Goel [mailto:nag...@go...] > >> >> Sent: 2015-06-22 2032 > >> >> To: Kirill Katsnelson; kal...@li... > >> >> Subject: RE: [Kaldi-users] LM weight > >> >> > >> >> Or maybe your domain is limited and LM very nicely matched to the > >> >> task at hand? > >> >> > >> >> -----Original Message----- > >> >> From: Kirill Katsnelson > [mailto:kir...@sm...] > >> >> Sent: Monday, June 22, 2015 11:29 PM > >> >> To: kal...@li... > >> >> Subject: [Kaldi-users] LM weight > >> >> > >> >> I my test sets I am getting the best WER at LM/acoustic weight in > >> the > >> >> range of 18-19, with multiple LMs of different size and origin. I > >> was > >> >> usually thinking the usual ballpark figure about 10, give or > take. > >> >> From your experience, does this larger LM weight mean anything, > >> >> and what if it does? I am guessing an inadequate acoustic model, > >> >> requiring more LM "pull"--am I making sense? > >> >> > >> >> -kkm > >> >> > >> >> ----------------------------------------------------------------- > - > >> >> -- > >> - > >> >> -- > >> >> ----- > >> >> -- > >> >> Monitor 25 network devices or servers for free with OpManager! > >> >> OpManager is web-based network management software that monitors > >> >> network devices and physical & virtual servers, alerts via email > & > >> >> sms for fault. > >> >> Monitor 25 devices for free with no restriction. Download now > >> >> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > >> >> _______________________________________________ > >> >> Kaldi-users mailing list > >> >> Kal...@li... > >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > > >> > ------------------------------------------------------------------ > - > >> > -- > >> - > >> > -------- Monitor 25 network devices or servers for free with > >> > OpManager! > >> > OpManager is web-based network management software that monitors > >> > network devices and physical & virtual servers, alerts via email & > >> sms > >> > for fault. Monitor 25 devices for free with no restriction. > >> > Download now > >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > >> > _______________________________________________ > >> > Kaldi-users mailing list > >> > Kal...@li... > >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Daniel P. <dp...@gm...> - 2015-06-23 04:30:57
|
By a lot of context I mean left-context and right-context, in the splicing. But I guess you are using one of the standard types of model. Dan On Tue, Jun 23, 2015 at 12:24 AM, Kirill Katsnelson <kir...@sm...> wrote: > The majority of the WER comes from subs, so this part looks pretty normal. > > A lot of acoustic context--probably, depending on the definition of "a lot." :-) Not sure I understand this part. How can I tell? It makes sense, looking at the base dev set figures that I got training the model from the first 500 hr of the librispeech corpus (best range of 16-17). Which are still higher than the reference in the RESULTS for the full 1Khr corpus, which is rather in the 12-15 range. > > -kkm > >> -----Original Message----- >> From: Daniel Povey [mailto:dp...@gm...] >> Sent: 2015-06-22 2059 >> To: Kirill Katsnelson >> Cc: Nagendra Goel; kal...@li... >> Subject: Re: [Kaldi-users] LM weight >> >> Usually if there is a lot of acoustic context in your model you will >> require a larger LM weight. >> Also, if for some reason there tend to be a lot of insertions in >> decoding (e.g. something weird went wrong in training, or there is some >> kind of normalization problem), a large LM weight can help reduce >> insertions and so improve the WER. >> >> Dan >> >> >> On Mon, Jun 22, 2015 at 11:36 PM, Kirill Katsnelson >> <kir...@sm...> wrote: >> > I am getting the same ratio on both small and more targeted, and a >> quite large general LM. I do not understand what to make out if it! >> > >> > -kkm >> > >> >> -----Original Message----- >> >> From: Nagendra Goel [mailto:nag...@go...] >> >> Sent: 2015-06-22 2032 >> >> To: Kirill Katsnelson; kal...@li... >> >> Subject: RE: [Kaldi-users] LM weight >> >> >> >> Or maybe your domain is limited and LM very nicely matched to the >> >> task at hand? >> >> >> >> -----Original Message----- >> >> From: Kirill Katsnelson [mailto:kir...@sm...] >> >> Sent: Monday, June 22, 2015 11:29 PM >> >> To: kal...@li... >> >> Subject: [Kaldi-users] LM weight >> >> >> >> I my test sets I am getting the best WER at LM/acoustic weight in >> the >> >> range of 18-19, with multiple LMs of different size and origin. I >> was >> >> usually thinking the usual ballpark figure about 10, give or take. >> >> From your experience, does this larger LM weight mean anything, and >> >> what if it does? I am guessing an inadequate acoustic model, >> >> requiring more LM "pull"--am I making sense? >> >> >> >> -kkm >> >> >> >> -------------------------------------------------------------------- >> - >> >> -- >> >> ----- >> >> -- >> >> Monitor 25 network devices or servers for free with OpManager! >> >> OpManager is web-based network management software that monitors >> >> network devices and physical & virtual servers, alerts via email & >> >> sms for fault. >> >> Monitor 25 devices for free with no restriction. Download now >> >> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> >> _______________________________________________ >> >> Kaldi-users mailing list >> >> Kal...@li... >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > >> > --------------------------------------------------------------------- >> - >> > -------- Monitor 25 network devices or servers for free with >> > OpManager! >> > OpManager is web-based network management software that monitors >> > network devices and physical & virtual servers, alerts via email & >> sms >> > for fault. Monitor 25 devices for free with no restriction. Download >> > now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o >> > _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Kirill K. <kir...@sm...> - 2015-06-23 04:24:24
|
The majority of the WER comes from subs, so this part looks pretty normal. A lot of acoustic context--probably, depending on the definition of "a lot." :-) Not sure I understand this part. How can I tell? It makes sense, looking at the base dev set figures that I got training the model from the first 500 hr of the librispeech corpus (best range of 16-17). Which are still higher than the reference in the RESULTS for the full 1Khr corpus, which is rather in the 12-15 range. -kkm > -----Original Message----- > From: Daniel Povey [mailto:dp...@gm...] > Sent: 2015-06-22 2059 > To: Kirill Katsnelson > Cc: Nagendra Goel; kal...@li... > Subject: Re: [Kaldi-users] LM weight > > Usually if there is a lot of acoustic context in your model you will > require a larger LM weight. > Also, if for some reason there tend to be a lot of insertions in > decoding (e.g. something weird went wrong in training, or there is some > kind of normalization problem), a large LM weight can help reduce > insertions and so improve the WER. > > Dan > > > On Mon, Jun 22, 2015 at 11:36 PM, Kirill Katsnelson > <kir...@sm...> wrote: > > I am getting the same ratio on both small and more targeted, and a > quite large general LM. I do not understand what to make out if it! > > > > -kkm > > > >> -----Original Message----- > >> From: Nagendra Goel [mailto:nag...@go...] > >> Sent: 2015-06-22 2032 > >> To: Kirill Katsnelson; kal...@li... > >> Subject: RE: [Kaldi-users] LM weight > >> > >> Or maybe your domain is limited and LM very nicely matched to the > >> task at hand? > >> > >> -----Original Message----- > >> From: Kirill Katsnelson [mailto:kir...@sm...] > >> Sent: Monday, June 22, 2015 11:29 PM > >> To: kal...@li... > >> Subject: [Kaldi-users] LM weight > >> > >> I my test sets I am getting the best WER at LM/acoustic weight in > the > >> range of 18-19, with multiple LMs of different size and origin. I > was > >> usually thinking the usual ballpark figure about 10, give or take. > >> From your experience, does this larger LM weight mean anything, and > >> what if it does? I am guessing an inadequate acoustic model, > >> requiring more LM "pull"--am I making sense? > >> > >> -kkm > >> > >> -------------------------------------------------------------------- > - > >> -- > >> ----- > >> -- > >> Monitor 25 network devices or servers for free with OpManager! > >> OpManager is web-based network management software that monitors > >> network devices and physical & virtual servers, alerts via email & > >> sms for fault. > >> Monitor 25 devices for free with no restriction. Download now > >> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > >> _______________________________________________ > >> Kaldi-users mailing list > >> Kal...@li... > >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > > --------------------------------------------------------------------- > - > > -------- Monitor 25 network devices or servers for free with > > OpManager! > > OpManager is web-based network management software that monitors > > network devices and physical & virtual servers, alerts via email & > sms > > for fault. Monitor 25 devices for free with no restriction. Download > > now http://ad.doubleclick.net/ddm/clk/292181274;119417398;o > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users |