You can subscribe to this list here.
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2012 |
Jan
|
Feb
|
Mar
(8) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2013 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(7) |
May
(31) |
Jun
(40) |
Jul
(65) |
Aug
(37) |
Sep
(12) |
Oct
(57) |
Nov
(15) |
Dec
(35) |
2014 |
Jan
(3) |
Feb
(30) |
Mar
(57) |
Apr
(26) |
May
(49) |
Jun
(26) |
Jul
(63) |
Aug
(33) |
Sep
(20) |
Oct
(153) |
Nov
(62) |
Dec
(20) |
2015 |
Jan
(6) |
Feb
(21) |
Mar
(42) |
Apr
(33) |
May
(76) |
Jun
(102) |
Jul
(39) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Daniel P. <dp...@gm...> - 2015-06-16 02:43:52
|
I think the confusion is probably between two loops with "real" on them in G.fst: one loop where you always take the bigram probability, and one where you always take the unigram probability. Or maybe a similar confusion between a loop where you use the trigram "real real real" and the bigram "real real". Those loops are expected to exist. Probably the issue is that something happened at the start of the sequence which caused the FST to be confused about which of those two states it was in. If you have any empty words (words with empty pronunciation) in your lexicon this could possibly happen, as it would be confused between taking a normal word, then the backoff symbol, vs. taking a normal word, then the empty word, then the backoff symbol. I think the current Kaldi graph-creation script check for empty words in the lexicon, for this reason. Dan > The sequence R_B ( ) IY1_I ( ) L_E (real) #1 ( ) #16 ( ) #0 ( ) generally almost makes sense, given that #16 is the last one in table, the silence disambiguation symbol. (Not sure why "real" is emitted at L_E--I would rather expect it to be emitted at #1.) What I do not understand is what exactly the debug trace represents, and what should I make out if it. It is a path through the FST graph, but I do not understand what is this path exactly, and what does this endless walk of this loop mean. > > -kkm > >> -----Original Message----- >> From: Daniel Povey [mailto:dp...@gm...] >> Sent: 2015-06-15 1858 >> To: Kirill Katsnelson >> Cc: kal...@li... >> Subject: Re: [Kaldi-users] fstdeterminizestar (L*G) never completes >> >> Look into the "backoff disambiguation symbol", normally called #0. >> The reason why it is needed should be explained in the hbka.pdf paper. >> Dan >> >> >> On Mon, Jun 15, 2015 at 9:54 PM, Kirill Katsnelson >> <kir...@sm...> wrote: >> > Thank you! The output consists of some sequences as you described, >> quickly falling into a short ever repeated loop. >> > >> > The non-repeated section ends up with osymbols (excluding epsilons) >> "whatsoever on vacation up", and then the repeated part looks like " #1 >> ( ) #16 ( ) #0 ( ) R_B ( ) IY1_I ( ) L_E (real)". The word "real" is >> spelled "R_B IY1_I L_E #1" in L_disambig. >> > >> > Both LMs contain a bigram for "vacation up" and a trigram "vacation >> up there". "up real" is a bigram in both, with 3-grams "up real quick" >> and "up real quickly". "up real" is also a tail of a few other 3-grams, >> but these are also same in both models (up to their weights). >> > >> > It looks I do not understand what should I make in the end out of >> this >> > debug data :( >> > >> > -kkm >> > >> >> -----Original Message----- >> >> From: Daniel Povey [mailto:dp...@gm...] >> >> Sent: 2015-06-15 1821 >> >> To: Kirill Katsnelson >> >> Cc: kal...@li... >> >> Subject: Re: [Kaldi-users] fstdeterminizestar (L*G) never completes >> >> >> >> > I have a small set of sentences with repeat counts, and generating >> >> > an >> >> LM out of it. One is generated by a horrible local tool I have >> >> trouble tracing exactly how. For this one L*G composition takes >> about >> >> 20 seconds on my CPU. Another LM I just generated out of the same >> >> files with srilm 1.7.1 ngram-count. This one has been sitting in >> >> mkgraphs.sh on L_disambig*G composition step for about 30 minutes, >> >> and still churning. fstdeterminizestar --use-log=true is running at >> 100%. >> >> L_disambig.fst is the same file in both cases. Looks like the G >> >> making it not determinizable, although I have no idea how it came to >> be. >> >> > >> >> > Anyone could share an advice on tracking down the problem? Thanks. >> >> >> >> You can send a signal to that program like kill -SIGUSR1 process-id >> >> and it will print out some info about the symbol sequences involved, >> >> I think it is like >> >> isymbol1 (osymbol1) isymbol2 (osymbol2) and so on. >> >> Usually there is a particular word sequence that is problematic. >> >> Dan >> >> >> >> >> >> >> >> >> >> > >> >> > -kkm >> >> > >> >> > ------------------------------------------------------------------ >> - >> >> > -- >> >> - >> >> > -------- _______________________________________________ >> >> > Kaldi-users mailing list >> >> > Kal...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Kirill K. <kir...@sm...> - 2015-06-16 02:34:40
|
The sequence R_B ( ) IY1_I ( ) L_E (real) #1 ( ) #16 ( ) #0 ( ) generally almost makes sense, given that #16 is the last one in table, the silence disambiguation symbol. (Not sure why "real" is emitted at L_E--I would rather expect it to be emitted at #1.) What I do not understand is what exactly the debug trace represents, and what should I make out if it. It is a path through the FST graph, but I do not understand what is this path exactly, and what does this endless walk of this loop mean. -kkm > -----Original Message----- > From: Daniel Povey [mailto:dp...@gm...] > Sent: 2015-06-15 1858 > To: Kirill Katsnelson > Cc: kal...@li... > Subject: Re: [Kaldi-users] fstdeterminizestar (L*G) never completes > > Look into the "backoff disambiguation symbol", normally called #0. > The reason why it is needed should be explained in the hbka.pdf paper. > Dan > > > On Mon, Jun 15, 2015 at 9:54 PM, Kirill Katsnelson > <kir...@sm...> wrote: > > Thank you! The output consists of some sequences as you described, > quickly falling into a short ever repeated loop. > > > > The non-repeated section ends up with osymbols (excluding epsilons) > "whatsoever on vacation up", and then the repeated part looks like " #1 > ( ) #16 ( ) #0 ( ) R_B ( ) IY1_I ( ) L_E (real)". The word "real" is > spelled "R_B IY1_I L_E #1" in L_disambig. > > > > Both LMs contain a bigram for "vacation up" and a trigram "vacation > up there". "up real" is a bigram in both, with 3-grams "up real quick" > and "up real quickly". "up real" is also a tail of a few other 3-grams, > but these are also same in both models (up to their weights). > > > > It looks I do not understand what should I make in the end out of > this > > debug data :( > > > > -kkm > > > >> -----Original Message----- > >> From: Daniel Povey [mailto:dp...@gm...] > >> Sent: 2015-06-15 1821 > >> To: Kirill Katsnelson > >> Cc: kal...@li... > >> Subject: Re: [Kaldi-users] fstdeterminizestar (L*G) never completes > >> > >> > I have a small set of sentences with repeat counts, and generating > >> > an > >> LM out of it. One is generated by a horrible local tool I have > >> trouble tracing exactly how. For this one L*G composition takes > about > >> 20 seconds on my CPU. Another LM I just generated out of the same > >> files with srilm 1.7.1 ngram-count. This one has been sitting in > >> mkgraphs.sh on L_disambig*G composition step for about 30 minutes, > >> and still churning. fstdeterminizestar --use-log=true is running at > 100%. > >> L_disambig.fst is the same file in both cases. Looks like the G > >> making it not determinizable, although I have no idea how it came to > be. > >> > > >> > Anyone could share an advice on tracking down the problem? Thanks. > >> > >> You can send a signal to that program like kill -SIGUSR1 process-id > >> and it will print out some info about the symbol sequences involved, > >> I think it is like > >> isymbol1 (osymbol1) isymbol2 (osymbol2) and so on. > >> Usually there is a particular word sequence that is problematic. > >> Dan > >> > >> > >> > >> > >> > > >> > -kkm > >> > > >> > ------------------------------------------------------------------ > - > >> > -- > >> - > >> > -------- _______________________________________________ > >> > Kaldi-users mailing list > >> > Kal...@li... > >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Daniel P. <dp...@gm...> - 2015-06-16 01:57:51
|
Look into the "backoff disambiguation symbol", normally called #0. The reason why it is needed should be explained in the hbka.pdf paper. Dan On Mon, Jun 15, 2015 at 9:54 PM, Kirill Katsnelson <kir...@sm...> wrote: > Thank you! The output consists of some sequences as you described, quickly falling into a short ever repeated loop. > > The non-repeated section ends up with osymbols (excluding epsilons) "whatsoever on vacation up", and then the repeated part looks like " #1 ( ) #16 ( ) #0 ( ) R_B ( ) IY1_I ( ) L_E (real)". The word "real" is spelled "R_B IY1_I L_E #1" in L_disambig. > > Both LMs contain a bigram for "vacation up" and a trigram "vacation up there". "up real" is a bigram in both, with 3-grams "up real quick" and "up real quickly". "up real" is also a tail of a few other 3-grams, but these are also same in both models (up to their weights). > > It looks I do not understand what should I make in the end out of this debug data :( > > -kkm > >> -----Original Message----- >> From: Daniel Povey [mailto:dp...@gm...] >> Sent: 2015-06-15 1821 >> To: Kirill Katsnelson >> Cc: kal...@li... >> Subject: Re: [Kaldi-users] fstdeterminizestar (L*G) never completes >> >> > I have a small set of sentences with repeat counts, and generating an >> LM out of it. One is generated by a horrible local tool I have trouble >> tracing exactly how. For this one L*G composition takes about 20 >> seconds on my CPU. Another LM I just generated out of the same files >> with srilm 1.7.1 ngram-count. This one has been sitting in mkgraphs.sh >> on L_disambig*G composition step for about 30 minutes, and still >> churning. fstdeterminizestar --use-log=true is running at 100%. >> L_disambig.fst is the same file in both cases. Looks like the G making >> it not determinizable, although I have no idea how it came to be. >> > >> > Anyone could share an advice on tracking down the problem? Thanks. >> >> You can send a signal to that program like kill -SIGUSR1 process-id >> and it will print out some info about the symbol sequences involved, I >> think it is like >> isymbol1 (osymbol1) isymbol2 (osymbol2) and so on. >> Usually there is a particular word sequence that is problematic. >> Dan >> >> >> >> >> > >> > -kkm >> > >> > --------------------------------------------------------------------- >> - >> > -------- _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Kirill K. <kir...@sm...> - 2015-06-16 01:54:30
|
Thank you! The output consists of some sequences as you described, quickly falling into a short ever repeated loop. The non-repeated section ends up with osymbols (excluding epsilons) "whatsoever on vacation up", and then the repeated part looks like " #1 ( ) #16 ( ) #0 ( ) R_B ( ) IY1_I ( ) L_E (real)". The word "real" is spelled "R_B IY1_I L_E #1" in L_disambig. Both LMs contain a bigram for "vacation up" and a trigram "vacation up there". "up real" is a bigram in both, with 3-grams "up real quick" and "up real quickly". "up real" is also a tail of a few other 3-grams, but these are also same in both models (up to their weights). It looks I do not understand what should I make in the end out of this debug data :( -kkm > -----Original Message----- > From: Daniel Povey [mailto:dp...@gm...] > Sent: 2015-06-15 1821 > To: Kirill Katsnelson > Cc: kal...@li... > Subject: Re: [Kaldi-users] fstdeterminizestar (L*G) never completes > > > I have a small set of sentences with repeat counts, and generating an > LM out of it. One is generated by a horrible local tool I have trouble > tracing exactly how. For this one L*G composition takes about 20 > seconds on my CPU. Another LM I just generated out of the same files > with srilm 1.7.1 ngram-count. This one has been sitting in mkgraphs.sh > on L_disambig*G composition step for about 30 minutes, and still > churning. fstdeterminizestar --use-log=true is running at 100%. > L_disambig.fst is the same file in both cases. Looks like the G making > it not determinizable, although I have no idea how it came to be. > > > > Anyone could share an advice on tracking down the problem? Thanks. > > You can send a signal to that program like kill -SIGUSR1 process-id > and it will print out some info about the symbol sequences involved, I > think it is like > isymbol1 (osymbol1) isymbol2 (osymbol2) and so on. > Usually there is a particular word sequence that is problematic. > Dan > > > > > > > > -kkm > > > > --------------------------------------------------------------------- > - > > -------- _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Daniel P. <dp...@gm...> - 2015-06-16 01:38:18
|
What you want to do is possible in principle, and Kaldi has no objections in principe to a G.fst that is not an acceptor, but you have to be careful to ensure that the resulting FST is determinizable. You should probably look for hbka.pdf online and read it to get some idea of the issues involved. Basically it's not allowed to have 2 different states with 2 different loops where the same input-symbol sequence is on both of the loops, with different cost. [it's called the "twins property".] Also it needs to be functional, meaning that any given input-label sequence generates only one output-label sequence. There are certain additional restrictions required to make sure that LG is determinizable; that is why we insert "disambiguation symbols" in the lexicon. Having a different symbol table for the olabels is probably not a good solution as the scripts do assume that words.txt is good for both sides of it-- better to have a single symbol table that covers both sides of the FST; there is no assumption that L.fst cover all the words in words.txt. Dan On Mon, Jun 15, 2015 at 9:21 PM, Kirill Katsnelson <kir...@sm...> wrote: > Sources often call the G FST an acceptor, assuming i- and o-labels are same. > > I want to treat it as transducer with o-labels encoding more information than just a word, only during the decode. (Think for example a grammar tagging words in context). I understand I am looking at 2 different symbol tables (instead of the single words.txt normally). > > Does kaldi support that out of the box? Do, for one, the *-latgen-* decoders actually put olabels into the lattices? > > -kkm > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Jan T. <jt...@gm...> - 2015-06-16 01:22:01
|
Does your LM or text contain characters that one could call "special"? The arpa2fst is probably one of the more horrible parts of kaldi -- I remember I had similar issues a year ago -- turned out there was an issue with some character being "special" for the arpa2fst. Dan fixed it, but I don't recall how or if it would be possible tht the same problem might appear again (with a different character). y. On Mon, Jun 15, 2015 at 9:14 PM, Kirill Katsnelson < kir...@sm...> wrote: > I have a small set of sentences with repeat counts, and generating an LM > out of it. One is generated by a horrible local tool I have trouble tracing > exactly how. For this one L*G composition takes about 20 seconds on my CPU. > Another LM I just generated out of the same files with srilm 1.7.1 > ngram-count. This one has been sitting in mkgraphs.sh on L_disambig*G > composition step for about 30 minutes, and still churning. > fstdeterminizestar --use-log=true is running at 100%. L_disambig.fst is the > same file in both cases. Looks like the G making it not determinizable, > although I have no idea how it came to be. > > Anyone could share an advice on tracking down the problem? Thanks. > > -kkm > > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Daniel P. <dp...@gm...> - 2015-06-16 01:21:27
|
> I have a small set of sentences with repeat counts, and generating an LM out of it. One is generated by a horrible local tool I have trouble tracing exactly how. For this one L*G composition takes about 20 seconds on my CPU. Another LM I just generated out of the same files with srilm 1.7.1 ngram-count. This one has been sitting in mkgraphs.sh on L_disambig*G composition step for about 30 minutes, and still churning. fstdeterminizestar --use-log=true is running at 100%. L_disambig.fst is the same file in both cases. Looks like the G making it not determinizable, although I have no idea how it came to be. > > Anyone could share an advice on tracking down the problem? Thanks. You can send a signal to that program like kill -SIGUSR1 process-id and it will print out some info about the symbol sequences involved, I think it is like isymbol1 (osymbol1) isymbol2 (osymbol2) and so on. Usually there is a particular word sequence that is problematic. Dan > > -kkm > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Kirill K. <kir...@sm...> - 2015-06-16 01:21:19
|
Sources often call the G FST an acceptor, assuming i- and o-labels are same. I want to treat it as transducer with o-labels encoding more information than just a word, only during the decode. (Think for example a grammar tagging words in context). I understand I am looking at 2 different symbol tables (instead of the single words.txt normally). Does kaldi support that out of the box? Do, for one, the *-latgen-* decoders actually put olabels into the lattices? -kkm |
From: Kirill K. <kir...@sm...> - 2015-06-16 01:14:31
|
I have a small set of sentences with repeat counts, and generating an LM out of it. One is generated by a horrible local tool I have trouble tracing exactly how. For this one L*G composition takes about 20 seconds on my CPU. Another LM I just generated out of the same files with srilm 1.7.1 ngram-count. This one has been sitting in mkgraphs.sh on L_disambig*G composition step for about 30 minutes, and still churning. fstdeterminizestar --use-log=true is running at 100%. L_disambig.fst is the same file in both cases. Looks like the G making it not determinizable, although I have no idea how it came to be. Anyone could share an advice on tracking down the problem? Thanks. -kkm |
From: Kirill K. <kir...@sm...> - 2015-06-16 00:47:14
|
> -----Original Message----- > From: Roozbeh [mailto:roo...@ya...] > Sent: 2015-06-15 1616 > > steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance. This is not an error, just a statement of fact. > run.pl: job failed, log is in exp/make_mfcc/train/make_mfcc_train.1.log > when I checked the log I saw > bash: line 1: compute-mfcc-feats: command not found > bash: line 1: copy-feats: command not found Make sure you are importing path.sh in your script. All standard scripts have a statement like . path.sh You checked that you have path.sh; make sure you actually use it. -kkm |
From: Roozbeh <roo...@ya...> - 2015-06-15 23:15:59
|
Hi,I am new to KALDI. I started a test with 8 utterances from one speaker but When I run this command:for x in train; do steps/make_mfcc.shdata/$x exp/make_mfcc/$x $mfccdir steps/compute_cmvn_stats.shdata/$x exp/make_mfcc/$x $mfccdir done The validation is successful but then this error comes up: steps/make_mfcc.sh: [info]: no segments file exists:assuming wav.scp indexed by utterance. run.pl: job failed, log is inexp/make_mfcc/train/make_mfcc_train.1.log when I checked the log I saw bash: line 1: compute-mfcc-feats: command not found bash: line 1: copy-feats: command not found I read in discussions that you suggested to somebody to copythe path.sh to the working directory. In may case it is there and I added the path of compute-mfcc-feats code separately (kaldi-trunk/src) but it didn't help. Would you please help me in resolving this issue. |
From: Daniel P. <dp...@gm...> - 2015-06-15 17:53:57
|
Yes, it produces "forced" alignments. If it ignored the word labels we wouldn't call it an alignment script, we'd call it a decoding script. Dan On Mon, Jun 15, 2015 at 12:58 PM, Mate Andre <ele...@gm...> wrote: > Hi, > > I've trained a classifier for the LibriSpeech corpus using the > 'egs/librispeech' recipe included in Kaldi's repository, and I am looking to > generate forced alignments for the utterances in the dataset. > > So far, I have used the 's5/steps/align_si.sh' to generate the desired > alignments. > > Does this script generate "forced" alignments, or does it ignore the word > labels on the training examples when producing alignments? > > ------------------------------------------------------------------------------ > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Mate A. <ele...@gm...> - 2015-06-15 16:58:57
|
Hi, I've trained a classifier for the LibriSpeech corpus using the 'egs/librispeech' recipe included in Kaldi's repository, and I am looking to generate forced alignments for the utterances in the dataset. So far, I have used the 's5/steps/align_si.sh' to generate the desired alignments. Does this script generate "forced" alignments, or does it ignore the word labels on the training examples when producing alignments? |
From: Xingyu Na <asr...@gm...> - 2015-06-13 02:36:19
|
Thank you Dan. I'll checked the things you suggest. :-) On 06/13/2015 03:01 AM, Daniel Povey wrote: > BTW, a good way to debug a zombie process is to look at its parent PID > (ppid) and check what that process is doing. E.g. is it stopped? If > so why? Or maybe it's busy doing something else. > > > On Fri, Jun 12, 2015 at 1:59 PM, Daniel Povey <dp...@gm...> wrote: >> This is not a Kaldi problem, it's almost certainly a problem either >> with your GridEngine software or equivalent, or with your machine >> (e.g. the linux mem-killer might be being invoked). Check the system >> logs and the GridEngine logs. >> Dan >> >> >> On Fri, Jun 12, 2015 at 6:30 AM, Xingyu Na <asr...@gm...> wrote: >>> No, the user didn't kill the script. And the terminal is alive. >>> It happens rather randomly, but only when the job is submitted to a certain >>> node, called "g05". >>> The log hangs at >>> ======================================= >>> # Running on g05 >>> # Started at Fri Jun 12 17:02:47 CST 2015 >>> # nnet-shuffle-egs --buffer-size=5000 --srand=2094 >>> ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- | nnet-train-simple >>> --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl ark:- >>> exp/nnet4d/2095.12.mdl >>> nnet-train-simple --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl >>> ark:- exp/nnet4d_gpu/2095.12.mdl >>> nnet-shuffle-egs --buffer-size=5000 --srand=2094 >>> ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- >>> ======================================= >>> >>> It seems that nnet-shuffle-egs and nnet-train-simple do not cooperate on >>> this specific job. Weird..... >>> >>> Best, >>> X. >>> >>> >>> On 06/12/2015 01:05 PM, Daniel Povey wrote: >>>> Possibly it is in zombie status because something interrupted or >>>> killed the run.pl process that had launched that process. E.g. a user >>>> did ctrl-z to the to-level script, maybe. >>>> >>>> Dan >>>> >>>> >>>> On Thu, Jun 11, 2015 at 11:11 PM, Xingyu Na <asr...@gm...> >>>> wrote: >>>>> Hi, >>>>> >>>>> A user report this when he was using the train_pnorm_fast script. Top >>>>> gave this: >>>>> 60442 zhangpe+ 20 0 1193408 13112 10704 S 0.0 0.0 0:02.30 >>>>> nnet-shuffle-eg >>>>> 60443 zhangpe+ 20 0 0 0 0 Z 0.0 0.0 0:02.19 >>>>> nnet-train-simp >>>>> >>>>> It remains in zombie status forever.... >>>>> Any idea how this goes wrong? >>>>> >>>>> Best, >>>>> Xingyu >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> _______________________________________________ >>>>> Kaldi-users mailing list >>>>> Kal...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> |
From: Kirill K. <kir...@sm...> - 2015-06-12 23:01:09
|
> From: David Warde-Farley [mailto:d.w...@gm...] > Subject: [Kaldi-users] non-cluster usage of Librispeech s5 recipe? > > I'm trying to > use the s5 recipe for LibriSpeech on a single machine with a single > GPU. I've modified cmd.sh to use run.pl. I ran it on a single machine, it requires a few modifications. Note that it took almost a week on a 6-core 4.1GHz overclocked i7-5930K CPU and GeForce 980 to train on the 500 hour set. > After about a day, I see a lot of background processes like gmm-latgen- > faster, lattice-add-penalty, lattice-scale, etc. that have been > launched in the background (the terminal is actually free, which > suggests the run.sh script has terminated...). I'm not totally sure > what's going on, or how to find out. In librispeech/s5/run.sh, look for decode commands in subshells, like ( utils/mkgraph.sh data/lang_nosp_test_tgsmall \ exp/tri4b exp/tri4b/graph_nosp_tgsmall || exit 1; for test in test_clean test_other dev_clean dev_other; do steps/decode_fmllr.sh --nj 20 --cmd "$decode_cmd" \ . . . )& These decodes are quite slow, if you run them on your machine. They are slower than other part of the script. In the end, they are accumulating, eating CPU and blowing up out of memory. They are not essential for NN training, except possibly for the mkgraph script. The results are useful to check if you are getting expected WER, but really not essential. You may either disable these decode blocks completely (except mkgraph invocations) or remove the '&' at the end to run them synchronously. NB they will take the most preparation time prior to NN training step. Dunno about your machine but give it an extra couple days to complete with these. > One thing I noticed earlier is that the script was trying to spawn > multiple GPU jobs, but this GPU is configured (by administrators) to > permit at most one CUDA process, and so I saw "3 of 4 jobs failed" > messages. Would these jobs have been retried? They will not, but you can restart NN training from the last step. Modify local/online/run_nnet2_ms.sh so that steps/nnet2/train_multisplice_accel2.sh is invoked with switches "--num-jobs-initial 1 --num-jobs-final 1" (the defaults are larger). When running local/online/run_nnet2_ms.sh, pass it "--stage 7" (this is the default) and "--train_stage N" the number of iteration you are restarting from. Even if not the 1 job limit, you probably won't benefit from running more than 1 at a time. -kkm |
From: Daniel P. <dp...@gm...> - 2015-06-12 19:01:44
|
BTW, a good way to debug a zombie process is to look at its parent PID (ppid) and check what that process is doing. E.g. is it stopped? If so why? Or maybe it's busy doing something else. On Fri, Jun 12, 2015 at 1:59 PM, Daniel Povey <dp...@gm...> wrote: > This is not a Kaldi problem, it's almost certainly a problem either > with your GridEngine software or equivalent, or with your machine > (e.g. the linux mem-killer might be being invoked). Check the system > logs and the GridEngine logs. > Dan > > > On Fri, Jun 12, 2015 at 6:30 AM, Xingyu Na <asr...@gm...> wrote: >> No, the user didn't kill the script. And the terminal is alive. >> It happens rather randomly, but only when the job is submitted to a certain >> node, called "g05". >> The log hangs at >> ======================================= >> # Running on g05 >> # Started at Fri Jun 12 17:02:47 CST 2015 >> # nnet-shuffle-egs --buffer-size=5000 --srand=2094 >> ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- | nnet-train-simple >> --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl ark:- >> exp/nnet4d/2095.12.mdl >> nnet-train-simple --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl >> ark:- exp/nnet4d_gpu/2095.12.mdl >> nnet-shuffle-egs --buffer-size=5000 --srand=2094 >> ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- >> ======================================= >> >> It seems that nnet-shuffle-egs and nnet-train-simple do not cooperate on >> this specific job. Weird..... >> >> Best, >> X. >> >> >> On 06/12/2015 01:05 PM, Daniel Povey wrote: >>> >>> Possibly it is in zombie status because something interrupted or >>> killed the run.pl process that had launched that process. E.g. a user >>> did ctrl-z to the to-level script, maybe. >>> >>> Dan >>> >>> >>> On Thu, Jun 11, 2015 at 11:11 PM, Xingyu Na <asr...@gm...> >>> wrote: >>>> >>>> Hi, >>>> >>>> A user report this when he was using the train_pnorm_fast script. Top >>>> gave this: >>>> 60442 zhangpe+ 20 0 1193408 13112 10704 S 0.0 0.0 0:02.30 >>>> nnet-shuffle-eg >>>> 60443 zhangpe+ 20 0 0 0 0 Z 0.0 0.0 0:02.19 >>>> nnet-train-simp >>>> >>>> It remains in zombie status forever.... >>>> Any idea how this goes wrong? >>>> >>>> Best, >>>> Xingyu >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> _______________________________________________ >>>> Kaldi-users mailing list >>>> Kal...@li... >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> |
From: Daniel P. <dp...@gm...> - 2015-06-12 17:59:47
|
This is not a Kaldi problem, it's almost certainly a problem either with your GridEngine software or equivalent, or with your machine (e.g. the linux mem-killer might be being invoked). Check the system logs and the GridEngine logs. Dan On Fri, Jun 12, 2015 at 6:30 AM, Xingyu Na <asr...@gm...> wrote: > No, the user didn't kill the script. And the terminal is alive. > It happens rather randomly, but only when the job is submitted to a certain > node, called "g05". > The log hangs at > ======================================= > # Running on g05 > # Started at Fri Jun 12 17:02:47 CST 2015 > # nnet-shuffle-egs --buffer-size=5000 --srand=2094 > ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- | nnet-train-simple > --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl ark:- > exp/nnet4d/2095.12.mdl > nnet-train-simple --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl > ark:- exp/nnet4d_gpu/2095.12.mdl > nnet-shuffle-egs --buffer-size=5000 --srand=2094 > ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- > ======================================= > > It seems that nnet-shuffle-egs and nnet-train-simple do not cooperate on > this specific job. Weird..... > > Best, > X. > > > On 06/12/2015 01:05 PM, Daniel Povey wrote: >> >> Possibly it is in zombie status because something interrupted or >> killed the run.pl process that had launched that process. E.g. a user >> did ctrl-z to the to-level script, maybe. >> >> Dan >> >> >> On Thu, Jun 11, 2015 at 11:11 PM, Xingyu Na <asr...@gm...> >> wrote: >>> >>> Hi, >>> >>> A user report this when he was using the train_pnorm_fast script. Top >>> gave this: >>> 60442 zhangpe+ 20 0 1193408 13112 10704 S 0.0 0.0 0:02.30 >>> nnet-shuffle-eg >>> 60443 zhangpe+ 20 0 0 0 0 Z 0.0 0.0 0:02.19 >>> nnet-train-simp >>> >>> It remains in zombie status forever.... >>> Any idea how this goes wrong? >>> >>> Best, >>> Xingyu >>> >>> >>> ------------------------------------------------------------------------------ >>> _______________________________________________ >>> Kaldi-users mailing list >>> Kal...@li... >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
From: Jonathan Lucuix-A. <rum...@gm...> - 2015-06-12 14:43:25
|
Hi, I have trained a classifier for the LibriSpeech corpus using the 'egs/librispeech' recipe included in Kaldi's repository, and I am looking to generate forced alignments for the utterances in the training sets. So far, I have used the 's5/steps/align_si.sh' to generate the desired alignments. Does this script generate "forced" alignments, or does it ignore the word labels on the training examples when producing alignments? |
From: Jan T. <jt...@gm...> - 2015-06-12 13:30:34
|
No, these jobs will not be retried. It's the user's responsibility to set the number of training jobs accordingly to the number of GPU's he/she have available for training (--num-jobs-initial and --num-jobs-final). About your observation with the jobs running on background. I'm not familiar with the librispeech recipe per se, so I just can tell you a general experience with the recipes in kaldi. I guess that _could_ happen -- in a script, when you spawn something to run on background (using &) and the parent script exits (no matter if with success or with failure), the background tasks will still run -- you could actually list them using "ps" issued on the terminal where the original script was executed. My feeling is that some part of the script failed, because if that happens, exit 1 is usually called. When the script will run successfully, there is usually "wait" at the end of the script, so the script will wait until all child tasks finish. hth y. On Fri, Jun 12, 2015 at 2:06 AM, David Warde-Farley < d.w...@gm...> wrote: > Hi, > > Apologies if this has been answered in the archives, but I'm trying to > use the s5 recipe for LibriSpeech on a single machine with a single > GPU. I've modified cmd.sh to use run.pl. > > After about a day, I see a lot of background processes like > gmm-latgen-faster, lattice-add-penalty, lattice-scale, etc. that have > been launched in the background (the terminal is actually free, which > suggests the run.sh script has terminated...). I'm not totally sure > what's going on, or how to find out. > > Specifically, I'm trying to export the features used to train the > final stage neural network as well as the aligned targets. > > One thing I noticed earlier is that the script was trying to spawn > multiple GPU jobs, but this GPU is configured (by administrators) to > permit at most one CUDA process, and so I saw "3 of 4 jobs failed" > messages. Would these jobs have been retried? > > Thanks in advance, > > David > > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
From: Xingyu Na <asr...@gm...> - 2015-06-12 10:30:53
|
No, the user didn't kill the script. And the terminal is alive. It happens rather randomly, but only when the job is submitted to a certain node, called "g05". The log hangs at ======================================= # Running on g05 # Started at Fri Jun 12 17:02:47 CST 2015 # nnet-shuffle-egs --buffer-size=5000 --srand=2094 ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- | nnet-train-simple --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl ark:- exp/nnet4d/2095.12.mdl nnet-train-simple --minibatch-size=512 --srand=2094 exp/nnet4d_gpu/2094.mdl ark:- exp/nnet4d_gpu/2095.12.mdl nnet-shuffle-egs --buffer-size=5000 --srand=2094 ark:exp/nnet4d_gpu/egs/egs.12.113.ark ark:- ======================================= It seems that nnet-shuffle-egs and nnet-train-simple do not cooperate on this specific job. Weird..... Best, X. On 06/12/2015 01:05 PM, Daniel Povey wrote: > Possibly it is in zombie status because something interrupted or > killed the run.pl process that had launched that process. E.g. a user > did ctrl-z to the to-level script, maybe. > > Dan > > > On Thu, Jun 11, 2015 at 11:11 PM, Xingyu Na <asr...@gm...> wrote: >> Hi, >> >> A user report this when he was using the train_pnorm_fast script. Top >> gave this: >> 60442 zhangpe+ 20 0 1193408 13112 10704 S 0.0 0.0 0:02.30 >> nnet-shuffle-eg >> 60443 zhangpe+ 20 0 0 0 0 Z 0.0 0.0 0:02.19 >> nnet-train-simp >> >> It remains in zombie status forever.... >> Any idea how this goes wrong? >> >> Best, >> Xingyu >> >> ------------------------------------------------------------------------------ >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: David Warde-F. <d.w...@gm...> - 2015-06-12 06:06:26
|
Hi, Apologies if this has been answered in the archives, but I'm trying to use the s5 recipe for LibriSpeech on a single machine with a single GPU. I've modified cmd.sh to use run.pl. After about a day, I see a lot of background processes like gmm-latgen-faster, lattice-add-penalty, lattice-scale, etc. that have been launched in the background (the terminal is actually free, which suggests the run.sh script has terminated...). I'm not totally sure what's going on, or how to find out. Specifically, I'm trying to export the features used to train the final stage neural network as well as the aligned targets. One thing I noticed earlier is that the script was trying to spawn multiple GPU jobs, but this GPU is configured (by administrators) to permit at most one CUDA process, and so I saw "3 of 4 jobs failed" messages. Would these jobs have been retried? Thanks in advance, David |
From: Daniel P. <dp...@gm...> - 2015-06-12 05:05:50
|
Possibly it is in zombie status because something interrupted or killed the run.pl process that had launched that process. E.g. a user did ctrl-z to the to-level script, maybe. Dan On Thu, Jun 11, 2015 at 11:11 PM, Xingyu Na <asr...@gm...> wrote: > Hi, > > A user report this when he was using the train_pnorm_fast script. Top > gave this: > 60442 zhangpe+ 20 0 1193408 13112 10704 S 0.0 0.0 0:02.30 > nnet-shuffle-eg > 60443 zhangpe+ 20 0 0 0 0 Z 0.0 0.0 0:02.19 > nnet-train-simp > > It remains in zombie status forever.... > Any idea how this goes wrong? > > Best, > Xingyu > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
From: Xingyu Na <asr...@gm...> - 2015-06-12 03:12:00
|
Hi, A user report this when he was using the train_pnorm_fast script. Top gave this: 60442 zhangpe+ 20 0 1193408 13112 10704 S 0.0 0.0 0:02.30 nnet-shuffle-eg 60443 zhangpe+ 20 0 0 0 0 Z 0.0 0.0 0:02.19 nnet-train-simp It remains in zombie status forever.... Any idea how this goes wrong? Best, Xingyu |
From: Xingyu Na <nax...@hc...> - 2015-06-12 02:59:45
|
Hi, A user report this when he was using the train_pnorm_fast script. Top gave this: 60442 zhangpe+ 20 0 1193408 13112 10704 S 0.0 0.0 0:02.30 nnet-shuffle-eg 60443 zhangpe+ 20 0 0 0 0 Z 0.0 0.0 0:02.19 nnet-train-simp It remains in zombie status forever.... Any idea how this goes wrong? Best, Xingyu |
From: Jan T. <jt...@gm...> - 2015-06-11 16:48:40
|
Congratulations, Sarah! y. On Thu, Jun 11, 2015 at 11:05 AM, Sarah Flora S. Juan < sar...@gm...> wrote: > Dear Kaldi users, > > We would like to inform you about our recently published data for ASR > available on github: https://github.com/sarahjuan/iban . The data > contains speech in Iban language, a language that is spoken in Borneo. We > have used the data in our study on under-resourced language for ASR and we > have built our systems using Kaldi. Thanks to the available recipes and > active forum, we have learnt several techniques that were very useful for > our research. > > Feel free to download our data and Kaldi scripts that were used to build > ASR. > > > Best regards, > > > > Sarah (sjs...@fi...) & Laurent (lau...@im...) > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |