From: Daniel P. <dp...@gm...> - 2015-06-16 02:43:52
|
I think the confusion is probably between two loops with "real" on them in G.fst: one loop where you always take the bigram probability, and one where you always take the unigram probability. Or maybe a similar confusion between a loop where you use the trigram "real real real" and the bigram "real real". Those loops are expected to exist. Probably the issue is that something happened at the start of the sequence which caused the FST to be confused about which of those two states it was in. If you have any empty words (words with empty pronunciation) in your lexicon this could possibly happen, as it would be confused between taking a normal word, then the backoff symbol, vs. taking a normal word, then the empty word, then the backoff symbol. I think the current Kaldi graph-creation script check for empty words in the lexicon, for this reason. Dan > The sequence R_B ( ) IY1_I ( ) L_E (real) #1 ( ) #16 ( ) #0 ( ) generally almost makes sense, given that #16 is the last one in table, the silence disambiguation symbol. (Not sure why "real" is emitted at L_E--I would rather expect it to be emitted at #1.) What I do not understand is what exactly the debug trace represents, and what should I make out if it. It is a path through the FST graph, but I do not understand what is this path exactly, and what does this endless walk of this loop mean. > > -kkm > >> -----Original Message----- >> From: Daniel Povey [mailto:dp...@gm...] >> Sent: 2015-06-15 1858 >> To: Kirill Katsnelson >> Cc: kal...@li... >> Subject: Re: [Kaldi-users] fstdeterminizestar (L*G) never completes >> >> Look into the "backoff disambiguation symbol", normally called #0. >> The reason why it is needed should be explained in the hbka.pdf paper. >> Dan >> >> >> On Mon, Jun 15, 2015 at 9:54 PM, Kirill Katsnelson >> <kir...@sm...> wrote: >> > Thank you! The output consists of some sequences as you described, >> quickly falling into a short ever repeated loop. >> > >> > The non-repeated section ends up with osymbols (excluding epsilons) >> "whatsoever on vacation up", and then the repeated part looks like " #1 >> ( ) #16 ( ) #0 ( ) R_B ( ) IY1_I ( ) L_E (real)". The word "real" is >> spelled "R_B IY1_I L_E #1" in L_disambig. >> > >> > Both LMs contain a bigram for "vacation up" and a trigram "vacation >> up there". "up real" is a bigram in both, with 3-grams "up real quick" >> and "up real quickly". "up real" is also a tail of a few other 3-grams, >> but these are also same in both models (up to their weights). >> > >> > It looks I do not understand what should I make in the end out of >> this >> > debug data :( >> > >> > -kkm >> > >> >> -----Original Message----- >> >> From: Daniel Povey [mailto:dp...@gm...] >> >> Sent: 2015-06-15 1821 >> >> To: Kirill Katsnelson >> >> Cc: kal...@li... >> >> Subject: Re: [Kaldi-users] fstdeterminizestar (L*G) never completes >> >> >> >> > I have a small set of sentences with repeat counts, and generating >> >> > an >> >> LM out of it. One is generated by a horrible local tool I have >> >> trouble tracing exactly how. For this one L*G composition takes >> about >> >> 20 seconds on my CPU. Another LM I just generated out of the same >> >> files with srilm 1.7.1 ngram-count. This one has been sitting in >> >> mkgraphs.sh on L_disambig*G composition step for about 30 minutes, >> >> and still churning. fstdeterminizestar --use-log=true is running at >> 100%. >> >> L_disambig.fst is the same file in both cases. Looks like the G >> >> making it not determinizable, although I have no idea how it came to >> be. >> >> > >> >> > Anyone could share an advice on tracking down the problem? Thanks. >> >> >> >> You can send a signal to that program like kill -SIGUSR1 process-id >> >> and it will print out some info about the symbol sequences involved, >> >> I think it is like >> >> isymbol1 (osymbol1) isymbol2 (osymbol2) and so on. >> >> Usually there is a particular word sequence that is problematic. >> >> Dan >> >> >> >> >> >> >> >> >> >> > >> >> > -kkm >> >> > >> >> > ------------------------------------------------------------------ >> - >> >> > -- >> >> - >> >> > -------- _______________________________________________ >> >> > Kaldi-users mailing list >> >> > Kal...@li... >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users |