From: Daniel P. <dp...@gm...> - 2015-06-16 01:57:51
|
Look into the "backoff disambiguation symbol", normally called #0. The reason why it is needed should be explained in the hbka.pdf paper. Dan On Mon, Jun 15, 2015 at 9:54 PM, Kirill Katsnelson <kir...@sm...> wrote: > Thank you! The output consists of some sequences as you described, quickly falling into a short ever repeated loop. > > The non-repeated section ends up with osymbols (excluding epsilons) "whatsoever on vacation up", and then the repeated part looks like " #1 ( ) #16 ( ) #0 ( ) R_B ( ) IY1_I ( ) L_E (real)". The word "real" is spelled "R_B IY1_I L_E #1" in L_disambig. > > Both LMs contain a bigram for "vacation up" and a trigram "vacation up there". "up real" is a bigram in both, with 3-grams "up real quick" and "up real quickly". "up real" is also a tail of a few other 3-grams, but these are also same in both models (up to their weights). > > It looks I do not understand what should I make in the end out of this debug data :( > > -kkm > >> -----Original Message----- >> From: Daniel Povey [mailto:dp...@gm...] >> Sent: 2015-06-15 1821 >> To: Kirill Katsnelson >> Cc: kal...@li... >> Subject: Re: [Kaldi-users] fstdeterminizestar (L*G) never completes >> >> > I have a small set of sentences with repeat counts, and generating an >> LM out of it. One is generated by a horrible local tool I have trouble >> tracing exactly how. For this one L*G composition takes about 20 >> seconds on my CPU. Another LM I just generated out of the same files >> with srilm 1.7.1 ngram-count. This one has been sitting in mkgraphs.sh >> on L_disambig*G composition step for about 30 minutes, and still >> churning. fstdeterminizestar --use-log=true is running at 100%. >> L_disambig.fst is the same file in both cases. Looks like the G making >> it not determinizable, although I have no idea how it came to be. >> > >> > Anyone could share an advice on tracking down the problem? Thanks. >> >> You can send a signal to that program like kill -SIGUSR1 process-id >> and it will print out some info about the symbol sequences involved, I >> think it is like >> isymbol1 (osymbol1) isymbol2 (osymbol2) and so on. >> Usually there is a particular word sequence that is problematic. >> Dan >> >> >> >> >> > >> > -kkm >> > >> > --------------------------------------------------------------------- >> - >> > -------- _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users |