From: Daniel P. <dp...@gm...> - 2015-06-16 01:21:27
|
> I have a small set of sentences with repeat counts, and generating an LM out of it. One is generated by a horrible local tool I have trouble tracing exactly how. For this one L*G composition takes about 20 seconds on my CPU. Another LM I just generated out of the same files with srilm 1.7.1 ngram-count. This one has been sitting in mkgraphs.sh on L_disambig*G composition step for about 30 minutes, and still churning. fstdeterminizestar --use-log=true is running at 100%. L_disambig.fst is the same file in both cases. Looks like the G making it not determinizable, although I have no idea how it came to be. > > Anyone could share an advice on tracking down the problem? Thanks. You can send a signal to that program like kill -SIGUSR1 process-id and it will print out some info about the symbol sequences involved, I think it is like isymbol1 (osymbol1) isymbol2 (osymbol2) and so on. Usually there is a particular word sequence that is problematic. Dan > > -kkm > > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |