From: Xavier A. <xan...@gm...> - 2015-01-05 23:42:05
|
Thanks Dan, it worked perfectly! X. On Mon, Jan 5, 2015 at 9:21 PM, Daniel Povey <dp...@gm...> wrote: > Your whole pipeline is based on using the words in the lattices, not > the phones. In your case the words *are* the phones, because you're > using a phone bigram LM. So you need to do lattice-align-words, not > lattice-align-phones. The confidence algorithm only works on words so > you need to use words. > Alternatively, if you don't need the confidences, a more efficient way > to do it without lattice-align-words is to simply do lattice-1best | > nbest-to-linear [only keeping the alignment output] | ali-to-phones > (--write-lengths=true). You'll have to write a script to convert the > output of ali-to-phones to ctm format. > > Wei, if you have time, could you please work on adding a boolean > option --ctm-output to the program ali-to-phones (and an option > --frame-shift, default 0.01, to control the times of the ctm output)? > The confidences can just be 1. This issue seems to come up > repeatedly. > > > Dan > > > On Mon, Jan 5, 2015 at 12:10 PM, Xavier Anguera <xan...@gm...> > wrote: > > Hi, > > I am trying to perform phonetic decoding in Kaldi where I would like to > > obtain a final ctm file with a time-aligned 1-best phone sequence given > my > > input audio. I must be missing something, as the decoded phones look good > > but their timings are not accurate at all. Here is what I am doing: > > > > 1) I create a phone bigram LM with utils/make_phone_bigram_lang.sh > > 2) I combine LM and acoustic models into a recognition graph with > > utils/mkgraph.sh > > 3) I perform the decoding of the input audio with steps/decode_si.sh > > 4) Obtain the 1-best CTM using the following command: > > lattice-align-phones --output-error-lats=true $hmm/final.mdl > "ark:gunzip > > -c $decodedir/lat.*.gz |" ark:- | \ > > lattice-to-ctm-conf --decode-mbr=true --acoustic-scale=$acwt ark:- > - | > > \ > > utils/int2sym.pl -f 5 $graph_or_lang/words.txt > $odir/$name.ctm || > > exit 1; > > > > Note that when using the same acoustic models for word decoding I get > very > > good word-starting times. In this case I am using, in step 4, > > lattice-align-words instead, could this be the problem? > > > > Thanks, > > > > X. Anguera > > > > > ------------------------------------------------------------------------------ > > Dive into the World of Parallel Programming! The Go Parallel Website, > > sponsored by Intel and developed in partnership with Slashdot Media, is > your > > hub for all things parallel software development, from weekly thought > > leadership blogs to news, videos, case studies, tutorials and more. Take > a > > look and join the conversation now. http://goparallel.sourceforge.net > > _______________________________________________ > > Kaldi-developers mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > > > |