From: Xavier A. <xan...@gm...> - 2015-01-05 20:10:21
|
Hi, I am trying to perform phonetic decoding in Kaldi where I would like to obtain a final ctm file with a time-aligned 1-best phone sequence given my input audio. I must be missing something, as the decoded phones look good but their timings are not accurate at all. Here is what I am doing: 1) I create a phone bigram LM with utils/make_phone_bigram_lang.sh 2) I combine LM and acoustic models into a recognition graph with utils/mkgraph.sh 3) I perform the decoding of the input audio with steps/decode_si.sh 4) Obtain the 1-best CTM using the following command: lattice-align-phones --output-error-lats=true $hmm/final.mdl "ark:gunzip -c $decodedir/lat.*.gz |" ark:- | \ lattice-to-ctm-conf --decode-mbr=true --acoustic-scale=$acwt ark:- - | \ utils/int2sym.pl -f 5 $graph_or_lang/words.txt > $odir/$name.ctm || exit 1; Note that when using the same acoustic models for word decoding I get very good word-starting times. In this case I am using, in step 4, lattice-align-words instead, could this be the problem? Thanks, X. Anguera |