[Kaldi-developers] Phonetic decoding

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi,
I am trying to perform phonetic decoding in Kaldi where I would like to
obtain a final ctm file with a time-aligned 1-best phone sequence given my
input audio. I must be missing something, as the decoded phones look good
but their timings are not accurate at all. Here is what I am doing:

1) I create a phone bigram LM with utils/make_phone_bigram_lang.sh
2) I combine LM and acoustic models into a recognition graph
with utils/mkgraph.sh
3) I perform the decoding of the input audio with steps/decode_si.sh
4) Obtain the 1-best CTM using the following command:
    lattice-align-phones --output-error-lats=true $hmm/final.mdl
"ark:gunzip -c $decodedir/lat.*.gz |" ark:- | \
     lattice-to-ctm-conf --decode-mbr=true --acoustic-scale=$acwt ark:- - |
\
     utils/int2sym.pl -f 5 $graph_or_lang/words.txt > $odir/$name.ctm ||
exit 1;

Note that when using the same acoustic models for word decoding I get very
good word-starting times. In this case I am using, in step
4, lattice-align-words instead, could this be the problem?

Thanks,

X. Anguera