|
From: Xavier A. <xan...@gm...> - 2015-01-05 20:10:21
|
Hi,
I am trying to perform phonetic decoding in Kaldi where I would like to
obtain a final ctm file with a time-aligned 1-best phone sequence given my
input audio. I must be missing something, as the decoded phones look good
but their timings are not accurate at all. Here is what I am doing:
1) I create a phone bigram LM with utils/make_phone_bigram_lang.sh
2) I combine LM and acoustic models into a recognition graph
with utils/mkgraph.sh
3) I perform the decoding of the input audio with steps/decode_si.sh
4) Obtain the 1-best CTM using the following command:
lattice-align-phones --output-error-lats=true $hmm/final.mdl
"ark:gunzip -c $decodedir/lat.*.gz |" ark:- | \
lattice-to-ctm-conf --decode-mbr=true --acoustic-scale=$acwt ark:- - |
\
utils/int2sym.pl -f 5 $graph_or_lang/words.txt > $odir/$name.ctm ||
exit 1;
Note that when using the same acoustic models for word decoding I get very
good word-starting times. In this case I am using, in step
4, lattice-align-words instead, could this be the problem?
Thanks,
X. Anguera
|