Re: [Kaldi-developers] Phonetic decoding

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Thanks Dan,
it worked perfectly!

X.

On Mon, Jan 5, 2015 at 9:21 PM, Daniel Povey <dp...@gm...> wrote:

> Your whole pipeline is based on using the words in the lattices, not
> the phones.  In your case the words *are* the phones, because you're
> using a phone bigram LM.  So you need to do lattice-align-words, not
> lattice-align-phones.  The confidence algorithm only works on words so
> you need to use words.
> Alternatively, if you don't need the confidences, a more efficient way
> to do it without lattice-align-words is to simply do lattice-1best |
> nbest-to-linear [only keeping the alignment output] | ali-to-phones
> (--write-lengths=true).  You'll have to write a script to convert the
> output of ali-to-phones to ctm format.
>
> Wei, if you have time, could you please work on adding a boolean
> option --ctm-output to the program ali-to-phones (and an option
> --frame-shift, default 0.01, to control the times of the ctm output)?
> The confidences can just be 1.  This issue seems to come up
> repeatedly.
>
>
> Dan
>
>
> On Mon, Jan 5, 2015 at 12:10 PM, Xavier Anguera <xan...@gm...>
> wrote:
> > Hi,
> > I am trying to perform phonetic decoding in Kaldi where I would like to
> > obtain a final ctm file with a time-aligned 1-best phone sequence given
> my
> > input audio. I must be missing something, as the decoded phones look good
> > but their timings are not accurate at all. Here is what I am doing:
> >
> > 1) I create a phone bigram LM with utils/make_phone_bigram_lang.sh
> > 2) I combine LM and acoustic models into a recognition graph with
> > utils/mkgraph.sh
> > 3) I perform the decoding of the input audio with steps/decode_si.sh
> > 4) Obtain the 1-best CTM using the following command:
> >     lattice-align-phones --output-error-lats=true $hmm/final.mdl
> "ark:gunzip
> > -c $decodedir/lat.*.gz |" ark:- | \
> >      lattice-to-ctm-conf --decode-mbr=true --acoustic-scale=$acwt ark:-
> - |
> > \
> >      utils/int2sym.pl -f 5 $graph_or_lang/words.txt > $odir/$name.ctm ||
> > exit 1;
> >
> > Note that when using the same acoustic models for word decoding I get
> very
> > good word-starting times. In this case I am using, in step 4,
> > lattice-align-words instead, could this be the problem?
> >
> > Thanks,
> >
> > X. Anguera
> >
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming! The Go Parallel Website,
> > sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> > hub for all things parallel software development, from weekly thought
> > leadership blogs to news, videos, case studies, tutorials and more. Take
> a
> > look and join the conversation now. http://goparallel.sourceforge.net
> > _______________________________________________
> > Kaldi-developers mailing list
> > Kal...@li...
> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers
> >
>