Re: [Kaldi-developers] Phonetic decoding

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I think the most likely difference relates to the acoustic scales you
used.  The output from decode-faster should be the same as the other
pipeline, but only if the --acoustic-scale option was identical in all
the stages (gmm-decode-faster, gmm-latgen-faster, lattice-1best)...
note, the --lm-scale option, if provided, is the inverse of
--acoustic-scale, it's an alternative way to set it.  The beams should
also be the same for the output to be (almost) exactly identical.
Dan


On Tue, Jan 6, 2015 at 10:18 AM, Xavier Anguera <xan...@gm...> wrote:
> Dan, all,
> while yesterday's solution worked perfectly for my needs, I later found a
> "simpler" way to obtain the time codes of a phoneme decoding that seems to
> also work, but is giving me a slightly different output.
> I your proposal the set of steps is: steps/decode_si.sh -> lattice-1best ->
> nbest-to-linear -> ali-to-phones
> This generates lattices that then converts into a 1-best decoding. Instead,
> I found the following to avoid outputing lattices, but just the alignments:
> steps/decode_nolats.sh -> ali-to-phones
>
> While this second solution is faster (less steps) it is not returning the
> exact same output.. I see that internally it is based on gmm-decode-faster
> instead of gmm-latgen-faster. Should I worry? which one is the best solution
> (if any)
>
> thanks,
>
> X. Anguera
>
> On Tue, Jan 6, 2015 at 12:41 AM, Xavier Anguera <xan...@gm...> wrote:
>>
>> Thanks Dan,
>> it worked perfectly!
>>
>> X.
>>
>>
>> On Mon, Jan 5, 2015 at 9:21 PM, Daniel Povey <dp...@gm...> wrote:
>>>
>>> Your whole pipeline is based on using the words in the lattices, not
>>> the phones.  In your case the words *are* the phones, because you're
>>> using a phone bigram LM.  So you need to do lattice-align-words, not
>>> lattice-align-phones.  The confidence algorithm only works on words so
>>> you need to use words.
>>> Alternatively, if you don't need the confidences, a more efficient way
>>> to do it without lattice-align-words is to simply do lattice-1best |
>>> nbest-to-linear [only keeping the alignment output] | ali-to-phones
>>> (--write-lengths=true).  You'll have to write a script to convert the
>>> output of ali-to-phones to ctm format.
>>>
>>> Wei, if you have time, could you please work on adding a boolean
>>> option --ctm-output to the program ali-to-phones (and an option
>>> --frame-shift, default 0.01, to control the times of the ctm output)?
>>> The confidences can just be 1.  This issue seems to come up
>>> repeatedly.
>>>
>>>
>>> Dan
>>>
>>>
>>> On Mon, Jan 5, 2015 at 12:10 PM, Xavier Anguera <xan...@gm...>
>>> wrote:
>>> > Hi,
>>> > I am trying to perform phonetic decoding in Kaldi where I would like to
>>> > obtain a final ctm file with a time-aligned 1-best phone sequence given
>>> > my
>>> > input audio. I must be missing something, as the decoded phones look
>>> > good
>>> > but their timings are not accurate at all. Here is what I am doing:
>>> >
>>> > 1) I create a phone bigram LM with utils/make_phone_bigram_lang.sh
>>> > 2) I combine LM and acoustic models into a recognition graph with
>>> > utils/mkgraph.sh
>>> > 3) I perform the decoding of the input audio with steps/decode_si.sh
>>> > 4) Obtain the 1-best CTM using the following command:
>>> >     lattice-align-phones --output-error-lats=true $hmm/final.mdl
>>> > "ark:gunzip
>>> > -c $decodedir/lat.*.gz |" ark:- | \
>>> >      lattice-to-ctm-conf --decode-mbr=true --acoustic-scale=$acwt ark:-
>>> > - |
>>> > \
>>> >      utils/int2sym.pl -f 5 $graph_or_lang/words.txt > $odir/$name.ctm
>>> > ||
>>> > exit 1;
>>> >
>>> > Note that when using the same acoustic models for word decoding I get
>>> > very
>>> > good word-starting times. In this case I am using, in step 4,
>>> > lattice-align-words instead, could this be the problem?
>>> >
>>> > Thanks,
>>> >
>>> > X. Anguera
>>> >
>>> >
>>> > ------------------------------------------------------------------------------
>>> > Dive into the World of Parallel Programming! The Go Parallel Website,
>>> > sponsored by Intel and developed in partnership with Slashdot Media, is
>>> > your
>>> > hub for all things parallel software development, from weekly thought
>>> > leadership blogs to news, videos, case studies, tutorials and more.
>>> > Take a
>>> > look and join the conversation now. http://goparallel.sourceforge.net
>>> > _______________________________________________
>>> > Kaldi-developers mailing list
>>> > Kal...@li...
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers
>>> >
>>
>>
>