Re: [Kaldi-users] timings don't match decoding

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>From what you described, something seems to be not matching up with
something else.  I am thinking that both of these somethings perhaps
derive from Kaldi lattices and maybe one of them had had
lattice-align-words run on it, and one had not, and that might be
responsible for the mismatch.
Dan


On Thu, Aug 1, 2013 at 7:42 PM, Nathan Dunn <nd...@uo...> wrote:
>
> On Aug 1, 2013, at 2:04 PM, Daniel Povey wrote:
>
>> Your stuff based on lattice-align-words seemed like it could give the
>> right answer.  The mismatch you described is what I would expect and
>> is not a problem, it's because you didn't run lattice-align-words in
>> the baseline.
>
> I'm trying to understand what you mean by "run lattice-align-words in the baseline".   Are you saying that I should align the lattices prior to running them?
>
>
> in wsj/s5/steps/word_align_lattices.sh , I see:
>
> $cmd JOB=1:$nj $outdir/log/align.JOB.log \
>   lattice-align-words --silence-label=$silence_label --test=true \
>    $wbfile $mdl "ark:gunzip -c $indir/lat.JOB.gz|" "ark,t:|gzip -c >$outdir/lat.JOB.gz" || exit 1;
>
> I'm assuming this aligns the lattice that can later be used below:
>
>> lattice-1best "ark:gunzip -c exp/tri1/decode_g300_test/lat.*.gz|" ark:- | lattice-align-words ./data/local/g300_test/lang/phones/word_boundary.int exp/tri1/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | ./utils/int2sym.pl -f 5 ./data/local/g300_test/lang/words.txt > exp/tri1/decode_g300_test/timings.all.txt
>
>
> Am I on the right track, or is there a better place for you to point me.
>
>>  I don't know exactly what the problem is-- perhaps the
>> times are too far from the human-generated reference?
>
> We can look into that.
>
>> In future if you ask questions about this, please paste the output of
>> the programs concerned and the corresponding command lines.
>
> Sorry, I'll do that in the future.
>
> Thanks,
>
> Nathan
>
>>
>> Dan
>>
>>
>>
>>> So, this was base on this thread:
>>>
>>> https://sourceforge.net/p/kaldi/mailman/message/31160057/
>>>
>>> Our "goal" is to determine what the last word spoken is at 1 minute for a given audio file.   How would you recommend doing that?
>>>
>>> This seems very close to what we need, but not quite there.
>>>
>>> Thanks,
>>>
>>> Nathan
>>>
>>> On Aug 1, 2013, at 1:12 PM, Daniel Povey wrote:
>>>
>>>> I wasn't aware we had any decoder that prints out per-word timings.
>>>> Anyway, even if we did, those timings would not be accurate because of
>>>> the word-symbols being "pushed around" in the graph.
>>>> Dan
>>>>
>>>>
>>>>
>>>> On Thu, Aug 1, 2013 at 4:08 PM, Nathan Dunn <nd...@uo...> wrote:
>>>>>
>>>>> Sorry for my misuse of terminology.  Let me know if there are better words for these.
>>>>>
>>>>> What I am calling the "decoding file" is the file generated during the decoding process.  Maybe the transcription or hypothesis would be more accurate.  This is what I use for that.    I have it start at a beam of 5 and go up to 20.  I get pretty good results for the most part.
>>>>>
>>>>>>>>> steps/decode.sh --nj 10 --model exp/tri1/final.mdl --num-threads 1 --acwt 0.1 --cmd "$decode_cmd" --config conf/decode.config exp/tri1/graph data/local/g300_test exp/tri1/decode_g300_test
>>>>>
>>>>>
>>>>> The "timings" file is what I call the file that shows the time each word starts for a decoded phrase.   It is based largely on swbd/s5/local/score_sclite.sh
>>>>>
>>>>>>>>> lattice-1best "ark:gunzip -c exp/tri1/decode_g300_test/lat.*.gz|" ark:- | lattice-align-words ./data/local/g300_test/lang/phones/word_boundary.int exp/tri1/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | ./utils/int2sym.pl -f 5 ./data/local/g300_test/lang/words.txt > exp/tri1/decode_g300_test/timings.all.txt
>>>>>
>>>>> For some reason the "timings file" and "decoding file" do not match.   I'm wonder if they should match or is there likely an upstream error.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Nathan
>>>>>
>>>>> On Aug 1, 2013, at 12:38 PM, Daniel Povey wrote:
>>>>>
>>>>>> I am not sure what you mean by "timing file" and "decoding file".
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 1, 2013 at 3:37 PM, Nathan Dunn <nd...@uo...> wrote:
>>>>>>> I'm wondering why the timing file doesn't match my decoding file.
>>>>>>>
>>>>>>> They should match right?
>>>>>>>
>>>>>>> Nathan
>>>>>>>
>>>>>>> On Aug 1, 2013, at 12:15 PM, Daniel Povey <dp...@gm...> wrote:
>>>>>>>
>>>>>>>> Nathan-- I don't really understand what you are saying or what you are asking.
>>>>>>>> Dan
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 1, 2013 at 2:45 PM, Nathan Dunn <nd...@uo...> wrote:
>>>>>>>>>
>>>>>>>>> Following the formal s5 scripts in wsj and the directions below, I was able to get word timings that ROUGHLY matched the decoding values I was getting.
>>>>>>>>>
>>>>>>>>> in the decode file (20.txt , decoding with a beam of 20) :
>>>>>>>>>
>>>>>>>>> 02.cut1 YOU ARE STANDING ON A SANDY WHITE BEACH OF THE ASSISTED AND END A THAN ANY OTHER KIND CONSISTED . . .
>>>>>>>>>
>>>>>>>>> in the timings.all.txt file:
>>>>>>>>> 02.cut1 1 2.64 0.03 <UNK>
>>>>>>>>> 02.cut1 1 3.06 0.11 YOU
>>>>>>>>> 02.cut1 1 3.17 0.22 ARE
>>>>>>>>> 02.cut1 1 3.39 0.40 STANDING
>>>>>>>>> 02.cut1 1 4.06 0.23 ON
>>>>>>>>> 02.cut1 1 4.29 0.06 A
>>>>>>>>> 02.cut1 1 4.35 0.12 IS
>>>>>>>>> 02.cut1 1 4.57 0.75 INVIOLATE
>>>>>>>>> 02.cut1 1 5.43 1.24 ECOSYSTEM
>>>>>>>>> 02.cut1 1 6.88 0.35 NESTS
>>>>>>>>> 02.cut1 1 7.64 0.65 ASSISTED(2)
>>>>>>>>> 02.cut1 1 8.29 0.19 OUT
>>>>>>>>> 02.cut1 1 9.76 1.06 ENLISTED(2)
>>>>>>>>> 02.cut1 1 10.82 0.41 WOULD
>>>>>>>>> 02.cut1 1 11.23 0.59 AN
>>>>>>>>> 02.cut1 1 11.82 0.79 ALAN'S
>>>>>>>>> 02.cut1 1 12.67 1.03 NETTLESOME
>>>>>>>>> 02.cut1 1 13.84 1.00 INSISTED
>>>>>>>>> 02.cut1 1 14.84 0.21 AND
>>>>>>>>>
>>>>>>>>> Here are the commands:
>>>>>>>>>
>>>>>>>>> %utils/mkgraph.sh data/local/g300_test/lang exp/tri1 exp/tri1/graph
>>>>>>>>> %steps/decode.sh --nj 10 --model exp/tri1/final.mdl --num-threads 1 --acwt 0.1 --cmd "$decode_cmd" --config conf/decode.config exp/tri1/graph data/local/g300_test exp/tri1/decode_g300_test
>>>>>>>>> %lattice-1best "ark:gunzip -c exp/tri1/decode_g300_test/lat.*.gz|" ark:- | lattice-align-words ./data/local/g300_test/lang/phones/word_boundary.int exp/tri1/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | ./utils/int2sym.pl -f 5 ./data/local/g300_test/lang/words.txt > exp/tri1/decode_g300_test/timings.all.txt
>>>>>>>>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 339 to best path, 0 had errors.
>>>>>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:117) Successfully aligned 339 lattices; 0 had errors.
>>>>>>>>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 339 linear lattices to ctm format; 0 had errors.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It should use the lattices generated from the decoding and the word_boundary and word files.   I can apply weights for language model and acoustic model, but I doubt that will have a great effect.   The words.txt file must be correct if I am getting "similar" results.
>>>>>>>>>
>>>>>>>>> Anyway, any help is appreciated.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Nathan
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>