|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-08-01 21:04:13
|
Your stuff based on lattice-align-words seemed like it could give the right answer. The mismatch you described is what I would expect and is not a problem, it's because you didn't run lattice-align-words in the baseline. I don't know exactly what the problem is-- perhaps the times are too far from the human-generated reference? In future if you ask questions about this, please paste the output of the programs concerned and the corresponding command lines. Dan > So, this was base on this thread: > > https://sourceforge.net/p/kaldi/mailman/message/31160057/ > > Our "goal" is to determine what the last word spoken is at 1 minute for a given audio file. How would you recommend doing that? > > This seems very close to what we need, but not quite there. > > Thanks, > > Nathan > > On Aug 1, 2013, at 1:12 PM, Daniel Povey wrote: > >> I wasn't aware we had any decoder that prints out per-word timings. >> Anyway, even if we did, those timings would not be accurate because of >> the word-symbols being "pushed around" in the graph. >> Dan >> >> >> >> On Thu, Aug 1, 2013 at 4:08 PM, Nathan Dunn <nd...@uo...> wrote: >>> >>> Sorry for my misuse of terminology. Let me know if there are better words for these. >>> >>> What I am calling the "decoding file" is the file generated during the decoding process. Maybe the transcription or hypothesis would be more accurate. This is what I use for that. I have it start at a beam of 5 and go up to 20. I get pretty good results for the most part. >>> >>>>>>> steps/decode.sh --nj 10 --model exp/tri1/final.mdl --num-threads 1 --acwt 0.1 --cmd "$decode_cmd" --config conf/decode.config exp/tri1/graph data/local/g300_test exp/tri1/decode_g300_test >>> >>> >>> The "timings" file is what I call the file that shows the time each word starts for a decoded phrase. It is based largely on swbd/s5/local/score_sclite.sh >>> >>>>>>> lattice-1best "ark:gunzip -c exp/tri1/decode_g300_test/lat.*.gz|" ark:- | lattice-align-words ./data/local/g300_test/lang/phones/word_boundary.int exp/tri1/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | ./utils/int2sym.pl -f 5 ./data/local/g300_test/lang/words.txt > exp/tri1/decode_g300_test/timings.all.txt >>> >>> For some reason the "timings file" and "decoding file" do not match. I'm wonder if they should match or is there likely an upstream error. >>> >>> Thanks, >>> >>> Nathan >>> >>> On Aug 1, 2013, at 12:38 PM, Daniel Povey wrote: >>> >>>> I am not sure what you mean by "timing file" and "decoding file". >>>> Dan >>>> >>>> >>>> On Thu, Aug 1, 2013 at 3:37 PM, Nathan Dunn <nd...@uo...> wrote: >>>>> I'm wondering why the timing file doesn't match my decoding file. >>>>> >>>>> They should match right? >>>>> >>>>> Nathan >>>>> >>>>> On Aug 1, 2013, at 12:15 PM, Daniel Povey <dp...@gm...> wrote: >>>>> >>>>>> Nathan-- I don't really understand what you are saying or what you are asking. >>>>>> Dan >>>>>> >>>>>> >>>>>> On Thu, Aug 1, 2013 at 2:45 PM, Nathan Dunn <nd...@uo...> wrote: >>>>>>> >>>>>>> Following the formal s5 scripts in wsj and the directions below, I was able to get word timings that ROUGHLY matched the decoding values I was getting. >>>>>>> >>>>>>> in the decode file (20.txt , decoding with a beam of 20) : >>>>>>> >>>>>>> 02.cut1 YOU ARE STANDING ON A SANDY WHITE BEACH OF THE ASSISTED AND END A THAN ANY OTHER KIND CONSISTED . . . >>>>>>> >>>>>>> in the timings.all.txt file: >>>>>>> 02.cut1 1 2.64 0.03 <UNK> >>>>>>> 02.cut1 1 3.06 0.11 YOU >>>>>>> 02.cut1 1 3.17 0.22 ARE >>>>>>> 02.cut1 1 3.39 0.40 STANDING >>>>>>> 02.cut1 1 4.06 0.23 ON >>>>>>> 02.cut1 1 4.29 0.06 A >>>>>>> 02.cut1 1 4.35 0.12 IS >>>>>>> 02.cut1 1 4.57 0.75 INVIOLATE >>>>>>> 02.cut1 1 5.43 1.24 ECOSYSTEM >>>>>>> 02.cut1 1 6.88 0.35 NESTS >>>>>>> 02.cut1 1 7.64 0.65 ASSISTED(2) >>>>>>> 02.cut1 1 8.29 0.19 OUT >>>>>>> 02.cut1 1 9.76 1.06 ENLISTED(2) >>>>>>> 02.cut1 1 10.82 0.41 WOULD >>>>>>> 02.cut1 1 11.23 0.59 AN >>>>>>> 02.cut1 1 11.82 0.79 ALAN'S >>>>>>> 02.cut1 1 12.67 1.03 NETTLESOME >>>>>>> 02.cut1 1 13.84 1.00 INSISTED >>>>>>> 02.cut1 1 14.84 0.21 AND >>>>>>> >>>>>>> Here are the commands: >>>>>>> >>>>>>> %utils/mkgraph.sh data/local/g300_test/lang exp/tri1 exp/tri1/graph >>>>>>> %steps/decode.sh --nj 10 --model exp/tri1/final.mdl --num-threads 1 --acwt 0.1 --cmd "$decode_cmd" --config conf/decode.config exp/tri1/graph data/local/g300_test exp/tri1/decode_g300_test >>>>>>> %lattice-1best "ark:gunzip -c exp/tri1/decode_g300_test/lat.*.gz|" ark:- | lattice-align-words ./data/local/g300_test/lang/phones/word_boundary.int exp/tri1/final.mdl ark:- ark:- | nbest-to-ctm ark:- - | ./utils/int2sym.pl -f 5 ./data/local/g300_test/lang/words.txt > exp/tri1/decode_g300_test/timings.all.txt >>>>>>> LOG (lattice-1best:main():lattice-1best.cc:88) Done converting 339 to best path, 0 had errors. >>>>>>> LOG (lattice-align-words:main():lattice-align-words.cc:117) Successfully aligned 339 lattices; 0 had errors. >>>>>>> LOG (nbest-to-ctm:main():nbest-to-ctm.cc:95) Converted 339 linear lattices to ctm format; 0 had errors. >>>>>>> >>>>>>> >>>>>>> It should use the lattices generated from the decoding and the word_boundary and word files. I can apply weights for language model and acoustic model, but I doubt that will have a great effect. The words.txt file must be correct if I am getting "similar" results. >>>>>>> >>>>>>> Anyway, any help is appreciated. >>>>>>> >>>>>>> >>>>>>> Nathan >>>>>> >>>> >>> >> > |