From: Daniel P. <dp...@gm...> - 2015-01-06 19:04:46
|
I think the most likely difference relates to the acoustic scales you used. The output from decode-faster should be the same as the other pipeline, but only if the --acoustic-scale option was identical in all the stages (gmm-decode-faster, gmm-latgen-faster, lattice-1best)... note, the --lm-scale option, if provided, is the inverse of --acoustic-scale, it's an alternative way to set it. The beams should also be the same for the output to be (almost) exactly identical. Dan On Tue, Jan 6, 2015 at 10:18 AM, Xavier Anguera <xan...@gm...> wrote: > Dan, all, > while yesterday's solution worked perfectly for my needs, I later found a > "simpler" way to obtain the time codes of a phoneme decoding that seems to > also work, but is giving me a slightly different output. > I your proposal the set of steps is: steps/decode_si.sh -> lattice-1best -> > nbest-to-linear -> ali-to-phones > This generates lattices that then converts into a 1-best decoding. Instead, > I found the following to avoid outputing lattices, but just the alignments: > steps/decode_nolats.sh -> ali-to-phones > > While this second solution is faster (less steps) it is not returning the > exact same output.. I see that internally it is based on gmm-decode-faster > instead of gmm-latgen-faster. Should I worry? which one is the best solution > (if any) > > thanks, > > X. Anguera > > On Tue, Jan 6, 2015 at 12:41 AM, Xavier Anguera <xan...@gm...> wrote: >> >> Thanks Dan, >> it worked perfectly! >> >> X. >> >> >> On Mon, Jan 5, 2015 at 9:21 PM, Daniel Povey <dp...@gm...> wrote: >>> >>> Your whole pipeline is based on using the words in the lattices, not >>> the phones. In your case the words *are* the phones, because you're >>> using a phone bigram LM. So you need to do lattice-align-words, not >>> lattice-align-phones. The confidence algorithm only works on words so >>> you need to use words. >>> Alternatively, if you don't need the confidences, a more efficient way >>> to do it without lattice-align-words is to simply do lattice-1best | >>> nbest-to-linear [only keeping the alignment output] | ali-to-phones >>> (--write-lengths=true). You'll have to write a script to convert the >>> output of ali-to-phones to ctm format. >>> >>> Wei, if you have time, could you please work on adding a boolean >>> option --ctm-output to the program ali-to-phones (and an option >>> --frame-shift, default 0.01, to control the times of the ctm output)? >>> The confidences can just be 1. This issue seems to come up >>> repeatedly. >>> >>> >>> Dan >>> >>> >>> On Mon, Jan 5, 2015 at 12:10 PM, Xavier Anguera <xan...@gm...> >>> wrote: >>> > Hi, >>> > I am trying to perform phonetic decoding in Kaldi where I would like to >>> > obtain a final ctm file with a time-aligned 1-best phone sequence given >>> > my >>> > input audio. I must be missing something, as the decoded phones look >>> > good >>> > but their timings are not accurate at all. Here is what I am doing: >>> > >>> > 1) I create a phone bigram LM with utils/make_phone_bigram_lang.sh >>> > 2) I combine LM and acoustic models into a recognition graph with >>> > utils/mkgraph.sh >>> > 3) I perform the decoding of the input audio with steps/decode_si.sh >>> > 4) Obtain the 1-best CTM using the following command: >>> > lattice-align-phones --output-error-lats=true $hmm/final.mdl >>> > "ark:gunzip >>> > -c $decodedir/lat.*.gz |" ark:- | \ >>> > lattice-to-ctm-conf --decode-mbr=true --acoustic-scale=$acwt ark:- >>> > - | >>> > \ >>> > utils/int2sym.pl -f 5 $graph_or_lang/words.txt > $odir/$name.ctm >>> > || >>> > exit 1; >>> > >>> > Note that when using the same acoustic models for word decoding I get >>> > very >>> > good word-starting times. In this case I am using, in step 4, >>> > lattice-align-words instead, could this be the problem? >>> > >>> > Thanks, >>> > >>> > X. Anguera >>> > >>> > >>> > ------------------------------------------------------------------------------ >>> > Dive into the World of Parallel Programming! The Go Parallel Website, >>> > sponsored by Intel and developed in partnership with Slashdot Media, is >>> > your >>> > hub for all things parallel software development, from weekly thought >>> > leadership blogs to news, videos, case studies, tutorials and more. >>> > Take a >>> > look and join the conversation now. http://goparallel.sourceforge.net >>> > _______________________________________________ >>> > Kaldi-developers mailing list >>> > Kal...@li... >>> > https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>> > >> >> > |