From: 石伟 <sh...@sz...> - 2015-01-13 15:29:12
|
Dan, please see the patch. I firstly use BaseFloatVectorWriter so that the parameters of ali-to-phones will remain unchanged. But I found the resulting output are not quite ctm one(there are parentheses '[' and ']'). So I changed the last parameter so it can be wspecifier or wxfilename, which depends on whether you set ctm-output option or not. -Xavier: you can test the patch in your local machine. By using pipelines like lattice-1best | nbest-to-linear | ali-to-phones --ctm-output, I can get phone alignments like: 2 1 0.00 0.36 SIL 2 1 0.36 0.08 l_B 2 1 0.44 0.06 i_E 2 1 0.50 0.05 k_B 2 1 0.55 0.09 e_I 2 1 0.64 0.07 j_I 2 1 0.71 0.20 iang_E 2 1 0.91 0.08 zh_B 2 1 0.99 0.08 u_I 2 1 1.07 0.06 ch_I 2 1 1.13 0.04 ib_E 2 1 1.17 0.06 zh_B 2 1 1.23 0.11 ao_I 2 1 1.34 0.09 k_I 2 1 1.43 0.13 ai_E 2 1 1.56 0.06 g_B 2 1 1.62 0.06 uo_I 2 1 1.68 0.07 w_I 2 1 1.75 0.06 u_I 2 1 1.81 0.09 uxs_I 2 1 1.90 0.10 an_E 2 1 2.00 0.14 ch_B 2 1 2.14 0.10 ang_I 2 1 2.24 0.08 w_I 2 1 2.32 0.04 u_E 2 1 2.36 0.11 h_B 2 1 2.47 0.06 ui_I 2 1 2.53 0.07 y_I 2 1 2.60 0.14 i_E 2 1 2.74 0.40 SIL I believe this is what you want. Wei ------------------ Original ------------------ From: "Daniel Povey"<dp...@gm...>; Date: Tue, Jan 6, 2015 04:21 AM To: "Xavier Anguera"<xan...@gm...>; "wei.shi"<we...@im...>; "shiwei"<sh...@sz...>; Cc: "kal...@li..."<kal...@li...>; Subject: Re: [Kaldi-developers] Phonetic decoding Your whole pipeline is based on using the words in the lattices, not the phones. In your case the words *are* the phones, because you're using a phone bigram LM. So you need to do lattice-align-words, not lattice-align-phones. The confidence algorithm only works on words so you need to use words. Alternatively, if you don't need the confidences, a more efficient way to do it without lattice-align-words is to simply do lattice-1best | nbest-to-linear [only keeping the alignment output] | ali-to-phones (--write-lengths=true). You'll have to write a script to convert the output of ali-to-phones to ctm format. Wei, if you have time, could you please work on adding a boolean option --ctm-output to the program ali-to-phones (and an option --frame-shift, default 0.01, to control the times of the ctm output)? The confidences can just be 1. This issue seems to come up repeatedly. Dan On Mon, Jan 5, 2015 at 12:10 PM, Xavier Anguera <xan...@gm...> wrote: > Hi, > I am trying to perform phonetic decoding in Kaldi where I would like to > obtain a final ctm file with a time-aligned 1-best phone sequence given my > input audio. I must be missing something, as the decoded phones look good > but their timings are not accurate at all. Here is what I am doing: > > 1) I create a phone bigram LM with utils/make_phone_bigram_lang.sh > 2) I combine LM and acoustic models into a recognition graph with > utils/mkgraph.sh > 3) I perform the decoding of the input audio with steps/decode_si.sh > 4) Obtain the 1-best CTM using the following command: > lattice-align-phones --output-error-lats=true $hmm/final.mdl "ark:gunzip > -c $decodedir/lat.*.gz |" ark:- | \ > lattice-to-ctm-conf --decode-mbr=true --acoustic-scale=$acwt ark:- - | > \ > utils/int2sym.pl -f 5 $graph_or_lang/words.txt > $odir/$name.ctm || > exit 1; > > Note that when using the same acoustic models for word decoding I get very > good word-starting times. In this case I am using, in step 4, > lattice-align-words instead, could this be the problem? > > Thanks, > > X. Anguera > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming! The Go Parallel Website, > sponsored by Intel and developed in partnership with Slashdot Media, is your > hub for all things parallel software development, from weekly thought > leadership blogs to news, videos, case studies, tutorials and more. Take a > look and join the conversation now. http://goparallel.sourceforge.net > _______________________________________________ > Kaldi-developers mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-developers > |