|
From: David M. <da...@ca...> - 2014-02-05 15:19:02
|
Hi Karel, utils/int2sym.pl seems to expect -f option. This command-line produced sensible looking lattices lattice-align-words exp/tri3b/graph_bd_tgpr/phones/word_boundary.int exp/tri3b/final.mdl 'ark:gunzip -c exp/tri3b/decode_bd_tgpr_eval92_fg/lat.1.gz |' ark,t:- | utils/int2sym.pl -f 3 exp/tri3b/graph_bd_tgpr/words.txt | utils/convert_slf.pl --word-to-node - ~/slf/ On the other hand, this one with the new tool produced lattices with word indexes: lattice-align-words-lexicon data/lang_bd/phones/align_lexicon.int exp/tri3b/final.mdl 'ark:gunzip -c exp/tri3b/decode_bd_tgpr_eval92_fg/lat.1.gz |' ark,t:- | utils/convert_slf.pl --word-to-node - ~/slf/ The first pipe worked so I will go with that. Thanks, David On 05/02/2014 13:37, Vesely Karel wrote: > Hello David, > I just modified the example to include the int->word mapping, > the lattices will now contain words, it is commited. > > Eventually you can also try newer tool 'lattice-align-words-lexicon', > which uses lexicon to do the aligning. For this you need to have > data/lang/phones/align_lexicon.int > > K. > > > On 02/05/2014 01:32 PM, Korbinian Riedhammer wrote: >> Hi David, >> >> yes, the numbers associated with the words are indices, you can >> resolve them using the data/lang/words.txt file. The warnings you see >> relate to process of alignining the lattices to words (i.e. making >> sure that the word arcs (transitions) reflect the actual word >> boundaries in terms of time), and not to the SLF conversion. To get >> rid of the epsilon transitions you may need to determinize the >> lattices first (please correct me, if I'm wrong, Dan). >> >> Korbinian. >> >> On Wed, Feb 5, 2014 at 12:34 PM, David Mrva >> <da...@ca...> wrote: >>> Hi Karel, >>> >>> Thanks for you help. Based on you new example in the convert_slf.pl >>> script I >>> made up a command-line that now produces HTK lattices from wsj >>> example kaldi >>> lattices. It gives me a warning. Is it anything to worry about? The >>> resulting lattices have words labelled with numbers, see below. Are the >>> numbers some indexes? How can I get words to the HTK lattice? >>> >>> David >>> >>> |