From: Daniel P. <dp...@gm...> - 2015-06-16 01:38:18
|
What you want to do is possible in principle, and Kaldi has no objections in principe to a G.fst that is not an acceptor, but you have to be careful to ensure that the resulting FST is determinizable. You should probably look for hbka.pdf online and read it to get some idea of the issues involved. Basically it's not allowed to have 2 different states with 2 different loops where the same input-symbol sequence is on both of the loops, with different cost. [it's called the "twins property".] Also it needs to be functional, meaning that any given input-label sequence generates only one output-label sequence. There are certain additional restrictions required to make sure that LG is determinizable; that is why we insert "disambiguation symbols" in the lexicon. Having a different symbol table for the olabels is probably not a good solution as the scripts do assume that words.txt is good for both sides of it-- better to have a single symbol table that covers both sides of the FST; there is no assumption that L.fst cover all the words in words.txt. Dan On Mon, Jun 15, 2015 at 9:21 PM, Kirill Katsnelson <kir...@sm...> wrote: > Sources often call the G FST an acceptor, assuming i- and o-labels are same. > > I want to treat it as transducer with o-labels encoding more information than just a word, only during the decode. (Think for example a grammar tagging words in context). I understand I am looking at 2 different symbol tables (instead of the single words.txt normally). > > Does kaldi support that out of the box? Do, for one, the *-latgen-* decoders actually put olabels into the lattices? > > -kkm > ------------------------------------------------------------------------------ > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |