Re: [Kaldi-users] Word-tagging grammars

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

What you want to do is possible in principle, and Kaldi has no
objections in principe to a G.fst that is not an acceptor, but you
have to be careful to ensure that the resulting FST is determinizable.
You should probably look for hbka.pdf online and read it to get some
idea of the issues involved.  Basically it's not allowed to have 2
different states with 2 different loops where the same input-symbol
sequence is on both of the loops, with different cost.  [it's called
the "twins property".]   Also it needs to be functional, meaning that
any given input-label sequence generates only one output-label
sequence.  There are certain additional restrictions required to make
sure that LG is determinizable; that is why we insert "disambiguation
symbols" in the lexicon.

Having a different symbol table for the olabels is probably not a good
solution as the scripts do assume that words.txt is good for both
sides of it-- better to have a single symbol table that covers both
sides of the FST; there is no assumption that L.fst cover all the
words in words.txt.

Dan

On Mon, Jun 15, 2015 at 9:21 PM, Kirill Katsnelson
<kir...@sm...> wrote:
> Sources often call the G FST an acceptor, assuming i- and o-labels are same.
>
> I want to treat it as transducer with o-labels encoding more information than just a word, only during the decode. (Think for example a grammar tagging words in context). I understand I am looking at 2 different symbol tables (instead of the single words.txt normally).
>
> Does kaldi support that out of the box? Do, for one, the *-latgen-* decoders actually put olabels into the lattices?
>
>  -kkm
> ------------------------------------------------------------------------------
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users