|
From: Michael R. <ri...@go...> - 2013-10-20 18:30:19
|
It is included but you need to compile with *--enable-ngram-fsts*. Similarly for the other extension formats. My own suggestion would be to use a large LM represented as an NGramFst in 2nd pass lattice rescoring and entropy prune that model to a modest size for a first pass with a static (H)CLG. With a reasonable lattice size, you'll get virtually no search errors. The NGramFst expects a specific ngram format; www.opengrm.org describes and offers tools to train and convert from DARPA format. -m On Sun, Oct 20, 2013 at 2:10 PM, Daniel Povey <dp...@gm...> wrote: > Mike: NgramFst isn't included in the standard OpenFst distribution, is it? > Dan > > > On Sun, Oct 20, 2013 at 2:08 PM, Michael Riley <ri...@go...> wrote: > > I don't know much about how ngram FSTs are used in Kaldi or the > > characteristics of their implementations, but I do know that for sheer > > OpenFst representation: > > > > VectorFst and ConstFst (the standard mutable and immutable OpenFst > reps) > > use about 20 bytes per state and 16 bytes per arc while NGramFst, > specific > > to ngram models (see here), use about 8 bytes per state and 8 bytes per > arc. > > The last, for representing at the word level, is the most compact and is > > used, in general, in a rescoring or on-the-fly composition mode. There > are > > also various compact FST formats (see here) that can represent other > > specific FSTs (and is extensible). > > > > With 64 bit compilation, you are limited by how much memory you have > for > > those reps and of course. Hope that helps. > > > > On Fri, Oct 18, 2013 at 12:06 AM, E <oth...@ao...> wrote: > >> > >> Thanks a lot for the answers and possible solutions. > >> > >> Few questions- > >> > >> What is the maximum size trigram language model supported by FST? I > tried > >> to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but > crashed > >> afterwords. So I want to know if there is a theoretical limit on size of > >> language model that can be integrated with Kaldi. > >> > >> I will try to make HCLG.fst with gigaword again (with triphone AM), but > >> has anyone tried to build it with LM of this size successfully, if so, > what > >> were the system requirements (RAM) and final FST size in mega/gigabytes? > >> > >> > >> > ------------------------------------------------------------------------------ > >> October Webinars: Code for Performance > >> Free Intel webinars can help you accelerate application performance. > >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > >> from > >> the latest Intel processors and coprocessors. See abstracts and > register > > >> > >> > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > >> _______________________________________________ > >> Kaldi-users mailing list > >> Kal...@li... > >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > > > > > > > ------------------------------------------------------------------------------ > > October Webinars: Code for Performance > > Free Intel webinars can help you accelerate application performance. > > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > > from > > the latest Intel processors and coprocessors. See abstracts and register > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > |