|
From: Al Z. <al...@gm...> - 2013-10-20 18:56:33
|
> My own suggestion would be to use a large LM represented as an NGramFst in 2nd pass lattice rescoring and entropy prune that model to a modest size for a first pass with a static (H)CLG. With a reasonable lattice size, you'll get virtually no search errors. But, in my experience, the size of the first pass LM is important not only for accuracy, but also for the speed. So,it is better to make 1st pass LM as big as possible. Is it just me, or you have the same experience? On Sun, Oct 20, 2013 at 10:29 PM, Michael Riley <ri...@go...> wrote: > It is included but you need to compile with *--enable-ngram-fsts*. > Similarly for the other extension formats. My own suggestion would be to > use a large LM represented as an NGramFst in 2nd pass lattice rescoring and > entropy prune that model to a modest size for a first pass with a static > (H)CLG. With a reasonable lattice size, you'll get virtually no search > errors. > > The NGramFst expects a specific ngram format; www.opengrm.org describes > and offers tools to train and convert from DARPA format. > > -m > > On Sun, Oct 20, 2013 at 2:10 PM, Daniel Povey <dp...@gm...> wrote: > >> Mike: NgramFst isn't included in the standard OpenFst distribution, is it? >> Dan >> >> >> On Sun, Oct 20, 2013 at 2:08 PM, Michael Riley <ri...@go...> wrote: >> > I don't know much about how ngram FSTs are used in Kaldi or the >> > characteristics of their implementations, but I do know that for sheer >> > OpenFst representation: >> > >> > VectorFst and ConstFst (the standard mutable and immutable OpenFst >> reps) >> > use about 20 bytes per state and 16 bytes per arc while NGramFst, >> specific >> > to ngram models (see here), use about 8 bytes per state and 8 bytes per >> arc. >> > The last, for representing at the word level, is the most compact and is >> > used, in general, in a rescoring or on-the-fly composition mode. There >> are >> > also various compact FST formats (see here) that can represent other >> > specific FSTs (and is extensible). >> > >> > With 64 bit compilation, you are limited by how much memory you have >> for >> > those reps and of course. Hope that helps. >> > >> > On Fri, Oct 18, 2013 at 12:06 AM, E <oth...@ao...> wrote: >> >> >> >> Thanks a lot for the answers and possible solutions. >> >> >> >> Few questions- >> >> >> >> What is the maximum size trigram language model supported by FST? I >> tried >> >> to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but >> crashed >> >> afterwords. So I want to know if there is a theoretical limit on size >> of >> >> language model that can be integrated with Kaldi. >> >> >> >> I will try to make HCLG.fst with gigaword again (with triphone AM), but >> >> has anyone tried to build it with LM of this size successfully, if so, >> what >> >> were the system requirements (RAM) and final FST size in >> mega/gigabytes? >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> October Webinars: Code for Performance >> >> Free Intel webinars can help you accelerate application performance. >> >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the >> most >> >> from >> >> the latest Intel processors and coprocessors. See abstracts and >> register > >> >> >> >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> >> Kaldi-users mailing list >> >> Kal...@li... >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> > >> > >> > >> ------------------------------------------------------------------------------ >> > October Webinars: Code for Performance >> > Free Intel webinars can help you accelerate application performance. >> > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> > from >> > the latest Intel processors and coprocessors. See abstracts and >> register > >> > >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> > _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > >> > > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |