|
From: Daniel P. <dp...@gm...> - 2013-10-20 18:34:22
|
OK, I remember now, I included --enable-ngram-fsts in the standard compilation options for Kaldi but I never got round to creating a recipe to use the openngrm toolkit. Doing this is definitely possible in Kaldi but it will require both some scripting and coding work, and I don't think I have time for it in the next 2-3 months. Dan On Sun, Oct 20, 2013 at 2:29 PM, Michael Riley <ri...@go...> wrote: > It is included but you need to compile with --enable-ngram-fsts. Similarly > for the other extension formats. My own suggestion would be to use a large > LM represented as an NGramFst in 2nd pass lattice rescoring and entropy > prune that model to a modest size for a first pass with a static (H)CLG. > With a reasonable lattice size, you'll get virtually no search errors. > > The NGramFst expects a specific ngram format; www.opengrm.org describes and > offers tools to train and convert from DARPA format. > > -m > > On Sun, Oct 20, 2013 at 2:10 PM, Daniel Povey <dp...@gm...> wrote: >> >> Mike: NgramFst isn't included in the standard OpenFst distribution, is it? >> Dan >> >> >> On Sun, Oct 20, 2013 at 2:08 PM, Michael Riley <ri...@go...> wrote: >> > I don't know much about how ngram FSTs are used in Kaldi or the >> > characteristics of their implementations, but I do know that for sheer >> > OpenFst representation: >> > >> > VectorFst and ConstFst (the standard mutable and immutable OpenFst >> > reps) >> > use about 20 bytes per state and 16 bytes per arc while NGramFst, >> > specific >> > to ngram models (see here), use about 8 bytes per state and 8 bytes per >> > arc. >> > The last, for representing at the word level, is the most compact and is >> > used, in general, in a rescoring or on-the-fly composition mode. There >> > are >> > also various compact FST formats (see here) that can represent other >> > specific FSTs (and is extensible). >> > >> > With 64 bit compilation, you are limited by how much memory you have >> > for >> > those reps and of course. Hope that helps. >> > >> > On Fri, Oct 18, 2013 at 12:06 AM, E <oth...@ao...> wrote: >> >> >> >> Thanks a lot for the answers and possible solutions. >> >> >> >> Few questions- >> >> >> >> What is the maximum size trigram language model supported by FST? I >> >> tried >> >> to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but >> >> crashed >> >> afterwords. So I want to know if there is a theoretical limit on size >> >> of >> >> language model that can be integrated with Kaldi. >> >> >> >> I will try to make HCLG.fst with gigaword again (with triphone AM), but >> >> has anyone tried to build it with LM of this size successfully, if so, >> >> what >> >> were the system requirements (RAM) and final FST size in >> >> mega/gigabytes? >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> October Webinars: Code for Performance >> >> Free Intel webinars can help you accelerate application performance. >> >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the >> >> most >> >> from >> >> the latest Intel processors and coprocessors. See abstracts and >> >> register > >> >> >> >> >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> >> Kaldi-users mailing list >> >> Kal...@li... >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > October Webinars: Code for Performance >> > Free Intel webinars can help you accelerate application performance. >> > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> > from >> > the latest Intel processors and coprocessors. See abstracts and register >> > > >> > >> > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> > _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > > > |