You can subscribe to this list here.
| 2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
(2) |
Aug
(2) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2012 |
Jan
|
Feb
|
Mar
(8) |
Apr
(4) |
May
(2) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
| 2013 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(7) |
May
(31) |
Jun
(40) |
Jul
(65) |
Aug
(37) |
Sep
(12) |
Oct
(57) |
Nov
(15) |
Dec
(35) |
| 2014 |
Jan
(3) |
Feb
(30) |
Mar
(57) |
Apr
(26) |
May
(49) |
Jun
(26) |
Jul
(63) |
Aug
(33) |
Sep
(20) |
Oct
(153) |
Nov
(62) |
Dec
(20) |
| 2015 |
Jan
(6) |
Feb
(21) |
Mar
(42) |
Apr
(33) |
May
(76) |
Jun
(102) |
Jul
(39) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: E <oth...@ao...> - 2013-10-22 07:20:32
|
-----Original Message----- From: BLUCHE, Théodore <The...@a2...> I guess you should generate lattices (with gmm-latgen-faster, e.g) and then you have lattice-add-penalty ... see for example in egs/rm/s5/local/score.sh Hope that's correct and that it'll help... ________________________________________ Thanks. But I'm not sure if this is the best way. I thought if decoder also had such a parameter, then lattice creation itself could be affected (in a preferable way depending on WIP). |
|
From: E <oth...@ao...> - 2013-10-22 07:07:32
|
Hi, I was trying to search in Kaldi source for the equivalent of word insertion penalty factor in HTK. It was shown here http://kaldi.sourceforge.net/structkaldi_1_1KaldiDecoderOptions.html but did not find it in downloaded source code. How might one access this parameter and control insertions? Thanks. |
|
From: Nagendra G. <nag...@go...> - 2013-10-20 23:22:58
|
Gigaword is a big LM. Nobody did that to my knowledge. Try pruning it before compiling. On Oct 18, 2013 12:07 AM, "E" <oth...@ao...> wrote: > Thanks a lot for the answers and possible solutions. > > Few questions- > > What is the maximum size trigram language model supported by FST? I > tried to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but > crashed afterwords. So I want to know if there is a theoretical limit on > size of language model that can be integrated with Kaldi. > > I will try to make HCLG.fst with gigaword again (with triphone AM), but > has anyone tried to build it with LM of this size successfully, if so, what > were the system requirements (RAM) and final FST size in mega/gigabytes? > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Daniel P. <dp...@gm...> - 2013-10-20 19:00:35
|
Yes, as big as you can given memory availability. At some point in the future we need to implement some kind of dynamic decoder either based on online composition of FSTs, or some other principle. Nickolay Shmyrev (cc'd) has a lot of decoder experience and may be able to help with this at some point. Personally, like I say, I am tied up for the next few months and can't really get to this. Dan On Sun, Oct 20, 2013 at 2:56 PM, Al Zatv <al...@gm...> wrote: >> My own suggestion would be to use a large LM represented as an NGramFst in >> 2nd pass lattice rescoring and entropy prune that model to a modest size for >> a first pass with a static (H)CLG. With a reasonable lattice size, you'll >> get virtually no search errors. > > But, in my experience, the size of the first pass LM is important not only > for accuracy, but also for the speed. So,it is better to make 1st pass LM as > big as possible. Is it just me, or you have the same experience? > > > On Sun, Oct 20, 2013 at 10:29 PM, Michael Riley <ri...@go...> wrote: >> >> It is included but you need to compile with --enable-ngram-fsts. Similarly >> for the other extension formats. My own suggestion would be to use a large >> LM represented as an NGramFst in 2nd pass lattice rescoring and entropy >> prune that model to a modest size for a first pass with a static (H)CLG. >> With a reasonable lattice size, you'll get virtually no search errors. >> >> The NGramFst expects a specific ngram format; www.opengrm.org describes >> and offers tools to train and convert from DARPA format. >> >> -m >> >> On Sun, Oct 20, 2013 at 2:10 PM, Daniel Povey <dp...@gm...> wrote: >>> >>> Mike: NgramFst isn't included in the standard OpenFst distribution, is >>> it? >>> Dan >>> >>> >>> On Sun, Oct 20, 2013 at 2:08 PM, Michael Riley <ri...@go...> wrote: >>> > I don't know much about how ngram FSTs are used in Kaldi or the >>> > characteristics of their implementations, but I do know that for sheer >>> > OpenFst representation: >>> > >>> > VectorFst and ConstFst (the standard mutable and immutable OpenFst >>> > reps) >>> > use about 20 bytes per state and 16 bytes per arc while NGramFst, >>> > specific >>> > to ngram models (see here), use about 8 bytes per state and 8 bytes per >>> > arc. >>> > The last, for representing at the word level, is the most compact and >>> > is >>> > used, in general, in a rescoring or on-the-fly composition mode. There >>> > are >>> > also various compact FST formats (see here) that can represent other >>> > specific FSTs (and is extensible). >>> > >>> > With 64 bit compilation, you are limited by how much memory you have >>> > for >>> > those reps and of course. Hope that helps. >>> > >>> > On Fri, Oct 18, 2013 at 12:06 AM, E <oth...@ao...> wrote: >>> >> >>> >> Thanks a lot for the answers and possible solutions. >>> >> >>> >> Few questions- >>> >> >>> >> What is the maximum size trigram language model supported by FST? I >>> >> tried >>> >> to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but >>> >> crashed >>> >> afterwords. So I want to know if there is a theoretical limit on size >>> >> of >>> >> language model that can be integrated with Kaldi. >>> >> >>> >> I will try to make HCLG.fst with gigaword again (with triphone AM), >>> >> but >>> >> has anyone tried to build it with LM of this size successfully, if so, >>> >> what >>> >> were the system requirements (RAM) and final FST size in >>> >> mega/gigabytes? >>> >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >>> >> October Webinars: Code for Performance >>> >> Free Intel webinars can help you accelerate application performance. >>> >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the >>> >> most >>> >> from >>> >> the latest Intel processors and coprocessors. See abstracts and >>> >> register > >>> >> >>> >> >>> >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >>> >> _______________________________________________ >>> >> Kaldi-users mailing list >>> >> Kal...@li... >>> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> >> >>> > >>> > >>> > >>> > ------------------------------------------------------------------------------ >>> > October Webinars: Code for Performance >>> > Free Intel webinars can help you accelerate application performance. >>> > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the >>> > most >>> > from >>> > the latest Intel processors and coprocessors. See abstracts and >>> > register > >>> > >>> > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >>> > _______________________________________________ >>> > Kaldi-users mailing list >>> > Kal...@li... >>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >>> > >> >> >> >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> from >> the latest Intel processors and coprocessors. See abstracts and register > >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > |
|
From: Al Z. <al...@gm...> - 2013-10-20 18:56:33
|
> My own suggestion would be to use a large LM represented as an NGramFst in 2nd pass lattice rescoring and entropy prune that model to a modest size for a first pass with a static (H)CLG. With a reasonable lattice size, you'll get virtually no search errors. But, in my experience, the size of the first pass LM is important not only for accuracy, but also for the speed. So,it is better to make 1st pass LM as big as possible. Is it just me, or you have the same experience? On Sun, Oct 20, 2013 at 10:29 PM, Michael Riley <ri...@go...> wrote: > It is included but you need to compile with *--enable-ngram-fsts*. > Similarly for the other extension formats. My own suggestion would be to > use a large LM represented as an NGramFst in 2nd pass lattice rescoring and > entropy prune that model to a modest size for a first pass with a static > (H)CLG. With a reasonable lattice size, you'll get virtually no search > errors. > > The NGramFst expects a specific ngram format; www.opengrm.org describes > and offers tools to train and convert from DARPA format. > > -m > > On Sun, Oct 20, 2013 at 2:10 PM, Daniel Povey <dp...@gm...> wrote: > >> Mike: NgramFst isn't included in the standard OpenFst distribution, is it? >> Dan >> >> >> On Sun, Oct 20, 2013 at 2:08 PM, Michael Riley <ri...@go...> wrote: >> > I don't know much about how ngram FSTs are used in Kaldi or the >> > characteristics of their implementations, but I do know that for sheer >> > OpenFst representation: >> > >> > VectorFst and ConstFst (the standard mutable and immutable OpenFst >> reps) >> > use about 20 bytes per state and 16 bytes per arc while NGramFst, >> specific >> > to ngram models (see here), use about 8 bytes per state and 8 bytes per >> arc. >> > The last, for representing at the word level, is the most compact and is >> > used, in general, in a rescoring or on-the-fly composition mode. There >> are >> > also various compact FST formats (see here) that can represent other >> > specific FSTs (and is extensible). >> > >> > With 64 bit compilation, you are limited by how much memory you have >> for >> > those reps and of course. Hope that helps. >> > >> > On Fri, Oct 18, 2013 at 12:06 AM, E <oth...@ao...> wrote: >> >> >> >> Thanks a lot for the answers and possible solutions. >> >> >> >> Few questions- >> >> >> >> What is the maximum size trigram language model supported by FST? I >> tried >> >> to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but >> crashed >> >> afterwords. So I want to know if there is a theoretical limit on size >> of >> >> language model that can be integrated with Kaldi. >> >> >> >> I will try to make HCLG.fst with gigaword again (with triphone AM), but >> >> has anyone tried to build it with LM of this size successfully, if so, >> what >> >> were the system requirements (RAM) and final FST size in >> mega/gigabytes? >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> October Webinars: Code for Performance >> >> Free Intel webinars can help you accelerate application performance. >> >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the >> most >> >> from >> >> the latest Intel processors and coprocessors. See abstracts and >> register > >> >> >> >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> >> Kaldi-users mailing list >> >> Kal...@li... >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> > >> > >> > >> ------------------------------------------------------------------------------ >> > October Webinars: Code for Performance >> > Free Intel webinars can help you accelerate application performance. >> > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> > from >> > the latest Intel processors and coprocessors. See abstracts and >> register > >> > >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> > _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > >> > > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Daniel P. <dp...@gm...> - 2013-10-20 18:34:22
|
OK, I remember now, I included --enable-ngram-fsts in the standard compilation options for Kaldi but I never got round to creating a recipe to use the openngrm toolkit. Doing this is definitely possible in Kaldi but it will require both some scripting and coding work, and I don't think I have time for it in the next 2-3 months. Dan On Sun, Oct 20, 2013 at 2:29 PM, Michael Riley <ri...@go...> wrote: > It is included but you need to compile with --enable-ngram-fsts. Similarly > for the other extension formats. My own suggestion would be to use a large > LM represented as an NGramFst in 2nd pass lattice rescoring and entropy > prune that model to a modest size for a first pass with a static (H)CLG. > With a reasonable lattice size, you'll get virtually no search errors. > > The NGramFst expects a specific ngram format; www.opengrm.org describes and > offers tools to train and convert from DARPA format. > > -m > > On Sun, Oct 20, 2013 at 2:10 PM, Daniel Povey <dp...@gm...> wrote: >> >> Mike: NgramFst isn't included in the standard OpenFst distribution, is it? >> Dan >> >> >> On Sun, Oct 20, 2013 at 2:08 PM, Michael Riley <ri...@go...> wrote: >> > I don't know much about how ngram FSTs are used in Kaldi or the >> > characteristics of their implementations, but I do know that for sheer >> > OpenFst representation: >> > >> > VectorFst and ConstFst (the standard mutable and immutable OpenFst >> > reps) >> > use about 20 bytes per state and 16 bytes per arc while NGramFst, >> > specific >> > to ngram models (see here), use about 8 bytes per state and 8 bytes per >> > arc. >> > The last, for representing at the word level, is the most compact and is >> > used, in general, in a rescoring or on-the-fly composition mode. There >> > are >> > also various compact FST formats (see here) that can represent other >> > specific FSTs (and is extensible). >> > >> > With 64 bit compilation, you are limited by how much memory you have >> > for >> > those reps and of course. Hope that helps. >> > >> > On Fri, Oct 18, 2013 at 12:06 AM, E <oth...@ao...> wrote: >> >> >> >> Thanks a lot for the answers and possible solutions. >> >> >> >> Few questions- >> >> >> >> What is the maximum size trigram language model supported by FST? I >> >> tried >> >> to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but >> >> crashed >> >> afterwords. So I want to know if there is a theoretical limit on size >> >> of >> >> language model that can be integrated with Kaldi. >> >> >> >> I will try to make HCLG.fst with gigaword again (with triphone AM), but >> >> has anyone tried to build it with LM of this size successfully, if so, >> >> what >> >> were the system requirements (RAM) and final FST size in >> >> mega/gigabytes? >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> October Webinars: Code for Performance >> >> Free Intel webinars can help you accelerate application performance. >> >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the >> >> most >> >> from >> >> the latest Intel processors and coprocessors. See abstracts and >> >> register > >> >> >> >> >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> >> Kaldi-users mailing list >> >> Kal...@li... >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > October Webinars: Code for Performance >> > Free Intel webinars can help you accelerate application performance. >> > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> > from >> > the latest Intel processors and coprocessors. See abstracts and register >> > > >> > >> > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> > _______________________________________________ >> > Kaldi-users mailing list >> > Kal...@li... >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > > > |
|
From: Michael R. <ri...@go...> - 2013-10-20 18:30:19
|
It is included but you need to compile with *--enable-ngram-fsts*. Similarly for the other extension formats. My own suggestion would be to use a large LM represented as an NGramFst in 2nd pass lattice rescoring and entropy prune that model to a modest size for a first pass with a static (H)CLG. With a reasonable lattice size, you'll get virtually no search errors. The NGramFst expects a specific ngram format; www.opengrm.org describes and offers tools to train and convert from DARPA format. -m On Sun, Oct 20, 2013 at 2:10 PM, Daniel Povey <dp...@gm...> wrote: > Mike: NgramFst isn't included in the standard OpenFst distribution, is it? > Dan > > > On Sun, Oct 20, 2013 at 2:08 PM, Michael Riley <ri...@go...> wrote: > > I don't know much about how ngram FSTs are used in Kaldi or the > > characteristics of their implementations, but I do know that for sheer > > OpenFst representation: > > > > VectorFst and ConstFst (the standard mutable and immutable OpenFst > reps) > > use about 20 bytes per state and 16 bytes per arc while NGramFst, > specific > > to ngram models (see here), use about 8 bytes per state and 8 bytes per > arc. > > The last, for representing at the word level, is the most compact and is > > used, in general, in a rescoring or on-the-fly composition mode. There > are > > also various compact FST formats (see here) that can represent other > > specific FSTs (and is extensible). > > > > With 64 bit compilation, you are limited by how much memory you have > for > > those reps and of course. Hope that helps. > > > > On Fri, Oct 18, 2013 at 12:06 AM, E <oth...@ao...> wrote: > >> > >> Thanks a lot for the answers and possible solutions. > >> > >> Few questions- > >> > >> What is the maximum size trigram language model supported by FST? I > tried > >> to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but > crashed > >> afterwords. So I want to know if there is a theoretical limit on size of > >> language model that can be integrated with Kaldi. > >> > >> I will try to make HCLG.fst with gigaword again (with triphone AM), but > >> has anyone tried to build it with LM of this size successfully, if so, > what > >> were the system requirements (RAM) and final FST size in mega/gigabytes? > >> > >> > >> > ------------------------------------------------------------------------------ > >> October Webinars: Code for Performance > >> Free Intel webinars can help you accelerate application performance. > >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > >> from > >> the latest Intel processors and coprocessors. See abstracts and > register > > >> > >> > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > >> _______________________________________________ > >> Kaldi-users mailing list > >> Kal...@li... > >> https://lists.sourceforge.net/lists/listinfo/kaldi-users > >> > > > > > > > ------------------------------------------------------------------------------ > > October Webinars: Code for Performance > > Free Intel webinars can help you accelerate application performance. > > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > > from > > the latest Intel processors and coprocessors. See abstracts and register > > > > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > > _______________________________________________ > > Kaldi-users mailing list > > Kal...@li... > > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > > |
|
From: Daniel P. <dp...@gm...> - 2013-10-20 18:10:51
|
Mike: NgramFst isn't included in the standard OpenFst distribution, is it? Dan On Sun, Oct 20, 2013 at 2:08 PM, Michael Riley <ri...@go...> wrote: > I don't know much about how ngram FSTs are used in Kaldi or the > characteristics of their implementations, but I do know that for sheer > OpenFst representation: > > VectorFst and ConstFst (the standard mutable and immutable OpenFst reps) > use about 20 bytes per state and 16 bytes per arc while NGramFst, specific > to ngram models (see here), use about 8 bytes per state and 8 bytes per arc. > The last, for representing at the word level, is the most compact and is > used, in general, in a rescoring or on-the-fly composition mode. There are > also various compact FST formats (see here) that can represent other > specific FSTs (and is extensible). > > With 64 bit compilation, you are limited by how much memory you have for > those reps and of course. Hope that helps. > > On Fri, Oct 18, 2013 at 12:06 AM, E <oth...@ao...> wrote: >> >> Thanks a lot for the answers and possible solutions. >> >> Few questions- >> >> What is the maximum size trigram language model supported by FST? I tried >> to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but crashed >> afterwords. So I want to know if there is a theoretical limit on size of >> language model that can be integrated with Kaldi. >> >> I will try to make HCLG.fst with gigaword again (with triphone AM), but >> has anyone tried to build it with LM of this size successfully, if so, what >> were the system requirements (RAM) and final FST size in mega/gigabytes? >> >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> from >> the latest Intel processors and coprocessors. See abstracts and register > >> >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Michael R. <ri...@go...> - 2013-10-20 18:09:18
|
I don't know much about how ngram FSTs are used in Kaldi or the characteristics of their implementations, but I do know that for sheer OpenFst representation: VectorFst and ConstFst (the standard mutable and immutable OpenFst reps) use about 20 bytes per state and 16 bytes per arc while NGramFst, specific to ngram models (see here<http://www.openfst.org/twiki/bin/view/FST/FstExtensions>), use about 8 bytes per state and 8 bytes per arc. The last, for representing at the word level, is the most compact and is used, in general, in a rescoring or on-the-fly composition mode. There are also various compact FST formats (see here<http://www.openfst.org/twiki/bin/view/FST/FstExtensions>) that can represent other specific FSTs (and is extensible). With 64 bit compilation, you are limited by how much memory you have for those reps and of course. Hope that helps. On Fri, Oct 18, 2013 at 12:06 AM, E <oth...@ao...> wrote: > Thanks a lot for the answers and possible solutions. > > Few questions- > > What is the maximum size trigram language model supported by FST? I > tried to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but > crashed afterwords. So I want to know if there is a theoretical limit on > size of language model that can be integrated with Kaldi. > > I will try to make HCLG.fst with gigaword again (with triphone AM), but > has anyone tried to build it with LM of this size successfully, if so, what > were the system requirements (RAM) and final FST size in mega/gigabytes? > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |
|
From: Andreas S. K. <as...@cb...> - 2013-10-18 08:21:52
|
Hi E I have successfully trained and used 40k language models and run into the same problem as you do when trying to increase the vocab. Because of Danish productive compounding, I have decided try the 'lmrescore' approach that Dan suggested in an earlier mail. -Andreas ________________________________________ From: E [oth...@ao...] Sent: 18 October 2013 06:06 To: kal...@li... Subject: Re: [Kaldi-users] Memory requirement for FSTs Thanks a lot for the answers and possible solutions. Few questions- What is the maximum size trigram language model supported by FST? I tried to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but crashed afterwords. So I want to know if there is a theoretical limit on size of language model that can be integrated with Kaldi. I will try to make HCLG.fst with gigaword again (with triphone AM), but has anyone tried to build it with LM of this size successfully, if so, what were the system requirements (RAM) and final FST size in mega/gigabytes? |
|
From: E <oth...@ao...> - 2013-10-18 04:07:03
|
Thanks a lot for the answers and possible solutions. Few questions- What is the maximum size trigram language model supported by FST? I tried to use Gigaword LM (64k vocab), mkgraph.sh ran for a long time but crashed afterwords. So I want to know if there is a theoretical limit on size of language model that can be integrated with Kaldi. I will try to make HCLG.fst with gigaword again (with triphone AM), but has anyone tried to build it with LM of this size successfully, if so, what were the system requirements (RAM) and final FST size in mega/gigabytes? |
|
From: Daniel P. <dp...@gm...> - 2013-10-17 18:11:11
|
BTW, one option is to decode with a fairly tightly pruned LM, and then rescore the lattices with a larger one. Search for "lmrescore" in egs/wsj/s5/run.sh Dan On Thu, Oct 17, 2013 at 1:03 AM, Arnab Ghoshal <ar...@gm...> wrote: > On Thu, Oct 17, 2013 at 8:26 AM, E <oth...@ao...> wrote: >> but in short why size of HCLG.fst >> sum of size of >> individual *.fsts)? Is there some redundancy involved? > > No redundacny. HCLG is *composition* of the individual FSTs, and its > size is roughly the product of the sizes of the FSTs being composed > (modulo determinization & minimization). Try pruning your LM if HCLG > is too big. > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Arnab G. <ar...@gm...> - 2013-10-17 08:03:37
|
On Thu, Oct 17, 2013 at 8:26 AM, E <oth...@ao...> wrote: > but in short why size of HCLG.fst >> sum of size of > individual *.fsts)? Is there some redundancy involved? No redundacny. HCLG is *composition* of the individual FSTs, and its size is roughly the product of the sizes of the FSTs being composed (modulo determinization & minimization). Try pruning your LM if HCLG is too big. |
|
From: Vassil P. <vas...@gm...> - 2013-10-17 07:47:32
|
There are people on this list that are far more knowledgeable than me, and perhaps someone can give a more complete/correct answer, but my understanding is that it boils down to the search space representation. In the traditional recognizers the search space is expanded on-demand, whereas with the (vanilla) WFST approach the entire search graph is statically built before the recognition begins. As far as I know there are ways to construct the graph on the fly, but I am not sure if something like this is currently implemented in Kaldi(I think not). Depending on your needs/constraints you may want to have a look at say PocketSphinx, although on the other hand Nuance released a WFST based recognizer for Android some time ago(not sure how well it works): https://github.com/android/platform_external_srec . Vassil On Thu, Oct 17, 2013 at 10:26 AM, E <oth...@ao...> wrote: > > Hello, > > I have been playing with Kaldi online recognizers (great work!) and wanted > to ask if the FST approach is useful if I'm running under memory > constraints. If I use traditional ARPA language model + acoustic models; > total size of models is < 100 Mb (for 20,000 vocab size). But the HCLG.fst > takes a whooping 500 Mbs! Why is this so (Perhaps I should read the papers > to find the answers, but in short why size of HCLG.fst >> sum of size of > individual *.fsts)? Is there some redundancy involved? > > What might be alternatives if one want to further reduce the size of > HCLG.fst? > > Thanks. > > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: E <oth...@ao...> - 2013-10-17 07:26:35
|
Hello, I have been playing with Kaldi online recognizers (great work!) and wanted to ask if the FST approach is useful if I'm running under memory constraints. If I use traditional ARPA language model + acoustic models; total size of models is < 100 Mb (for 20,000 vocab size). But the HCLG.fst takes a whooping 500 Mbs! Why is this so (Perhaps I should read the papers to find the answers, but in short why size of HCLG.fst >> sum of size of individual *.fsts)? Is there some redundancy involved? What might be alternatives if one want to further reduce the size of HCLG.fst? Thanks. |
|
From: E <oth...@ao...> - 2013-10-17 07:19:20
|
This particular code was actually written by Danijel Korzinek. Just to make that clear: did you only run the server? Thanks a lot. Stupid mistake, I should have run "online-gmm-decode-faster" - e |
|
From: Vassil P. <vas...@gm...> - 2013-10-17 05:22:07
|
[almost awake now] This particular code was actually written by Danijel Korzinek. Just to make that clear: did you only run the server? The binary you are using implements a client/server model and you need to send it audio with "online-audio-client" (or the Java client) as described in http://kaldi.sourceforge.net/online_programs.html Vassil On Thu, Oct 17, 2013 at 6:23 AM, Daniel Povey <dp...@gm...> wrote: > Cc'ing Vassil as he wrote this code; he may be asleep right now as he > lives in Bulgaria. > Dan > > > On Wed, Oct 16, 2013 at 8:21 PM, E <oth...@ao...> wrote: >> Hi! >> >> When I run online decoder I see below output on screen >> >> online-audio-server-decode-faster --verbose=5 --rt-min=0.7 --rt-max=0.9 >> --max-active=10000 --beam=72.0 --acoustic-scale=0.0769 ./final.mdl >> graph_tgpr/HCLG.fst graph_tgpr/words.txt 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15 >> 5010 graph_tgpr/phones/word_boundary.int final.mat >> TcpServer: Listening on port: 5010 >> Reading LDA matrix: final.mat... >> Reading acoustic model: ./final.mdl... >> Reading word list: graph_tgpr/words.txt... >> Reading word boundary file: graph_tgpr/phones/word_boundary.int... >> Reading FST: graph_tgpr/HCLG.fst... >> Waiting for client... >> >> >> Then it stays like that for a long time without recognizing. Could this be >> because the HCLG.fst file is too large (585M)? >> >> I checked mic by recording in Audacity. Recording works. >> >> How to debug this issue? >> >> Thanks! >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> from >> the latest Intel processors and coprocessors. See abstracts and register > >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> |
|
From: Vassil P. <vas...@gm...> - 2013-10-17 05:21:12
|
The online decoders are inherently constrained to run faster than real time - otherwise audio input buffers overflow and in general the recognizer can't keep up with the speaker. These real-time factors were meant to allow the user to specify minimum and maximum bounds in which the inner loop of the decoder is allowed to run, by decreasing the beam if the upper bound is exceeded and increasing it if the decoding become too fast. In practice, I think the current implementation isn't perfect. Vassil On Thu, Oct 17, 2013 at 5:22 AM, Daniel Povey <dp...@gm...> wrote: > I think the latter; this is probably in the online decoder. > Dan > > > On Wed, Oct 16, 2013 at 7:06 PM, E <oth...@ao...> wrote: >> Hello, >> >> I'm a bit flummoxed by rt-min and rt-max factors. I thought these factors >> are computed after recognition is complete- depending on how much time it >> took to decode. How are then they supplied as arguments to decoder? Does the >> decoder do more pruning if it seems that rt-max is getting exceeded? >> >> ------------------------------------------------------------------------------ >> October Webinars: Code for Performance >> Free Intel webinars can help you accelerate application performance. >> Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most >> from >> the latest Intel processors and coprocessors. See abstracts and register > >> http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users |
|
From: Daniel P. <dp...@gm...> - 2013-10-17 03:23:44
|
Cc'ing Vassil as he wrote this code; he may be asleep right now as he lives in Bulgaria. Dan On Wed, Oct 16, 2013 at 8:21 PM, E <oth...@ao...> wrote: > Hi! > > When I run online decoder I see below output on screen > > online-audio-server-decode-faster --verbose=5 --rt-min=0.7 --rt-max=0.9 > --max-active=10000 --beam=72.0 --acoustic-scale=0.0769 ./final.mdl > graph_tgpr/HCLG.fst graph_tgpr/words.txt 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15 > 5010 graph_tgpr/phones/word_boundary.int final.mat > TcpServer: Listening on port: 5010 > Reading LDA matrix: final.mat... > Reading acoustic model: ./final.mdl... > Reading word list: graph_tgpr/words.txt... > Reading word boundary file: graph_tgpr/phones/word_boundary.int... > Reading FST: graph_tgpr/HCLG.fst... > Waiting for client... > > > Then it stays like that for a long time without recognizing. Could this be > because the HCLG.fst file is too large (585M)? > > I checked mic by recording in Audacity. Recording works. > > How to debug this issue? > > Thanks! > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: E <oth...@ao...> - 2013-10-17 03:21:48
|
Hi!
When I run online decoder I see below output on screen
online-audio-server-decode-faster --verbose=5 --rt-min=0.7 --rt-max=0.9 --max-active=10000 --beam=72.0 --acoustic-scale=0.0769 ./final.mdl graph_tgpr/HCLG.fst graph_tgpr/words.txt 1:2:3:4:5:6:7:8:9:10:11:12:13:14:15 5010 graph_tgpr/phones/word_boundary.int final.mat
TcpServer: Listening on port: 5010
Reading LDA matrix: final.mat...
Reading acoustic model: ./final.mdl...
Reading word list: graph_tgpr/words.txt...
Reading word boundary file: graph_tgpr/phones/word_boundary.int...
Reading FST: graph_tgpr/HCLG.fst...
Waiting for client...
Then it stays like that for a long time without recognizing. Could this be because the HCLG.fst file is too large (585M)?
I checked mic by recording in Audacity. Recording works.
How to debug this issue?
Thanks!
|
|
From: Daniel P. <dp...@gm...> - 2013-10-17 02:22:52
|
I think the latter; this is probably in the online decoder. Dan On Wed, Oct 16, 2013 at 7:06 PM, E <oth...@ao...> wrote: > Hello, > > I'm a bit flummoxed by rt-min and rt-max factors. I thought these factors > are computed after recognition is complete- depending on how much time it > took to decode. How are then they supplied as arguments to decoder? Does the > decoder do more pruning if it seems that rt-max is getting exceeded? > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: E <oth...@ao...> - 2013-10-17 02:06:49
|
Hello, I'm a bit flummoxed by rt-min and rt-max factors. I thought these factors are computed after recognition is complete- depending on how much time it took to decode. How are then they supplied as arguments to decoder? Does the decoder do more pruning if it seems that rt-max is getting exceeded? |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-15 08:00:12
|
Hi all, there are two open positions at the University of Edinburgh. For the ASR position, experience with Kaldi will be highly valued. Please forward this to people who may be interested. -Arnab Postdoctoral Research Positions The Centre for Speech Technology Research University of Edinburgh http://www.cstr.ed.ac.uk/opportunities/ Closing date: 18th November 2013 We have open postdoctoral research positions in speech recognition and speech synthesis, which are part of the large UK project "Natural Speech Technology", http://www.natural-speech-technology.org. The topics of research to be undertaken in these positions is flexible. We encourage applicants with their own research agenda, provided that it fits within the objective of the project, which is to advance the state of the art in speech technology by making it more natural, approaching human levels of reliability, adaptability, and conversational richness. SPEECH SYNTHESIS: Our specific interests in speech synthesis include - but are not limited to - the following: * Machine learning for vocoding. * Fluent speech synthesis. * Shallow stochastic natural language generation to improve fluency. * Beyond decision tree parameter tying, including neural network approaches or tree intersect models. * Use of synthetic speech in assistive technologies. SPEECH RECOGNITION: Our specific interests in speech recognition include - but are not limited to - the following: * Wide domain coverage and models which make use of rich contexts. * Cross-lingual speech recognition. * Neural network models. * Adaptation and canonical modelling techniques for acoustic or language modelling. * Distant speech recognition. * Approaches based on models incorporating articulatory data. The Centre for Speech Technology Research (CSTR) is an exciting, vibrant, interdisciplinary research centre and a great place to work. We are part of the University of Edinburgh (QS world ranking 17th) linking the world-class subject areas of informatics / computer science (QS world ranking 15th) and linguistics (QS world ranking 5th) Founded in 1984, CSTR is concerned with research in all areas of speech technology including speech recognition, speech synthesis, speech signal processing, information access, multimodal interfaces and dialogue systems. We have many significant collaborations with the wider community of researchers in speech science, language, cognition and machine learning for which Edinburgh is renowned, and a wide network of collaborators across the globe. For further details, and links to the online application procedure please visithttp://www.cstr.ed.ac.uk/opportunities/ Informal enquiries about these positions should be made to Prof Steve Renals (s.r...@ed...) or to Prof Simon King (Sim...@ed...). |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-15 07:15:02
|
Hi, Unfortunately I think there is no other way to do this, but to change the source code. As you might have already seen the feature extraction parameters are hard-coded - e.g. here is a snippet from the main() function: // We are not properly registering/exposing MFCC and frame extraction options, // because there are parts of the online decoding code, where some of these // options are hardwired(ToDo: we should fix this at some point) MfccOptions mfcc_opts; mfcc_opts.use_energy = false; int32 frame_length = mfcc_opts.frame_opts.frame_length_ms = 25; int32 frame_shift = mfcc_opts.frame_opts.frame_shift_ms = 10; To be honest I can't even remember which parts of the online code should be fixed, and I don't have time to refactor the code at the moment. Vassil On Tue, Oct 15, 2013 at 9:30 AM, Mailing list used for User Communication and Updates <kal...@li...> wrote: > Hi, > > I wanted to try online-audio-server-decode-faster with some *.mdl files > (trained using non-default FE params) I have. There don't seem to be any > feature extraction arguments (window length, number of FFT bins). How should > I go about this task? > > Thanks! > > ------------------------------------------------------------------------------ > October Webinars: Code for Performance > Free Intel webinars can help you accelerate application performance. > Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most > from > the latest Intel processors and coprocessors. See abstracts and register > > http://pubads.g.doubleclick.net/gampad/clk?id=60135031&iu=/4140/ostg.clktrk > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > |
|
From: Mailing l. u. f. U. C. a. U. <kal...@li...> - 2013-10-15 06:30:38
|
Hi, I wanted to try online-audio-server-decode-faster with some *.mdl files (trained using non-default FE params) I have. There don't seem to be any feature extraction arguments (window length, number of FFT bins). How should I go about this task? Thanks! |