kaldi-users Mailing List for Kaldi (Page 4)

Brought to you by: bouliagi, danielpovey, jtrmal, ngoel17, and 2 others

This project can now be found here.

kaldi-users — Kaldi Users

You can subscribe to this list here.

2011	_Jan	_Feb	_Mar	_Apr	_May	_Jun	_Jul (2)	_Aug (2)	_Sep (1)	_Oct (1)	_Nov	_Dec
2012	_Jan	_Feb	_Mar (8)	_Apr (4)	_May (2)	_Jun (1)	_Jul	_Aug	_Sep	_Oct	_Nov	_Dec
2013	_Jan	_Feb (2)	_Mar (2)	_Apr (7)	_May (31)	_Jun (40)	_Jul (65)	_Aug (37)	_Sep (12)	_Oct (57)	_Nov (15)	_Dec (35)
2014	_Jan (3)	_Feb (30)	_Mar (57)	_Apr (26)	_May (49)	_Jun (26)	_Jul (63)	_Aug (33)	_Sep (20)	_Oct (153)	_Nov (62)	_Dec (20)
2015	_Jan (6)	_Feb (21)	_Mar (42)	_Apr (33)	_May (76)	_Jun (102)	_Jul (39)	_Aug	_Sep	_Oct	_Nov	_Dec

Flat | Threaded

<< < 1 2 3 4 5 6 .. 48 > >> (Page 4 of 48)

Re: [Kaldi-users] LM weight

From: Daniel P. <dp...@gm...> - 2015-06-23 03:59:00

Usually if there is a lot of acoustic context in your model you will
require a larger LM weight.
Also, if for some reason there tend to be a lot of insertions in
decoding (e.g. something weird went wrong in training, or there is
some kind of normalization problem), a large LM weight can help reduce
insertions and so improve the WER.

Dan


On Mon, Jun 22, 2015 at 11:36 PM, Kirill Katsnelson
<kir...@sm...> wrote:
> I am getting the same ratio on both small and more targeted, and a quite large general LM. I do not understand what to make out if it!
>
>  -kkm
>
>> -----Original Message-----
>> From: Nagendra Goel [mailto:nag...@go...]
>> Sent: 2015-06-22 2032
>> To: Kirill Katsnelson; kal...@li...
>> Subject: RE: [Kaldi-users] LM weight
>>
>> Or maybe your domain is limited and LM very nicely matched to the task
>> at hand?
>>
>> -----Original Message-----
>> From: Kirill Katsnelson [mailto:kir...@sm...]
>> Sent: Monday, June 22, 2015 11:29 PM
>> To: kal...@li...
>> Subject: [Kaldi-users] LM weight
>>
>> I my test sets I am getting the best WER at LM/acoustic weight in the
>> range of 18-19, with multiple LMs of different size and origin. I was
>> usually thinking the usual ballpark figure about 10, give or take. From
>> your experience, does this larger LM weight mean anything, and what if
>> it does? I am guessing an inadequate acoustic model, requiring more LM
>> "pull"--am I making sense?
>>
>>  -kkm
>>
>> -----------------------------------------------------------------------
>> -----
>> --
>> Monitor 25 network devices or servers for free with OpManager!
>> OpManager is web-based network management software that monitors
>> network devices and physical & virtual servers, alerts via email & sms
>> for fault.
>> Monitor 25 devices for free with no restriction. Download now
>> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
>> _______________________________________________
>> Kaldi-users mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>
> ------------------------------------------------------------------------------
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault. Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users

Re: [Kaldi-users] LM weight

From: Kirill K. <kir...@sm...> - 2015-06-23 03:36:34

I am getting the same ratio on both small and more targeted, and a quite large general LM. I do not understand what to make out if it!

 -kkm

> -----Original Message-----
> From: Nagendra Goel [mailto:nag...@go...]
> Sent: 2015-06-22 2032
> To: Kirill Katsnelson; kal...@li...
> Subject: RE: [Kaldi-users] LM weight
> 
> Or maybe your domain is limited and LM very nicely matched to the task
> at hand?
> 
> -----Original Message-----
> From: Kirill Katsnelson [mailto:kir...@sm...]
> Sent: Monday, June 22, 2015 11:29 PM
> To: kal...@li...
> Subject: [Kaldi-users] LM weight
> 
> I my test sets I am getting the best WER at LM/acoustic weight in the
> range of 18-19, with multiple LMs of different size and origin. I was
> usually thinking the usual ballpark figure about 10, give or take. From
> your experience, does this larger LM weight mean anything, and what if
> it does? I am guessing an inadequate acoustic model, requiring more LM
> "pull"--am I making sense?
> 
>  -kkm
> 
> -----------------------------------------------------------------------
> -----
> --
> Monitor 25 network devices or servers for free with OpManager!
> OpManager is web-based network management software that monitors
> network devices and physical & virtual servers, alerts via email & sms
> for fault.
> Monitor 25 devices for free with no restriction. Download now
> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users

Re: [Kaldi-users] LM weight

From: Nagendra G. <nag...@go...> - 2015-06-23 03:31:37

Or maybe your domain is limited and LM very nicely matched to the task at
hand?

-----Original Message-----
From: Kirill Katsnelson [mailto:kir...@sm...] 
Sent: Monday, June 22, 2015 11:29 PM
To: kal...@li...
Subject: [Kaldi-users] LM weight

I my test sets I am getting the best WER at LM/acoustic weight in the range
of 18-19, with multiple LMs of different size and origin. I was usually
thinking the usual ballpark figure about 10, give or take. From your
experience, does this larger LM weight mean anything, and what if it does? I
am guessing an inadequate acoustic model, requiring more LM "pull"--am I
making sense?

 -kkm

----------------------------------------------------------------------------
--
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors network
devices and physical & virtual servers, alerts via email & sms for fault.
Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
_______________________________________________
Kaldi-users mailing list
Kal...@li...
https://lists.sourceforge.net/lists/listinfo/kaldi-users

[Kaldi-users] LM weight

From: Kirill K. <kir...@sm...> - 2015-06-23 03:29:26

I my test sets I am getting the best WER at LM/acoustic weight in the range of 18-19, with multiple LMs of different size and origin. I was usually thinking the usual ballpark figure about 10, give or take. From your experience, does this larger LM weight mean anything, and what if it does? I am guessing an inadequate acoustic model, requiring more LM "pull"--am I making sense?

 -kkm

Re: [Kaldi-users] nnet2-online i-vector sensibility with short utterances

From: Daniel P. <dp...@gm...> - 2015-06-18 21:08:31

The lack of length normalization is actually on purpose.  It is the
only way to make it so the system can be in principle completely
invariant to data offsets.  It also enables more robust backoff to
when you have no adaptation data at all, because it smoothly
approaches the zero ivector (due to the prior term in the iVector
estimation objective function).
I think you should just not use the iVectors at all if your utterances
are very short.  For the CTS task, you can always use previous
utterances of the same speaker in the iVector estimation.  The setup
that's checked in does that unless you decode with --per-utt.
Dan


On Thu, Jun 18, 2015 at 9:29 AM, Nagendra Goel
<nag...@go...> wrote:
> I think it would make sense. Would you like to contribute that to the
> recipe.
>
> -----Original Message-----
> From: David van Leeuwen [mailto:dav...@gm...]
> Sent: Thursday, June 18, 2015 5:18 AM
> To: kal...@li...
> Subject: [Kaldi-users] nnet2-online i-vector sensibility with short
> utterances
>
> Hello,
>
> We're using the nnet2-online setup in a CTS task.  We have a good experience
> with the same setup for a BN task.  However, for the CTS task, where
> utterances can be very short ("yes", "mmm", etc), and we observe a very
> strong dependence of the ivector length on duration (which makes sense) a
> very strong dependence of ASR performance on ivector length (which also
> makes sense).
>
> It seems that in the nnet2-online setup the ivectors are not normalized to
> length as is customary in speaker recognition.  The nnet doesn't seem to
> like the duration dependence---what would be an approach to deal with this?
> Would it make sense to train the nnet with length-normalized ivectors?
>
> Cheers,
>
> ---david
>
>
> --
> David van Leeuwen
>
> ----------------------------------------------------------------------------
> --
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>
>
> ------------------------------------------------------------------------------
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users

Re: [Kaldi-users] nnet2-online i-vector sensibility with short utterances

From: Nagendra G. <nag...@go...> - 2015-06-18 14:00:21

I think it would make sense. Would you like to contribute that to the
recipe.

-----Original Message-----
From: David van Leeuwen [mailto:dav...@gm...] 
Sent: Thursday, June 18, 2015 5:18 AM
To: kal...@li...
Subject: [Kaldi-users] nnet2-online i-vector sensibility with short
utterances

Hello,

We're using the nnet2-online setup in a CTS task.  We have a good experience
with the same setup for a BN task.  However, for the CTS task, where
utterances can be very short ("yes", "mmm", etc), and we observe a very
strong dependence of the ivector length on duration (which makes sense) a
very strong dependence of ASR performance on ivector length (which also
makes sense).

It seems that in the nnet2-online setup the ivectors are not normalized to
length as is customary in speaker recognition.  The nnet doesn't seem to
like the duration dependence---what would be an approach to deal with this?
Would it make sense to train the nnet with length-normalized ivectors?

Cheers,

---david

--
David van Leeuwen

----------------------------------------------------------------------------
--
_______________________________________________
Kaldi-users mailing list
Kal...@li...
https://lists.sourceforge.net/lists/listinfo/kaldi-users

[Kaldi-users] nnet2-online i-vector sensibility with short utterances

From: David v. L. <dav...@gm...> - 2015-06-18 09:18:43

Hello,

We're using the nnet2-online setup in a CTS task.  We have a good
experience with the same setup for a BN task.  However, for the CTS
task, where utterances can be very short ("yes", "mmm", etc), and we
observe a very strong dependence of the ivector length on duration
(which makes sense) a very strong dependence of ASR performance on
ivector length (which also makes sense).

It seems that in the nnet2-online setup the ivectors are not
normalized to length as is customary in speaker recognition.  The nnet
doesn't seem to like the duration dependence---what would be an
approach to deal with this?  Would it make sense to train the nnet
with length-normalized ivectors?

Cheers,

---david


-- 
David van Leeuwen

Re: [Kaldi-users] LM grafting

From: Sandeep R. <san...@go...> - 2015-06-17 21:00:47

Ondrej,
   I'll run the Vystadial recipe and see what opportunities are there. Did
somebody already make a class LM on it or at least define what potential
classes are? I hadn't looked into it earlier.
Thanks
Nagendra

On Wed, Jun 17, 2015 at 3:42 AM, Ondrej Platek <ond...@gm...>
wrote:

> Dear all,
>
> thanks to reminder of Dimitris, I realized that the Vystadial dataset is
> very convenient for Class based LM/ LM grafting.
> As the scripts for Vystadial Cs & En are already in Kaldi it may be
> convenient starting data because
> they contain transcription of user utterances from communication with
> spoken dialogue system where we have the classes defined.
>
> See scritps:
> https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_en
> https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_cz
>
> See data (scroll to the bottom to download the datasets):
> http://hdl.handle.net/11858/00-097C-0000-0023-4671-4  (en)
> http://hdl.handle.net/11858/00-097C-0000-0023-4670-6 (cs)
>
>
> We can probably recreate / find the list of words in the classes for
> English if there is interest.
> For Czech this should be no problem at all.
>
> Please, let me know if you are interested in these datasets and the lists
> of classes and their members.
>
> Ondra
>
> PS: Currently, we used classed based (CB) LM which we later expand to full
> LM in arpa format than create G.fst as in standard use case.
> It is not optimal attitude but it works for us.
> If you want to know how we are modeling the  CBLM just let me know, I am
> working on slight improvement of it right now,
> so I am interested in improving it.
>
>
> On Tue, May 26, 2015 at 8:11 PM, Kirill Katsnelson <
> kir...@sm...> wrote:
>
>> Speaking about data set preprocessing only, will Stanford NLP POS tagger
>> pull the trick?
>>
>>  -kkm
>>
>> > -----Original Message-----
>> > From: Nagendra Goel [mailto:nag...@go...]
>> > Sent: 2015-05-24 1511
>> > To: Matthew Aylett
>> > Cc: Dimitris Vassos; kal...@li...
>> > Subject: Re: [Kaldi-users] LM grafting
>> >
>> > A systematic way for identifying special elements in text will be very
>> > useful. Currently  NSW-EXPAND from festival conflicts with this sub-
>> > grammar approach although otherwise it's a good lm pre-processing step.
>> >
>> > Nagendra Kumar Goel
>> >
>> > On May 24, 2015 4:45 PM, "Matthew Aylett" <mat...@gm...>
>> > wrote:
>> >
>> >
>> >       Not sure if this is relevant to this thread. But in the speech
>> > synthesis system branch we have a very early text normaliser which
>> > (when
>> > complete) will detect things like phone numbers addresses, currencies
>> > etc. The output form this could then be used to inform language model
>> > building. Currently it deals with symbols and tokenisations in English.
>> >
>> >       Potentially `(although I wasn't currently planning on this), the
>> > text normaliser could be written in thrax - based on openfst - authored
>> > by Richard Sproat I believe). However if this approach would benefit
>> > ASR as well then it might be worth doing it this way rather than my
>> > plan of a simple greedy normaliser.
>> >
>> >
>> >       v best
>> >
>> >       Matthew Aylett
>> >
>> >
>> >       On Sun, May 24, 2015 at 8:34 AM, Dimitris Vassos
>> > <dva...@gm...> wrote:
>> >
>> >
>> >               We have access to several corpora and we are trying to put
>> > together something appropriate.
>> >
>> >               In the next couple of days, we will also volunteer a
>> server
>> > to set it all up and run the tests.
>> >
>> >               Dimitris
>> >
>> >               > On 24 Μαΐ 2015, at 02:06, Daniel Povey <
>> dp...@gm...>
>> > wrote:
>> >               >
>> >               > One possibility is to use a completely open-source
>> setup,
>> > e.g.
>> >               > Voxforge, and forget about the "has a clear advantage"
>> > requirement.
>> >               > E.g. target anything that looks like a year, and make a
>> > grammar for
>> >               > years.
>> >               > Dan
>> >               >
>> >               >
>> >               > On Fri, May 22, 2015 at 6:32 AM, Nagendra Goel
>> >               > <nag...@go...> wrote:
>> >               >> Since I cannot volunteer my enviornment, do you
>> > recommend another
>> >               >> enviornment  where this can be prototyped and where you
>> > can check in some
>> >               >> class lm recipe that has advantage.
>> >               >>
>> >               >> Nagendra
>> >               >>
>> >               >> Nagendra Kumar Goel
>> >               >>
>> >               >>> On May 21, 2015 11:01 PM, "Dimitris Vassos"
>> > <dva...@gm...> wrote:
>> >               >>>
>> >               >>> +1 for the class-based LMs. I have also been
>> interested
>> > in this
>> >               >>> functionality for some time now, so will be more than
>> > happy to try out the
>> >               >>> current implementation, if possible.
>> >               >>>
>> >               >>> Thanks
>> >               >>> Dimitris
>> >               >>>
>> >               >>>> On 22 Μαΐ 2015, at 01:34,
>> > kal...@li...
>> >               >>>> wrote:
>> >               >>>>
>> >               >>>> Send Kaldi-users mailing list submissions to
>> >               >>>>   kal...@li...
>> >               >>>>
>> >               >>>> To subscribe or unsubscribe via the World Wide Web,
>> > visit
>> >               >>>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>>> or, via email, send a message with subject or body
>> > 'help' to
>> >               >>>>   kal...@li...
>> >               >>>>
>> >               >>>> You can reach the person managing the list at
>> >               >>>>   kal...@li...
>> >               >>>>
>> >               >>>> When replying, please edit your Subject line so it is
>> > more specific
>> >               >>>> than "Re: Contents of Kaldi-users digest..."
>> >               >>>>
>> >               >>>>
>> >               >>>> Today's Topics:
>> >               >>>>
>> >               >>>>  1. Re: LM grafting (Daniel Povey)
>> >               >>>>  2. Re: LM grafting (Kirill Katsnelson)
>> >               >>>>  3. Re: LM grafting (Hainan Xu)
>> >               >>>>  4. Re: LM grafting (Sean True)
>> >               >>>>
>> >               >>>>
>> >               >>>>
>> > ----------------------------------------------------------------------
>> >               >>>>
>> >               >>>> Message: 1
>> >               >>>> Date: Thu, 21 May 2015 15:04:04 -0400
>> >               >>>> From: Daniel Povey <dp...@gm...>
>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>> >               >>>> To: Sean True <se...@se...>
>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>> >               >>>>   "kal...@li..."
>> >               >>>>   <kal...@li...>,    Kirill
>> > Katsnelson
>> >               >>>>   <kir...@sm...>
>> >               >>>> Message-ID:
>> >               >>>>
>> > <CAE...@ma...
>> > <mailto:k4YJVsBiAfEuFDFMvY%2B...@ma...> >
>> >               >>>> Content-Type: text/plain; charset=UTF-8
>> >               >>>>
>> >               >>>> The general approach is to create an FST for the
>> > little language
>> >               >>>> model, and then to use fstreplace to replace
>> instances
>> > of a particular
>> >               >>>> symbol in the top-level language model, with that
>> FST.
>> >               >>>> The tricky part is ensuring that the result is
>> > determinizable after
>> >               >>>> composing with the lexicon.  In general our solution
>> > is to add special
>> >               >>>> disambiguation symbols at the beginning and end of
>> > each of the
>> >               >>>> sub-FSTs, and of course making sure that the sub-FSTs
>> > are themselves
>> >               >>>> determinizable.
>> >               >>>> Dan
>> >               >>>>
>> >               >>>>
>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>> > <se...@se...>
>> >               >>>>> wrote:
>> >               >>>>> That's a subject of some general interest. Is there
>> a
>> > discussion of the
>> >               >>>>> general approach that was taken somewhere?
>> >               >>>>>
>> >               >>>>> -- Sean
>> >               >>>>>
>> >               >>>>> Sean True
>> >               >>>>> Semantic Machines
>> >               >>>>>
>> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>> > <dp...@gm...>
>> >               >>>>>> wrote:
>> >               >>>>>>
>> >               >>>>>> Nagendra Goel has worked on some example scripts
>> for
>> > this type of
>> >               >>>>>> thing, and with Hainan we were working on trying to
>> > get it cleaned up
>> >               >>>>>> and checked in, but he's going for an internship so
>> > it will have to
>> >               >>>>>> wait.  But Nagendra might be willing to share it
>> > with you.
>> >               >>>>>> Dan
>> >               >>>>>>
>> >               >>>>>>
>> >               >>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
>> >               >>>>>> <kir...@sm...> wrote:
>> >               >>>>>>> Suppose I have a language model where one token (a
>> > "word") is a
>> >               >>>>>>> pointer
>> >               >>>>>>> to a whole another LM. This is a practical case
>> > when you expect an
>> >               >>>>>>> abrupt
>> >               >>>>>>> change in model, a clear example being "my phone
>> > number is..." and
>> >               >>>>>>> then
>> >               >>>>>>> you'd expect them rattling a string of digits.
>> > Is there any support
>> >               >>>>>>> in kaldi
>> >               >>>>>>> for this?
>> >               >>>>>>>
>> >               >>>>>>> Thanks,
>> >               >>>>>>>
>> >               >>>>>>> -kkm
>> >               >>>>>>>
>> >               >>>>>>>
>> >               >>>>>>>
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >               >>>>>>> One dashboard for servers and applications across
>> >               >>>>>>> Physical-Virtual-Cloud
>> >               >>>>>>> Widest out-of-the-box monitoring support with
>> > 50+ applications
>> >               >>>>>>> Performance metrics, stats and reports that give
>> > you Actionable
>> >               >>>>>>> Insights
>> >               >>>>>>> Deep dive visibility with transaction tracing
>> using
>> > APM Insight.
>> >               >>>>>>>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >>>>>>> _______________________________________________
>> >               >>>>>>> Kaldi-users mailing list
>> >               >>>>>>> Kal...@li...
>> >               >>>>>>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>>>>>
>> >               >>>>>>
>> >               >>>>>>
>> >               >>>>>>
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >               >>>>>> One dashboard for servers and applications across
>> >               >>>>>> Physical-Virtual-Cloud
>> >               >>>>>> Widest out-of-the-box monitoring support with 50+
>> > applications
>> >               >>>>>> Performance metrics, stats and reports that give
>> you
>> > Actionable
>> >               >>>>>> Insights
>> >               >>>>>> Deep dive visibility with transaction tracing using
>> > APM Insight.
>> >               >>>>>>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >>>>>> _______________________________________________
>> >               >>>>>> Kaldi-users mailing list
>> >               >>>>>> Kal...@li...
>> >               >>>>>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>>>
>> >               >>>>
>> >               >>>>
>> >               >>>> ------------------------------
>> >               >>>>
>> >               >>>> Message: 2
>> >               >>>> Date: Thu, 21 May 2015 19:24:38 +0000
>> >               >>>> From: Kirill Katsnelson
>> > <kir...@sm...>
>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>> >               >>>> To: "dp...@gm..." <dp...@gm...>, Sean True
>> >               >>>>   <se...@se...>
>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>> >               >>>>   "kal...@li..."
>> >               >>>>   <kal...@li...>
>> >               >>>> Message-ID:
>> >               >>>>
>> >               >>>>
>> > <CY1...@CY...
>> > l
>> > ook.com>
>> >               >>>>
>> >               >>>> Content-Type: text/plain; charset="utf-8"
>> >               >>>>
>> >               >>>> Also, from the practical standpoint,
>> > backoff/discounting weights usually
>> >               >>>> need to be massaged. Otherwise when the grafted LM is
>> > small and the main LM
>> >               >>>> is large, the little model will tend to shoehorn an
>> > utterance into itself
>> >               >>>> rather than let go of it. In my phone number example,
>> > everything becomes
>> >               >>>> digits once the phone number starts.
>> >               >>>>
>> >               >>>> -kkm
>> >               >>>>
>> >               >>>>> -----Original Message-----
>> >               >>>>> From: Daniel Povey [mailto:dp...@gm...]
>> >               >>>>> Sent: 2015-05-21 1204
>> >               >>>>> To: Sean True
>> >               >>>>> Cc: Kirill Katsnelson; Nagendra Goel; Hainan Xu;
>> > kaldi-
>> >               >>>>> us...@li...
>> >               >>>>> Subject: Re: [Kaldi-users] LM grafting
>> >               >>>>>
>> >               >>>>> The general approach is to create an FST for the
>> > little language model,
>> >               >>>>> and then to use fstreplace to replace instances of a
>> > particular symbol
>> >               >>>>> in the top-level language model, with that FST.
>> >               >>>>> The tricky part is ensuring that the result is
>> > determinizable after
>> >               >>>>> composing with the lexicon.  In general our solution
>> > is to add special
>> >               >>>>> disambiguation symbols at the beginning and end of
>> > each of the sub-
>> >               >>>>> FSTs, and of course making sure that the sub-FSTs
>> are
>> > themselves
>> >               >>>>> determinizable.
>> >               >>>>> Dan
>> >               >>>>>
>> >               >>>>>
>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>> > <se...@se...>
>> >               >>>>> wrote:
>> >               >>>>>> That's a subject of some general interest. Is there
>> > a discussion of
>> >               >>>>>> the general approach that was taken somewhere?
>> >               >>>>>>
>> >               >>>>>> -- Sean
>> >               >>>>>>
>> >               >>>>>> Sean True
>> >               >>>>>> Semantic Machines
>> >               >>>>>>
>> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>> > <dp...@gm...>
>> >               >>>>> wrote:
>> >               >>>>>>>
>> >               >>>>>>> Nagendra Goel has worked on some example scripts
>> > for this type of
>> >               >>>>>>> thing, and with Hainan we were working on trying
>> to
>> > get it cleaned
>> >               >>>>> up
>> >               >>>>>>> and checked in, but he's going for an internship
>> so
>> > it will have to
>> >               >>>>>>> wait.  But Nagendra might be willing to share it
>> > with you.
>> >               >>>>>>> Dan
>> >               >>>>>>>
>> >               >>>>>>>
>> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
>> >               >>>>>>> <kir...@sm...> wrote:
>> >               >>>>>>>> Suppose I have a language model where one token
>> (a
>> > "word") is a
>> >               >>>>>>>> pointer to a whole another LM. This is a
>> practical
>> > case when you
>> >               >>>>>>>> expect an abrupt change in model, a clear example
>> > being "my phone
>> >               >>>>>>>> number is..." and then you'd expect them rattling
>> > a string of
>> >               >>>>>>>> digits. Is there any support in kaldi for this?
>> >               >>>>>>>>
>> >               >>>>>>>> Thanks,
>> >               >>>>>>>>
>> >               >>>>>>>> -kkm
>> >               >>>>>>>>
>> >               >>>>>>>>
>> > ------------------------------------------------------------------
>> >               >>>>> -
>> >               >>>>>>>> ----------- One dashboard for servers and
>> > applications across
>> >               >>>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
>> > monitoring support
>> >               >>>>>>>> with 50+ applications Performance metrics, stats
>> > and reports that
>> >               >>>>>>>> give you Actionable Insights Deep dive visibility
>> > with transaction
>> >               >>>>>>>> tracing using APM Insight.
>> >               >>>>>>>>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >>>>>>>> _______________________________________________
>> >               >>>>>>>> Kaldi-users mailing list
>> >               >>>>>>>> Kal...@li...
>> >               >>>>>>>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>>>>>>
>> >               >>>>>>>
>> >               >>>>>>>
>> > --------------------------------------------------------------------
>> >               >>>>> -
>> >               >>>>>>> --------- One dashboard for servers and
>> > applications across
>> >               >>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
>> > monitoring support with
>> >               >>>>>>> 50+ applications Performance metrics, stats and
>> > reports that give
>> >               >>>>> you
>> >               >>>>>>> Actionable Insights Deep dive visibility with
>> > transaction tracing
>> >               >>>>>>> using APM Insight.
>> >               >>>>>>>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >>>>>>> _______________________________________________
>> >               >>>>>>> Kaldi-users mailing list
>> >               >>>>>>> Kal...@li...
>> >               >>>>>>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>>>
>> >               >>>> ------------------------------
>> >               >>>>
>> >               >>>> Message: 3
>> >               >>>> Date: Thu, 21 May 2015 15:29:54 -0400
>> >               >>>> From: Hainan Xu <hai...@gm...>
>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>> >               >>>> To: Daniel Povey <dp...@gm...>
>> >               >>>> Cc: Sean True <se...@se...>,
>> >               >>>>   "kal...@li..."
>> >               >>>>   <kal...@li...>,    Kirill
>> > Katsnelson
>> >               >>>>   <kir...@sm...>
>> >               >>>> Message-ID:
>> >               >>>>
>> > <CAL...@ma...>
>> >               >>>> Content-Type: text/plain; charset="utf-8"
>> >               >>>>
>> >               >>>> There is a paper in ICASSP 2015 that described some
>> > very similar idea:
>> >               >>>>
>> >               >>>> Improved recognition of contact names in voice
>> > commands
>> >               >>>>
>> >               >>>>> On Thu, May 21, 2015 at 3:04 PM, Daniel Povey
>> > <dp...@gm...> wrote:
>> >               >>>>>
>> >               >>>>> The general approach is to create an FST for the
>> > little language
>> >               >>>>> model, and then to use fstreplace to replace
>> > instances of a particular
>> >               >>>>> symbol in the top-level language model, with that
>> > FST.
>> >               >>>>> The tricky part is ensuring that the result is
>> > determinizable after
>> >               >>>>> composing with the lexicon.  In general our solution
>> > is to add special
>> >               >>>>> disambiguation symbols at the beginning and end of
>> > each of the
>> >               >>>>> sub-FSTs, and of course making sure that the
>> sub-FSTs
>> > are themselves
>> >               >>>>> determinizable.
>> >               >>>>> Dan
>> >               >>>>>
>> >               >>>>>
>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>> > <se...@se...>
>> >               >>>>> wrote:
>> >               >>>>>> That's a subject of some general interest. Is there
>> > a discussion of
>> >               >>>>>> the
>> >               >>>>>> general approach that was taken somewhere?
>> >               >>>>>>
>> >               >>>>>> -- Sean
>> >               >>>>>>
>> >               >>>>>> Sean True
>> >               >>>>>> Semantic Machines
>> >               >>>>>>
>> >               >>>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>> > <dp...@gm...>
>> >               >>>>>>> wrote:
>> >               >>>>>>>
>> >               >>>>>>> Nagendra Goel has worked on some example scripts
>> > for this type of
>> >               >>>>>>> thing, and with Hainan we were working on trying
>> to
>> > get it cleaned up
>> >               >>>>>>> and checked in, but he's going for an internship
>> so
>> > it will have to
>> >               >>>>>>> wait.  But Nagendra might be willing to share it
>> > with you.
>> >               >>>>>>> Dan
>> >               >>>>>>>
>> >               >>>>>>>
>> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
>> >               >>>>>>> <kir...@sm...> wrote:
>> >               >>>>>>>> Suppose I have a language model where one token
>> (a
>> > "word") is a
>> >               >>>>> pointer
>> >               >>>>>>>> to a whole another LM. This is a practical case
>> > when you expect an
>> >               >>>>> abrupt
>> >               >>>>>>>> change in model, a clear example being "my phone
>> > number is..." and
>> >               >>>>> then
>> >               >>>>>>>> you'd expect them rattling a string of digits.
>> > Is there any support
>> >               >>>>> in kaldi
>> >               >>>>>>>> for this?
>> >               >>>>>>>>
>> >               >>>>>>>> Thanks,
>> >               >>>>>>>>
>> >               >>>>>>>> -kkm
>> >               >>>>>
>> >               >>>>>
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >               >>>>>>>> One dashboard for servers and applications across
>> >               >>>>> Physical-Virtual-Cloud
>> >               >>>>>>>> Widest out-of-the-box monitoring support with
>> > 50+ applications
>> >               >>>>>>>> Performance metrics, stats and reports that give
>> > you Actionable
>> >               >>>>> Insights
>> >               >>>>>>>> Deep dive visibility with transaction tracing
>> > using APM Insight.
>> >               >>>>>>>>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >>>>>>>> _______________________________________________
>> >               >>>>>>>> Kaldi-users mailing list
>> >               >>>>>>>> Kal...@li...
>> >               >>>>>>>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>>>>
>> >               >>>>>
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >               >>>>>>> One dashboard for servers and applications across
>> >               >>>>>>> Physical-Virtual-Cloud
>> >               >>>>>>> Widest out-of-the-box monitoring support with
>> > 50+ applications
>> >               >>>>>>> Performance metrics, stats and reports that give
>> > you Actionable
>> >               >>>>>>> Insights
>> >               >>>>>>> Deep dive visibility with transaction tracing
>> using
>> > APM Insight.
>> >               >>>>>>>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >>>>>>> _______________________________________________
>> >               >>>>>>> Kaldi-users mailing list
>> >               >>>>>>> Kal...@li...
>> >               >>>>>>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>>>
>> >               >>>>
>> >               >>>>
>> >               >>>> --
>> >               >>>> - Hainan
>> >               >>>> -------------- next part --------------
>> >               >>>> An HTML attachment was scrubbed...
>> >               >>>>
>> >               >>>> ------------------------------
>> >               >>>>
>> >               >>>> Message: 4
>> >               >>>> Date: Thu, 21 May 2015 15:01:51 -0400
>> >               >>>> From: Sean True <se...@se...>
>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>> >               >>>> To: Daniel Povey <dp...@gm...>
>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>> >               >>>>   "kal...@li..."
>> >               >>>>   <kal...@li...>,    Kirill
>> > Katsnelson
>> >               >>>>   <kir...@sm...>
>> >               >>>> Message-ID:
>> >               >>>>
>> > <CAL...@ma...>
>> >               >>>> Content-Type: text/plain; charset="utf-8"
>> >               >>>>
>> >               >>>> That's a subject of some general interest. Is there a
>> > discussion of the
>> >               >>>> general approach that was taken somewhere?
>> >               >>>>
>> >               >>>> -- Sean
>> >               >>>>
>> >               >>>> Sean True
>> >               >>>> Semantic Machines
>> >               >>>>
>> >               >>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>> > <dp...@gm...> wrote:
>> >               >>>>>
>> >               >>>>> Nagendra Goel has worked on some example scripts for
>> > this type of
>> >               >>>>> thing, and with Hainan we were working on trying to
>> > get it cleaned up
>> >               >>>>> and checked in, but he's going for an internship so
>> > it will have to
>> >               >>>>> wait.  But Nagendra might be willing to share it
>> with
>> > you.
>> >               >>>>> Dan
>> >               >>>>>
>> >               >>>>>
>> >               >>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
>> >               >>>>> <kir...@sm...> wrote:
>> >               >>>>>> Suppose I have a language model where one token (a
>> > "word") is a
>> >               >>>>>> pointer
>> >               >>>>> to a whole another LM. This is a practical case when
>> > you expect an
>> >               >>>>> abrupt
>> >               >>>>> change in model, a clear example being "my phone
>> > number is..." and then
>> >               >>>>> you'd expect them rattling a string of digits. Is
>> > there any support in
>> >               >>>>> kaldi for this?
>> >               >>>>>>
>> >               >>>>>> Thanks,
>> >               >>>>>>
>> >               >>>>>> -kkm
>> >               >>>>>
>> >               >>>>>
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >               >>>>>> One dashboard for servers and applications across
>> >               >>>>>> Physical-Virtual-Cloud
>> >               >>>>>> Widest out-of-the-box monitoring support with 50+
>> > applications
>> >               >>>>>> Performance metrics, stats and reports that give
>> you
>> > Actionable
>> >               >>>>>> Insights
>> >               >>>>>> Deep dive visibility with transaction tracing using
>> > APM Insight.
>> >               >>>>>>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >>>>>> _______________________________________________
>> >               >>>>>> Kaldi-users mailing list
>> >               >>>>>> Kal...@li...
>> >               >>>>>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>>>>
>> >               >>>>>
>> >               >>>>>
>> >               >>>>>
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >               >>>>> One dashboard for servers and applications across
>> >               >>>>> Physical-Virtual-Cloud
>> >               >>>>> Widest out-of-the-box monitoring support with 50+
>> > applications
>> >               >>>>> Performance metrics, stats and reports that give you
>> > Actionable
>> >               >>>>> Insights
>> >               >>>>> Deep dive visibility with transaction tracing using
>> > APM Insight.
>> >               >>>>>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >>>>> _______________________________________________
>> >               >>>>> Kaldi-users mailing list
>> >               >>>>> Kal...@li...
>> >               >>>>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>>> -------------- next part --------------
>> >               >>>> An HTML attachment was scrubbed...
>> >               >>>>
>> >               >>>> ------------------------------
>> >               >>>>
>> >               >>>>
>> >               >>>>
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >               >>>> One dashboard for servers and applications across
>> > Physical-Virtual-Cloud
>> >               >>>> Widest out-of-the-box monitoring support with 50+
>> > applications
>> >               >>>> Performance metrics, stats and reports that give you
>> > Actionable Insights
>> >               >>>> Deep dive visibility with transaction tracing using
>> > APM Insight.
>> >               >>>>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >>>>
>> >               >>>> ------------------------------
>> >               >>>>
>> >               >>>> _______________________________________________
>> >               >>>> Kaldi-users mailing list
>> >               >>>> Kal...@li...
>> >               >>>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>>>
>> >               >>>>
>> >               >>>> End of Kaldi-users Digest, Vol 29, Issue 15
>> >               >>>> *******************************************
>> >               >>>
>> >               >>>
>> >               >>>
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >               >>> One dashboard for servers and applications across
>> > Physical-Virtual-Cloud
>> >               >>> Widest out-of-the-box monitoring support with 50+
>> > applications
>> >               >>> Performance metrics, stats and reports that give you
>> > Actionable Insights
>> >               >>> Deep dive visibility with transaction tracing using
>> APM
>> > Insight.
>> >               >>>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >>> _______________________________________________
>> >               >>> Kaldi-users mailing list
>> >               >>> Kal...@li...
>> >               >>>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>
>> >               >>
>> >               >>
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >               >> One dashboard for servers and applications across
>> > Physical-Virtual-Cloud
>> >               >> Widest out-of-the-box monitoring support with 50+
>> > applications
>> >               >> Performance metrics, stats and reports that give you
>> > Actionable Insights
>> >               >> Deep dive visibility with transaction tracing using APM
>> > Insight.
>> >               >>
>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               >> _______________________________________________
>> >               >> Kaldi-users mailing list
>> >               >> Kal...@li...
>> >               >>
>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >               >>
>> >
>> >
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >               One dashboard for servers and applications across
>> Physical-
>> > Virtual-Cloud
>> >               Widest out-of-the-box monitoring support with 50+
>> > applications
>> >               Performance metrics, stats and reports that give you
>> > Actionable Insights
>> >               Deep dive visibility with transaction tracing using APM
>> > Insight.
>> >               http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >               _______________________________________________
>> >               Kaldi-users mailing list
>> >               Kal...@li...
>> >               https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >
>> >
>> >
>> >
>> >
>> > -----------------------------------------------------------------------
>> > -
>> > ------
>> >       One dashboard for servers and applications across Physical-
>> > Virtual-Cloud
>> >       Widest out-of-the-box monitoring support with 50+ applications
>> >       Performance metrics, stats and reports that give you Actionable
>> > Insights
>> >       Deep dive visibility with transaction tracing using APM Insight.
>> >       http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> >       _______________________________________________
>> >       Kaldi-users mailing list
>> >       Kal...@li...
>> >       https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> One dashboard for servers and applications across Physical-Virtual-Cloud
>> Widest out-of-the-box monitoring support with 50+ applications
>> Performance metrics, stats and reports that give you Actionable Insights
>> Deep dive visibility with transaction tracing using APM Insight.
>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>> _______________________________________________
>> Kaldi-users mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>
>
>
>
> --
> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
> ond...@gm...
>
>
> ------------------------------------------------------------------------------
>
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>
>

Re: [Kaldi-users] LM grafting

From: Sandeep R. <san...@go...> - 2015-06-17 20:43:21

Does the kaldi recipe do Class LM? Or can you add it to recipe? That would
make the whole process so much easier. I don't mind if the words are Czetch.

On Wed, Jun 17, 2015 at 4:08 PM, Ondrej Platek <ond...@gm...>
wrote:

> For the Czech data we are running the system live with Kaldi and we use
> class LM.
> For the English data I will give you few examples from top of my head:
>
> PRICE_RANGE - cheap, middle price-range...
> FOOD_TYPE - Indian, Chinese,
> LOCATION - city center, Chesterton area, ..
> ....
>
> We will try to find the classes definition, since we are not running the
> system.
>
> Ondrej
>
> On Wed, Jun 17, 2015 at 10:01 PM, Sandeep Reddy <
> san...@go...> wrote:
>
>> Ondrej,
>>    I'll run the Vystadial recipe and see what opportunities are there.
>> Did somebody already make a class LM on it or at least define what
>> potential classes are? I hadn't looked into it earlier.
>> Thanks
>> Nagendra
>>
>> On Wed, Jun 17, 2015 at 3:42 AM, Ondrej Platek <ond...@gm...>
>> wrote:
>>
>>> Dear all,
>>>
>>> thanks to reminder of Dimitris, I realized that the Vystadial dataset is
>>> very convenient for Class based LM/ LM grafting.
>>> As the scripts for Vystadial Cs & En are already in Kaldi it may be
>>> convenient starting data because
>>> they contain transcription of user utterances from communication with
>>> spoken dialogue system where we have the classes defined.
>>>
>>> See scritps:
>>> https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_en
>>> https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_cz
>>>
>>> See data (scroll to the bottom to download the datasets):
>>> http://hdl.handle.net/11858/00-097C-0000-0023-4671-4  (en)
>>> http://hdl.handle.net/11858/00-097C-0000-0023-4670-6 (cs)
>>>
>>>
>>> We can probably recreate / find the list of words in the classes for
>>> English if there is interest.
>>> For Czech this should be no problem at all.
>>>
>>> Please, let me know if you are interested in these datasets and the
>>> lists of classes and their members.
>>>
>>> Ondra
>>>
>>> PS: Currently, we used classed based (CB) LM which we later expand to
>>> full LM in arpa format than create G.fst as in standard use case.
>>> It is not optimal attitude but it works for us.
>>> If you want to know how we are modeling the  CBLM just let me know, I am
>>> working on slight improvement of it right now,
>>> so I am interested in improving it.
>>>
>>>
>>> On Tue, May 26, 2015 at 8:11 PM, Kirill Katsnelson <
>>> kir...@sm...> wrote:
>>>
>>>> Speaking about data set preprocessing only, will Stanford NLP POS
>>>> tagger pull the trick?
>>>>
>>>>  -kkm
>>>>
>>>> > -----Original Message-----
>>>> > From: Nagendra Goel [mailto:nag...@go...]
>>>> > Sent: 2015-05-24 1511
>>>> > To: Matthew Aylett
>>>> > Cc: Dimitris Vassos; kal...@li...
>>>> > Subject: Re: [Kaldi-users] LM grafting
>>>> >
>>>> > A systematic way for identifying special elements in text will be very
>>>> > useful. Currently  NSW-EXPAND from festival conflicts with this sub-
>>>> > grammar approach although otherwise it's a good lm pre-processing
>>>> step.
>>>> >
>>>> > Nagendra Kumar Goel
>>>> >
>>>> > On May 24, 2015 4:45 PM, "Matthew Aylett" <mat...@gm...>
>>>> > wrote:
>>>> >
>>>> >
>>>> >       Not sure if this is relevant to this thread. But in the speech
>>>> > synthesis system branch we have a very early text normaliser which
>>>> > (when
>>>> > complete) will detect things like phone numbers addresses, currencies
>>>> > etc. The output form this could then be used to inform language model
>>>> > building. Currently it deals with symbols and tokenisations in
>>>> English.
>>>> >
>>>> >       Potentially `(although I wasn't currently planning on this), the
>>>> > text normaliser could be written in thrax - based on openfst -
>>>> authored
>>>> > by Richard Sproat I believe). However if this approach would benefit
>>>> > ASR as well then it might be worth doing it this way rather than my
>>>> > plan of a simple greedy normaliser.
>>>> >
>>>> >
>>>> >       v best
>>>> >
>>>> >       Matthew Aylett
>>>> >
>>>> >
>>>> >       On Sun, May 24, 2015 at 8:34 AM, Dimitris Vassos
>>>> > <dva...@gm...> wrote:
>>>> >
>>>> >
>>>> >               We have access to several corpora and we are trying to
>>>> put
>>>> > together something appropriate.
>>>> >
>>>> >               In the next couple of days, we will also volunteer a
>>>> server
>>>> > to set it all up and run the tests.
>>>> >
>>>> >               Dimitris
>>>> >
>>>> >               > On 24 Μαΐ 2015, at 02:06, Daniel Povey <
>>>> dp...@gm...>
>>>> > wrote:
>>>> >               >
>>>> >               > One possibility is to use a completely open-source
>>>> setup,
>>>> > e.g.
>>>> >               > Voxforge, and forget about the "has a clear advantage"
>>>> > requirement.
>>>> >               > E.g. target anything that looks like a year, and make
>>>> a
>>>> > grammar for
>>>> >               > years.
>>>> >               > Dan
>>>> >               >
>>>> >               >
>>>> >               > On Fri, May 22, 2015 at 6:32 AM, Nagendra Goel
>>>> >               > <nag...@go...> wrote:
>>>> >               >> Since I cannot volunteer my enviornment, do you
>>>> > recommend another
>>>> >               >> enviornment  where this can be prototyped and where
>>>> you
>>>> > can check in some
>>>> >               >> class lm recipe that has advantage.
>>>> >               >>
>>>> >               >> Nagendra
>>>> >               >>
>>>> >               >> Nagendra Kumar Goel
>>>> >               >>
>>>> >               >>> On May 21, 2015 11:01 PM, "Dimitris Vassos"
>>>> > <dva...@gm...> wrote:
>>>> >               >>>
>>>> >               >>> +1 for the class-based LMs. I have also been
>>>> interested
>>>> > in this
>>>> >               >>> functionality for some time now, so will be more
>>>> than
>>>> > happy to try out the
>>>> >               >>> current implementation, if possible.
>>>> >               >>>
>>>> >               >>> Thanks
>>>> >               >>> Dimitris
>>>> >               >>>
>>>> >               >>>> On 22 Μαΐ 2015, at 01:34,
>>>> > kal...@li...
>>>> >               >>>> wrote:
>>>> >               >>>>
>>>> >               >>>> Send Kaldi-users mailing list submissions to
>>>> >               >>>>   kal...@li...
>>>> >               >>>>
>>>> >               >>>> To subscribe or unsubscribe via the World Wide Web,
>>>> > visit
>>>> >               >>>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>>> or, via email, send a message with subject or body
>>>> > 'help' to
>>>> >               >>>>   kal...@li...
>>>> >               >>>>
>>>> >               >>>> You can reach the person managing the list at
>>>> >               >>>>   kal...@li...
>>>> >               >>>>
>>>> >               >>>> When replying, please edit your Subject line so it
>>>> is
>>>> > more specific
>>>> >               >>>> than "Re: Contents of Kaldi-users digest..."
>>>> >               >>>>
>>>> >               >>>>
>>>> >               >>>> Today's Topics:
>>>> >               >>>>
>>>> >               >>>>  1. Re: LM grafting (Daniel Povey)
>>>> >               >>>>  2. Re: LM grafting (Kirill Katsnelson)
>>>> >               >>>>  3. Re: LM grafting (Hainan Xu)
>>>> >               >>>>  4. Re: LM grafting (Sean True)
>>>> >               >>>>
>>>> >               >>>>
>>>> >               >>>>
>>>> > ----------------------------------------------------------------------
>>>> >               >>>>
>>>> >               >>>> Message: 1
>>>> >               >>>> Date: Thu, 21 May 2015 15:04:04 -0400
>>>> >               >>>> From: Daniel Povey <dp...@gm...>
>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>> >               >>>> To: Sean True <se...@se...>
>>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>>> >               >>>>   "kal...@li..."
>>>> >               >>>>   <kal...@li...>,    Kirill
>>>> > Katsnelson
>>>> >               >>>>   <kir...@sm...>
>>>> >               >>>> Message-ID:
>>>> >               >>>>
>>>> > <CAE...@ma...
>>>> > <mailto:k4YJVsBiAfEuFDFMvY%2B...@ma...> >
>>>> >               >>>> Content-Type: text/plain; charset=UTF-8
>>>> >               >>>>
>>>> >               >>>> The general approach is to create an FST for the
>>>> > little language
>>>> >               >>>> model, and then to use fstreplace to replace
>>>> instances
>>>> > of a particular
>>>> >               >>>> symbol in the top-level language model, with that
>>>> FST.
>>>> >               >>>> The tricky part is ensuring that the result is
>>>> > determinizable after
>>>> >               >>>> composing with the lexicon.  In general our
>>>> solution
>>>> > is to add special
>>>> >               >>>> disambiguation symbols at the beginning and end of
>>>> > each of the
>>>> >               >>>> sub-FSTs, and of course making sure that the
>>>> sub-FSTs
>>>> > are themselves
>>>> >               >>>> determinizable.
>>>> >               >>>> Dan
>>>> >               >>>>
>>>> >               >>>>
>>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>>> > <se...@se...>
>>>> >               >>>>> wrote:
>>>> >               >>>>> That's a subject of some general interest. Is
>>>> there a
>>>> > discussion of the
>>>> >               >>>>> general approach that was taken somewhere?
>>>> >               >>>>>
>>>> >               >>>>> -- Sean
>>>> >               >>>>>
>>>> >               >>>>> Sean True
>>>> >               >>>>> Semantic Machines
>>>> >               >>>>>
>>>> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>> > <dp...@gm...>
>>>> >               >>>>>> wrote:
>>>> >               >>>>>>
>>>> >               >>>>>> Nagendra Goel has worked on some example scripts
>>>> for
>>>> > this type of
>>>> >               >>>>>> thing, and with Hainan we were working on trying
>>>> to
>>>> > get it cleaned up
>>>> >               >>>>>> and checked in, but he's going for an internship
>>>> so
>>>> > it will have to
>>>> >               >>>>>> wait.  But Nagendra might be willing to share it
>>>> > with you.
>>>> >               >>>>>> Dan
>>>> >               >>>>>>
>>>> >               >>>>>>
>>>> >               >>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>> Katsnelson
>>>> >               >>>>>> <kir...@sm...> wrote:
>>>> >               >>>>>>> Suppose I have a language model where one token
>>>> (a
>>>> > "word") is a
>>>> >               >>>>>>> pointer
>>>> >               >>>>>>> to a whole another LM. This is a practical case
>>>> > when you expect an
>>>> >               >>>>>>> abrupt
>>>> >               >>>>>>> change in model, a clear example being "my phone
>>>> > number is..." and
>>>> >               >>>>>>> then
>>>> >               >>>>>>> you'd expect them rattling a string of digits.
>>>> > Is there any support
>>>> >               >>>>>>> in kaldi
>>>> >               >>>>>>> for this?
>>>> >               >>>>>>>
>>>> >               >>>>>>> Thanks,
>>>> >               >>>>>>>
>>>> >               >>>>>>> -kkm
>>>> >               >>>>>>>
>>>> >               >>>>>>>
>>>> >               >>>>>>>
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >               >>>>>>> One dashboard for servers and applications
>>>> across
>>>> >               >>>>>>> Physical-Virtual-Cloud
>>>> >               >>>>>>> Widest out-of-the-box monitoring support with
>>>> > 50+ applications
>>>> >               >>>>>>> Performance metrics, stats and reports that give
>>>> > you Actionable
>>>> >               >>>>>>> Insights
>>>> >               >>>>>>> Deep dive visibility with transaction tracing
>>>> using
>>>> > APM Insight.
>>>> >               >>>>>>>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >>>>>>> _______________________________________________
>>>> >               >>>>>>> Kaldi-users mailing list
>>>> >               >>>>>>> Kal...@li...
>>>> >               >>>>>>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>>>>>
>>>> >               >>>>>>
>>>> >               >>>>>>
>>>> >               >>>>>>
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >               >>>>>> One dashboard for servers and applications across
>>>> >               >>>>>> Physical-Virtual-Cloud
>>>> >               >>>>>> Widest out-of-the-box monitoring support with 50+
>>>> > applications
>>>> >               >>>>>> Performance metrics, stats and reports that give
>>>> you
>>>> > Actionable
>>>> >               >>>>>> Insights
>>>> >               >>>>>> Deep dive visibility with transaction tracing
>>>> using
>>>> > APM Insight.
>>>> >               >>>>>>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >>>>>> _______________________________________________
>>>> >               >>>>>> Kaldi-users mailing list
>>>> >               >>>>>> Kal...@li...
>>>> >               >>>>>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>>>
>>>> >               >>>>
>>>> >               >>>>
>>>> >               >>>> ------------------------------
>>>> >               >>>>
>>>> >               >>>> Message: 2
>>>> >               >>>> Date: Thu, 21 May 2015 19:24:38 +0000
>>>> >               >>>> From: Kirill Katsnelson
>>>> > <kir...@sm...>
>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>> >               >>>> To: "dp...@gm..." <dp...@gm...>, Sean
>>>> True
>>>> >               >>>>   <se...@se...>
>>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>>> >               >>>>   "kal...@li..."
>>>> >               >>>>   <kal...@li...>
>>>> >               >>>> Message-ID:
>>>> >               >>>>
>>>> >               >>>>
>>>> >
>>>> <CY1...@CY...
>>>> > l
>>>> > ook.com>
>>>> >               >>>>
>>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>>> >               >>>>
>>>> >               >>>> Also, from the practical standpoint,
>>>> > backoff/discounting weights usually
>>>> >               >>>> need to be massaged. Otherwise when the grafted LM
>>>> is
>>>> > small and the main LM
>>>> >               >>>> is large, the little model will tend to shoehorn an
>>>> > utterance into itself
>>>> >               >>>> rather than let go of it. In my phone number
>>>> example,
>>>> > everything becomes
>>>> >               >>>> digits once the phone number starts.
>>>> >               >>>>
>>>> >               >>>> -kkm
>>>> >               >>>>
>>>> >               >>>>> -----Original Message-----
>>>> >               >>>>> From: Daniel Povey [mailto:dp...@gm...]
>>>> >               >>>>> Sent: 2015-05-21 1204
>>>> >               >>>>> To: Sean True
>>>> >               >>>>> Cc: Kirill Katsnelson; Nagendra Goel; Hainan Xu;
>>>> > kaldi-
>>>> >               >>>>> us...@li...
>>>> >               >>>>> Subject: Re: [Kaldi-users] LM grafting
>>>> >               >>>>>
>>>> >               >>>>> The general approach is to create an FST for the
>>>> > little language model,
>>>> >               >>>>> and then to use fstreplace to replace instances
>>>> of a
>>>> > particular symbol
>>>> >               >>>>> in the top-level language model, with that FST.
>>>> >               >>>>> The tricky part is ensuring that the result is
>>>> > determinizable after
>>>> >               >>>>> composing with the lexicon.  In general our
>>>> solution
>>>> > is to add special
>>>> >               >>>>> disambiguation symbols at the beginning and end of
>>>> > each of the sub-
>>>> >               >>>>> FSTs, and of course making sure that the sub-FSTs
>>>> are
>>>> > themselves
>>>> >               >>>>> determinizable.
>>>> >               >>>>> Dan
>>>> >               >>>>>
>>>> >               >>>>>
>>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>>> > <se...@se...>
>>>> >               >>>>> wrote:
>>>> >               >>>>>> That's a subject of some general interest. Is
>>>> there
>>>> > a discussion of
>>>> >               >>>>>> the general approach that was taken somewhere?
>>>> >               >>>>>>
>>>> >               >>>>>> -- Sean
>>>> >               >>>>>>
>>>> >               >>>>>> Sean True
>>>> >               >>>>>> Semantic Machines
>>>> >               >>>>>>
>>>> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>> > <dp...@gm...>
>>>> >               >>>>> wrote:
>>>> >               >>>>>>>
>>>> >               >>>>>>> Nagendra Goel has worked on some example scripts
>>>> > for this type of
>>>> >               >>>>>>> thing, and with Hainan we were working on
>>>> trying to
>>>> > get it cleaned
>>>> >               >>>>> up
>>>> >               >>>>>>> and checked in, but he's going for an
>>>> internship so
>>>> > it will have to
>>>> >               >>>>>>> wait.  But Nagendra might be willing to share it
>>>> > with you.
>>>> >               >>>>>>> Dan
>>>> >               >>>>>>>
>>>> >               >>>>>>>
>>>> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>> Katsnelson
>>>> >               >>>>>>> <kir...@sm...> wrote:
>>>> >               >>>>>>>> Suppose I have a language model where one
>>>> token (a
>>>> > "word") is a
>>>> >               >>>>>>>> pointer to a whole another LM. This is a
>>>> practical
>>>> > case when you
>>>> >               >>>>>>>> expect an abrupt change in model, a clear
>>>> example
>>>> > being "my phone
>>>> >               >>>>>>>> number is..." and then you'd expect them
>>>> rattling
>>>> > a string of
>>>> >               >>>>>>>> digits. Is there any support in kaldi for this?
>>>> >               >>>>>>>>
>>>> >               >>>>>>>> Thanks,
>>>> >               >>>>>>>>
>>>> >               >>>>>>>> -kkm
>>>> >               >>>>>>>>
>>>> >               >>>>>>>>
>>>> > ------------------------------------------------------------------
>>>> >               >>>>> -
>>>> >               >>>>>>>> ----------- One dashboard for servers and
>>>> > applications across
>>>> >               >>>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
>>>> > monitoring support
>>>> >               >>>>>>>> with 50+ applications Performance metrics,
>>>> stats
>>>> > and reports that
>>>> >               >>>>>>>> give you Actionable Insights Deep dive
>>>> visibility
>>>> > with transaction
>>>> >               >>>>>>>> tracing using APM Insight.
>>>> >               >>>>>>>>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >>>>>>>> _______________________________________________
>>>> >               >>>>>>>> Kaldi-users mailing list
>>>> >               >>>>>>>> Kal...@li...
>>>> >               >>>>>>>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>>>>>>
>>>> >               >>>>>>>
>>>> >               >>>>>>>
>>>> > --------------------------------------------------------------------
>>>> >               >>>>> -
>>>> >               >>>>>>> --------- One dashboard for servers and
>>>> > applications across
>>>> >               >>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
>>>> > monitoring support with
>>>> >               >>>>>>> 50+ applications Performance metrics, stats and
>>>> > reports that give
>>>> >               >>>>> you
>>>> >               >>>>>>> Actionable Insights Deep dive visibility with
>>>> > transaction tracing
>>>> >               >>>>>>> using APM Insight.
>>>> >               >>>>>>>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >>>>>>> _______________________________________________
>>>> >               >>>>>>> Kaldi-users mailing list
>>>> >               >>>>>>> Kal...@li...
>>>> >               >>>>>>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>>>
>>>> >               >>>> ------------------------------
>>>> >               >>>>
>>>> >               >>>> Message: 3
>>>> >               >>>> Date: Thu, 21 May 2015 15:29:54 -0400
>>>> >               >>>> From: Hainan Xu <hai...@gm...>
>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>> >               >>>> To: Daniel Povey <dp...@gm...>
>>>> >               >>>> Cc: Sean True <se...@se...>,
>>>> >               >>>>   "kal...@li..."
>>>> >               >>>>   <kal...@li...>,    Kirill
>>>> > Katsnelson
>>>> >               >>>>   <kir...@sm...>
>>>> >               >>>> Message-ID:
>>>> >               >>>>
>>>> > <CAL...@ma...>
>>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>>> >               >>>>
>>>> >               >>>> There is a paper in ICASSP 2015 that described some
>>>> > very similar idea:
>>>> >               >>>>
>>>> >               >>>> Improved recognition of contact names in voice
>>>> > commands
>>>> >               >>>>
>>>> >               >>>>> On Thu, May 21, 2015 at 3:04 PM, Daniel Povey
>>>> > <dp...@gm...> wrote:
>>>> >               >>>>>
>>>> >               >>>>> The general approach is to create an FST for the
>>>> > little language
>>>> >               >>>>> model, and then to use fstreplace to replace
>>>> > instances of a particular
>>>> >               >>>>> symbol in the top-level language model, with that
>>>> > FST.
>>>> >               >>>>> The tricky part is ensuring that the result is
>>>> > determinizable after
>>>> >               >>>>> composing with the lexicon.  In general our
>>>> solution
>>>> > is to add special
>>>> >               >>>>> disambiguation symbols at the beginning and end of
>>>> > each of the
>>>> >               >>>>> sub-FSTs, and of course making sure that the
>>>> sub-FSTs
>>>> > are themselves
>>>> >               >>>>> determinizable.
>>>> >               >>>>> Dan
>>>> >               >>>>>
>>>> >               >>>>>
>>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>>> > <se...@se...>
>>>> >               >>>>> wrote:
>>>> >               >>>>>> That's a subject of some general interest. Is
>>>> there
>>>> > a discussion of
>>>> >               >>>>>> the
>>>> >               >>>>>> general approach that was taken somewhere?
>>>> >               >>>>>>
>>>> >               >>>>>> -- Sean
>>>> >               >>>>>>
>>>> >               >>>>>> Sean True
>>>> >               >>>>>> Semantic Machines
>>>> >               >>>>>>
>>>> >               >>>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>> > <dp...@gm...>
>>>> >               >>>>>>> wrote:
>>>> >               >>>>>>>
>>>> >               >>>>>>> Nagendra Goel has worked on some example scripts
>>>> > for this type of
>>>> >               >>>>>>> thing, and with Hainan we were working on
>>>> trying to
>>>> > get it cleaned up
>>>> >               >>>>>>> and checked in, but he's going for an
>>>> internship so
>>>> > it will have to
>>>> >               >>>>>>> wait.  But Nagendra might be willing to share it
>>>> > with you.
>>>> >               >>>>>>> Dan
>>>> >               >>>>>>>
>>>> >               >>>>>>>
>>>> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>> Katsnelson
>>>> >               >>>>>>> <kir...@sm...> wrote:
>>>> >               >>>>>>>> Suppose I have a language model where one
>>>> token (a
>>>> > "word") is a
>>>> >               >>>>> pointer
>>>> >               >>>>>>>> to a whole another LM. This is a practical case
>>>> > when you expect an
>>>> >               >>>>> abrupt
>>>> >               >>>>>>>> change in model, a clear example being "my
>>>> phone
>>>> > number is..." and
>>>> >               >>>>> then
>>>> >               >>>>>>>> you'd expect them rattling a string of digits.
>>>> > Is there any support
>>>> >               >>>>> in kaldi
>>>> >               >>>>>>>> for this?
>>>> >               >>>>>>>>
>>>> >               >>>>>>>> Thanks,
>>>> >               >>>>>>>>
>>>> >               >>>>>>>> -kkm
>>>> >               >>>>>
>>>> >               >>>>>
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >               >>>>>>>> One dashboard for servers and applications
>>>> across
>>>> >               >>>>> Physical-Virtual-Cloud
>>>> >               >>>>>>>> Widest out-of-the-box monitoring support with
>>>> > 50+ applications
>>>> >               >>>>>>>> Performance metrics, stats and reports that
>>>> give
>>>> > you Actionable
>>>> >               >>>>> Insights
>>>> >               >>>>>>>> Deep dive visibility with transaction tracing
>>>> > using APM Insight.
>>>> >               >>>>>>>>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >>>>>>>> _______________________________________________
>>>> >               >>>>>>>> Kaldi-users mailing list
>>>> >               >>>>>>>> Kal...@li...
>>>> >               >>>>>>>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>>>>
>>>> >               >>>>>
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >               >>>>>>> One dashboard for servers and applications
>>>> across
>>>> >               >>>>>>> Physical-Virtual-Cloud
>>>> >               >>>>>>> Widest out-of-the-box monitoring support with
>>>> > 50+ applications
>>>> >               >>>>>>> Performance metrics, stats and reports that give
>>>> > you Actionable
>>>> >               >>>>>>> Insights
>>>> >               >>>>>>> Deep dive visibility with transaction tracing
>>>> using
>>>> > APM Insight.
>>>> >               >>>>>>>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >>>>>>> _______________________________________________
>>>> >               >>>>>>> Kaldi-users mailing list
>>>> >               >>>>>>> Kal...@li...
>>>> >               >>>>>>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>>>
>>>> >               >>>>
>>>> >               >>>>
>>>> >               >>>> --
>>>> >               >>>> - Hainan
>>>> >               >>>> -------------- next part --------------
>>>> >               >>>> An HTML attachment was scrubbed...
>>>> >               >>>>
>>>> >               >>>> ------------------------------
>>>> >               >>>>
>>>> >               >>>> Message: 4
>>>> >               >>>> Date: Thu, 21 May 2015 15:01:51 -0400
>>>> >               >>>> From: Sean True <se...@se...>
>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>> >               >>>> To: Daniel Povey <dp...@gm...>
>>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>>> >               >>>>   "kal...@li..."
>>>> >               >>>>   <kal...@li...>,    Kirill
>>>> > Katsnelson
>>>> >               >>>>   <kir...@sm...>
>>>> >               >>>> Message-ID:
>>>> >               >>>>
>>>> > <CAL...@ma...>
>>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>>> >               >>>>
>>>> >               >>>> That's a subject of some general interest. Is
>>>> there a
>>>> > discussion of the
>>>> >               >>>> general approach that was taken somewhere?
>>>> >               >>>>
>>>> >               >>>> -- Sean
>>>> >               >>>>
>>>> >               >>>> Sean True
>>>> >               >>>> Semantic Machines
>>>> >               >>>>
>>>> >               >>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>> > <dp...@gm...> wrote:
>>>> >               >>>>>
>>>> >               >>>>> Nagendra Goel has worked on some example scripts
>>>> for
>>>> > this type of
>>>> >               >>>>> thing, and with Hainan we were working on trying
>>>> to
>>>> > get it cleaned up
>>>> >               >>>>> and checked in, but he's going for an internship
>>>> so
>>>> > it will have to
>>>> >               >>>>> wait.  But Nagendra might be willing to share it
>>>> with
>>>> > you.
>>>> >               >>>>> Dan
>>>> >               >>>>>
>>>> >               >>>>>
>>>> >               >>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
>>>> >               >>>>> <kir...@sm...> wrote:
>>>> >               >>>>>> Suppose I have a language model where one token
>>>> (a
>>>> > "word") is a
>>>> >               >>>>>> pointer
>>>> >               >>>>> to a whole another LM. This is a practical case
>>>> when
>>>> > you expect an
>>>> >               >>>>> abrupt
>>>> >               >>>>> change in model, a clear example being "my phone
>>>> > number is..." and then
>>>> >               >>>>> you'd expect them rattling a string of digits. Is
>>>> > there any support in
>>>> >               >>>>> kaldi for this?
>>>> >               >>>>>>
>>>> >               >>>>>> Thanks,
>>>> >               >>>>>>
>>>> >               >>>>>> -kkm
>>>> >               >>>>>
>>>> >               >>>>>
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >               >>>>>> One dashboard for servers and applications across
>>>> >               >>>>>> Physical-Virtual-Cloud
>>>> >               >>>>>> Widest out-of-the-box monitoring support with 50+
>>>> > applications
>>>> >               >>>>>> Performance metrics, stats and reports that give
>>>> you
>>>> > Actionable
>>>> >               >>>>>> Insights
>>>> >               >>>>>> Deep dive visibility with transaction tracing
>>>> using
>>>> > APM Insight.
>>>> >               >>>>>>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >>>>>> _______________________________________________
>>>> >               >>>>>> Kaldi-users mailing list
>>>> >               >>>>>> Kal...@li...
>>>> >               >>>>>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>>>>
>>>> >               >>>>>
>>>> >               >>>>>
>>>> >               >>>>>
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >               >>>>> One dashboard for servers and applications across
>>>> >               >>>>> Physical-Virtual-Cloud
>>>> >               >>>>> Widest out-of-the-box monitoring support with 50+
>>>> > applications
>>>> >               >>>>> Performance metrics, stats and reports that give
>>>> you
>>>> > Actionable
>>>> >               >>>>> Insights
>>>> >               >>>>> Deep dive visibility with transaction tracing
>>>> using
>>>> > APM Insight.
>>>> >               >>>>>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >>>>> _______________________________________________
>>>> >               >>>>> Kaldi-users mailing list
>>>> >               >>>>> Kal...@li...
>>>> >               >>>>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>>> -------------- next part --------------
>>>> >               >>>> An HTML attachment was scrubbed...
>>>> >               >>>>
>>>> >               >>>> ------------------------------
>>>> >               >>>>
>>>> >               >>>>
>>>> >               >>>>
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >               >>>> One dashboard for servers and applications across
>>>> > Physical-Virtual-Cloud
>>>> >               >>>> Widest out-of-the-box monitoring support with 50+
>>>> > applications
>>>> >               >>>> Performance metrics, stats and reports that give
>>>> you
>>>> > Actionable Insights
>>>> >               >>>> Deep dive visibility with transaction tracing using
>>>> > APM Insight.
>>>> >               >>>>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >>>>
>>>> >               >>>> ------------------------------
>>>> >               >>>>
>>>> >               >>>> _______________________________________________
>>>> >               >>>> Kaldi-users mailing list
>>>> >               >>>> Kal...@li...
>>>> >               >>>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>>>
>>>> >               >>>>
>>>> >               >>>> End of Kaldi-users Digest, Vol 29, Issue 15
>>>> >               >>>> *******************************************
>>>> >               >>>
>>>> >               >>>
>>>> >               >>>
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >               >>> One dashboard for servers and applications across
>>>> > Physical-Virtual-Cloud
>>>> >               >>> Widest out-of-the-box monitoring support with 50+
>>>> > applications
>>>> >               >>> Performance metrics, stats and reports that give you
>>>> > Actionable Insights
>>>> >               >>> Deep dive visibility with transaction tracing using
>>>> APM
>>>> > Insight.
>>>> >               >>>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >>> _______________________________________________
>>>> >               >>> Kaldi-users mailing list
>>>> >               >>> Kal...@li...
>>>> >               >>>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>
>>>> >               >>
>>>> >               >>
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >               >> One dashboard for servers and applications across
>>>> > Physical-Virtual-Cloud
>>>> >               >> Widest out-of-the-box monitoring support with 50+
>>>> > applications
>>>> >               >> Performance metrics, stats and reports that give you
>>>> > Actionable Insights
>>>> >               >> Deep dive visibility with transaction tracing using
>>>> APM
>>>> > Insight.
>>>> >               >>
>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               >> _______________________________________________
>>>> >               >> Kaldi-users mailing list
>>>> >               >> Kal...@li...
>>>> >               >>
>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >               >>
>>>> >
>>>> >
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >               One dashboard for servers and applications across
>>>> Physical-
>>>> > Virtual-Cloud
>>>> >               Widest out-of-the-box monitoring support with 50+
>>>> > applications
>>>> >               Performance metrics, stats and reports that give you
>>>> > Actionable Insights
>>>> >               Deep dive visibility with transaction tracing using APM
>>>> > Insight.
>>>> >               http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >               _______________________________________________
>>>> >               Kaldi-users mailing list
>>>> >               Kal...@li...
>>>> >
>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> -----------------------------------------------------------------------
>>>> > -
>>>> > ------
>>>> >       One dashboard for servers and applications across Physical-
>>>> > Virtual-Cloud
>>>> >       Widest out-of-the-box monitoring support with 50+ applications
>>>> >       Performance metrics, stats and reports that give you Actionable
>>>> > Insights
>>>> >       Deep dive visibility with transaction tracing using APM Insight.
>>>> >       http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> >       _______________________________________________
>>>> >       Kaldi-users mailing list
>>>> >       Kal...@li...
>>>> >       https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>> >
>>>> >
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> One dashboard for servers and applications across Physical-Virtual-Cloud
>>>> Widest out-of-the-box monitoring support with 50+ applications
>>>> Performance metrics, stats and reports that give you Actionable Insights
>>>> Deep dive visibility with transaction tracing using APM Insight.
>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>> _______________________________________________
>>>> Kaldi-users mailing list
>>>> Kal...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>
>>>
>>>
>>>
>>> --
>>> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
>>> ond...@gm...
>>>
>>>
>>> ------------------------------------------------------------------------------
>>>
>>> _______________________________________________
>>> Kaldi-users mailing list
>>> Kal...@li...
>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>
>>>
>>
>
>
> --
> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
> ond...@gm...
>

Re: [Kaldi-users] LM grafting

From: Ondrej P. <ond...@gm...> - 2015-06-17 20:25:36

We currently use the script below for creating arpa LM based on CB LM and
mixed from out of domain data and indomain data LM which are not classed
based. Given the arpa file we convert it
https://github.com/UFAL-DSG/alex/blob/master/alex/applications/PublicTransportInfoCS/lm/build.py

Note, that this CB-MODEL estimation has several drawbacks.
The biggest one that we do not compute bigrams (or higher ngrams) estimates
if there two classes in one bigram e.g. I want connection CITY CITY.
I am working on improving this as a side project.

Another important problem is that we need to expand the LM with instances
of the classes which significantly increase the size of the lexicon, and
also the higher order ngrams in the LM.

I was not sure if you do want to do it this way in Kaldi or if you want to
do it on FST level.

PS: I attached the Czech class file
 classes.txt.zip
<https://drive.google.com/file/d/0B_cd-iN3UhaVOFpIWGlic0F5cUU/edit?usp=drive_web>






On Wed, Jun 17, 2015 at 10:11 PM, Sandeep Reddy <san...@go...>
wrote:

> Does the kaldi recipe do Class LM? Or can you add it to recipe? That would
> make the whole process so much easier. I don't mind if the words are Czetch.
>
> On Wed, Jun 17, 2015 at 4:08 PM, Ondrej Platek <ond...@gm...>
> wrote:
>
>> For the Czech data we are running the system live with Kaldi and we use
>> class LM.
>> For the English data I will give you few examples from top of my head:
>>
>> PRICE_RANGE - cheap, middle price-range...
>> FOOD_TYPE - Indian, Chinese,
>> LOCATION - city center, Chesterton area, ..
>> ....
>>
>> We will try to find the classes definition, since we are not running the
>> system.
>>
>> Ondrej
>>
>> On Wed, Jun 17, 2015 at 10:01 PM, Sandeep Reddy <
>> san...@go...> wrote:
>>
>>> Ondrej,
>>>    I'll run the Vystadial recipe and see what opportunities are there.
>>> Did somebody already make a class LM on it or at least define what
>>> potential classes are? I hadn't looked into it earlier.
>>> Thanks
>>> Nagendra
>>>
>>> On Wed, Jun 17, 2015 at 3:42 AM, Ondrej Platek <ond...@gm...>
>>> wrote:
>>>
>>>> Dear all,
>>>>
>>>> thanks to reminder of Dimitris, I realized that the Vystadial dataset
>>>> is very convenient for Class based LM/ LM grafting.
>>>> As the scripts for Vystadial Cs & En are already in Kaldi it may be
>>>> convenient starting data because
>>>> they contain transcription of user utterances from communication with
>>>> spoken dialogue system where we have the classes defined.
>>>>
>>>> See scritps:
>>>> https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_en
>>>> https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_cz
>>>>
>>>> See data (scroll to the bottom to download the datasets):
>>>> http://hdl.handle.net/11858/00-097C-0000-0023-4671-4  (en)
>>>> http://hdl.handle.net/11858/00-097C-0000-0023-4670-6 (cs)
>>>>
>>>>
>>>> We can probably recreate / find the list of words in the classes for
>>>> English if there is interest.
>>>> For Czech this should be no problem at all.
>>>>
>>>> Please, let me know if you are interested in these datasets and the
>>>> lists of classes and their members.
>>>>
>>>> Ondra
>>>>
>>>> PS: Currently, we used classed based (CB) LM which we later expand to
>>>> full LM in arpa format than create G.fst as in standard use case.
>>>> It is not optimal attitude but it works for us.
>>>> If you want to know how we are modeling the  CBLM just let me know, I
>>>> am working on slight improvement of it right now,
>>>> so I am interested in improving it.
>>>>
>>>>
>>>> On Tue, May 26, 2015 at 8:11 PM, Kirill Katsnelson <
>>>> kir...@sm...> wrote:
>>>>
>>>>> Speaking about data set preprocessing only, will Stanford NLP POS
>>>>> tagger pull the trick?
>>>>>
>>>>>  -kkm
>>>>>
>>>>> > -----Original Message-----
>>>>> > From: Nagendra Goel [mailto:nag...@go...]
>>>>> > Sent: 2015-05-24 1511
>>>>> > To: Matthew Aylett
>>>>> > Cc: Dimitris Vassos; kal...@li...
>>>>> > Subject: Re: [Kaldi-users] LM grafting
>>>>> >
>>>>> > A systematic way for identifying special elements in text will be
>>>>> very
>>>>> > useful. Currently  NSW-EXPAND from festival conflicts with this sub-
>>>>> > grammar approach although otherwise it's a good lm pre-processing
>>>>> step.
>>>>> >
>>>>> > Nagendra Kumar Goel
>>>>> >
>>>>> > On May 24, 2015 4:45 PM, "Matthew Aylett" <mat...@gm...>
>>>>> > wrote:
>>>>> >
>>>>> >
>>>>> >       Not sure if this is relevant to this thread. But in the speech
>>>>> > synthesis system branch we have a very early text normaliser which
>>>>> > (when
>>>>> > complete) will detect things like phone numbers addresses, currencies
>>>>> > etc. The output form this could then be used to inform language model
>>>>> > building. Currently it deals with symbols and tokenisations in
>>>>> English.
>>>>> >
>>>>> >       Potentially `(although I wasn't currently planning on this),
>>>>> the
>>>>> > text normaliser could be written in thrax - based on openfst -
>>>>> authored
>>>>> > by Richard Sproat I believe). However if this approach would benefit
>>>>> > ASR as well then it might be worth doing it this way rather than my
>>>>> > plan of a simple greedy normaliser.
>>>>> >
>>>>> >
>>>>> >       v best
>>>>> >
>>>>> >       Matthew Aylett
>>>>> >
>>>>> >
>>>>> >       On Sun, May 24, 2015 at 8:34 AM, Dimitris Vassos
>>>>> > <dva...@gm...> wrote:
>>>>> >
>>>>> >
>>>>> >               We have access to several corpora and we are trying to
>>>>> put
>>>>> > together something appropriate.
>>>>> >
>>>>> >               In the next couple of days, we will also volunteer a
>>>>> server
>>>>> > to set it all up and run the tests.
>>>>> >
>>>>> >               Dimitris
>>>>> >
>>>>> >               > On 24 Μαΐ 2015, at 02:06, Daniel Povey <
>>>>> dp...@gm...>
>>>>> > wrote:
>>>>> >               >
>>>>> >               > One possibility is to use a completely open-source
>>>>> setup,
>>>>> > e.g.
>>>>> >               > Voxforge, and forget about the "has a clear
>>>>> advantage"
>>>>> > requirement.
>>>>> >               > E.g. target anything that looks like a year, and
>>>>> make a
>>>>> > grammar for
>>>>> >               > years.
>>>>> >               > Dan
>>>>> >               >
>>>>> >               >
>>>>> >               > On Fri, May 22, 2015 at 6:32 AM, Nagendra Goel
>>>>> >               > <nag...@go...> wrote:
>>>>> >               >> Since I cannot volunteer my enviornment, do you
>>>>> > recommend another
>>>>> >               >> enviornment  where this can be prototyped and where
>>>>> you
>>>>> > can check in some
>>>>> >               >> class lm recipe that has advantage.
>>>>> >               >>
>>>>> >               >> Nagendra
>>>>> >               >>
>>>>> >               >> Nagendra Kumar Goel
>>>>> >               >>
>>>>> >               >>> On May 21, 2015 11:01 PM, "Dimitris Vassos"
>>>>> > <dva...@gm...> wrote:
>>>>> >               >>>
>>>>> >               >>> +1 for the class-based LMs. I have also been
>>>>> interested
>>>>> > in this
>>>>> >               >>> functionality for some time now, so will be more
>>>>> than
>>>>> > happy to try out the
>>>>> >               >>> current implementation, if possible.
>>>>> >               >>>
>>>>> >               >>> Thanks
>>>>> >               >>> Dimitris
>>>>> >               >>>
>>>>> >               >>>> On 22 Μαΐ 2015, at 01:34,
>>>>> > kal...@li...
>>>>> >               >>>> wrote:
>>>>> >               >>>>
>>>>> >               >>>> Send Kaldi-users mailing list submissions to
>>>>> >               >>>>   kal...@li...
>>>>> >               >>>>
>>>>> >               >>>> To subscribe or unsubscribe via the World Wide
>>>>> Web,
>>>>> > visit
>>>>> >               >>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>> or, via email, send a message with subject or body
>>>>> > 'help' to
>>>>> >               >>>>   kal...@li...
>>>>> >               >>>>
>>>>> >               >>>> You can reach the person managing the list at
>>>>> >               >>>>   kal...@li...
>>>>> >               >>>>
>>>>> >               >>>> When replying, please edit your Subject line so
>>>>> it is
>>>>> > more specific
>>>>> >               >>>> than "Re: Contents of Kaldi-users digest..."
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>> Today's Topics:
>>>>> >               >>>>
>>>>> >               >>>>  1. Re: LM grafting (Daniel Povey)
>>>>> >               >>>>  2. Re: LM grafting (Kirill Katsnelson)
>>>>> >               >>>>  3. Re: LM grafting (Hainan Xu)
>>>>> >               >>>>  4. Re: LM grafting (Sean True)
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >
>>>>> ----------------------------------------------------------------------
>>>>> >               >>>>
>>>>> >               >>>> Message: 1
>>>>> >               >>>> Date: Thu, 21 May 2015 15:04:04 -0400
>>>>> >               >>>> From: Daniel Povey <dp...@gm...>
>>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>>> >               >>>> To: Sean True <se...@se...>
>>>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>>>> >               >>>>   "kal...@li..."
>>>>> >               >>>>   <kal...@li...>,    Kirill
>>>>> > Katsnelson
>>>>> >               >>>>   <kir...@sm...>
>>>>> >               >>>> Message-ID:
>>>>> >               >>>>
>>>>> > <CAE...@ma...
>>>>> > <mailto:k4YJVsBiAfEuFDFMvY%2B...@ma...> >
>>>>> >               >>>> Content-Type: text/plain; charset=UTF-8
>>>>> >               >>>>
>>>>> >               >>>> The general approach is to create an FST for the
>>>>> > little language
>>>>> >               >>>> model, and then to use fstreplace to replace
>>>>> instances
>>>>> > of a particular
>>>>> >               >>>> symbol in the top-level language model, with that
>>>>> FST.
>>>>> >               >>>> The tricky part is ensuring that the result is
>>>>> > determinizable after
>>>>> >               >>>> composing with the lexicon.  In general our
>>>>> solution
>>>>> > is to add special
>>>>> >               >>>> disambiguation symbols at the beginning and end of
>>>>> > each of the
>>>>> >               >>>> sub-FSTs, and of course making sure that the
>>>>> sub-FSTs
>>>>> > are themselves
>>>>> >               >>>> determinizable.
>>>>> >               >>>> Dan
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>>>> > <se...@se...>
>>>>> >               >>>>> wrote:
>>>>> >               >>>>> That's a subject of some general interest. Is
>>>>> there a
>>>>> > discussion of the
>>>>> >               >>>>> general approach that was taken somewhere?
>>>>> >               >>>>>
>>>>> >               >>>>> -- Sean
>>>>> >               >>>>>
>>>>> >               >>>>> Sean True
>>>>> >               >>>>> Semantic Machines
>>>>> >               >>>>>
>>>>> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>>> > <dp...@gm...>
>>>>> >               >>>>>> wrote:
>>>>> >               >>>>>>
>>>>> >               >>>>>> Nagendra Goel has worked on some example
>>>>> scripts for
>>>>> > this type of
>>>>> >               >>>>>> thing, and with Hainan we were working on
>>>>> trying to
>>>>> > get it cleaned up
>>>>> >               >>>>>> and checked in, but he's going for an
>>>>> internship so
>>>>> > it will have to
>>>>> >               >>>>>> wait.  But Nagendra might be willing to share it
>>>>> > with you.
>>>>> >               >>>>>> Dan
>>>>> >               >>>>>>
>>>>> >               >>>>>>
>>>>> >               >>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>>> Katsnelson
>>>>> >               >>>>>> <kir...@sm...> wrote:
>>>>> >               >>>>>>> Suppose I have a language model where one
>>>>> token (a
>>>>> > "word") is a
>>>>> >               >>>>>>> pointer
>>>>> >               >>>>>>> to a whole another LM. This is a practical case
>>>>> > when you expect an
>>>>> >               >>>>>>> abrupt
>>>>> >               >>>>>>> change in model, a clear example being "my
>>>>> phone
>>>>> > number is..." and
>>>>> >               >>>>>>> then
>>>>> >               >>>>>>> you'd expect them rattling a string of digits.
>>>>> > Is there any support
>>>>> >               >>>>>>> in kaldi
>>>>> >               >>>>>>> for this?
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> Thanks,
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> -kkm
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>>>> One dashboard for servers and applications
>>>>> across
>>>>> >               >>>>>>> Physical-Virtual-Cloud
>>>>> >               >>>>>>> Widest out-of-the-box monitoring support with
>>>>> > 50+ applications
>>>>> >               >>>>>>> Performance metrics, stats and reports that
>>>>> give
>>>>> > you Actionable
>>>>> >               >>>>>>> Insights
>>>>> >               >>>>>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>>> _______________________________________________
>>>>> >               >>>>>>> Kaldi-users mailing list
>>>>> >               >>>>>>> Kal...@li...
>>>>> >               >>>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>>>
>>>>> >               >>>>>>
>>>>> >               >>>>>>
>>>>> >               >>>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>>> One dashboard for servers and applications
>>>>> across
>>>>> >               >>>>>> Physical-Virtual-Cloud
>>>>> >               >>>>>> Widest out-of-the-box monitoring support with
>>>>> 50+
>>>>> > applications
>>>>> >               >>>>>> Performance metrics, stats and reports that
>>>>> give you
>>>>> > Actionable
>>>>> >               >>>>>> Insights
>>>>> >               >>>>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>> _______________________________________________
>>>>> >               >>>>>> Kaldi-users mailing list
>>>>> >               >>>>>> Kal...@li...
>>>>> >               >>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>> ------------------------------
>>>>> >               >>>>
>>>>> >               >>>> Message: 2
>>>>> >               >>>> Date: Thu, 21 May 2015 19:24:38 +0000
>>>>> >               >>>> From: Kirill Katsnelson
>>>>> > <kir...@sm...>
>>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>>> >               >>>> To: "dp...@gm..." <dp...@gm...>, Sean
>>>>> True
>>>>> >               >>>>   <se...@se...>
>>>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>>>> >               >>>>   "kal...@li..."
>>>>> >               >>>>   <kal...@li...>
>>>>> >               >>>> Message-ID:
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >
>>>>> <CY1...@CY...
>>>>> > l
>>>>> > ook.com>
>>>>> >               >>>>
>>>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>>>> >               >>>>
>>>>> >               >>>> Also, from the practical standpoint,
>>>>> > backoff/discounting weights usually
>>>>> >               >>>> need to be massaged. Otherwise when the grafted
>>>>> LM is
>>>>> > small and the main LM
>>>>> >               >>>> is large, the little model will tend to shoehorn
>>>>> an
>>>>> > utterance into itself
>>>>> >               >>>> rather than let go of it. In my phone number
>>>>> example,
>>>>> > everything becomes
>>>>> >               >>>> digits once the phone number starts.
>>>>> >               >>>>
>>>>> >               >>>> -kkm
>>>>> >               >>>>
>>>>> >               >>>>> -----Original Message-----
>>>>> >               >>>>> From: Daniel Povey [mailto:dp...@gm...]
>>>>> >               >>>>> Sent: 2015-05-21 1204
>>>>> >               >>>>> To: Sean True
>>>>> >               >>>>> Cc: Kirill Katsnelson; Nagendra Goel; Hainan Xu;
>>>>> > kaldi-
>>>>> >               >>>>> us...@li...
>>>>> >               >>>>> Subject: Re: [Kaldi-users] LM grafting
>>>>> >               >>>>>
>>>>> >               >>>>> The general approach is to create an FST for the
>>>>> > little language model,
>>>>> >               >>>>> and then to use fstreplace to replace instances
>>>>> of a
>>>>> > particular symbol
>>>>> >               >>>>> in the top-level language model, with that FST.
>>>>> >               >>>>> The tricky part is ensuring that the result is
>>>>> > determinizable after
>>>>> >               >>>>> composing with the lexicon.  In general our
>>>>> solution
>>>>> > is to add special
>>>>> >               >>>>> disambiguation symbols at the beginning and end
>>>>> of
>>>>> > each of the sub-
>>>>> >               >>>>> FSTs, and of course making sure that the
>>>>> sub-FSTs are
>>>>> > themselves
>>>>> >               >>>>> determinizable.
>>>>> >               >>>>> Dan
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>>>> > <se...@se...>
>>>>> >               >>>>> wrote:
>>>>> >               >>>>>> That's a subject of some general interest. Is
>>>>> there
>>>>> > a discussion of
>>>>> >               >>>>>> the general approach that was taken somewhere?
>>>>> >               >>>>>>
>>>>> >               >>>>>> -- Sean
>>>>> >               >>>>>>
>>>>> >               >>>>>> Sean True
>>>>> >               >>>>>> Semantic Machines
>>>>> >               >>>>>>
>>>>> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>>> > <dp...@gm...>
>>>>> >               >>>>> wrote:
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> Nagendra Goel has worked on some example
>>>>> scripts
>>>>> > for this type of
>>>>> >               >>>>>>> thing, and with Hainan we were working on
>>>>> trying to
>>>>> > get it cleaned
>>>>> >               >>>>> up
>>>>> >               >>>>>>> and checked in, but he's going for an
>>>>> internship so
>>>>> > it will have to
>>>>> >               >>>>>>> wait.  But Nagendra might be willing to share
>>>>> it
>>>>> > with you.
>>>>> >               >>>>>>> Dan
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>>> Katsnelson
>>>>> >               >>>>>>> <kir...@sm...> wrote:
>>>>> >               >>>>>>>> Suppose I have a language model where one
>>>>> token (a
>>>>> > "word") is a
>>>>> >               >>>>>>>> pointer to a whole another LM. This is a
>>>>> practical
>>>>> > case when you
>>>>> >               >>>>>>>> expect an abrupt change in model, a clear
>>>>> example
>>>>> > being "my phone
>>>>> >               >>>>>>>> number is..." and then you'd expect them
>>>>> rattling
>>>>> > a string of
>>>>> >               >>>>>>>> digits. Is there any support in kaldi for
>>>>> this?
>>>>> >               >>>>>>>>
>>>>> >               >>>>>>>> Thanks,
>>>>> >               >>>>>>>>
>>>>> >               >>>>>>>> -kkm
>>>>> >               >>>>>>>>
>>>>> >               >>>>>>>>
>>>>> > ------------------------------------------------------------------
>>>>> >               >>>>> -
>>>>> >               >>>>>>>> ----------- One dashboard for servers and
>>>>> > applications across
>>>>> >               >>>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
>>>>> > monitoring support
>>>>> >               >>>>>>>> with 50+ applications Performance metrics,
>>>>> stats
>>>>> > and reports that
>>>>> >               >>>>>>>> give you Actionable Insights Deep dive
>>>>> visibility
>>>>> > with transaction
>>>>> >               >>>>>>>> tracing using APM Insight.
>>>>> >               >>>>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>>>>
>>>>> _______________________________________________
>>>>> >               >>>>>>>> Kaldi-users mailing list
>>>>> >               >>>>>>>> Kal...@li...
>>>>> >               >>>>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> > --------------------------------------------------------------------
>>>>> >               >>>>> -
>>>>> >               >>>>>>> --------- One dashboard for servers and
>>>>> > applications across
>>>>> >               >>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
>>>>> > monitoring support with
>>>>> >               >>>>>>> 50+ applications Performance metrics, stats and
>>>>> > reports that give
>>>>> >               >>>>> you
>>>>> >               >>>>>>> Actionable Insights Deep dive visibility with
>>>>> > transaction tracing
>>>>> >               >>>>>>> using APM Insight.
>>>>> >               >>>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>>> _______________________________________________
>>>>> >               >>>>>>> Kaldi-users mailing list
>>>>> >               >>>>>>> Kal...@li...
>>>>> >               >>>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>
>>>>> >               >>>> ------------------------------
>>>>> >               >>>>
>>>>> >               >>>> Message: 3
>>>>> >               >>>> Date: Thu, 21 May 2015 15:29:54 -0400
>>>>> >               >>>> From: Hainan Xu <hai...@gm...>
>>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>>> >               >>>> To: Daniel Povey <dp...@gm...>
>>>>> >               >>>> Cc: Sean True <se...@se...>,
>>>>> >               >>>>   "kal...@li..."
>>>>> >               >>>>   <kal...@li...>,    Kirill
>>>>> > Katsnelson
>>>>> >               >>>>   <kir...@sm...>
>>>>> >               >>>> Message-ID:
>>>>> >               >>>>
>>>>> > <CAL...@ma...>
>>>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>>>> >               >>>>
>>>>> >               >>>> There is a paper in ICASSP 2015 that described
>>>>> some
>>>>> > very similar idea:
>>>>> >               >>>>
>>>>> >               >>>> Improved recognition of contact names in voice
>>>>> > commands
>>>>> >               >>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 3:04 PM, Daniel Povey
>>>>> > <dp...@gm...> wrote:
>>>>> >               >>>>>
>>>>> >               >>>>> The general approach is to create an FST for the
>>>>> > little language
>>>>> >               >>>>> model, and then to use fstreplace to replace
>>>>> > instances of a particular
>>>>> >               >>>>> symbol in the top-level language model, with that
>>>>> > FST.
>>>>> >               >>>>> The tricky part is ensuring that the result is
>>>>> > determinizable after
>>>>> >               >>>>> composing with the lexicon.  In general our
>>>>> solution
>>>>> > is to add special
>>>>> >               >>>>> disambiguation symbols at the beginning and end
>>>>> of
>>>>> > each of the
>>>>> >               >>>>> sub-FSTs, and of course making sure that the
>>>>> sub-FSTs
>>>>> > are themselves
>>>>> >               >>>>> determinizable.
>>>>> >               >>>>> Dan
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>>>> > <se...@se...>
>>>>> >               >>>>> wrote:
>>>>> >               >>>>>> That's a subject of some general interest. Is
>>>>> there
>>>>> > a discussion of
>>>>> >               >>>>>> the
>>>>> >               >>>>>> general approach that was taken somewhere?
>>>>> >               >>>>>>
>>>>> >               >>>>>> -- Sean
>>>>> >               >>>>>>
>>>>> >               >>>>>> Sean True
>>>>> >               >>>>>> Semantic Machines
>>>>> >               >>>>>>
>>>>> >               >>>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>>> > <dp...@gm...>
>>>>> >               >>>>>>> wrote:
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> Nagendra Goel has worked on some example
>>>>> scripts
>>>>> > for this type of
>>>>> >               >>>>>>> thing, and with Hainan we were working on
>>>>> trying to
>>>>> > get it cleaned up
>>>>> >               >>>>>>> and checked in, but he's going for an
>>>>> internship so
>>>>> > it will have to
>>>>> >               >>>>>>> wait.  But Nagendra might be willing to share
>>>>> it
>>>>> > with you.
>>>>> >               >>>>>>> Dan
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>>> Katsnelson
>>>>> >               >>>>>>> <kir...@sm...> wrote:
>>>>> >               >>>>>>>> Suppose I have a language model where one
>>>>> token (a
>>>>> > "word") is a
>>>>> >               >>>>> pointer
>>>>> >               >>>>>>>> to a whole another LM. This is a practical
>>>>> case
>>>>> > when you expect an
>>>>> >               >>>>> abrupt
>>>>> >               >>>>>>>> change in model, a clear example being "my
>>>>> phone
>>>>> > number is..." and
>>>>> >               >>>>> then
>>>>> >               >>>>>>>> you'd expect them rattling a string of digits.
>>>>> > Is there any support
>>>>> >               >>>>> in kaldi
>>>>> >               >>>>>>>> for this?
>>>>> >               >>>>>>>>
>>>>> >               >>>>>>>> Thanks,
>>>>> >               >>>>>>>>
>>>>> >               >>>>>>>> -kkm
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>>>>> One dashboard for servers and applications
>>>>> across
>>>>> >               >>>>> Physical-Virtual-Cloud
>>>>> >               >>>>>>>> Widest out-of-the-box monitoring support with
>>>>> > 50+ applications
>>>>> >               >>>>>>>> Performance metrics, stats and reports that
>>>>> give
>>>>> > you Actionable
>>>>> >               >>>>> Insights
>>>>> >               >>>>>>>> Deep dive visibility with transaction tracing
>>>>> > using APM Insight.
>>>>> >               >>>>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>>>>
>>>>> _______________________________________________
>>>>> >               >>>>>>>> Kaldi-users mailing list
>>>>> >               >>>>>>>> Kal...@li...
>>>>> >               >>>>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>>>> One dashboard for servers and applications
>>>>> across
>>>>> >               >>>>>>> Physical-Virtual-Cloud
>>>>> >               >>>>>>> Widest out-of-the-box monitoring support with
>>>>> > 50+ applications
>>>>> >               >>>>>>> Performance metrics, stats and reports that
>>>>> give
>>>>> > you Actionable
>>>>> >               >>>>>>> Insights
>>>>> >               >>>>>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>>> _______________________________________________
>>>>> >               >>>>>>> Kaldi-users mailing list
>>>>> >               >>>>>>> Kal...@li...
>>>>> >               >>>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>> --
>>>>> >               >>>> - Hainan
>>>>> >               >>>> -------------- next part --------------
>>>>> >               >>>> An HTML attachment was scrubbed...
>>>>> >               >>>>
>>>>> >               >>>> ------------------------------
>>>>> >               >>>>
>>>>> >               >>>> Message: 4
>>>>> >               >>>> Date: Thu, 21 May 2015 15:01:51 -0400
>>>>> >               >>>> From: Sean True <se...@se...>
>>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>>> >               >>>> To: Daniel Povey <dp...@gm...>
>>>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>>>> >               >>>>   "kal...@li..."
>>>>> >               >>>>   <kal...@li...>,    Kirill
>>>>> > Katsnelson
>>>>> >               >>>>   <kir...@sm...>
>>>>> >               >>>> Message-ID:
>>>>> >               >>>>
>>>>> > <CAL...@ma...>
>>>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>>>> >               >>>>
>>>>> >               >>>> That's a subject of some general interest. Is
>>>>> there a
>>>>> > discussion of the
>>>>> >               >>>> general approach that was taken somewhere?
>>>>> >               >>>>
>>>>> >               >>>> -- Sean
>>>>> >               >>>>
>>>>> >               >>>> Sean True
>>>>> >               >>>> Semantic Machines
>>>>> >               >>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>>> > <dp...@gm...> wrote:
>>>>> >               >>>>>
>>>>> >               >>>>> Nagendra Goel has worked on some example scripts
>>>>> for
>>>>> > this type of
>>>>> >               >>>>> thing, and with Hainan we were working on trying
>>>>> to
>>>>> > get it cleaned up
>>>>> >               >>>>> and checked in, but he's going for an internship
>>>>> so
>>>>> > it will have to
>>>>> >               >>>>> wait.  But Nagendra might be willing to share it
>>>>> with
>>>>> > you.
>>>>> >               >>>>> Dan
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>>> Katsnelson
>>>>> >               >>>>> <kir...@sm...> wrote:
>>>>> >               >>>>>> Suppose I have a language model where one token
>>>>> (a
>>>>> > "word") is a
>>>>> >               >>>>>> pointer
>>>>> >               >>>>> to a whole another LM. This is a practical case
>>>>> when
>>>>> > you expect an
>>>>> >               >>>>> abrupt
>>>>> >               >>>>> change in model, a clear example being "my phone
>>>>> > number is..." and then
>>>>> >               >>>>> you'd expect them rattling a string of digits. Is
>>>>> > there any support in
>>>>> >               >>>>> kaldi for this?
>>>>> >               >>>>>>
>>>>> >               >>>>>> Thanks,
>>>>> >               >>>>>>
>>>>> >               >>>>>> -kkm
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>>> One dashboard for servers and applications
>>>>> across
>>>>> >               >>>>>> Physical-Virtual-Cloud
>>>>> >               >>>>>> Widest out-of-the-box monitoring support with
>>>>> 50+
>>>>> > applications
>>>>> >               >>>>>> Performance metrics, stats and reports that
>>>>> give you
>>>>> > Actionable
>>>>> >               >>>>>> Insights
>>>>> >               >>>>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>> _______________________________________________
>>>>> >               >>>>>> Kaldi-users mailing list
>>>>> >               >>>>>> Kal...@li...
>>>>> >               >>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>> One dashboard for servers and applications across
>>>>> >               >>>>> Physical-Virtual-Cloud
>>>>> >               >>>>> Widest out-of-the-box monitoring support with 50+
>>>>> > applications
>>>>> >               >>>>> Performance metrics, stats and reports that give
>>>>> you
>>>>> > Actionable
>>>>> >               >>>>> Insights
>>>>> >               >>>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>> _______________________________________________
>>>>> >               >>>>> Kaldi-users mailing list
>>>>> >               >>>>> Kal...@li...
>>>>> >               >>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>> -------------- next part --------------
>>>>> >               >>>> An HTML attachment was scrubbed...
>>>>> >               >>>>
>>>>> >               >>>> ------------------------------
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>> One dashboard for servers and applications across
>>>>> > Physical-Virtual-Cloud
>>>>> >               >>>> Widest out-of-the-box monitoring support with 50+
>>>>> > applications
>>>>> >               >>>> Performance metrics, stats and reports that give
>>>>> you
>>>>> > Actionable Insights
>>>>> >               >>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>
>>>>> >               >>>> ------------------------------
>>>>> >               >>>>
>>>>> >               >>>> _______________________________________________
>>>>> >               >>>> Kaldi-users mailing list
>>>>> >               >>>> Kal...@li...
>>>>> >               >>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>> End of Kaldi-users Digest, Vol 29, Issue 15
>>>>> >               >>>> *******************************************
>>>>> >               >>>
>>>>> >               >>>
>>>>> >               >>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>> One dashboard for servers and applications across
>>>>> > Physical-Virtual-Cloud
>>>>> >               >>> Widest out-of-the-box monitoring support with 50+
>>>>> > applications
>>>>> >               >>> Performance metrics, stats and reports that give
>>>>> you
>>>>> > Actionable Insights
>>>>> >               >>> Deep dive visibility with transaction tracing
>>>>> using APM
>>>>> > Insight.
>>>>> >               >>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>> _______________________________________________
>>>>> >               >>> Kaldi-users mailing list
>>>>> >               >>> Kal...@li...
>>>>> >               >>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>
>>>>> >               >>
>>>>> >               >>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >> One dashboard for servers and applications across
>>>>> > Physical-Virtual-Cloud
>>>>> >               >> Widest out-of-the-box monitoring support with 50+
>>>>> > applications
>>>>> >               >> Performance metrics, stats and reports that give you
>>>>> > Actionable Insights
>>>>> >               >> Deep dive visibility with transaction tracing using
>>>>> APM
>>>>> > Insight.
>>>>> >               >>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >> _______________________________________________
>>>>> >               >> Kaldi-users mailing list
>>>>> >               >> Kal...@li...
>>>>> >               >>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>
>>>>> >
>>>>> >
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               One dashboard for servers and applications across
>>>>> Physical-
>>>>> > Virtual-Cloud
>>>>> >               Widest out-of-the-box monitoring support with 50+
>>>>> > applications
>>>>> >               Performance metrics, stats and reports that give you
>>>>> > Actionable Insights
>>>>> >               Deep dive visibility with transaction tracing using APM
>>>>> > Insight.
>>>>> >
>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               _______________________________________________
>>>>> >               Kaldi-users mailing list
>>>>> >               Kal...@li...
>>>>> >
>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >       One dashboard for servers and applications across Physical-
>>>>> > Virtual-Cloud
>>>>> >       Widest out-of-the-box monitoring support with 50+ applications
>>>>> >       Performance metrics, stats and reports that give you Actionable
>>>>> > Insights
>>>>> >       Deep dive visibility with transaction tracing using APM
>>>>> Insight.
>>>>> >       http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >       _______________________________________________
>>>>> >       Kaldi-users mailing list
>>>>> >       Kal...@li...
>>>>> >       https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> One dashboard for servers and applications across
>>>>> Physical-Virtual-Cloud
>>>>> Widest out-of-the-box monitoring support with 50+ applications
>>>>> Performance metrics, stats and reports that give you Actionable
>>>>> Insights
>>>>> Deep dive visibility with transaction tracing using APM Insight.
>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> _______________________________________________
>>>>> Kaldi-users mailing list
>>>>> Kal...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
>>>> ond...@gm...
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Kaldi-users mailing list
>>>> Kal...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>
>>>>
>>>
>>
>>
>> --
>> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
>> ond...@gm...
>>
>
>


-- 
Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, ond...@gm...

Re: [Kaldi-users] LM grafting

From: Ondrej P. <ond...@gm...> - 2015-06-17 20:08:21

For the Czech data we are running the system live with Kaldi and we use
class LM.
For the English data I will give you few examples from top of my head:

PRICE_RANGE - cheap, middle price-range...
FOOD_TYPE - Indian, Chinese,
LOCATION - city center, Chesterton area, ..
....

We will try to find the classes definition, since we are not running the
system.

Ondrej

On Wed, Jun 17, 2015 at 10:01 PM, Sandeep Reddy <san...@go...>
wrote:

> Ondrej,
>    I'll run the Vystadial recipe and see what opportunities are there. Did
> somebody already make a class LM on it or at least define what potential
> classes are? I hadn't looked into it earlier.
> Thanks
> Nagendra
>
> On Wed, Jun 17, 2015 at 3:42 AM, Ondrej Platek <ond...@gm...>
> wrote:
>
>> Dear all,
>>
>> thanks to reminder of Dimitris, I realized that the Vystadial dataset is
>> very convenient for Class based LM/ LM grafting.
>> As the scripts for Vystadial Cs & En are already in Kaldi it may be
>> convenient starting data because
>> they contain transcription of user utterances from communication with
>> spoken dialogue system where we have the classes defined.
>>
>> See scritps:
>> https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_en
>> https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_cz
>>
>> See data (scroll to the bottom to download the datasets):
>> http://hdl.handle.net/11858/00-097C-0000-0023-4671-4  (en)
>> http://hdl.handle.net/11858/00-097C-0000-0023-4670-6 (cs)
>>
>>
>> We can probably recreate / find the list of words in the classes for
>> English if there is interest.
>> For Czech this should be no problem at all.
>>
>> Please, let me know if you are interested in these datasets and the lists
>> of classes and their members.
>>
>> Ondra
>>
>> PS: Currently, we used classed based (CB) LM which we later expand to
>> full LM in arpa format than create G.fst as in standard use case.
>> It is not optimal attitude but it works for us.
>> If you want to know how we are modeling the  CBLM just let me know, I am
>> working on slight improvement of it right now,
>> so I am interested in improving it.
>>
>>
>> On Tue, May 26, 2015 at 8:11 PM, Kirill Katsnelson <
>> kir...@sm...> wrote:
>>
>>> Speaking about data set preprocessing only, will Stanford NLP POS tagger
>>> pull the trick?
>>>
>>>  -kkm
>>>
>>> > -----Original Message-----
>>> > From: Nagendra Goel [mailto:nag...@go...]
>>> > Sent: 2015-05-24 1511
>>> > To: Matthew Aylett
>>> > Cc: Dimitris Vassos; kal...@li...
>>> > Subject: Re: [Kaldi-users] LM grafting
>>> >
>>> > A systematic way for identifying special elements in text will be very
>>> > useful. Currently  NSW-EXPAND from festival conflicts with this sub-
>>> > grammar approach although otherwise it's a good lm pre-processing step.
>>> >
>>> > Nagendra Kumar Goel
>>> >
>>> > On May 24, 2015 4:45 PM, "Matthew Aylett" <mat...@gm...>
>>> > wrote:
>>> >
>>> >
>>> >       Not sure if this is relevant to this thread. But in the speech
>>> > synthesis system branch we have a very early text normaliser which
>>> > (when
>>> > complete) will detect things like phone numbers addresses, currencies
>>> > etc. The output form this could then be used to inform language model
>>> > building. Currently it deals with symbols and tokenisations in English.
>>> >
>>> >       Potentially `(although I wasn't currently planning on this), the
>>> > text normaliser could be written in thrax - based on openfst - authored
>>> > by Richard Sproat I believe). However if this approach would benefit
>>> > ASR as well then it might be worth doing it this way rather than my
>>> > plan of a simple greedy normaliser.
>>> >
>>> >
>>> >       v best
>>> >
>>> >       Matthew Aylett
>>> >
>>> >
>>> >       On Sun, May 24, 2015 at 8:34 AM, Dimitris Vassos
>>> > <dva...@gm...> wrote:
>>> >
>>> >
>>> >               We have access to several corpora and we are trying to
>>> put
>>> > together something appropriate.
>>> >
>>> >               In the next couple of days, we will also volunteer a
>>> server
>>> > to set it all up and run the tests.
>>> >
>>> >               Dimitris
>>> >
>>> >               > On 24 Μαΐ 2015, at 02:06, Daniel Povey <
>>> dp...@gm...>
>>> > wrote:
>>> >               >
>>> >               > One possibility is to use a completely open-source
>>> setup,
>>> > e.g.
>>> >               > Voxforge, and forget about the "has a clear advantage"
>>> > requirement.
>>> >               > E.g. target anything that looks like a year, and make a
>>> > grammar for
>>> >               > years.
>>> >               > Dan
>>> >               >
>>> >               >
>>> >               > On Fri, May 22, 2015 at 6:32 AM, Nagendra Goel
>>> >               > <nag...@go...> wrote:
>>> >               >> Since I cannot volunteer my enviornment, do you
>>> > recommend another
>>> >               >> enviornment  where this can be prototyped and where
>>> you
>>> > can check in some
>>> >               >> class lm recipe that has advantage.
>>> >               >>
>>> >               >> Nagendra
>>> >               >>
>>> >               >> Nagendra Kumar Goel
>>> >               >>
>>> >               >>> On May 21, 2015 11:01 PM, "Dimitris Vassos"
>>> > <dva...@gm...> wrote:
>>> >               >>>
>>> >               >>> +1 for the class-based LMs. I have also been
>>> interested
>>> > in this
>>> >               >>> functionality for some time now, so will be more than
>>> > happy to try out the
>>> >               >>> current implementation, if possible.
>>> >               >>>
>>> >               >>> Thanks
>>> >               >>> Dimitris
>>> >               >>>
>>> >               >>>> On 22 Μαΐ 2015, at 01:34,
>>> > kal...@li...
>>> >               >>>> wrote:
>>> >               >>>>
>>> >               >>>> Send Kaldi-users mailing list submissions to
>>> >               >>>>   kal...@li...
>>> >               >>>>
>>> >               >>>> To subscribe or unsubscribe via the World Wide Web,
>>> > visit
>>> >               >>>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>>> or, via email, send a message with subject or body
>>> > 'help' to
>>> >               >>>>   kal...@li...
>>> >               >>>>
>>> >               >>>> You can reach the person managing the list at
>>> >               >>>>   kal...@li...
>>> >               >>>>
>>> >               >>>> When replying, please edit your Subject line so it
>>> is
>>> > more specific
>>> >               >>>> than "Re: Contents of Kaldi-users digest..."
>>> >               >>>>
>>> >               >>>>
>>> >               >>>> Today's Topics:
>>> >               >>>>
>>> >               >>>>  1. Re: LM grafting (Daniel Povey)
>>> >               >>>>  2. Re: LM grafting (Kirill Katsnelson)
>>> >               >>>>  3. Re: LM grafting (Hainan Xu)
>>> >               >>>>  4. Re: LM grafting (Sean True)
>>> >               >>>>
>>> >               >>>>
>>> >               >>>>
>>> > ----------------------------------------------------------------------
>>> >               >>>>
>>> >               >>>> Message: 1
>>> >               >>>> Date: Thu, 21 May 2015 15:04:04 -0400
>>> >               >>>> From: Daniel Povey <dp...@gm...>
>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>> >               >>>> To: Sean True <se...@se...>
>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>> >               >>>>   "kal...@li..."
>>> >               >>>>   <kal...@li...>,    Kirill
>>> > Katsnelson
>>> >               >>>>   <kir...@sm...>
>>> >               >>>> Message-ID:
>>> >               >>>>
>>> > <CAE...@ma...
>>> > <mailto:k4YJVsBiAfEuFDFMvY%2B...@ma...> >
>>> >               >>>> Content-Type: text/plain; charset=UTF-8
>>> >               >>>>
>>> >               >>>> The general approach is to create an FST for the
>>> > little language
>>> >               >>>> model, and then to use fstreplace to replace
>>> instances
>>> > of a particular
>>> >               >>>> symbol in the top-level language model, with that
>>> FST.
>>> >               >>>> The tricky part is ensuring that the result is
>>> > determinizable after
>>> >               >>>> composing with the lexicon.  In general our solution
>>> > is to add special
>>> >               >>>> disambiguation symbols at the beginning and end of
>>> > each of the
>>> >               >>>> sub-FSTs, and of course making sure that the
>>> sub-FSTs
>>> > are themselves
>>> >               >>>> determinizable.
>>> >               >>>> Dan
>>> >               >>>>
>>> >               >>>>
>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>> > <se...@se...>
>>> >               >>>>> wrote:
>>> >               >>>>> That's a subject of some general interest. Is
>>> there a
>>> > discussion of the
>>> >               >>>>> general approach that was taken somewhere?
>>> >               >>>>>
>>> >               >>>>> -- Sean
>>> >               >>>>>
>>> >               >>>>> Sean True
>>> >               >>>>> Semantic Machines
>>> >               >>>>>
>>> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>> > <dp...@gm...>
>>> >               >>>>>> wrote:
>>> >               >>>>>>
>>> >               >>>>>> Nagendra Goel has worked on some example scripts
>>> for
>>> > this type of
>>> >               >>>>>> thing, and with Hainan we were working on trying
>>> to
>>> > get it cleaned up
>>> >               >>>>>> and checked in, but he's going for an internship
>>> so
>>> > it will have to
>>> >               >>>>>> wait.  But Nagendra might be willing to share it
>>> > with you.
>>> >               >>>>>> Dan
>>> >               >>>>>>
>>> >               >>>>>>
>>> >               >>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
>>> >               >>>>>> <kir...@sm...> wrote:
>>> >               >>>>>>> Suppose I have a language model where one token
>>> (a
>>> > "word") is a
>>> >               >>>>>>> pointer
>>> >               >>>>>>> to a whole another LM. This is a practical case
>>> > when you expect an
>>> >               >>>>>>> abrupt
>>> >               >>>>>>> change in model, a clear example being "my phone
>>> > number is..." and
>>> >               >>>>>>> then
>>> >               >>>>>>> you'd expect them rattling a string of digits.
>>> > Is there any support
>>> >               >>>>>>> in kaldi
>>> >               >>>>>>> for this?
>>> >               >>>>>>>
>>> >               >>>>>>> Thanks,
>>> >               >>>>>>>
>>> >               >>>>>>> -kkm
>>> >               >>>>>>>
>>> >               >>>>>>>
>>> >               >>>>>>>
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >               >>>>>>> One dashboard for servers and applications across
>>> >               >>>>>>> Physical-Virtual-Cloud
>>> >               >>>>>>> Widest out-of-the-box monitoring support with
>>> > 50+ applications
>>> >               >>>>>>> Performance metrics, stats and reports that give
>>> > you Actionable
>>> >               >>>>>>> Insights
>>> >               >>>>>>> Deep dive visibility with transaction tracing
>>> using
>>> > APM Insight.
>>> >               >>>>>>>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >>>>>>> _______________________________________________
>>> >               >>>>>>> Kaldi-users mailing list
>>> >               >>>>>>> Kal...@li...
>>> >               >>>>>>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>>>>>
>>> >               >>>>>>
>>> >               >>>>>>
>>> >               >>>>>>
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >               >>>>>> One dashboard for servers and applications across
>>> >               >>>>>> Physical-Virtual-Cloud
>>> >               >>>>>> Widest out-of-the-box monitoring support with 50+
>>> > applications
>>> >               >>>>>> Performance metrics, stats and reports that give
>>> you
>>> > Actionable
>>> >               >>>>>> Insights
>>> >               >>>>>> Deep dive visibility with transaction tracing
>>> using
>>> > APM Insight.
>>> >               >>>>>>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >>>>>> _______________________________________________
>>> >               >>>>>> Kaldi-users mailing list
>>> >               >>>>>> Kal...@li...
>>> >               >>>>>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>>>
>>> >               >>>>
>>> >               >>>>
>>> >               >>>> ------------------------------
>>> >               >>>>
>>> >               >>>> Message: 2
>>> >               >>>> Date: Thu, 21 May 2015 19:24:38 +0000
>>> >               >>>> From: Kirill Katsnelson
>>> > <kir...@sm...>
>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>> >               >>>> To: "dp...@gm..." <dp...@gm...>, Sean
>>> True
>>> >               >>>>   <se...@se...>
>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>> >               >>>>   "kal...@li..."
>>> >               >>>>   <kal...@li...>
>>> >               >>>> Message-ID:
>>> >               >>>>
>>> >               >>>>
>>> > <CY1...@CY...
>>> > l
>>> > ook.com>
>>> >               >>>>
>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>> >               >>>>
>>> >               >>>> Also, from the practical standpoint,
>>> > backoff/discounting weights usually
>>> >               >>>> need to be massaged. Otherwise when the grafted LM
>>> is
>>> > small and the main LM
>>> >               >>>> is large, the little model will tend to shoehorn an
>>> > utterance into itself
>>> >               >>>> rather than let go of it. In my phone number
>>> example,
>>> > everything becomes
>>> >               >>>> digits once the phone number starts.
>>> >               >>>>
>>> >               >>>> -kkm
>>> >               >>>>
>>> >               >>>>> -----Original Message-----
>>> >               >>>>> From: Daniel Povey [mailto:dp...@gm...]
>>> >               >>>>> Sent: 2015-05-21 1204
>>> >               >>>>> To: Sean True
>>> >               >>>>> Cc: Kirill Katsnelson; Nagendra Goel; Hainan Xu;
>>> > kaldi-
>>> >               >>>>> us...@li...
>>> >               >>>>> Subject: Re: [Kaldi-users] LM grafting
>>> >               >>>>>
>>> >               >>>>> The general approach is to create an FST for the
>>> > little language model,
>>> >               >>>>> and then to use fstreplace to replace instances of
>>> a
>>> > particular symbol
>>> >               >>>>> in the top-level language model, with that FST.
>>> >               >>>>> The tricky part is ensuring that the result is
>>> > determinizable after
>>> >               >>>>> composing with the lexicon.  In general our
>>> solution
>>> > is to add special
>>> >               >>>>> disambiguation symbols at the beginning and end of
>>> > each of the sub-
>>> >               >>>>> FSTs, and of course making sure that the sub-FSTs
>>> are
>>> > themselves
>>> >               >>>>> determinizable.
>>> >               >>>>> Dan
>>> >               >>>>>
>>> >               >>>>>
>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>> > <se...@se...>
>>> >               >>>>> wrote:
>>> >               >>>>>> That's a subject of some general interest. Is
>>> there
>>> > a discussion of
>>> >               >>>>>> the general approach that was taken somewhere?
>>> >               >>>>>>
>>> >               >>>>>> -- Sean
>>> >               >>>>>>
>>> >               >>>>>> Sean True
>>> >               >>>>>> Semantic Machines
>>> >               >>>>>>
>>> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>> > <dp...@gm...>
>>> >               >>>>> wrote:
>>> >               >>>>>>>
>>> >               >>>>>>> Nagendra Goel has worked on some example scripts
>>> > for this type of
>>> >               >>>>>>> thing, and with Hainan we were working on trying
>>> to
>>> > get it cleaned
>>> >               >>>>> up
>>> >               >>>>>>> and checked in, but he's going for an internship
>>> so
>>> > it will have to
>>> >               >>>>>>> wait.  But Nagendra might be willing to share it
>>> > with you.
>>> >               >>>>>>> Dan
>>> >               >>>>>>>
>>> >               >>>>>>>
>>> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>> Katsnelson
>>> >               >>>>>>> <kir...@sm...> wrote:
>>> >               >>>>>>>> Suppose I have a language model where one token
>>> (a
>>> > "word") is a
>>> >               >>>>>>>> pointer to a whole another LM. This is a
>>> practical
>>> > case when you
>>> >               >>>>>>>> expect an abrupt change in model, a clear
>>> example
>>> > being "my phone
>>> >               >>>>>>>> number is..." and then you'd expect them
>>> rattling
>>> > a string of
>>> >               >>>>>>>> digits. Is there any support in kaldi for this?
>>> >               >>>>>>>>
>>> >               >>>>>>>> Thanks,
>>> >               >>>>>>>>
>>> >               >>>>>>>> -kkm
>>> >               >>>>>>>>
>>> >               >>>>>>>>
>>> > ------------------------------------------------------------------
>>> >               >>>>> -
>>> >               >>>>>>>> ----------- One dashboard for servers and
>>> > applications across
>>> >               >>>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
>>> > monitoring support
>>> >               >>>>>>>> with 50+ applications Performance metrics, stats
>>> > and reports that
>>> >               >>>>>>>> give you Actionable Insights Deep dive
>>> visibility
>>> > with transaction
>>> >               >>>>>>>> tracing using APM Insight.
>>> >               >>>>>>>>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >>>>>>>> _______________________________________________
>>> >               >>>>>>>> Kaldi-users mailing list
>>> >               >>>>>>>> Kal...@li...
>>> >               >>>>>>>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>>>>>>
>>> >               >>>>>>>
>>> >               >>>>>>>
>>> > --------------------------------------------------------------------
>>> >               >>>>> -
>>> >               >>>>>>> --------- One dashboard for servers and
>>> > applications across
>>> >               >>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
>>> > monitoring support with
>>> >               >>>>>>> 50+ applications Performance metrics, stats and
>>> > reports that give
>>> >               >>>>> you
>>> >               >>>>>>> Actionable Insights Deep dive visibility with
>>> > transaction tracing
>>> >               >>>>>>> using APM Insight.
>>> >               >>>>>>>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >>>>>>> _______________________________________________
>>> >               >>>>>>> Kaldi-users mailing list
>>> >               >>>>>>> Kal...@li...
>>> >               >>>>>>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>>>
>>> >               >>>> ------------------------------
>>> >               >>>>
>>> >               >>>> Message: 3
>>> >               >>>> Date: Thu, 21 May 2015 15:29:54 -0400
>>> >               >>>> From: Hainan Xu <hai...@gm...>
>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>> >               >>>> To: Daniel Povey <dp...@gm...>
>>> >               >>>> Cc: Sean True <se...@se...>,
>>> >               >>>>   "kal...@li..."
>>> >               >>>>   <kal...@li...>,    Kirill
>>> > Katsnelson
>>> >               >>>>   <kir...@sm...>
>>> >               >>>> Message-ID:
>>> >               >>>>
>>> > <CAL...@ma...>
>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>> >               >>>>
>>> >               >>>> There is a paper in ICASSP 2015 that described some
>>> > very similar idea:
>>> >               >>>>
>>> >               >>>> Improved recognition of contact names in voice
>>> > commands
>>> >               >>>>
>>> >               >>>>> On Thu, May 21, 2015 at 3:04 PM, Daniel Povey
>>> > <dp...@gm...> wrote:
>>> >               >>>>>
>>> >               >>>>> The general approach is to create an FST for the
>>> > little language
>>> >               >>>>> model, and then to use fstreplace to replace
>>> > instances of a particular
>>> >               >>>>> symbol in the top-level language model, with that
>>> > FST.
>>> >               >>>>> The tricky part is ensuring that the result is
>>> > determinizable after
>>> >               >>>>> composing with the lexicon.  In general our
>>> solution
>>> > is to add special
>>> >               >>>>> disambiguation symbols at the beginning and end of
>>> > each of the
>>> >               >>>>> sub-FSTs, and of course making sure that the
>>> sub-FSTs
>>> > are themselves
>>> >               >>>>> determinizable.
>>> >               >>>>> Dan
>>> >               >>>>>
>>> >               >>>>>
>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>> > <se...@se...>
>>> >               >>>>> wrote:
>>> >               >>>>>> That's a subject of some general interest. Is
>>> there
>>> > a discussion of
>>> >               >>>>>> the
>>> >               >>>>>> general approach that was taken somewhere?
>>> >               >>>>>>
>>> >               >>>>>> -- Sean
>>> >               >>>>>>
>>> >               >>>>>> Sean True
>>> >               >>>>>> Semantic Machines
>>> >               >>>>>>
>>> >               >>>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>> > <dp...@gm...>
>>> >               >>>>>>> wrote:
>>> >               >>>>>>>
>>> >               >>>>>>> Nagendra Goel has worked on some example scripts
>>> > for this type of
>>> >               >>>>>>> thing, and with Hainan we were working on trying
>>> to
>>> > get it cleaned up
>>> >               >>>>>>> and checked in, but he's going for an internship
>>> so
>>> > it will have to
>>> >               >>>>>>> wait.  But Nagendra might be willing to share it
>>> > with you.
>>> >               >>>>>>> Dan
>>> >               >>>>>>>
>>> >               >>>>>>>
>>> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>> Katsnelson
>>> >               >>>>>>> <kir...@sm...> wrote:
>>> >               >>>>>>>> Suppose I have a language model where one token
>>> (a
>>> > "word") is a
>>> >               >>>>> pointer
>>> >               >>>>>>>> to a whole another LM. This is a practical case
>>> > when you expect an
>>> >               >>>>> abrupt
>>> >               >>>>>>>> change in model, a clear example being "my phone
>>> > number is..." and
>>> >               >>>>> then
>>> >               >>>>>>>> you'd expect them rattling a string of digits.
>>> > Is there any support
>>> >               >>>>> in kaldi
>>> >               >>>>>>>> for this?
>>> >               >>>>>>>>
>>> >               >>>>>>>> Thanks,
>>> >               >>>>>>>>
>>> >               >>>>>>>> -kkm
>>> >               >>>>>
>>> >               >>>>>
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >               >>>>>>>> One dashboard for servers and applications
>>> across
>>> >               >>>>> Physical-Virtual-Cloud
>>> >               >>>>>>>> Widest out-of-the-box monitoring support with
>>> > 50+ applications
>>> >               >>>>>>>> Performance metrics, stats and reports that give
>>> > you Actionable
>>> >               >>>>> Insights
>>> >               >>>>>>>> Deep dive visibility with transaction tracing
>>> > using APM Insight.
>>> >               >>>>>>>>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >>>>>>>> _______________________________________________
>>> >               >>>>>>>> Kaldi-users mailing list
>>> >               >>>>>>>> Kal...@li...
>>> >               >>>>>>>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>>>>
>>> >               >>>>>
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >               >>>>>>> One dashboard for servers and applications across
>>> >               >>>>>>> Physical-Virtual-Cloud
>>> >               >>>>>>> Widest out-of-the-box monitoring support with
>>> > 50+ applications
>>> >               >>>>>>> Performance metrics, stats and reports that give
>>> > you Actionable
>>> >               >>>>>>> Insights
>>> >               >>>>>>> Deep dive visibility with transaction tracing
>>> using
>>> > APM Insight.
>>> >               >>>>>>>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >>>>>>> _______________________________________________
>>> >               >>>>>>> Kaldi-users mailing list
>>> >               >>>>>>> Kal...@li...
>>> >               >>>>>>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>>>
>>> >               >>>>
>>> >               >>>>
>>> >               >>>> --
>>> >               >>>> - Hainan
>>> >               >>>> -------------- next part --------------
>>> >               >>>> An HTML attachment was scrubbed...
>>> >               >>>>
>>> >               >>>> ------------------------------
>>> >               >>>>
>>> >               >>>> Message: 4
>>> >               >>>> Date: Thu, 21 May 2015 15:01:51 -0400
>>> >               >>>> From: Sean True <se...@se...>
>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>> >               >>>> To: Daniel Povey <dp...@gm...>
>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>> >               >>>>   "kal...@li..."
>>> >               >>>>   <kal...@li...>,    Kirill
>>> > Katsnelson
>>> >               >>>>   <kir...@sm...>
>>> >               >>>> Message-ID:
>>> >               >>>>
>>> > <CAL...@ma...>
>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>> >               >>>>
>>> >               >>>> That's a subject of some general interest. Is there
>>> a
>>> > discussion of the
>>> >               >>>> general approach that was taken somewhere?
>>> >               >>>>
>>> >               >>>> -- Sean
>>> >               >>>>
>>> >               >>>> Sean True
>>> >               >>>> Semantic Machines
>>> >               >>>>
>>> >               >>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>> > <dp...@gm...> wrote:
>>> >               >>>>>
>>> >               >>>>> Nagendra Goel has worked on some example scripts
>>> for
>>> > this type of
>>> >               >>>>> thing, and with Hainan we were working on trying to
>>> > get it cleaned up
>>> >               >>>>> and checked in, but he's going for an internship so
>>> > it will have to
>>> >               >>>>> wait.  But Nagendra might be willing to share it
>>> with
>>> > you.
>>> >               >>>>> Dan
>>> >               >>>>>
>>> >               >>>>>
>>> >               >>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
>>> >               >>>>> <kir...@sm...> wrote:
>>> >               >>>>>> Suppose I have a language model where one token (a
>>> > "word") is a
>>> >               >>>>>> pointer
>>> >               >>>>> to a whole another LM. This is a practical case
>>> when
>>> > you expect an
>>> >               >>>>> abrupt
>>> >               >>>>> change in model, a clear example being "my phone
>>> > number is..." and then
>>> >               >>>>> you'd expect them rattling a string of digits. Is
>>> > there any support in
>>> >               >>>>> kaldi for this?
>>> >               >>>>>>
>>> >               >>>>>> Thanks,
>>> >               >>>>>>
>>> >               >>>>>> -kkm
>>> >               >>>>>
>>> >               >>>>>
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >               >>>>>> One dashboard for servers and applications across
>>> >               >>>>>> Physical-Virtual-Cloud
>>> >               >>>>>> Widest out-of-the-box monitoring support with 50+
>>> > applications
>>> >               >>>>>> Performance metrics, stats and reports that give
>>> you
>>> > Actionable
>>> >               >>>>>> Insights
>>> >               >>>>>> Deep dive visibility with transaction tracing
>>> using
>>> > APM Insight.
>>> >               >>>>>>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >>>>>> _______________________________________________
>>> >               >>>>>> Kaldi-users mailing list
>>> >               >>>>>> Kal...@li...
>>> >               >>>>>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>>>>
>>> >               >>>>>
>>> >               >>>>>
>>> >               >>>>>
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >               >>>>> One dashboard for servers and applications across
>>> >               >>>>> Physical-Virtual-Cloud
>>> >               >>>>> Widest out-of-the-box monitoring support with 50+
>>> > applications
>>> >               >>>>> Performance metrics, stats and reports that give
>>> you
>>> > Actionable
>>> >               >>>>> Insights
>>> >               >>>>> Deep dive visibility with transaction tracing using
>>> > APM Insight.
>>> >               >>>>>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >>>>> _______________________________________________
>>> >               >>>>> Kaldi-users mailing list
>>> >               >>>>> Kal...@li...
>>> >               >>>>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>>> -------------- next part --------------
>>> >               >>>> An HTML attachment was scrubbed...
>>> >               >>>>
>>> >               >>>> ------------------------------
>>> >               >>>>
>>> >               >>>>
>>> >               >>>>
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >               >>>> One dashboard for servers and applications across
>>> > Physical-Virtual-Cloud
>>> >               >>>> Widest out-of-the-box monitoring support with 50+
>>> > applications
>>> >               >>>> Performance metrics, stats and reports that give you
>>> > Actionable Insights
>>> >               >>>> Deep dive visibility with transaction tracing using
>>> > APM Insight.
>>> >               >>>>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >>>>
>>> >               >>>> ------------------------------
>>> >               >>>>
>>> >               >>>> _______________________________________________
>>> >               >>>> Kaldi-users mailing list
>>> >               >>>> Kal...@li...
>>> >               >>>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>>>
>>> >               >>>>
>>> >               >>>> End of Kaldi-users Digest, Vol 29, Issue 15
>>> >               >>>> *******************************************
>>> >               >>>
>>> >               >>>
>>> >               >>>
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >               >>> One dashboard for servers and applications across
>>> > Physical-Virtual-Cloud
>>> >               >>> Widest out-of-the-box monitoring support with 50+
>>> > applications
>>> >               >>> Performance metrics, stats and reports that give you
>>> > Actionable Insights
>>> >               >>> Deep dive visibility with transaction tracing using
>>> APM
>>> > Insight.
>>> >               >>>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >>> _______________________________________________
>>> >               >>> Kaldi-users mailing list
>>> >               >>> Kal...@li...
>>> >               >>>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>
>>> >               >>
>>> >               >>
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >               >> One dashboard for servers and applications across
>>> > Physical-Virtual-Cloud
>>> >               >> Widest out-of-the-box monitoring support with 50+
>>> > applications
>>> >               >> Performance metrics, stats and reports that give you
>>> > Actionable Insights
>>> >               >> Deep dive visibility with transaction tracing using
>>> APM
>>> > Insight.
>>> >               >>
>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               >> _______________________________________________
>>> >               >> Kaldi-users mailing list
>>> >               >> Kal...@li...
>>> >               >>
>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >               >>
>>> >
>>> >
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >               One dashboard for servers and applications across
>>> Physical-
>>> > Virtual-Cloud
>>> >               Widest out-of-the-box monitoring support with 50+
>>> > applications
>>> >               Performance metrics, stats and reports that give you
>>> > Actionable Insights
>>> >               Deep dive visibility with transaction tracing using APM
>>> > Insight.
>>> >               http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >               _______________________________________________
>>> >               Kaldi-users mailing list
>>> >               Kal...@li...
>>> >               https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > -----------------------------------------------------------------------
>>> > -
>>> > ------
>>> >       One dashboard for servers and applications across Physical-
>>> > Virtual-Cloud
>>> >       Widest out-of-the-box monitoring support with 50+ applications
>>> >       Performance metrics, stats and reports that give you Actionable
>>> > Insights
>>> >       Deep dive visibility with transaction tracing using APM Insight.
>>> >       http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> >       _______________________________________________
>>> >       Kaldi-users mailing list
>>> >       Kal...@li...
>>> >       https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>> >
>>> >
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> One dashboard for servers and applications across Physical-Virtual-Cloud
>>> Widest out-of-the-box monitoring support with 50+ applications
>>> Performance metrics, stats and reports that give you Actionable Insights
>>> Deep dive visibility with transaction tracing using APM Insight.
>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>> _______________________________________________
>>> Kaldi-users mailing list
>>> Kal...@li...
>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>
>>
>>
>>
>> --
>> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
>> ond...@gm...
>>
>>
>> ------------------------------------------------------------------------------
>>
>> _______________________________________________
>> Kaldi-users mailing list
>> Kal...@li...
>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>
>>
>


-- 
Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, ond...@gm...

Re: [Kaldi-users] non-cluster usage of Librispeech s5 recipe?

From: Kirill K. <kir...@sm...> - 2015-06-17 17:53:33

> From: David Warde-Farley [mailto:d.w...@gm...]
> Sent: 2015-06-17 0028
> Subject: Re: [Kaldi-users] non-cluster usage of Librispeech s5 recipe?
> 
> Many thanks for the pointers. On your setup, how long does the entire
> recipe take without decoding?

A few hours to train the tri5 model (10 to 15 hours I guess, on a 6-core CPU), then maybe 4-5 days to train the nnet2 on the 460 hour data set on the 980 GPU board. I did not go any further than that. Guess it would take at least twice that time to process the 1000 hour set.

> For the life of me I can't figure out where num_jobs_nnet is being set
> (it's being written in the egs_dir as 4, I've changed it everywhere I
> could find it.)

I did not have to change anything in this regard, except for the number of jobs argument to train_multisplice_accel2 in run_nnet2_ms.sh. What file the number of jobs was saved into? Some steps rely on the number of jobs in previous steps. Sometimes the number of jobs sticks in the file which is not recreated. It may be easier to start clean.

Do you run the discriminative training script (run_nnet2_ms_disc.sh)? I did not. 

 -kkm

> On Fri, Jun 12, 2015 at 7:00 PM, Kirill Katsnelson
> <kir...@sm...> wrote:
> >> From: David Warde-Farley [mailto:d.w...@gm...]
> >> Subject: [Kaldi-users] non-cluster usage of Librispeech s5 recipe?
> >>
> >> I'm trying to
> >> use the s5 recipe for LibriSpeech on a single machine with a single
> >> GPU. I've modified cmd.sh to use run.pl.
> >
> > I ran it on a single machine, it requires a few modifications. Note
> that it took almost a week on a 6-core 4.1GHz overclocked i7-5930K CPU
> and GeForce 980 to train on the 500 hour set.
> >
> >> After about a day, I see a lot of background processes like
> >> gmm-latgen- faster, lattice-add-penalty, lattice-scale, etc. that
> >> have been launched in the background (the terminal is actually free,
> >> which suggests the run.sh script has terminated...). I'm not totally
> >> sure what's going on, or how to find out.
> >
> > In librispeech/s5/run.sh, look for decode commands in subshells, like
> >
> > (
> >    utils/mkgraph.sh data/lang_nosp_test_tgsmall \
> >      exp/tri4b exp/tri4b/graph_nosp_tgsmall || exit 1;
> >   for test in test_clean test_other dev_clean dev_other; do
> >     steps/decode_fmllr.sh --nj 20 --cmd "$decode_cmd" \
> >   . . .
> > )&
> >
> > These decodes are quite slow, if you run them on your machine. They
> are slower than other part of the script. In the end, they are
> accumulating, eating CPU and blowing up out of memory. They are not
> essential for NN training, except possibly for the mkgraph script. The
> results are useful to check if you are getting expected WER, but really
> not essential. You may either disable these decode blocks completely
> (except mkgraph invocations) or remove the '&' at the end to run them
> synchronously. NB they will take the most preparation time prior to NN
> training step. Dunno about your machine but give it an extra couple
> days to complete with these.
> >
> >> One thing I noticed earlier is that the script was trying to spawn
> >> multiple GPU jobs, but this GPU is configured (by administrators) to
> >> permit at most one CUDA process, and so I saw "3 of 4 jobs failed"
> >> messages. Would these jobs have been retried?
> >
> > They will not, but you can restart NN training from the last step.
> Modify local/online/run_nnet2_ms.sh so that
> steps/nnet2/train_multisplice_accel2.sh is invoked with switches "--
> num-jobs-initial 1 --num-jobs-final 1" (the defaults are larger). When
> running local/online/run_nnet2_ms.sh, pass it "--stage 7" (this is the
> default) and "--train_stage N" the number of iteration you are
> restarting from.
> >
> > Even if not the 1 job limit, you probably won't benefit from running
> more than 1 at a time.
> >
> >  -kkm
> 
> 
> On Fri, Jun 12, 2015 at 4:00 PM, Kirill Katsnelson
> <kir...@sm...> wrote:
> >> From: David Warde-Farley [mailto:d.w...@gm...]
> >> Subject: [Kaldi-users] non-cluster usage of Librispeech s5 recipe?
> >>
> >> I'm trying to
> >> use the s5 recipe for LibriSpeech on a single machine with a single
> >> GPU. I've modified cmd.sh to use run.pl.
> >
> > I ran it on a single machine, it requires a few modifications. Note
> that it took almost a week on a 6-core 4.1GHz overclocked i7-5930K CPU
> and GeForce 980 to train on the 500 hour set.
> >
> >> After about a day, I see a lot of background processes like
> >> gmm-latgen- faster, lattice-add-penalty, lattice-scale, etc. that
> >> have been launched in the background (the terminal is actually free,
> >> which suggests the run.sh script has terminated...). I'm not totally
> >> sure what's going on, or how to find out.
> >
> > In librispeech/s5/run.sh, look for decode commands in subshells, like
> >
> > (
> >    utils/mkgraph.sh data/lang_nosp_test_tgsmall \
> >      exp/tri4b exp/tri4b/graph_nosp_tgsmall || exit 1;
> >   for test in test_clean test_other dev_clean dev_other; do
> >     steps/decode_fmllr.sh --nj 20 --cmd "$decode_cmd" \
> >   . . .
> > )&
> >
> > These decodes are quite slow, if you run them on your machine. They
> are slower than other part of the script. In the end, they are
> accumulating, eating CPU and blowing up out of memory. They are not
> essential for NN training, except possibly for the mkgraph script. The
> results are useful to check if you are getting expected WER, but really
> not essential. You may either disable these decode blocks completely
> (except mkgraph invocations) or remove the '&' at the end to run them
> synchronously. NB they will take the most preparation time prior to NN
> training step. Dunno about your machine but give it an extra couple
> days to complete with these.
> >
> >> One thing I noticed earlier is that the script was trying to spawn
> >> multiple GPU jobs, but this GPU is configured (by administrators) to
> >> permit at most one CUDA process, and so I saw "3 of 4 jobs failed"
> >> messages. Would these jobs have been retried?
> >
> > They will not, but you can restart NN training from the last step.
> Modify local/online/run_nnet2_ms.sh so that
> steps/nnet2/train_multisplice_accel2.sh is invoked with switches "--
> num-jobs-initial 1 --num-jobs-final 1" (the defaults are larger). When
> running local/online/run_nnet2_ms.sh, pass it "--stage 7" (this is the
> default) and "--train_stage N" the number of iteration you are
> restarting from.
> >
> > Even if not the 1 job limit, you probably won't benefit from running
> more than 1 at a time.
> >
> >  -kkm

Re: [Kaldi-users] LM grafting

From: Ondrej P. <ond...@gm...> - 2015-06-17 07:42:28

Dear all,

thanks to reminder of Dimitris, I realized that the Vystadial dataset is
very convenient for Class based LM/ LM grafting.
As the scripts for Vystadial Cs & En are already in Kaldi it may be
convenient starting data because
they contain transcription of user utterances from communication with
spoken dialogue system where we have the classes defined.

See scritps:
https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_en
https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_cz

See data (scroll to the bottom to download the datasets):
http://hdl.handle.net/11858/00-097C-0000-0023-4671-4  (en)
http://hdl.handle.net/11858/00-097C-0000-0023-4670-6 (cs)


We can probably recreate / find the list of words in the classes for
English if there is interest.
For Czech this should be no problem at all.

Please, let me know if you are interested in these datasets and the lists
of classes and their members.

Ondra

PS: Currently, we used classed based (CB) LM which we later expand to full
LM in arpa format than create G.fst as in standard use case.
It is not optimal attitude but it works for us.
If you want to know how we are modeling the  CBLM just let me know, I am
working on slight improvement of it right now,
so I am interested in improving it.


On Tue, May 26, 2015 at 8:11 PM, Kirill Katsnelson <
kir...@sm...> wrote:

> Speaking about data set preprocessing only, will Stanford NLP POS tagger
> pull the trick?
>
>  -kkm
>
> > -----Original Message-----
> > From: Nagendra Goel [mailto:nag...@go...]
> > Sent: 2015-05-24 1511
> > To: Matthew Aylett
> > Cc: Dimitris Vassos; kal...@li...
> > Subject: Re: [Kaldi-users] LM grafting
> >
> > A systematic way for identifying special elements in text will be very
> > useful. Currently  NSW-EXPAND from festival conflicts with this sub-
> > grammar approach although otherwise it's a good lm pre-processing step.
> >
> > Nagendra Kumar Goel
> >
> > On May 24, 2015 4:45 PM, "Matthew Aylett" <mat...@gm...>
> > wrote:
> >
> >
> >       Not sure if this is relevant to this thread. But in the speech
> > synthesis system branch we have a very early text normaliser which
> > (when
> > complete) will detect things like phone numbers addresses, currencies
> > etc. The output form this could then be used to inform language model
> > building. Currently it deals with symbols and tokenisations in English.
> >
> >       Potentially `(although I wasn't currently planning on this), the
> > text normaliser could be written in thrax - based on openfst - authored
> > by Richard Sproat I believe). However if this approach would benefit
> > ASR as well then it might be worth doing it this way rather than my
> > plan of a simple greedy normaliser.
> >
> >
> >       v best
> >
> >       Matthew Aylett
> >
> >
> >       On Sun, May 24, 2015 at 8:34 AM, Dimitris Vassos
> > <dva...@gm...> wrote:
> >
> >
> >               We have access to several corpora and we are trying to put
> > together something appropriate.
> >
> >               In the next couple of days, we will also volunteer a server
> > to set it all up and run the tests.
> >
> >               Dimitris
> >
> >               > On 24 Μαΐ 2015, at 02:06, Daniel Povey <dp...@gm...
> >
> > wrote:
> >               >
> >               > One possibility is to use a completely open-source setup,
> > e.g.
> >               > Voxforge, and forget about the "has a clear advantage"
> > requirement.
> >               > E.g. target anything that looks like a year, and make a
> > grammar for
> >               > years.
> >               > Dan
> >               >
> >               >
> >               > On Fri, May 22, 2015 at 6:32 AM, Nagendra Goel
> >               > <nag...@go...> wrote:
> >               >> Since I cannot volunteer my enviornment, do you
> > recommend another
> >               >> enviornment  where this can be prototyped and where you
> > can check in some
> >               >> class lm recipe that has advantage.
> >               >>
> >               >> Nagendra
> >               >>
> >               >> Nagendra Kumar Goel
> >               >>
> >               >>> On May 21, 2015 11:01 PM, "Dimitris Vassos"
> > <dva...@gm...> wrote:
> >               >>>
> >               >>> +1 for the class-based LMs. I have also been interested
> > in this
> >               >>> functionality for some time now, so will be more than
> > happy to try out the
> >               >>> current implementation, if possible.
> >               >>>
> >               >>> Thanks
> >               >>> Dimitris
> >               >>>
> >               >>>> On 22 Μαΐ 2015, at 01:34,
> > kal...@li...
> >               >>>> wrote:
> >               >>>>
> >               >>>> Send Kaldi-users mailing list submissions to
> >               >>>>   kal...@li...
> >               >>>>
> >               >>>> To subscribe or unsubscribe via the World Wide Web,
> > visit
> >               >>>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>>> or, via email, send a message with subject or body
> > 'help' to
> >               >>>>   kal...@li...
> >               >>>>
> >               >>>> You can reach the person managing the list at
> >               >>>>   kal...@li...
> >               >>>>
> >               >>>> When replying, please edit your Subject line so it is
> > more specific
> >               >>>> than "Re: Contents of Kaldi-users digest..."
> >               >>>>
> >               >>>>
> >               >>>> Today's Topics:
> >               >>>>
> >               >>>>  1. Re: LM grafting (Daniel Povey)
> >               >>>>  2. Re: LM grafting (Kirill Katsnelson)
> >               >>>>  3. Re: LM grafting (Hainan Xu)
> >               >>>>  4. Re: LM grafting (Sean True)
> >               >>>>
> >               >>>>
> >               >>>>
> > ----------------------------------------------------------------------
> >               >>>>
> >               >>>> Message: 1
> >               >>>> Date: Thu, 21 May 2015 15:04:04 -0400
> >               >>>> From: Daniel Povey <dp...@gm...>
> >               >>>> Subject: Re: [Kaldi-users] LM grafting
> >               >>>> To: Sean True <se...@se...>
> >               >>>> Cc: Hainan Xu <hai...@gm...>,
> >               >>>>   "kal...@li..."
> >               >>>>   <kal...@li...>,    Kirill
> > Katsnelson
> >               >>>>   <kir...@sm...>
> >               >>>> Message-ID:
> >               >>>>
> > <CAE...@ma...
> > <mailto:k4YJVsBiAfEuFDFMvY%2B...@ma...> >
> >               >>>> Content-Type: text/plain; charset=UTF-8
> >               >>>>
> >               >>>> The general approach is to create an FST for the
> > little language
> >               >>>> model, and then to use fstreplace to replace instances
> > of a particular
> >               >>>> symbol in the top-level language model, with that FST.
> >               >>>> The tricky part is ensuring that the result is
> > determinizable after
> >               >>>> composing with the lexicon.  In general our solution
> > is to add special
> >               >>>> disambiguation symbols at the beginning and end of
> > each of the
> >               >>>> sub-FSTs, and of course making sure that the sub-FSTs
> > are themselves
> >               >>>> determinizable.
> >               >>>> Dan
> >               >>>>
> >               >>>>
> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
> > <se...@se...>
> >               >>>>> wrote:
> >               >>>>> That's a subject of some general interest. Is there a
> > discussion of the
> >               >>>>> general approach that was taken somewhere?
> >               >>>>>
> >               >>>>> -- Sean
> >               >>>>>
> >               >>>>> Sean True
> >               >>>>> Semantic Machines
> >               >>>>>
> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
> > <dp...@gm...>
> >               >>>>>> wrote:
> >               >>>>>>
> >               >>>>>> Nagendra Goel has worked on some example scripts for
> > this type of
> >               >>>>>> thing, and with Hainan we were working on trying to
> > get it cleaned up
> >               >>>>>> and checked in, but he's going for an internship so
> > it will have to
> >               >>>>>> wait.  But Nagendra might be willing to share it
> > with you.
> >               >>>>>> Dan
> >               >>>>>>
> >               >>>>>>
> >               >>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
> >               >>>>>> <kir...@sm...> wrote:
> >               >>>>>>> Suppose I have a language model where one token (a
> > "word") is a
> >               >>>>>>> pointer
> >               >>>>>>> to a whole another LM. This is a practical case
> > when you expect an
> >               >>>>>>> abrupt
> >               >>>>>>> change in model, a clear example being "my phone
> > number is..." and
> >               >>>>>>> then
> >               >>>>>>> you'd expect them rattling a string of digits.
> > Is there any support
> >               >>>>>>> in kaldi
> >               >>>>>>> for this?
> >               >>>>>>>
> >               >>>>>>> Thanks,
> >               >>>>>>>
> >               >>>>>>> -kkm
> >               >>>>>>>
> >               >>>>>>>
> >               >>>>>>>
> > -----------------------------------------------------------------------
> > -
> > ------
> >               >>>>>>> One dashboard for servers and applications across
> >               >>>>>>> Physical-Virtual-Cloud
> >               >>>>>>> Widest out-of-the-box monitoring support with
> > 50+ applications
> >               >>>>>>> Performance metrics, stats and reports that give
> > you Actionable
> >               >>>>>>> Insights
> >               >>>>>>> Deep dive visibility with transaction tracing using
> > APM Insight.
> >               >>>>>>>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >>>>>>> _______________________________________________
> >               >>>>>>> Kaldi-users mailing list
> >               >>>>>>> Kal...@li...
> >               >>>>>>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>>>>>
> >               >>>>>>
> >               >>>>>>
> >               >>>>>>
> > -----------------------------------------------------------------------
> > -
> > ------
> >               >>>>>> One dashboard for servers and applications across
> >               >>>>>> Physical-Virtual-Cloud
> >               >>>>>> Widest out-of-the-box monitoring support with 50+
> > applications
> >               >>>>>> Performance metrics, stats and reports that give you
> > Actionable
> >               >>>>>> Insights
> >               >>>>>> Deep dive visibility with transaction tracing using
> > APM Insight.
> >               >>>>>>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >>>>>> _______________________________________________
> >               >>>>>> Kaldi-users mailing list
> >               >>>>>> Kal...@li...
> >               >>>>>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>>>
> >               >>>>
> >               >>>>
> >               >>>> ------------------------------
> >               >>>>
> >               >>>> Message: 2
> >               >>>> Date: Thu, 21 May 2015 19:24:38 +0000
> >               >>>> From: Kirill Katsnelson
> > <kir...@sm...>
> >               >>>> Subject: Re: [Kaldi-users] LM grafting
> >               >>>> To: "dp...@gm..." <dp...@gm...>, Sean True
> >               >>>>   <se...@se...>
> >               >>>> Cc: Hainan Xu <hai...@gm...>,
> >               >>>>   "kal...@li..."
> >               >>>>   <kal...@li...>
> >               >>>> Message-ID:
> >               >>>>
> >               >>>>
> > <CY1...@CY...
> > l
> > ook.com>
> >               >>>>
> >               >>>> Content-Type: text/plain; charset="utf-8"
> >               >>>>
> >               >>>> Also, from the practical standpoint,
> > backoff/discounting weights usually
> >               >>>> need to be massaged. Otherwise when the grafted LM is
> > small and the main LM
> >               >>>> is large, the little model will tend to shoehorn an
> > utterance into itself
> >               >>>> rather than let go of it. In my phone number example,
> > everything becomes
> >               >>>> digits once the phone number starts.
> >               >>>>
> >               >>>> -kkm
> >               >>>>
> >               >>>>> -----Original Message-----
> >               >>>>> From: Daniel Povey [mailto:dp...@gm...]
> >               >>>>> Sent: 2015-05-21 1204
> >               >>>>> To: Sean True
> >               >>>>> Cc: Kirill Katsnelson; Nagendra Goel; Hainan Xu;
> > kaldi-
> >               >>>>> us...@li...
> >               >>>>> Subject: Re: [Kaldi-users] LM grafting
> >               >>>>>
> >               >>>>> The general approach is to create an FST for the
> > little language model,
> >               >>>>> and then to use fstreplace to replace instances of a
> > particular symbol
> >               >>>>> in the top-level language model, with that FST.
> >               >>>>> The tricky part is ensuring that the result is
> > determinizable after
> >               >>>>> composing with the lexicon.  In general our solution
> > is to add special
> >               >>>>> disambiguation symbols at the beginning and end of
> > each of the sub-
> >               >>>>> FSTs, and of course making sure that the sub-FSTs are
> > themselves
> >               >>>>> determinizable.
> >               >>>>> Dan
> >               >>>>>
> >               >>>>>
> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
> > <se...@se...>
> >               >>>>> wrote:
> >               >>>>>> That's a subject of some general interest. Is there
> > a discussion of
> >               >>>>>> the general approach that was taken somewhere?
> >               >>>>>>
> >               >>>>>> -- Sean
> >               >>>>>>
> >               >>>>>> Sean True
> >               >>>>>> Semantic Machines
> >               >>>>>>
> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
> > <dp...@gm...>
> >               >>>>> wrote:
> >               >>>>>>>
> >               >>>>>>> Nagendra Goel has worked on some example scripts
> > for this type of
> >               >>>>>>> thing, and with Hainan we were working on trying to
> > get it cleaned
> >               >>>>> up
> >               >>>>>>> and checked in, but he's going for an internship so
> > it will have to
> >               >>>>>>> wait.  But Nagendra might be willing to share it
> > with you.
> >               >>>>>>> Dan
> >               >>>>>>>
> >               >>>>>>>
> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
> >               >>>>>>> <kir...@sm...> wrote:
> >               >>>>>>>> Suppose I have a language model where one token (a
> > "word") is a
> >               >>>>>>>> pointer to a whole another LM. This is a practical
> > case when you
> >               >>>>>>>> expect an abrupt change in model, a clear example
> > being "my phone
> >               >>>>>>>> number is..." and then you'd expect them rattling
> > a string of
> >               >>>>>>>> digits. Is there any support in kaldi for this?
> >               >>>>>>>>
> >               >>>>>>>> Thanks,
> >               >>>>>>>>
> >               >>>>>>>> -kkm
> >               >>>>>>>>
> >               >>>>>>>>
> > ------------------------------------------------------------------
> >               >>>>> -
> >               >>>>>>>> ----------- One dashboard for servers and
> > applications across
> >               >>>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
> > monitoring support
> >               >>>>>>>> with 50+ applications Performance metrics, stats
> > and reports that
> >               >>>>>>>> give you Actionable Insights Deep dive visibility
> > with transaction
> >               >>>>>>>> tracing using APM Insight.
> >               >>>>>>>>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >>>>>>>> _______________________________________________
> >               >>>>>>>> Kaldi-users mailing list
> >               >>>>>>>> Kal...@li...
> >               >>>>>>>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>>>>>>
> >               >>>>>>>
> >               >>>>>>>
> > --------------------------------------------------------------------
> >               >>>>> -
> >               >>>>>>> --------- One dashboard for servers and
> > applications across
> >               >>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
> > monitoring support with
> >               >>>>>>> 50+ applications Performance metrics, stats and
> > reports that give
> >               >>>>> you
> >               >>>>>>> Actionable Insights Deep dive visibility with
> > transaction tracing
> >               >>>>>>> using APM Insight.
> >               >>>>>>>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >>>>>>> _______________________________________________
> >               >>>>>>> Kaldi-users mailing list
> >               >>>>>>> Kal...@li...
> >               >>>>>>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>>>
> >               >>>> ------------------------------
> >               >>>>
> >               >>>> Message: 3
> >               >>>> Date: Thu, 21 May 2015 15:29:54 -0400
> >               >>>> From: Hainan Xu <hai...@gm...>
> >               >>>> Subject: Re: [Kaldi-users] LM grafting
> >               >>>> To: Daniel Povey <dp...@gm...>
> >               >>>> Cc: Sean True <se...@se...>,
> >               >>>>   "kal...@li..."
> >               >>>>   <kal...@li...>,    Kirill
> > Katsnelson
> >               >>>>   <kir...@sm...>
> >               >>>> Message-ID:
> >               >>>>
> > <CAL...@ma...>
> >               >>>> Content-Type: text/plain; charset="utf-8"
> >               >>>>
> >               >>>> There is a paper in ICASSP 2015 that described some
> > very similar idea:
> >               >>>>
> >               >>>> Improved recognition of contact names in voice
> > commands
> >               >>>>
> >               >>>>> On Thu, May 21, 2015 at 3:04 PM, Daniel Povey
> > <dp...@gm...> wrote:
> >               >>>>>
> >               >>>>> The general approach is to create an FST for the
> > little language
> >               >>>>> model, and then to use fstreplace to replace
> > instances of a particular
> >               >>>>> symbol in the top-level language model, with that
> > FST.
> >               >>>>> The tricky part is ensuring that the result is
> > determinizable after
> >               >>>>> composing with the lexicon.  In general our solution
> > is to add special
> >               >>>>> disambiguation symbols at the beginning and end of
> > each of the
> >               >>>>> sub-FSTs, and of course making sure that the sub-FSTs
> > are themselves
> >               >>>>> determinizable.
> >               >>>>> Dan
> >               >>>>>
> >               >>>>>
> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
> > <se...@se...>
> >               >>>>> wrote:
> >               >>>>>> That's a subject of some general interest. Is there
> > a discussion of
> >               >>>>>> the
> >               >>>>>> general approach that was taken somewhere?
> >               >>>>>>
> >               >>>>>> -- Sean
> >               >>>>>>
> >               >>>>>> Sean True
> >               >>>>>> Semantic Machines
> >               >>>>>>
> >               >>>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
> > <dp...@gm...>
> >               >>>>>>> wrote:
> >               >>>>>>>
> >               >>>>>>> Nagendra Goel has worked on some example scripts
> > for this type of
> >               >>>>>>> thing, and with Hainan we were working on trying to
> > get it cleaned up
> >               >>>>>>> and checked in, but he's going for an internship so
> > it will have to
> >               >>>>>>> wait.  But Nagendra might be willing to share it
> > with you.
> >               >>>>>>> Dan
> >               >>>>>>>
> >               >>>>>>>
> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
> >               >>>>>>> <kir...@sm...> wrote:
> >               >>>>>>>> Suppose I have a language model where one token (a
> > "word") is a
> >               >>>>> pointer
> >               >>>>>>>> to a whole another LM. This is a practical case
> > when you expect an
> >               >>>>> abrupt
> >               >>>>>>>> change in model, a clear example being "my phone
> > number is..." and
> >               >>>>> then
> >               >>>>>>>> you'd expect them rattling a string of digits.
> > Is there any support
> >               >>>>> in kaldi
> >               >>>>>>>> for this?
> >               >>>>>>>>
> >               >>>>>>>> Thanks,
> >               >>>>>>>>
> >               >>>>>>>> -kkm
> >               >>>>>
> >               >>>>>
> > -----------------------------------------------------------------------
> > -
> > ------
> >               >>>>>>>> One dashboard for servers and applications across
> >               >>>>> Physical-Virtual-Cloud
> >               >>>>>>>> Widest out-of-the-box monitoring support with
> > 50+ applications
> >               >>>>>>>> Performance metrics, stats and reports that give
> > you Actionable
> >               >>>>> Insights
> >               >>>>>>>> Deep dive visibility with transaction tracing
> > using APM Insight.
> >               >>>>>>>>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >>>>>>>> _______________________________________________
> >               >>>>>>>> Kaldi-users mailing list
> >               >>>>>>>> Kal...@li...
> >               >>>>>>>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>>>>
> >               >>>>>
> > -----------------------------------------------------------------------
> > -
> > ------
> >               >>>>>>> One dashboard for servers and applications across
> >               >>>>>>> Physical-Virtual-Cloud
> >               >>>>>>> Widest out-of-the-box monitoring support with
> > 50+ applications
> >               >>>>>>> Performance metrics, stats and reports that give
> > you Actionable
> >               >>>>>>> Insights
> >               >>>>>>> Deep dive visibility with transaction tracing using
> > APM Insight.
> >               >>>>>>>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >>>>>>> _______________________________________________
> >               >>>>>>> Kaldi-users mailing list
> >               >>>>>>> Kal...@li...
> >               >>>>>>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>>>
> >               >>>>
> >               >>>>
> >               >>>> --
> >               >>>> - Hainan
> >               >>>> -------------- next part --------------
> >               >>>> An HTML attachment was scrubbed...
> >               >>>>
> >               >>>> ------------------------------
> >               >>>>
> >               >>>> Message: 4
> >               >>>> Date: Thu, 21 May 2015 15:01:51 -0400
> >               >>>> From: Sean True <se...@se...>
> >               >>>> Subject: Re: [Kaldi-users] LM grafting
> >               >>>> To: Daniel Povey <dp...@gm...>
> >               >>>> Cc: Hainan Xu <hai...@gm...>,
> >               >>>>   "kal...@li..."
> >               >>>>   <kal...@li...>,    Kirill
> > Katsnelson
> >               >>>>   <kir...@sm...>
> >               >>>> Message-ID:
> >               >>>>
> > <CAL...@ma...>
> >               >>>> Content-Type: text/plain; charset="utf-8"
> >               >>>>
> >               >>>> That's a subject of some general interest. Is there a
> > discussion of the
> >               >>>> general approach that was taken somewhere?
> >               >>>>
> >               >>>> -- Sean
> >               >>>>
> >               >>>> Sean True
> >               >>>> Semantic Machines
> >               >>>>
> >               >>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
> > <dp...@gm...> wrote:
> >               >>>>>
> >               >>>>> Nagendra Goel has worked on some example scripts for
> > this type of
> >               >>>>> thing, and with Hainan we were working on trying to
> > get it cleaned up
> >               >>>>> and checked in, but he's going for an internship so
> > it will have to
> >               >>>>> wait.  But Nagendra might be willing to share it with
> > you.
> >               >>>>> Dan
> >               >>>>>
> >               >>>>>
> >               >>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson
> >               >>>>> <kir...@sm...> wrote:
> >               >>>>>> Suppose I have a language model where one token (a
> > "word") is a
> >               >>>>>> pointer
> >               >>>>> to a whole another LM. This is a practical case when
> > you expect an
> >               >>>>> abrupt
> >               >>>>> change in model, a clear example being "my phone
> > number is..." and then
> >               >>>>> you'd expect them rattling a string of digits. Is
> > there any support in
> >               >>>>> kaldi for this?
> >               >>>>>>
> >               >>>>>> Thanks,
> >               >>>>>>
> >               >>>>>> -kkm
> >               >>>>>
> >               >>>>>
> > -----------------------------------------------------------------------
> > -
> > ------
> >               >>>>>> One dashboard for servers and applications across
> >               >>>>>> Physical-Virtual-Cloud
> >               >>>>>> Widest out-of-the-box monitoring support with 50+
> > applications
> >               >>>>>> Performance metrics, stats and reports that give you
> > Actionable
> >               >>>>>> Insights
> >               >>>>>> Deep dive visibility with transaction tracing using
> > APM Insight.
> >               >>>>>>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >>>>>> _______________________________________________
> >               >>>>>> Kaldi-users mailing list
> >               >>>>>> Kal...@li...
> >               >>>>>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>>>>
> >               >>>>>
> >               >>>>>
> >               >>>>>
> > -----------------------------------------------------------------------
> > -
> > ------
> >               >>>>> One dashboard for servers and applications across
> >               >>>>> Physical-Virtual-Cloud
> >               >>>>> Widest out-of-the-box monitoring support with 50+
> > applications
> >               >>>>> Performance metrics, stats and reports that give you
> > Actionable
> >               >>>>> Insights
> >               >>>>> Deep dive visibility with transaction tracing using
> > APM Insight.
> >               >>>>>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >>>>> _______________________________________________
> >               >>>>> Kaldi-users mailing list
> >               >>>>> Kal...@li...
> >               >>>>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>>> -------------- next part --------------
> >               >>>> An HTML attachment was scrubbed...
> >               >>>>
> >               >>>> ------------------------------
> >               >>>>
> >               >>>>
> >               >>>>
> > -----------------------------------------------------------------------
> > -
> > ------
> >               >>>> One dashboard for servers and applications across
> > Physical-Virtual-Cloud
> >               >>>> Widest out-of-the-box monitoring support with 50+
> > applications
> >               >>>> Performance metrics, stats and reports that give you
> > Actionable Insights
> >               >>>> Deep dive visibility with transaction tracing using
> > APM Insight.
> >               >>>>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >>>>
> >               >>>> ------------------------------
> >               >>>>
> >               >>>> _______________________________________________
> >               >>>> Kaldi-users mailing list
> >               >>>> Kal...@li...
> >               >>>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>>>
> >               >>>>
> >               >>>> End of Kaldi-users Digest, Vol 29, Issue 15
> >               >>>> *******************************************
> >               >>>
> >               >>>
> >               >>>
> > -----------------------------------------------------------------------
> > -
> > ------
> >               >>> One dashboard for servers and applications across
> > Physical-Virtual-Cloud
> >               >>> Widest out-of-the-box monitoring support with 50+
> > applications
> >               >>> Performance metrics, stats and reports that give you
> > Actionable Insights
> >               >>> Deep dive visibility with transaction tracing using APM
> > Insight.
> >               >>>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >>> _______________________________________________
> >               >>> Kaldi-users mailing list
> >               >>> Kal...@li...
> >               >>>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>
> >               >>
> >               >>
> > -----------------------------------------------------------------------
> > -
> > ------
> >               >> One dashboard for servers and applications across
> > Physical-Virtual-Cloud
> >               >> Widest out-of-the-box monitoring support with 50+
> > applications
> >               >> Performance metrics, stats and reports that give you
> > Actionable Insights
> >               >> Deep dive visibility with transaction tracing using APM
> > Insight.
> >               >>
> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               >> _______________________________________________
> >               >> Kaldi-users mailing list
> >               >> Kal...@li...
> >               >>
> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >               >>
> >
> >
> > -----------------------------------------------------------------------
> > -
> > ------
> >               One dashboard for servers and applications across Physical-
> > Virtual-Cloud
> >               Widest out-of-the-box monitoring support with 50+
> > applications
> >               Performance metrics, stats and reports that give you
> > Actionable Insights
> >               Deep dive visibility with transaction tracing using APM
> > Insight.
> >               http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >               _______________________________________________
> >               Kaldi-users mailing list
> >               Kal...@li...
> >               https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >
> >
> >
> >
> >
> > -----------------------------------------------------------------------
> > -
> > ------
> >       One dashboard for servers and applications across Physical-
> > Virtual-Cloud
> >       Widest out-of-the-box monitoring support with 50+ applications
> >       Performance metrics, stats and reports that give you Actionable
> > Insights
> >       Deep dive visibility with transaction tracing using APM Insight.
> >       http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> >       _______________________________________________
> >       Kaldi-users mailing list
> >       Kal...@li...
> >       https://lists.sourceforge.net/lists/listinfo/kaldi-users
> >
> >
>
>
> ------------------------------------------------------------------------------
> One dashboard for servers and applications across Physical-Virtual-Cloud
> Widest out-of-the-box monitoring support with 50+ applications
> Performance metrics, stats and reports that give you Actionable Insights
> Deep dive visibility with transaction tracing using APM Insight.
> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
> _______________________________________________
> Kaldi-users mailing list
> Kal...@li...
> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>



-- 
Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, ond...@gm...

Re: [Kaldi-users] non-cluster usage of Librispeech s5 recipe?

From: David Warde-F. <d.w...@gm...> - 2015-06-17 07:27:49

Kirill,

Many thanks for the pointers. On your setup, how long does the entire
recipe take without decoding?

For the life of me I can't figure out where num_jobs_nnet is being set
(it's being written in the egs_dir as 4, I've changed it everywhere I
could find it.)

On Fri, Jun 12, 2015 at 7:00 PM, Kirill Katsnelson
<kir...@sm...> wrote:
>> From: David Warde-Farley [mailto:d.w...@gm...]
>> Subject: [Kaldi-users] non-cluster usage of Librispeech s5 recipe?
>>
>> I'm trying to
>> use the s5 recipe for LibriSpeech on a single machine with a single
>> GPU. I've modified cmd.sh to use run.pl.
>
> I ran it on a single machine, it requires a few modifications. Note that it took almost a week on a 6-core 4.1GHz overclocked i7-5930K CPU and GeForce 980 to train on the 500 hour set.
>
>> After about a day, I see a lot of background processes like gmm-latgen-
>> faster, lattice-add-penalty, lattice-scale, etc. that have been
>> launched in the background (the terminal is actually free, which
>> suggests the run.sh script has terminated...). I'm not totally sure
>> what's going on, or how to find out.
>
> In librispeech/s5/run.sh, look for decode commands in subshells, like
>
> (
>    utils/mkgraph.sh data/lang_nosp_test_tgsmall \
>      exp/tri4b exp/tri4b/graph_nosp_tgsmall || exit 1;
>   for test in test_clean test_other dev_clean dev_other; do
>     steps/decode_fmllr.sh --nj 20 --cmd "$decode_cmd" \
>   . . .
> )&
>
> These decodes are quite slow, if you run them on your machine. They are slower than other part of the script. In the end, they are accumulating, eating CPU and blowing up out of memory. They are not essential for NN training, except possibly for the mkgraph script. The results are useful to check if you are getting expected WER, but really not essential. You may either disable these decode blocks completely (except mkgraph invocations) or remove the '&' at the end to run them synchronously. NB they will take the most preparation time prior to NN training step. Dunno about your machine but give it an extra couple days to complete with these.
>
>> One thing I noticed earlier is that the script was trying to spawn
>> multiple GPU jobs, but this GPU is configured (by administrators) to
>> permit at most one CUDA process, and so I saw "3 of 4 jobs failed"
>> messages. Would these jobs have been retried?
>
> They will not, but you can restart NN training from the last step. Modify local/online/run_nnet2_ms.sh so that steps/nnet2/train_multisplice_accel2.sh is invoked with switches "--num-jobs-initial 1 --num-jobs-final 1" (the defaults are larger). When running local/online/run_nnet2_ms.sh, pass it "--stage 7" (this is the default) and "--train_stage N" the number of iteration you are restarting from.
>
> Even if not the 1 job limit, you probably won't benefit from running more than 1 at a time.
>
>  -kkm


On Fri, Jun 12, 2015 at 4:00 PM, Kirill Katsnelson
<kir...@sm...> wrote:
>> From: David Warde-Farley [mailto:d.w...@gm...]
>> Subject: [Kaldi-users] non-cluster usage of Librispeech s5 recipe?
>>
>> I'm trying to
>> use the s5 recipe for LibriSpeech on a single machine with a single
>> GPU. I've modified cmd.sh to use run.pl.
>
> I ran it on a single machine, it requires a few modifications. Note that it took almost a week on a 6-core 4.1GHz overclocked i7-5930K CPU and GeForce 980 to train on the 500 hour set.
>
>> After about a day, I see a lot of background processes like gmm-latgen-
>> faster, lattice-add-penalty, lattice-scale, etc. that have been
>> launched in the background (the terminal is actually free, which
>> suggests the run.sh script has terminated...). I'm not totally sure
>> what's going on, or how to find out.
>
> In librispeech/s5/run.sh, look for decode commands in subshells, like
>
> (
>    utils/mkgraph.sh data/lang_nosp_test_tgsmall \
>      exp/tri4b exp/tri4b/graph_nosp_tgsmall || exit 1;
>   for test in test_clean test_other dev_clean dev_other; do
>     steps/decode_fmllr.sh --nj 20 --cmd "$decode_cmd" \
>   . . .
> )&
>
> These decodes are quite slow, if you run them on your machine. They are slower than other part of the script. In the end, they are accumulating, eating CPU and blowing up out of memory. They are not essential for NN training, except possibly for the mkgraph script. The results are useful to check if you are getting expected WER, but really not essential. You may either disable these decode blocks completely (except mkgraph invocations) or remove the '&' at the end to run them synchronously. NB they will take the most preparation time prior to NN training step. Dunno about your machine but give it an extra couple days to complete with these.
>
>> One thing I noticed earlier is that the script was trying to spawn
>> multiple GPU jobs, but this GPU is configured (by administrators) to
>> permit at most one CUDA process, and so I saw "3 of 4 jobs failed"
>> messages. Would these jobs have been retried?
>
> They will not, but you can restart NN training from the last step. Modify local/online/run_nnet2_ms.sh so that steps/nnet2/train_multisplice_accel2.sh is invoked with switches "--num-jobs-initial 1 --num-jobs-final 1" (the defaults are larger). When running local/online/run_nnet2_ms.sh, pass it "--stage 7" (this is the default) and "--train_stage N" the number of iteration you are restarting from.
>
> Even if not the 1 job limit, you probably won't benefit from running more than 1 at a time.
>
>  -kkm