Re: [Kaldi-users] LM grafting

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

We currently use the script below for creating arpa LM based on CB LM and
mixed from out of domain data and indomain data LM which are not classed
based. Given the arpa file we convert it
https://github.com/UFAL-DSG/alex/blob/master/alex/applications/PublicTransportInfoCS/lm/build.py

Note, that this CB-MODEL estimation has several drawbacks.
The biggest one that we do not compute bigrams (or higher ngrams) estimates
if there two classes in one bigram e.g. I want connection CITY CITY.
I am working on improving this as a side project.

Another important problem is that we need to expand the LM with instances
of the classes which significantly increase the size of the lexicon, and
also the higher order ngrams in the LM.

I was not sure if you do want to do it this way in Kaldi or if you want to
do it on FST level.

PS: I attached the Czech class file
 classes.txt.zip
<https://drive.google.com/file/d/0B_cd-iN3UhaVOFpIWGlic0F5cUU/edit?usp=drive_web>

On Wed, Jun 17, 2015 at 10:11 PM, Sandeep Reddy <san...@go...>
wrote:

> Does the kaldi recipe do Class LM? Or can you add it to recipe? That would
> make the whole process so much easier. I don't mind if the words are Czetch.
>
> On Wed, Jun 17, 2015 at 4:08 PM, Ondrej Platek <ond...@gm...>
> wrote:
>
>> For the Czech data we are running the system live with Kaldi and we use
>> class LM.
>> For the English data I will give you few examples from top of my head:
>>
>> PRICE_RANGE - cheap, middle price-range...
>> FOOD_TYPE - Indian, Chinese,
>> LOCATION - city center, Chesterton area, ..
>> ....
>>
>> We will try to find the classes definition, since we are not running the
>> system.
>>
>> Ondrej
>>
>> On Wed, Jun 17, 2015 at 10:01 PM, Sandeep Reddy <
>> san...@go...> wrote:
>>
>>> Ondrej,
>>>    I'll run the Vystadial recipe and see what opportunities are there.
>>> Did somebody already make a class LM on it or at least define what
>>> potential classes are? I hadn't looked into it earlier.
>>> Thanks
>>> Nagendra
>>>
>>> On Wed, Jun 17, 2015 at 3:42 AM, Ondrej Platek <ond...@gm...>
>>> wrote:
>>>
>>>> Dear all,
>>>>
>>>> thanks to reminder of Dimitris, I realized that the Vystadial dataset
>>>> is very convenient for Class based LM/ LM grafting.
>>>> As the scripts for Vystadial Cs & En are already in Kaldi it may be
>>>> convenient starting data because
>>>> they contain transcription of user utterances from communication with
>>>> spoken dialogue system where we have the classes defined.
>>>>
>>>> See scritps:
>>>> https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_en
>>>> https://github.com/kaldi-asr/kaldi/tree/master/egs/vystadial_cz
>>>>
>>>> See data (scroll to the bottom to download the datasets):
>>>> http://hdl.handle.net/11858/00-097C-0000-0023-4671-4  (en)
>>>> http://hdl.handle.net/11858/00-097C-0000-0023-4670-6 (cs)
>>>>
>>>>
>>>> We can probably recreate / find the list of words in the classes for
>>>> English if there is interest.
>>>> For Czech this should be no problem at all.
>>>>
>>>> Please, let me know if you are interested in these datasets and the
>>>> lists of classes and their members.
>>>>
>>>> Ondra
>>>>
>>>> PS: Currently, we used classed based (CB) LM which we later expand to
>>>> full LM in arpa format than create G.fst as in standard use case.
>>>> It is not optimal attitude but it works for us.
>>>> If you want to know how we are modeling the  CBLM just let me know, I
>>>> am working on slight improvement of it right now,
>>>> so I am interested in improving it.
>>>>
>>>>
>>>> On Tue, May 26, 2015 at 8:11 PM, Kirill Katsnelson <
>>>> kir...@sm...> wrote:
>>>>
>>>>> Speaking about data set preprocessing only, will Stanford NLP POS
>>>>> tagger pull the trick?
>>>>>
>>>>>  -kkm
>>>>>
>>>>> > -----Original Message-----
>>>>> > From: Nagendra Goel [mailto:nag...@go...]
>>>>> > Sent: 2015-05-24 1511
>>>>> > To: Matthew Aylett
>>>>> > Cc: Dimitris Vassos; kal...@li...
>>>>> > Subject: Re: [Kaldi-users] LM grafting
>>>>> >
>>>>> > A systematic way for identifying special elements in text will be
>>>>> very
>>>>> > useful. Currently  NSW-EXPAND from festival conflicts with this sub-
>>>>> > grammar approach although otherwise it's a good lm pre-processing
>>>>> step.
>>>>> >
>>>>> > Nagendra Kumar Goel
>>>>> >
>>>>> > On May 24, 2015 4:45 PM, "Matthew Aylett" <mat...@gm...>
>>>>> > wrote:
>>>>> >
>>>>> >
>>>>> >       Not sure if this is relevant to this thread. But in the speech
>>>>> > synthesis system branch we have a very early text normaliser which
>>>>> > (when
>>>>> > complete) will detect things like phone numbers addresses, currencies
>>>>> > etc. The output form this could then be used to inform language model
>>>>> > building. Currently it deals with symbols and tokenisations in
>>>>> English.
>>>>> >
>>>>> >       Potentially `(although I wasn't currently planning on this),
>>>>> the
>>>>> > text normaliser could be written in thrax - based on openfst -
>>>>> authored
>>>>> > by Richard Sproat I believe). However if this approach would benefit
>>>>> > ASR as well then it might be worth doing it this way rather than my
>>>>> > plan of a simple greedy normaliser.
>>>>> >
>>>>> >
>>>>> >       v best
>>>>> >
>>>>> >       Matthew Aylett
>>>>> >
>>>>> >
>>>>> >       On Sun, May 24, 2015 at 8:34 AM, Dimitris Vassos
>>>>> > <dva...@gm...> wrote:
>>>>> >
>>>>> >
>>>>> >               We have access to several corpora and we are trying to
>>>>> put
>>>>> > together something appropriate.
>>>>> >
>>>>> >               In the next couple of days, we will also volunteer a
>>>>> server
>>>>> > to set it all up and run the tests.
>>>>> >
>>>>> >               Dimitris
>>>>> >
>>>>> >               > On 24 Μαΐ 2015, at 02:06, Daniel Povey <
>>>>> dp...@gm...>
>>>>> > wrote:
>>>>> >               >
>>>>> >               > One possibility is to use a completely open-source
>>>>> setup,
>>>>> > e.g.
>>>>> >               > Voxforge, and forget about the "has a clear
>>>>> advantage"
>>>>> > requirement.
>>>>> >               > E.g. target anything that looks like a year, and
>>>>> make a
>>>>> > grammar for
>>>>> >               > years.
>>>>> >               > Dan
>>>>> >               >
>>>>> >               >
>>>>> >               > On Fri, May 22, 2015 at 6:32 AM, Nagendra Goel
>>>>> >               > <nag...@go...> wrote:
>>>>> >               >> Since I cannot volunteer my enviornment, do you
>>>>> > recommend another
>>>>> >               >> enviornment  where this can be prototyped and where
>>>>> you
>>>>> > can check in some
>>>>> >               >> class lm recipe that has advantage.
>>>>> >               >>
>>>>> >               >> Nagendra
>>>>> >               >>
>>>>> >               >> Nagendra Kumar Goel
>>>>> >               >>
>>>>> >               >>> On May 21, 2015 11:01 PM, "Dimitris Vassos"
>>>>> > <dva...@gm...> wrote:
>>>>> >               >>>
>>>>> >               >>> +1 for the class-based LMs. I have also been
>>>>> interested
>>>>> > in this
>>>>> >               >>> functionality for some time now, so will be more
>>>>> than
>>>>> > happy to try out the
>>>>> >               >>> current implementation, if possible.
>>>>> >               >>>
>>>>> >               >>> Thanks
>>>>> >               >>> Dimitris
>>>>> >               >>>
>>>>> >               >>>> On 22 Μαΐ 2015, at 01:34,
>>>>> > kal...@li...
>>>>> >               >>>> wrote:
>>>>> >               >>>>
>>>>> >               >>>> Send Kaldi-users mailing list submissions to
>>>>> >               >>>>   kal...@li...
>>>>> >               >>>>
>>>>> >               >>>> To subscribe or unsubscribe via the World Wide
>>>>> Web,
>>>>> > visit
>>>>> >               >>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>> or, via email, send a message with subject or body
>>>>> > 'help' to
>>>>> >               >>>>   kal...@li...
>>>>> >               >>>>
>>>>> >               >>>> You can reach the person managing the list at
>>>>> >               >>>>   kal...@li...
>>>>> >               >>>>
>>>>> >               >>>> When replying, please edit your Subject line so
>>>>> it is
>>>>> > more specific
>>>>> >               >>>> than "Re: Contents of Kaldi-users digest..."
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>> Today's Topics:
>>>>> >               >>>>
>>>>> >               >>>>  1. Re: LM grafting (Daniel Povey)
>>>>> >               >>>>  2. Re: LM grafting (Kirill Katsnelson)
>>>>> >               >>>>  3. Re: LM grafting (Hainan Xu)
>>>>> >               >>>>  4. Re: LM grafting (Sean True)
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >
>>>>> ----------------------------------------------------------------------
>>>>> >               >>>>
>>>>> >               >>>> Message: 1
>>>>> >               >>>> Date: Thu, 21 May 2015 15:04:04 -0400
>>>>> >               >>>> From: Daniel Povey <dp...@gm...>
>>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>>> >               >>>> To: Sean True <se...@se...>
>>>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>>>> >               >>>>   "kal...@li..."
>>>>> >               >>>>   <kal...@li...>,    Kirill
>>>>> > Katsnelson
>>>>> >               >>>>   <kir...@sm...>
>>>>> >               >>>> Message-ID:
>>>>> >               >>>>
>>>>> > <CAE...@ma...
>>>>> > <mailto:k4YJVsBiAfEuFDFMvY%2B...@ma...> >
>>>>> >               >>>> Content-Type: text/plain; charset=UTF-8
>>>>> >               >>>>
>>>>> >               >>>> The general approach is to create an FST for the
>>>>> > little language
>>>>> >               >>>> model, and then to use fstreplace to replace
>>>>> instances
>>>>> > of a particular
>>>>> >               >>>> symbol in the top-level language model, with that
>>>>> FST.
>>>>> >               >>>> The tricky part is ensuring that the result is
>>>>> > determinizable after
>>>>> >               >>>> composing with the lexicon.  In general our
>>>>> solution
>>>>> > is to add special
>>>>> >               >>>> disambiguation symbols at the beginning and end of
>>>>> > each of the
>>>>> >               >>>> sub-FSTs, and of course making sure that the
>>>>> sub-FSTs
>>>>> > are themselves
>>>>> >               >>>> determinizable.
>>>>> >               >>>> Dan
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>>>> > <se...@se...>
>>>>> >               >>>>> wrote:
>>>>> >               >>>>> That's a subject of some general interest. Is
>>>>> there a
>>>>> > discussion of the
>>>>> >               >>>>> general approach that was taken somewhere?
>>>>> >               >>>>>
>>>>> >               >>>>> -- Sean
>>>>> >               >>>>>
>>>>> >               >>>>> Sean True
>>>>> >               >>>>> Semantic Machines
>>>>> >               >>>>>
>>>>> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>>> > <dp...@gm...>
>>>>> >               >>>>>> wrote:
>>>>> >               >>>>>>
>>>>> >               >>>>>> Nagendra Goel has worked on some example
>>>>> scripts for
>>>>> > this type of
>>>>> >               >>>>>> thing, and with Hainan we were working on
>>>>> trying to
>>>>> > get it cleaned up
>>>>> >               >>>>>> and checked in, but he's going for an
>>>>> internship so
>>>>> > it will have to
>>>>> >               >>>>>> wait.  But Nagendra might be willing to share it
>>>>> > with you.
>>>>> >               >>>>>> Dan
>>>>> >               >>>>>>
>>>>> >               >>>>>>
>>>>> >               >>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>>> Katsnelson
>>>>> >               >>>>>> <kir...@sm...> wrote:
>>>>> >               >>>>>>> Suppose I have a language model where one
>>>>> token (a
>>>>> > "word") is a
>>>>> >               >>>>>>> pointer
>>>>> >               >>>>>>> to a whole another LM. This is a practical case
>>>>> > when you expect an
>>>>> >               >>>>>>> abrupt
>>>>> >               >>>>>>> change in model, a clear example being "my
>>>>> phone
>>>>> > number is..." and
>>>>> >               >>>>>>> then
>>>>> >               >>>>>>> you'd expect them rattling a string of digits.
>>>>> > Is there any support
>>>>> >               >>>>>>> in kaldi
>>>>> >               >>>>>>> for this?
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> Thanks,
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> -kkm
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>>>> One dashboard for servers and applications
>>>>> across
>>>>> >               >>>>>>> Physical-Virtual-Cloud
>>>>> >               >>>>>>> Widest out-of-the-box monitoring support with
>>>>> > 50+ applications
>>>>> >               >>>>>>> Performance metrics, stats and reports that
>>>>> give
>>>>> > you Actionable
>>>>> >               >>>>>>> Insights
>>>>> >               >>>>>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>>> _______________________________________________
>>>>> >               >>>>>>> Kaldi-users mailing list
>>>>> >               >>>>>>> Kal...@li...
>>>>> >               >>>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>>>
>>>>> >               >>>>>>
>>>>> >               >>>>>>
>>>>> >               >>>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>>> One dashboard for servers and applications
>>>>> across
>>>>> >               >>>>>> Physical-Virtual-Cloud
>>>>> >               >>>>>> Widest out-of-the-box monitoring support with
>>>>> 50+
>>>>> > applications
>>>>> >               >>>>>> Performance metrics, stats and reports that
>>>>> give you
>>>>> > Actionable
>>>>> >               >>>>>> Insights
>>>>> >               >>>>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>> _______________________________________________
>>>>> >               >>>>>> Kaldi-users mailing list
>>>>> >               >>>>>> Kal...@li...
>>>>> >               >>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>> ------------------------------
>>>>> >               >>>>
>>>>> >               >>>> Message: 2
>>>>> >               >>>> Date: Thu, 21 May 2015 19:24:38 +0000
>>>>> >               >>>> From: Kirill Katsnelson
>>>>> > <kir...@sm...>
>>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>>> >               >>>> To: "dp...@gm..." <dp...@gm...>, Sean
>>>>> True
>>>>> >               >>>>   <se...@se...>
>>>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>>>> >               >>>>   "kal...@li..."
>>>>> >               >>>>   <kal...@li...>
>>>>> >               >>>> Message-ID:
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >
>>>>> <CY1...@CY...
>>>>> > l
>>>>> > ook.com>
>>>>> >               >>>>
>>>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>>>> >               >>>>
>>>>> >               >>>> Also, from the practical standpoint,
>>>>> > backoff/discounting weights usually
>>>>> >               >>>> need to be massaged. Otherwise when the grafted
>>>>> LM is
>>>>> > small and the main LM
>>>>> >               >>>> is large, the little model will tend to shoehorn
>>>>> an
>>>>> > utterance into itself
>>>>> >               >>>> rather than let go of it. In my phone number
>>>>> example,
>>>>> > everything becomes
>>>>> >               >>>> digits once the phone number starts.
>>>>> >               >>>>
>>>>> >               >>>> -kkm
>>>>> >               >>>>
>>>>> >               >>>>> -----Original Message-----
>>>>> >               >>>>> From: Daniel Povey [mailto:dp...@gm...]
>>>>> >               >>>>> Sent: 2015-05-21 1204
>>>>> >               >>>>> To: Sean True
>>>>> >               >>>>> Cc: Kirill Katsnelson; Nagendra Goel; Hainan Xu;
>>>>> > kaldi-
>>>>> >               >>>>> us...@li...
>>>>> >               >>>>> Subject: Re: [Kaldi-users] LM grafting
>>>>> >               >>>>>
>>>>> >               >>>>> The general approach is to create an FST for the
>>>>> > little language model,
>>>>> >               >>>>> and then to use fstreplace to replace instances
>>>>> of a
>>>>> > particular symbol
>>>>> >               >>>>> in the top-level language model, with that FST.
>>>>> >               >>>>> The tricky part is ensuring that the result is
>>>>> > determinizable after
>>>>> >               >>>>> composing with the lexicon.  In general our
>>>>> solution
>>>>> > is to add special
>>>>> >               >>>>> disambiguation symbols at the beginning and end
>>>>> of
>>>>> > each of the sub-
>>>>> >               >>>>> FSTs, and of course making sure that the
>>>>> sub-FSTs are
>>>>> > themselves
>>>>> >               >>>>> determinizable.
>>>>> >               >>>>> Dan
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>>>> > <se...@se...>
>>>>> >               >>>>> wrote:
>>>>> >               >>>>>> That's a subject of some general interest. Is
>>>>> there
>>>>> > a discussion of
>>>>> >               >>>>>> the general approach that was taken somewhere?
>>>>> >               >>>>>>
>>>>> >               >>>>>> -- Sean
>>>>> >               >>>>>>
>>>>> >               >>>>>> Sean True
>>>>> >               >>>>>> Semantic Machines
>>>>> >               >>>>>>
>>>>> >               >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>>> > <dp...@gm...>
>>>>> >               >>>>> wrote:
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> Nagendra Goel has worked on some example
>>>>> scripts
>>>>> > for this type of
>>>>> >               >>>>>>> thing, and with Hainan we were working on
>>>>> trying to
>>>>> > get it cleaned
>>>>> >               >>>>> up
>>>>> >               >>>>>>> and checked in, but he's going for an
>>>>> internship so
>>>>> > it will have to
>>>>> >               >>>>>>> wait.  But Nagendra might be willing to share
>>>>> it
>>>>> > with you.
>>>>> >               >>>>>>> Dan
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>>> Katsnelson
>>>>> >               >>>>>>> <kir...@sm...> wrote:
>>>>> >               >>>>>>>> Suppose I have a language model where one
>>>>> token (a
>>>>> > "word") is a
>>>>> >               >>>>>>>> pointer to a whole another LM. This is a
>>>>> practical
>>>>> > case when you
>>>>> >               >>>>>>>> expect an abrupt change in model, a clear
>>>>> example
>>>>> > being "my phone
>>>>> >               >>>>>>>> number is..." and then you'd expect them
>>>>> rattling
>>>>> > a string of
>>>>> >               >>>>>>>> digits. Is there any support in kaldi for
>>>>> this?
>>>>> >               >>>>>>>>
>>>>> >               >>>>>>>> Thanks,
>>>>> >               >>>>>>>>
>>>>> >               >>>>>>>> -kkm
>>>>> >               >>>>>>>>
>>>>> >               >>>>>>>>
>>>>> > ------------------------------------------------------------------
>>>>> >               >>>>> -
>>>>> >               >>>>>>>> ----------- One dashboard for servers and
>>>>> > applications across
>>>>> >               >>>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
>>>>> > monitoring support
>>>>> >               >>>>>>>> with 50+ applications Performance metrics,
>>>>> stats
>>>>> > and reports that
>>>>> >               >>>>>>>> give you Actionable Insights Deep dive
>>>>> visibility
>>>>> > with transaction
>>>>> >               >>>>>>>> tracing using APM Insight.
>>>>> >               >>>>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>>>>
>>>>> _______________________________________________
>>>>> >               >>>>>>>> Kaldi-users mailing list
>>>>> >               >>>>>>>> Kal...@li...
>>>>> >               >>>>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> > --------------------------------------------------------------------
>>>>> >               >>>>> -
>>>>> >               >>>>>>> --------- One dashboard for servers and
>>>>> > applications across
>>>>> >               >>>>>>> Physical-Virtual-Cloud Widest out-of-the-box
>>>>> > monitoring support with
>>>>> >               >>>>>>> 50+ applications Performance metrics, stats and
>>>>> > reports that give
>>>>> >               >>>>> you
>>>>> >               >>>>>>> Actionable Insights Deep dive visibility with
>>>>> > transaction tracing
>>>>> >               >>>>>>> using APM Insight.
>>>>> >               >>>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>>> _______________________________________________
>>>>> >               >>>>>>> Kaldi-users mailing list
>>>>> >               >>>>>>> Kal...@li...
>>>>> >               >>>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>
>>>>> >               >>>> ------------------------------
>>>>> >               >>>>
>>>>> >               >>>> Message: 3
>>>>> >               >>>> Date: Thu, 21 May 2015 15:29:54 -0400
>>>>> >               >>>> From: Hainan Xu <hai...@gm...>
>>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>>> >               >>>> To: Daniel Povey <dp...@gm...>
>>>>> >               >>>> Cc: Sean True <se...@se...>,
>>>>> >               >>>>   "kal...@li..."
>>>>> >               >>>>   <kal...@li...>,    Kirill
>>>>> > Katsnelson
>>>>> >               >>>>   <kir...@sm...>
>>>>> >               >>>> Message-ID:
>>>>> >               >>>>
>>>>> > <CAL...@ma...>
>>>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>>>> >               >>>>
>>>>> >               >>>> There is a paper in ICASSP 2015 that described
>>>>> some
>>>>> > very similar idea:
>>>>> >               >>>>
>>>>> >               >>>> Improved recognition of contact names in voice
>>>>> > commands
>>>>> >               >>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 3:04 PM, Daniel Povey
>>>>> > <dp...@gm...> wrote:
>>>>> >               >>>>>
>>>>> >               >>>>> The general approach is to create an FST for the
>>>>> > little language
>>>>> >               >>>>> model, and then to use fstreplace to replace
>>>>> > instances of a particular
>>>>> >               >>>>> symbol in the top-level language model, with that
>>>>> > FST.
>>>>> >               >>>>> The tricky part is ensuring that the result is
>>>>> > determinizable after
>>>>> >               >>>>> composing with the lexicon.  In general our
>>>>> solution
>>>>> > is to add special
>>>>> >               >>>>> disambiguation symbols at the beginning and end
>>>>> of
>>>>> > each of the
>>>>> >               >>>>> sub-FSTs, and of course making sure that the
>>>>> sub-FSTs
>>>>> > are themselves
>>>>> >               >>>>> determinizable.
>>>>> >               >>>>> Dan
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True
>>>>> > <se...@se...>
>>>>> >               >>>>> wrote:
>>>>> >               >>>>>> That's a subject of some general interest. Is
>>>>> there
>>>>> > a discussion of
>>>>> >               >>>>>> the
>>>>> >               >>>>>> general approach that was taken somewhere?
>>>>> >               >>>>>>
>>>>> >               >>>>>> -- Sean
>>>>> >               >>>>>>
>>>>> >               >>>>>> Sean True
>>>>> >               >>>>>> Semantic Machines
>>>>> >               >>>>>>
>>>>> >               >>>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>>> > <dp...@gm...>
>>>>> >               >>>>>>> wrote:
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> Nagendra Goel has worked on some example
>>>>> scripts
>>>>> > for this type of
>>>>> >               >>>>>>> thing, and with Hainan we were working on
>>>>> trying to
>>>>> > get it cleaned up
>>>>> >               >>>>>>> and checked in, but he's going for an
>>>>> internship so
>>>>> > it will have to
>>>>> >               >>>>>>> wait.  But Nagendra might be willing to share
>>>>> it
>>>>> > with you.
>>>>> >               >>>>>>> Dan
>>>>> >               >>>>>>>
>>>>> >               >>>>>>>
>>>>> >               >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>>> Katsnelson
>>>>> >               >>>>>>> <kir...@sm...> wrote:
>>>>> >               >>>>>>>> Suppose I have a language model where one
>>>>> token (a
>>>>> > "word") is a
>>>>> >               >>>>> pointer
>>>>> >               >>>>>>>> to a whole another LM. This is a practical
>>>>> case
>>>>> > when you expect an
>>>>> >               >>>>> abrupt
>>>>> >               >>>>>>>> change in model, a clear example being "my
>>>>> phone
>>>>> > number is..." and
>>>>> >               >>>>> then
>>>>> >               >>>>>>>> you'd expect them rattling a string of digits.
>>>>> > Is there any support
>>>>> >               >>>>> in kaldi
>>>>> >               >>>>>>>> for this?
>>>>> >               >>>>>>>>
>>>>> >               >>>>>>>> Thanks,
>>>>> >               >>>>>>>>
>>>>> >               >>>>>>>> -kkm
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>>>>> One dashboard for servers and applications
>>>>> across
>>>>> >               >>>>> Physical-Virtual-Cloud
>>>>> >               >>>>>>>> Widest out-of-the-box monitoring support with
>>>>> > 50+ applications
>>>>> >               >>>>>>>> Performance metrics, stats and reports that
>>>>> give
>>>>> > you Actionable
>>>>> >               >>>>> Insights
>>>>> >               >>>>>>>> Deep dive visibility with transaction tracing
>>>>> > using APM Insight.
>>>>> >               >>>>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>>>>
>>>>> _______________________________________________
>>>>> >               >>>>>>>> Kaldi-users mailing list
>>>>> >               >>>>>>>> Kal...@li...
>>>>> >               >>>>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>>>> One dashboard for servers and applications
>>>>> across
>>>>> >               >>>>>>> Physical-Virtual-Cloud
>>>>> >               >>>>>>> Widest out-of-the-box monitoring support with
>>>>> > 50+ applications
>>>>> >               >>>>>>> Performance metrics, stats and reports that
>>>>> give
>>>>> > you Actionable
>>>>> >               >>>>>>> Insights
>>>>> >               >>>>>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>>> _______________________________________________
>>>>> >               >>>>>>> Kaldi-users mailing list
>>>>> >               >>>>>>> Kal...@li...
>>>>> >               >>>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>> --
>>>>> >               >>>> - Hainan
>>>>> >               >>>> -------------- next part --------------
>>>>> >               >>>> An HTML attachment was scrubbed...
>>>>> >               >>>>
>>>>> >               >>>> ------------------------------
>>>>> >               >>>>
>>>>> >               >>>> Message: 4
>>>>> >               >>>> Date: Thu, 21 May 2015 15:01:51 -0400
>>>>> >               >>>> From: Sean True <se...@se...>
>>>>> >               >>>> Subject: Re: [Kaldi-users] LM grafting
>>>>> >               >>>> To: Daniel Povey <dp...@gm...>
>>>>> >               >>>> Cc: Hainan Xu <hai...@gm...>,
>>>>> >               >>>>   "kal...@li..."
>>>>> >               >>>>   <kal...@li...>,    Kirill
>>>>> > Katsnelson
>>>>> >               >>>>   <kir...@sm...>
>>>>> >               >>>> Message-ID:
>>>>> >               >>>>
>>>>> > <CAL...@ma...>
>>>>> >               >>>> Content-Type: text/plain; charset="utf-8"
>>>>> >               >>>>
>>>>> >               >>>> That's a subject of some general interest. Is
>>>>> there a
>>>>> > discussion of the
>>>>> >               >>>> general approach that was taken somewhere?
>>>>> >               >>>>
>>>>> >               >>>> -- Sean
>>>>> >               >>>>
>>>>> >               >>>> Sean True
>>>>> >               >>>> Semantic Machines
>>>>> >               >>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey
>>>>> > <dp...@gm...> wrote:
>>>>> >               >>>>>
>>>>> >               >>>>> Nagendra Goel has worked on some example scripts
>>>>> for
>>>>> > this type of
>>>>> >               >>>>> thing, and with Hainan we were working on trying
>>>>> to
>>>>> > get it cleaned up
>>>>> >               >>>>> and checked in, but he's going for an internship
>>>>> so
>>>>> > it will have to
>>>>> >               >>>>> wait.  But Nagendra might be willing to share it
>>>>> with
>>>>> > you.
>>>>> >               >>>>> Dan
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >               >>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill
>>>>> Katsnelson
>>>>> >               >>>>> <kir...@sm...> wrote:
>>>>> >               >>>>>> Suppose I have a language model where one token
>>>>> (a
>>>>> > "word") is a
>>>>> >               >>>>>> pointer
>>>>> >               >>>>> to a whole another LM. This is a practical case
>>>>> when
>>>>> > you expect an
>>>>> >               >>>>> abrupt
>>>>> >               >>>>> change in model, a clear example being "my phone
>>>>> > number is..." and then
>>>>> >               >>>>> you'd expect them rattling a string of digits. Is
>>>>> > there any support in
>>>>> >               >>>>> kaldi for this?
>>>>> >               >>>>>>
>>>>> >               >>>>>> Thanks,
>>>>> >               >>>>>>
>>>>> >               >>>>>> -kkm
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>>> One dashboard for servers and applications
>>>>> across
>>>>> >               >>>>>> Physical-Virtual-Cloud
>>>>> >               >>>>>> Widest out-of-the-box monitoring support with
>>>>> 50+
>>>>> > applications
>>>>> >               >>>>>> Performance metrics, stats and reports that
>>>>> give you
>>>>> > Actionable
>>>>> >               >>>>>> Insights
>>>>> >               >>>>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>>> _______________________________________________
>>>>> >               >>>>>> Kaldi-users mailing list
>>>>> >               >>>>>> Kal...@li...
>>>>> >               >>>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >               >>>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>>> One dashboard for servers and applications across
>>>>> >               >>>>> Physical-Virtual-Cloud
>>>>> >               >>>>> Widest out-of-the-box monitoring support with 50+
>>>>> > applications
>>>>> >               >>>>> Performance metrics, stats and reports that give
>>>>> you
>>>>> > Actionable
>>>>> >               >>>>> Insights
>>>>> >               >>>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>> _______________________________________________
>>>>> >               >>>>> Kaldi-users mailing list
>>>>> >               >>>>> Kal...@li...
>>>>> >               >>>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>> -------------- next part --------------
>>>>> >               >>>> An HTML attachment was scrubbed...
>>>>> >               >>>>
>>>>> >               >>>> ------------------------------
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>>> One dashboard for servers and applications across
>>>>> > Physical-Virtual-Cloud
>>>>> >               >>>> Widest out-of-the-box monitoring support with 50+
>>>>> > applications
>>>>> >               >>>> Performance metrics, stats and reports that give
>>>>> you
>>>>> > Actionable Insights
>>>>> >               >>>> Deep dive visibility with transaction tracing
>>>>> using
>>>>> > APM Insight.
>>>>> >               >>>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>>>
>>>>> >               >>>> ------------------------------
>>>>> >               >>>>
>>>>> >               >>>> _______________________________________________
>>>>> >               >>>> Kaldi-users mailing list
>>>>> >               >>>> Kal...@li...
>>>>> >               >>>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>>>
>>>>> >               >>>>
>>>>> >               >>>> End of Kaldi-users Digest, Vol 29, Issue 15
>>>>> >               >>>> *******************************************
>>>>> >               >>>
>>>>> >               >>>
>>>>> >               >>>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >>> One dashboard for servers and applications across
>>>>> > Physical-Virtual-Cloud
>>>>> >               >>> Widest out-of-the-box monitoring support with 50+
>>>>> > applications
>>>>> >               >>> Performance metrics, stats and reports that give
>>>>> you
>>>>> > Actionable Insights
>>>>> >               >>> Deep dive visibility with transaction tracing
>>>>> using APM
>>>>> > Insight.
>>>>> >               >>>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >>> _______________________________________________
>>>>> >               >>> Kaldi-users mailing list
>>>>> >               >>> Kal...@li...
>>>>> >               >>>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>
>>>>> >               >>
>>>>> >               >>
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               >> One dashboard for servers and applications across
>>>>> > Physical-Virtual-Cloud
>>>>> >               >> Widest out-of-the-box monitoring support with 50+
>>>>> > applications
>>>>> >               >> Performance metrics, stats and reports that give you
>>>>> > Actionable Insights
>>>>> >               >> Deep dive visibility with transaction tracing using
>>>>> APM
>>>>> > Insight.
>>>>> >               >>
>>>>> > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               >> _______________________________________________
>>>>> >               >> Kaldi-users mailing list
>>>>> >               >> Kal...@li...
>>>>> >               >>
>>>>> > https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >               >>
>>>>> >
>>>>> >
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >               One dashboard for servers and applications across
>>>>> Physical-
>>>>> > Virtual-Cloud
>>>>> >               Widest out-of-the-box monitoring support with 50+
>>>>> > applications
>>>>> >               Performance metrics, stats and reports that give you
>>>>> > Actionable Insights
>>>>> >               Deep dive visibility with transaction tracing using APM
>>>>> > Insight.
>>>>> >
>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >               _______________________________________________
>>>>> >               Kaldi-users mailing list
>>>>> >               Kal...@li...
>>>>> >
>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> -----------------------------------------------------------------------
>>>>> > -
>>>>> > ------
>>>>> >       One dashboard for servers and applications across Physical-
>>>>> > Virtual-Cloud
>>>>> >       Widest out-of-the-box monitoring support with 50+ applications
>>>>> >       Performance metrics, stats and reports that give you Actionable
>>>>> > Insights
>>>>> >       Deep dive visibility with transaction tracing using APM
>>>>> Insight.
>>>>> >       http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> >       _______________________________________________
>>>>> >       Kaldi-users mailing list
>>>>> >       Kal...@li...
>>>>> >       https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> One dashboard for servers and applications across
>>>>> Physical-Virtual-Cloud
>>>>> Widest out-of-the-box monitoring support with 50+ applications
>>>>> Performance metrics, stats and reports that give you Actionable
>>>>> Insights
>>>>> Deep dive visibility with transaction tracing using APM Insight.
>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
>>>>> _______________________________________________
>>>>> Kaldi-users mailing list
>>>>> Kal...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
>>>> ond...@gm...
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Kaldi-users mailing list
>>>> Kal...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>>>>
>>>>
>>>
>>
>>
>> --
>> Ondřej Plátek, +420 737 758 650, skype:ondrejplatek,
>> ond...@gm...
>>
>
>

-- 
Ondřej Plátek, +420 737 758 650, skype:ondrejplatek, ond...@gm...