From: Nagendra G. <nag...@go...> - 2015-05-24 22:42:28
|
A systematic way for identifying special elements in text will be very useful. Currently NSW-EXPAND from festival conflicts with this sub-grammar approach although otherwise it's a good lm pre-processing step. Nagendra Kumar Goel On May 24, 2015 4:45 PM, "Matthew Aylett" <mat...@gm...> wrote: > Not sure if this is relevant to this thread. But in the speech synthesis > system branch we have a very early text normaliser which (when complete) > will detect things like phone numbers addresses, currencies etc. The output > form this could then be used to inform language model building. Currently > it deals with symbols and tokenisations in English. > > Potentially `(although I wasn't currently planning on this), the text > normaliser could be written in thrax - based on openfst - authored by > Richard Sproat I believe). However if this approach would benefit ASR as > well then it might be worth doing it this way rather than my plan of a > simple greedy normaliser. > > v best > > Matthew Aylett > > > On Sun, May 24, 2015 at 8:34 AM, Dimitris Vassos <dva...@gm...> > wrote: > >> We have access to several corpora and we are trying to put together >> something appropriate. >> >> In the next couple of days, we will also volunteer a server to set it all >> up and run the tests. >> >> Dimitris >> >> > On 24 Μαΐ 2015, at 02:06, Daniel Povey <dp...@gm...> wrote: >> > >> > One possibility is to use a completely open-source setup, e.g. >> > Voxforge, and forget about the "has a clear advantage" requirement. >> > E.g. target anything that looks like a year, and make a grammar for >> > years. >> > Dan >> > >> > >> > On Fri, May 22, 2015 at 6:32 AM, Nagendra Goel >> > <nag...@go...> wrote: >> >> Since I cannot volunteer my enviornment, do you recommend another >> >> enviornment where this can be prototyped and where you can check in >> some >> >> class lm recipe that has advantage. >> >> >> >> Nagendra >> >> >> >> Nagendra Kumar Goel >> >> >> >>> On May 21, 2015 11:01 PM, "Dimitris Vassos" <dva...@gm...> >> wrote: >> >>> >> >>> +1 for the class-based LMs. I have also been interested in this >> >>> functionality for some time now, so will be more than happy to try >> out the >> >>> current implementation, if possible. >> >>> >> >>> Thanks >> >>> Dimitris >> >>> >> >>>> On 22 Μαΐ 2015, at 01:34, kal...@li... >> >>>> wrote: >> >>>> >> >>>> Send Kaldi-users mailing list submissions to >> >>>> kal...@li... >> >>>> >> >>>> To subscribe or unsubscribe via the World Wide Web, visit >> >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> or, via email, send a message with subject or body 'help' to >> >>>> kal...@li... >> >>>> >> >>>> You can reach the person managing the list at >> >>>> kal...@li... >> >>>> >> >>>> When replying, please edit your Subject line so it is more specific >> >>>> than "Re: Contents of Kaldi-users digest..." >> >>>> >> >>>> >> >>>> Today's Topics: >> >>>> >> >>>> 1. Re: LM grafting (Daniel Povey) >> >>>> 2. Re: LM grafting (Kirill Katsnelson) >> >>>> 3. Re: LM grafting (Hainan Xu) >> >>>> 4. Re: LM grafting (Sean True) >> >>>> >> >>>> >> >>>> >> ---------------------------------------------------------------------- >> >>>> >> >>>> Message: 1 >> >>>> Date: Thu, 21 May 2015 15:04:04 -0400 >> >>>> From: Daniel Povey <dp...@gm...> >> >>>> Subject: Re: [Kaldi-users] LM grafting >> >>>> To: Sean True <se...@se...> >> >>>> Cc: Hainan Xu <hai...@gm...>, >> >>>> "kal...@li..." >> >>>> <kal...@li...>, Kirill Katsnelson >> >>>> <kir...@sm...> >> >>>> Message-ID: >> >>>> <CAEWAuySHaXwdNJZAoL6CanzHth= >> k4Y...@ma...> >> >>>> Content-Type: text/plain; charset=UTF-8 >> >>>> >> >>>> The general approach is to create an FST for the little language >> >>>> model, and then to use fstreplace to replace instances of a >> particular >> >>>> symbol in the top-level language model, with that FST. >> >>>> The tricky part is ensuring that the result is determinizable after >> >>>> composing with the lexicon. In general our solution is to add >> special >> >>>> disambiguation symbols at the beginning and end of each of the >> >>>> sub-FSTs, and of course making sure that the sub-FSTs are themselves >> >>>> determinizable. >> >>>> Dan >> >>>> >> >>>> >> >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True < >> se...@se...> >> >>>>> wrote: >> >>>>> That's a subject of some general interest. Is there a discussion of >> the >> >>>>> general approach that was taken somewhere? >> >>>>> >> >>>>> -- Sean >> >>>>> >> >>>>> Sean True >> >>>>> Semantic Machines >> >>>>> >> >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> >> >>>>>> wrote: >> >>>>>> >> >>>>>> Nagendra Goel has worked on some example scripts for this type of >> >>>>>> thing, and with Hainan we were working on trying to get it cleaned >> up >> >>>>>> and checked in, but he's going for an internship so it will have to >> >>>>>> wait. But Nagendra might be willing to share it with you. >> >>>>>> Dan >> >>>>>> >> >>>>>> >> >>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson >> >>>>>> <kir...@sm...> wrote: >> >>>>>>> Suppose I have a language model where one token (a "word") is a >> >>>>>>> pointer >> >>>>>>> to a whole another LM. This is a practical case when you expect an >> >>>>>>> abrupt >> >>>>>>> change in model, a clear example being "my phone number is..." and >> >>>>>>> then >> >>>>>>> you'd expect them rattling a string of digits. Is there any >> support >> >>>>>>> in kaldi >> >>>>>>> for this? >> >>>>>>> >> >>>>>>> Thanks, >> >>>>>>> >> >>>>>>> -kkm >> >>>>>>> >> >>>>>>> >> >>>>>>> >> ------------------------------------------------------------------------------ >> >>>>>>> One dashboard for servers and applications across >> >>>>>>> Physical-Virtual-Cloud >> >>>>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>>>> Performance metrics, stats and reports that give you Actionable >> >>>>>>> Insights >> >>>>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>>> _______________________________________________ >> >>>>>>> Kaldi-users mailing list >> >>>>>>> Kal...@li... >> >>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> ------------------------------------------------------------------------------ >> >>>>>> One dashboard for servers and applications across >> >>>>>> Physical-Virtual-Cloud >> >>>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>>> Performance metrics, stats and reports that give you Actionable >> >>>>>> Insights >> >>>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>> _______________________________________________ >> >>>>>> Kaldi-users mailing list >> >>>>>> Kal...@li... >> >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> >> >>>> >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> Message: 2 >> >>>> Date: Thu, 21 May 2015 19:24:38 +0000 >> >>>> From: Kirill Katsnelson <kir...@sm...> >> >>>> Subject: Re: [Kaldi-users] LM grafting >> >>>> To: "dp...@gm..." <dp...@gm...>, Sean True >> >>>> <se...@se...> >> >>>> Cc: Hainan Xu <hai...@gm...>, >> >>>> "kal...@li..." >> >>>> <kal...@li...> >> >>>> Message-ID: >> >>>> >> >>>> < >> CY1...@CY... >> > >> >>>> >> >>>> Content-Type: text/plain; charset="utf-8" >> >>>> >> >>>> Also, from the practical standpoint, backoff/discounting weights >> usually >> >>>> need to be massaged. Otherwise when the grafted LM is small and the >> main LM >> >>>> is large, the little model will tend to shoehorn an utterance into >> itself >> >>>> rather than let go of it. In my phone number example, everything >> becomes >> >>>> digits once the phone number starts. >> >>>> >> >>>> -kkm >> >>>> >> >>>>> -----Original Message----- >> >>>>> From: Daniel Povey [mailto:dp...@gm...] >> >>>>> Sent: 2015-05-21 1204 >> >>>>> To: Sean True >> >>>>> Cc: Kirill Katsnelson; Nagendra Goel; Hainan Xu; kaldi- >> >>>>> us...@li... >> >>>>> Subject: Re: [Kaldi-users] LM grafting >> >>>>> >> >>>>> The general approach is to create an FST for the little language >> model, >> >>>>> and then to use fstreplace to replace instances of a particular >> symbol >> >>>>> in the top-level language model, with that FST. >> >>>>> The tricky part is ensuring that the result is determinizable after >> >>>>> composing with the lexicon. In general our solution is to add >> special >> >>>>> disambiguation symbols at the beginning and end of each of the sub- >> >>>>> FSTs, and of course making sure that the sub-FSTs are themselves >> >>>>> determinizable. >> >>>>> Dan >> >>>>> >> >>>>> >> >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True < >> se...@se...> >> >>>>> wrote: >> >>>>>> That's a subject of some general interest. Is there a discussion of >> >>>>>> the general approach that was taken somewhere? >> >>>>>> >> >>>>>> -- Sean >> >>>>>> >> >>>>>> Sean True >> >>>>>> Semantic Machines >> >>>>>> >> >>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> >> >>>>> wrote: >> >>>>>>> >> >>>>>>> Nagendra Goel has worked on some example scripts for this type of >> >>>>>>> thing, and with Hainan we were working on trying to get it cleaned >> >>>>> up >> >>>>>>> and checked in, but he's going for an internship so it will have >> to >> >>>>>>> wait. But Nagendra might be willing to share it with you. >> >>>>>>> Dan >> >>>>>>> >> >>>>>>> >> >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson >> >>>>>>> <kir...@sm...> wrote: >> >>>>>>>> Suppose I have a language model where one token (a "word") is a >> >>>>>>>> pointer to a whole another LM. This is a practical case when you >> >>>>>>>> expect an abrupt change in model, a clear example being "my phone >> >>>>>>>> number is..." and then you'd expect them rattling a string of >> >>>>>>>> digits. Is there any support in kaldi for this? >> >>>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> >> >>>>>>>> -kkm >> >>>>>>>> >> >>>>>>>> >> ------------------------------------------------------------------ >> >>>>> - >> >>>>>>>> ----------- One dashboard for servers and applications across >> >>>>>>>> Physical-Virtual-Cloud Widest out-of-the-box monitoring support >> >>>>>>>> with 50+ applications Performance metrics, stats and reports that >> >>>>>>>> give you Actionable Insights Deep dive visibility with >> transaction >> >>>>>>>> tracing using APM Insight. >> >>>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>>>> _______________________________________________ >> >>>>>>>> Kaldi-users mailing list >> >>>>>>>> Kal...@li... >> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>>>>> >> >>>>>>> >> >>>>>>> >> -------------------------------------------------------------------- >> >>>>> - >> >>>>>>> --------- One dashboard for servers and applications across >> >>>>>>> Physical-Virtual-Cloud Widest out-of-the-box monitoring support >> with >> >>>>>>> 50+ applications Performance metrics, stats and reports that give >> >>>>> you >> >>>>>>> Actionable Insights Deep dive visibility with transaction tracing >> >>>>>>> using APM Insight. >> >>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>>> _______________________________________________ >> >>>>>>> Kaldi-users mailing list >> >>>>>>> Kal...@li... >> >>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> Message: 3 >> >>>> Date: Thu, 21 May 2015 15:29:54 -0400 >> >>>> From: Hainan Xu <hai...@gm...> >> >>>> Subject: Re: [Kaldi-users] LM grafting >> >>>> To: Daniel Povey <dp...@gm...> >> >>>> Cc: Sean True <se...@se...>, >> >>>> "kal...@li..." >> >>>> <kal...@li...>, Kirill Katsnelson >> >>>> <kir...@sm...> >> >>>> Message-ID: >> >>>> <CALP+BDZvJP-2cZ+fEJEXaMaVWzgy63mtc= >> J1E...@ma...> >> >>>> Content-Type: text/plain; charset="utf-8" >> >>>> >> >>>> There is a paper in ICASSP 2015 that described some very similar >> idea: >> >>>> >> >>>> Improved recognition of contact names in voice commands >> >>>> >> >>>>> On Thu, May 21, 2015 at 3:04 PM, Daniel Povey <dp...@gm...> >> wrote: >> >>>>> >> >>>>> The general approach is to create an FST for the little language >> >>>>> model, and then to use fstreplace to replace instances of a >> particular >> >>>>> symbol in the top-level language model, with that FST. >> >>>>> The tricky part is ensuring that the result is determinizable after >> >>>>> composing with the lexicon. In general our solution is to add >> special >> >>>>> disambiguation symbols at the beginning and end of each of the >> >>>>> sub-FSTs, and of course making sure that the sub-FSTs are themselves >> >>>>> determinizable. >> >>>>> Dan >> >>>>> >> >>>>> >> >>>>> On Thu, May 21, 2015 at 3:01 PM, Sean True < >> se...@se...> >> >>>>> wrote: >> >>>>>> That's a subject of some general interest. Is there a discussion of >> >>>>>> the >> >>>>>> general approach that was taken somewhere? >> >>>>>> >> >>>>>> -- Sean >> >>>>>> >> >>>>>> Sean True >> >>>>>> Semantic Machines >> >>>>>> >> >>>>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>> Nagendra Goel has worked on some example scripts for this type of >> >>>>>>> thing, and with Hainan we were working on trying to get it >> cleaned up >> >>>>>>> and checked in, but he's going for an internship so it will have >> to >> >>>>>>> wait. But Nagendra might be willing to share it with you. >> >>>>>>> Dan >> >>>>>>> >> >>>>>>> >> >>>>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson >> >>>>>>> <kir...@sm...> wrote: >> >>>>>>>> Suppose I have a language model where one token (a "word") is a >> >>>>> pointer >> >>>>>>>> to a whole another LM. This is a practical case when you expect >> an >> >>>>> abrupt >> >>>>>>>> change in model, a clear example being "my phone number is..." >> and >> >>>>> then >> >>>>>>>> you'd expect them rattling a string of digits. Is there any >> support >> >>>>> in kaldi >> >>>>>>>> for this? >> >>>>>>>> >> >>>>>>>> Thanks, >> >>>>>>>> >> >>>>>>>> -kkm >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>>>>> One dashboard for servers and applications across >> >>>>> Physical-Virtual-Cloud >> >>>>>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>>>>> Performance metrics, stats and reports that give you Actionable >> >>>>> Insights >> >>>>>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>>>> _______________________________________________ >> >>>>>>>> Kaldi-users mailing list >> >>>>>>>> Kal...@li... >> >>>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>>>> One dashboard for servers and applications across >> >>>>>>> Physical-Virtual-Cloud >> >>>>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>>>> Performance metrics, stats and reports that give you Actionable >> >>>>>>> Insights >> >>>>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>>> _______________________________________________ >> >>>>>>> Kaldi-users mailing list >> >>>>>>> Kal...@li... >> >>>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> - Hainan >> >>>> -------------- next part -------------- >> >>>> An HTML attachment was scrubbed... >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> Message: 4 >> >>>> Date: Thu, 21 May 2015 15:01:51 -0400 >> >>>> From: Sean True <se...@se...> >> >>>> Subject: Re: [Kaldi-users] LM grafting >> >>>> To: Daniel Povey <dp...@gm...> >> >>>> Cc: Hainan Xu <hai...@gm...>, >> >>>> "kal...@li..." >> >>>> <kal...@li...>, Kirill Katsnelson >> >>>> <kir...@sm...> >> >>>> Message-ID: >> >>>> <CALtEaHntdAcmO_Ji5dxsPnT8i9M_LVuGnY0UjkJUPp= >> pY...@ma...> >> >>>> Content-Type: text/plain; charset="utf-8" >> >>>> >> >>>> That's a subject of some general interest. Is there a discussion of >> the >> >>>> general approach that was taken somewhere? >> >>>> >> >>>> -- Sean >> >>>> >> >>>> Sean True >> >>>> Semantic Machines >> >>>> >> >>>>> On Thu, May 21, 2015 at 2:14 PM, Daniel Povey <dp...@gm...> >> wrote: >> >>>>> >> >>>>> Nagendra Goel has worked on some example scripts for this type of >> >>>>> thing, and with Hainan we were working on trying to get it cleaned >> up >> >>>>> and checked in, but he's going for an internship so it will have to >> >>>>> wait. But Nagendra might be willing to share it with you. >> >>>>> Dan >> >>>>> >> >>>>> >> >>>>> On Thu, May 21, 2015 at 2:10 PM, Kirill Katsnelson >> >>>>> <kir...@sm...> wrote: >> >>>>>> Suppose I have a language model where one token (a "word") is a >> >>>>>> pointer >> >>>>> to a whole another LM. This is a practical case when you expect an >> >>>>> abrupt >> >>>>> change in model, a clear example being "my phone number is..." and >> then >> >>>>> you'd expect them rattling a string of digits. Is there any support >> in >> >>>>> kaldi for this? >> >>>>>> >> >>>>>> Thanks, >> >>>>>> >> >>>>>> -kkm >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>>> One dashboard for servers and applications across >> >>>>>> Physical-Virtual-Cloud >> >>>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>>> Performance metrics, stats and reports that give you Actionable >> >>>>>> Insights >> >>>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>>> _______________________________________________ >> >>>>>> Kaldi-users mailing list >> >>>>>> Kal...@li... >> >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> ------------------------------------------------------------------------------ >> >>>>> One dashboard for servers and applications across >> >>>>> Physical-Virtual-Cloud >> >>>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>>> Performance metrics, stats and reports that give you Actionable >> >>>>> Insights >> >>>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>>> _______________________________________________ >> >>>>> Kaldi-users mailing list >> >>>>> Kal...@li... >> >>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> -------------- next part -------------- >> >>>> An HTML attachment was scrubbed... >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> >> >>>> >> ------------------------------------------------------------------------------ >> >>>> One dashboard for servers and applications across >> Physical-Virtual-Cloud >> >>>> Widest out-of-the-box monitoring support with 50+ applications >> >>>> Performance metrics, stats and reports that give you Actionable >> Insights >> >>>> Deep dive visibility with transaction tracing using APM Insight. >> >>>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>>> >> >>>> ------------------------------ >> >>>> >> >>>> _______________________________________________ >> >>>> Kaldi-users mailing list >> >>>> Kal...@li... >> >>>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >>>> >> >>>> >> >>>> End of Kaldi-users Digest, Vol 29, Issue 15 >> >>>> ******************************************* >> >>> >> >>> >> >>> >> ------------------------------------------------------------------------------ >> >>> One dashboard for servers and applications across >> Physical-Virtual-Cloud >> >>> Widest out-of-the-box monitoring support with 50+ applications >> >>> Performance metrics, stats and reports that give you Actionable >> Insights >> >>> Deep dive visibility with transaction tracing using APM Insight. >> >>> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >>> _______________________________________________ >> >>> Kaldi-users mailing list >> >>> Kal...@li... >> >>> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> One dashboard for servers and applications across >> Physical-Virtual-Cloud >> >> Widest out-of-the-box monitoring support with 50+ applications >> >> Performance metrics, stats and reports that give you Actionable >> Insights >> >> Deep dive visibility with transaction tracing using APM Insight. >> >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> >> _______________________________________________ >> >> Kaldi-users mailing list >> >> Kal...@li... >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> >> >> >> >> ------------------------------------------------------------------------------ >> One dashboard for servers and applications across Physical-Virtual-Cloud >> Widest out-of-the-box monitoring support with 50+ applications >> Performance metrics, stats and reports that give you Actionable Insights >> Deep dive visibility with transaction tracing using APM Insight. >> http://ad.doubleclick.net/ddm/clk/290420510;117567292;y >> _______________________________________________ >> Kaldi-users mailing list >> Kal...@li... >> https://lists.sourceforge.net/lists/listinfo/kaldi-users >> > > > > ------------------------------------------------------------------------------ > One dashboard for servers and applications across Physical-Virtual-Cloud > Widest out-of-the-box monitoring support with 50+ applications > Performance metrics, stats and reports that give you Actionable Insights > Deep dive visibility with transaction tracing using APM Insight. > http://ad.doubleclick.net/ddm/clk/290420510;117567292;y > _______________________________________________ > Kaldi-users mailing list > Kal...@li... > https://lists.sourceforge.net/lists/listinfo/kaldi-users > > |