Re: [Kaldi-users] LM weight

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

There is no theory, it's just experience.  But sometimes if there are
too many insertions it can be because of OOV words in the vocabulary,
or problems with normalization or training-data alignment, or other
problems with specific words.  So look carefully at the output.
Dan

On Tue, Jun 23, 2015 at 12:09 PM, Kirill Katsnelson
<kir...@sm...> wrote:
> Aha, thanks, I see a pattern! I've got roughly same number of insertions and deletions at the original weight 0.5. I'll split off a dev set and try to tune the penalty.
>
> Is there any theory behind this optimal ins/del ratio, or is it just a trick of the art?
>
>  -kkm
>
>> -----Original Message-----
>> From: Daniel Povey [mailto:dp...@gm...]
>> Sent: 2015-06-22 2357
>> To: Kirill Katsnelson
>> Cc: Nagendra Goel; kal...@li...
>> Subject: Re: [Kaldi-users] LM weight
>>
>> It could still be about insertion errors.  Typically you want insertion
>> rates about 1/3 to 1/2 as big as deletion rates.  If your setup is
>> getting too many insertions, it could be using the LM scale to
>> compensate.  Playing with an insertion penalty may help (see the more
>> recent scoring scripts).
>> Dan
>>
>>
>> On Tue, Jun 23, 2015 at 1:04 AM, Kirill Katsnelson
>> <kir...@sm...> wrote:
>> > Yes, I am using the pretty standard nnet2_online model with the
>> librispeech data, with a 8 kHz conversion and a squished frequency
>> range of the high-res features, as I am finding there is a lot of
>> rather useless variance in the very low range, given the data are
>> coming mostly from cell phones. But nothing fancy there overall.
>> >
>> >  -kkm
>> >
>> >> -----Original Message-----
>> >> From: Daniel Povey [mailto:dp...@gm...]
>> >> Sent: 2015-06-22 2131
>> >> To: Kirill Katsnelson
>> >> Cc: Nagendra Goel; kal...@li...
>> >> Subject: Re: [Kaldi-users] LM weight
>> >>
>> >> By a lot of context I mean left-context and right-context, in the
>> >> splicing.  But I guess you are using one of the standard types of
>> >> model.
>> >> Dan
>> >>
>> >>
>> >> On Tue, Jun 23, 2015 at 12:24 AM, Kirill Katsnelson
>> >> <kir...@sm...> wrote:
>> >> > The majority of the WER comes from subs, so this part looks pretty
>> >> normal.
>> >> >
>> >> > A lot of acoustic context--probably, depending on the definition
>> of
>> >> "a lot." :-) Not sure I understand this part. How can I tell? It
>> >> makes sense, looking at the base dev set figures that I got training
>> >> the model from the first 500 hr of the librispeech corpus (best
>> range
>> >> of 16-17). Which are still higher than the reference in the RESULTS
>> >> for the full 1Khr corpus, which is rather in the 12-15 range.
>> >> >
>> >> >  -kkm
>> >> >
>> >> >> -----Original Message-----
>> >> >> From: Daniel Povey [mailto:dp...@gm...]
>> >> >> Sent: 2015-06-22 2059
>> >> >> To: Kirill Katsnelson
>> >> >> Cc: Nagendra Goel; kal...@li...
>> >> >> Subject: Re: [Kaldi-users] LM weight
>> >> >>
>> >> >> Usually if there is a lot of acoustic context in your model you
>> >> >> will require a larger LM weight.
>> >> >> Also, if for some reason there tend to be a lot of insertions in
>> >> >> decoding (e.g. something weird went wrong in training, or there
>> is
>> >> >> some kind of normalization problem), a large LM weight can help
>> >> >> reduce insertions and so improve the WER.
>> >> >>
>> >> >> Dan
>> >> >>
>> >> >>
>> >> >> On Mon, Jun 22, 2015 at 11:36 PM, Kirill Katsnelson
>> >> >> <kir...@sm...> wrote:
>> >> >> > I am getting the same ratio on both small and more targeted,
>> and
>> >> >> > a
>> >> >> quite large general LM. I do not understand what to make out if
>> it!
>> >> >> >
>> >> >> >  -kkm
>> >> >> >
>> >> >> >> -----Original Message-----
>> >> >> >> From: Nagendra Goel [mailto:nag...@go...]
>> >> >> >> Sent: 2015-06-22 2032
>> >> >> >> To: Kirill Katsnelson; kal...@li...
>> >> >> >> Subject: RE: [Kaldi-users] LM weight
>> >> >> >>
>> >> >> >> Or maybe your domain is limited and LM very nicely matched to
>> >> >> >> the task at hand?
>> >> >> >>
>> >> >> >> -----Original Message-----
>> >> >> >> From: Kirill Katsnelson
>> >> [mailto:kir...@sm...]
>> >> >> >> Sent: Monday, June 22, 2015 11:29 PM
>> >> >> >> To: kal...@li...
>> >> >> >> Subject: [Kaldi-users] LM weight
>> >> >> >>
>> >> >> >> I my test sets I am getting the best WER at LM/acoustic weight
>> >> >> >> in
>> >> >> the
>> >> >> >> range of 18-19, with multiple LMs of different size and
>> origin.
>> >> >> >> I
>> >> >> was
>> >> >> >> usually thinking the usual ballpark figure about 10, give or
>> >> take.
>> >> >> >> From your experience, does this larger LM weight mean
>> anything,
>> >> >> >> and what if it does? I am guessing an inadequate acoustic
>> >> >> >> model, requiring more LM "pull"--am I making sense?
>> >> >> >>
>> >> >> >>  -kkm
>> >> >> >>
>> >> >> >> --------------------------------------------------------------
>> -
>> >> >> >> --
>> >> -
>> >> >> >> --
>> >> >> -
>> >> >> >> --
>> >> >> >> -----
>> >> >> >> --
>> >> >> >> Monitor 25 network devices or servers for free with OpManager!
>> >> >> >> OpManager is web-based network management software that
>> >> >> >> monitors network devices and physical & virtual servers,
>> alerts
>> >> >> >> via email
>> >> &
>> >> >> >> sms for fault.
>> >> >> >> Monitor 25 devices for free with no restriction. Download now
>> >> >> >> http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
>> >> >> >> _______________________________________________
>> >> >> >> Kaldi-users mailing list
>> >> >> >> Kal...@li...
>> >> >> >> https://lists.sourceforge.net/lists/listinfo/kaldi-users
>> >> >> >
>> >> >> > ---------------------------------------------------------------
>> -
>> >> >> > --
>> >> -
>> >> >> > --
>> >> >> -
>> >> >> > -------- Monitor 25 network devices or servers for free with
>> >> >> > OpManager!
>> >> >> > OpManager is web-based network management software that
>> monitors
>> >> >> > network devices and physical & virtual servers, alerts via
>> email
>> >> >> > &
>> >> >> sms
>> >> >> > for fault. Monitor 25 devices for free with no restriction.
>> >> >> > Download now
>> >> >> > http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
>> >> >> > _______________________________________________
>> >> >> > Kaldi-users mailing list
>> >> >> > Kal...@li...
>> >> >> > https://lists.sourceforge.net/lists/listinfo/kaldi-users