From: Xavier A. <xan...@gm...> - 2013-04-08 19:31:06
|
I know, the problem happened when not applying this standard method to adapt the code to perform MAP adaptation. In my opinion the posteriors should always be >0, even if their value is equal to the minimum possible float value available (like in my code). I do agree though that in standard ASR model training this is not relevant. I am finding this because I am using Kaldi for things other than ASR. BTW, thanks for helping put together such a nice tool! X. On Mon, Apr 8, 2013 at 9:23 PM, Daniel Povey <dp...@gm...> wrote: > It's the normal practice when dealing with Gaussians to get rid of small > counts. > Dan > > > > On Mon, Apr 8, 2013 at 3:23 PM, Xavier Anguera <xan...@gm...> wrote: > >> I agree my "hack" is not the solution. >> I see that when performing an EM training there is a check for very small >> occupancy or weight, and eliminates a Gaussian if it is so. I am though not >> happy with such an approach and had commented out that line some time ago >> (I am implementing a MAP adaptation function that needs to deal with these >> cases) >> >> X. >> >> >> On Mon, Apr 8, 2013 at 9:19 PM, Daniel Povey <dp...@gm...> wrote: >> >>> Hm, thanks, but I don't think this is the right way to fix the problem. >>> Update code should always take into account the possibility that >>> occupancies will be zero. It's expected that exp() on very negative values >>> will produce zero. >>> Dan >>> >>> >>> On Mon, Apr 8, 2013 at 3:15 PM, Xavier Anguera <xan...@gm...>wrote: >>> >>>> Hi Dan, >>>> the segmentation fault comes from a division by 0 when using the >>>> occupancy of the Gaussians, that has been computed by adding together all >>>> the posterior probabilities for each Gaussian and a set of features. When >>>> all posteriors for a given Gaussian and all features is 0, there is a >>>> division by 0. >>>> I am pasting the "hack" I wrote to prevent this. I believe though that >>>> maybe the exp() function should be revisited. Tell me what you think. >>>> >>>> Real VectorBase<Real>::ApplySoftMax() { >>>> Real max = this->Max(), sum = 0.0; >>>> for (MatrixIndexT i = 0; i < dim_; i++) { >>>> data_[i] = exp(data_[i] - max); >>>> if(data_[i] < FLT_MIN ) >>>> data_[i] = FLT_MIN; //very small value >>>> sum += data_[i]; >>>> } >>>> >>>> this->Scale(1.0 / sum); >>>> return max + log(sum); >>>> } >>>> >>>> >>>> On Mon, Apr 8, 2013 at 8:58 PM, Daniel Povey <dp...@gm...> wrote: >>>> >>>>> Firstly, this should give you numerical problems but not a >>>>> segmentation fault. >>>>> You'll have to look in the code and see if it's behaving as expected. >>>>> E.g. is it due to a number so small that it cannot be represented in >>>>> floating point, or is it larger than that and unexpectedly becoming zero? >>>>> It might be an issue with your algorithm design. >>>>> Let me know if that function needs to be fixed. >>>>> >>>>> Dan >>>>> >>>>> >>>>> On Mon, Apr 8, 2013 at 2:54 PM, Xavier Anguera <xan...@gm...>wrote: >>>>> >>>>>> Hi, >>>>>> when using the function template<typename Real> Real >>>>>> VectorBase<Real>::ApplySoftMax() in kaldi-vector.cc file, I noticed that >>>>>> very small likelihoods are rounded to a posterior probability of 0.0 >>>>>> Is this an expected behavior? I am trying to perform an EM training >>>>>> of a simple GMM and I keep bumping into segmentation fault due to this. >>>>>> >>>>>> Thanks >>>>>> >>>>>> Xavi Anguera >>>>>> >>>>>> >>>>>> ------------------------------------------------------------------------------ >>>>>> Minimize network downtime and maximize team effectiveness. >>>>>> Reduce network management and security costs.Learn how to hire >>>>>> the most talented Cisco Certified professionals. Visit the >>>>>> Employer Resources Portal >>>>>> http://www.cisco.com/web/learning/employer_resources/index.html >>>>>> _______________________________________________ >>>>>> Kaldi-developers mailing list >>>>>> Kal...@li... >>>>>> https://lists.sourceforge.net/lists/listinfo/kaldi-developers >>>>>> >>>>>> >>>>> >>>> >>> >> > |