Menu

GIS algorithm Implementation

sivant
2006-08-22
2013-04-16
  • sivant

    sivant - 2006-08-22

    Hi
    It seems like there is some inconsistency between the GIS implementation in opennlp.maxent.GISTrainer class and the known definition of the algorithm.

    In the definition when updating the parameters, one should raise the expectation ratio to the power by 1/C, or in log form: log_alpha += (1/c) * (log_sample_E - log_model_E).

    In the implementation I only see multiplication by (1/C) in the eval function, where the model distribution is calculated.
    It seems that the implementation calculates
    log_alpha += log_sample_E - (1/c) * log_model_E.

    Can someone explain the difference?

    Thanks.
    Sivan 

     
    • Thomas Morton

      Thomas Morton - 2006-08-28

      Hi,
      So after a careful look at the code and the formula I can explain why the code does what it does.

      log_alpha(n) = sum 1/c [log_sample_E - log_model_E(n)]
      log_alpha(n) = 1/c sum [log_sample_E - log_model_E(n)]
      C*alpha_alpha(n) = sum [log_sample_E - log_model_E(n)]

      So the code computes c * log_alpha(n). Then in the eval method alpha is divided by C as p is computed as:
      p(a|b) = sum e ^ log_alphaj(n) * f(b) * 1/C

      This helps keep the more precision in the computation of log_alpha by only performing the division by C once. 

      Hope this helps...Tom

       
    • yustian01

      yustian01 - 2007-03-08

      I am not conversant to GIS,  so i hope u donnot mind about basic question for u^^
      I cannot understand the code
      outsums[oid] += constantInverse * activeParameters[j];
      outsums[oid] += ((1.0 - ((double) numfeats[oid] / constant)) * correctionParam)
      and i still do not catch the author above..
      p(a|b) = sum e ^ log_alphaj(n) * f(b) * 1/C  ???
      I cannot match this formula with original formula p(n)(a|b)=1/Z(b)П(log_alphaj(n)) * f(b)  in the paper.

      Can anyone explain them for me?

      thanks,

       
      • Thomas Morton

        Thomas Morton - 2007-03-14

        Hi,
           Sorry for the belated reply; I was on vacation.  So the formula you list is incorrect.  I think you mean:

        p(n)(a|b)=1/Z(b)П(alphaj(n)) ^ f(b)

        Note its alpha and not log_alpha and the f(b) is an exponent.

        The formula in the previous post is also incorrect; it has exp and sum switched; it should be:

        p(a|b) = e ^ (sum(log_alphaj(n) * f(b) * 1/C))

        It got that way via this (ignoring the 1/Z part for a moment)

        П(alphaj(n)) ^ f(b) = e ^ (log (П(alphaj(n)) ^ f(b)) = e ^ (sum (log(alphaj(n) ^ f(b))) =
        e^sum (log_alphaj(n) * f(b))

        So basicialy you sum all the active parameters (f(b) is 0 for inactive) and since we computed C*log_alpha in training (see previous post) we multiple by 1/C.

        The code involving C and the correction parameter is only used when that option is on and is based on a requirement of the original proof which said that you needed to have the same number of active parameters for each event.  This is no longer used by default so perhaps the code should be updated to reflect that.

        Hope this helps...Tom

         
    • yustian01

      yustian01 - 2007-03-15

      thanks for your reply..
      That makes sense...^^

      And can you answer the title(please help, about GIS algorithm )that I've mentioned in Help..??
      Thanks,
      yustian..

       

Log in to post a comment.