OpenNLP / Discussion / Open Discussion: GIS algorithm Implementation

sivant - 2006-08-22

Hi
It seems like there is some inconsistency between the GIS implementation in opennlp.maxent.GISTrainer class and the known definition of the algorithm.

In the definition when updating the parameters, one should raise the expectation ratio to the power by 1/C, or in log form: log_alpha += (1/c) * (log_sample_E - log_model_E).

In the implementation I only see multiplication by (1/C) in the eval function, where the model distribution is calculated.
It seems that the implementation calculates
log_alpha += log_sample_E - (1/c) * log_model_E.

Can someone explain the difference?

Thanks.
Sivan

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2006-08-28
  
  Hi,
  So after a careful look at the code and the formula I can explain why the code does what it does.
  
  log_alpha(n) = sum 1/c [log_sample_E - log_model_E(n)]
  log_alpha(n) = 1/c sum [log_sample_E - log_model_E(n)]
  C*alpha_alpha(n) = sum [log_sample_E - log_model_E(n)]
  
  So the code computes c * log_alpha(n). Then in the eval method alpha is divided by C as p is computed as:
  p(a|b) = sum e ^ log_alphaj(n) * f(b) * 1/C
  
  This helps keep the more precision in the computation of log_alpha by only performing the division by C once.
  
  Hope this helps...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- yustian01 - 2007-03-08
  
  I am not conversant to GIS, so i hope u donnot mind about basic question for u^^
  I cannot understand the code
  outsums[oid] += constantInverse * activeParameters[j];
  outsums[oid] += ((1.0 - ((double) numfeats[oid] / constant)) * correctionParam)
  and i still do not catch the author above..
  p(a|b) = sum e ^ log_alphaj(n) * f(b) * 1/C ???
  I cannot match this formula with original formula p(n)(a|b)=1/Z(b)П(log_alphaj(n)) * f(b) in the paper.
  
  Can anyone explain them for me?
  
  thanks,
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2007-03-14
    
    Hi,
    Sorry for the belated reply; I was on vacation. So the formula you list is incorrect. I think you mean:
    
    p(n)(a|b)=1/Z(b)П(alphaj(n)) ^ f(b)
    
    Note its alpha and not log_alpha and the f(b) is an exponent.
    
    The formula in the previous post is also incorrect; it has exp and sum switched; it should be:
    
    p(a|b) = e ^ (sum(log_alphaj(n) * f(b) * 1/C))
    
    It got that way via this (ignoring the 1/Z part for a moment)
    
    П(alphaj(n)) ^ f(b) = e ^ (log (П(alphaj(n)) ^ f(b)) = e ^ (sum (log(alphaj(n) ^ f(b))) =
    e^sum (log_alphaj(n) * f(b))
    
    So basicialy you sum all the active parameters (f(b) is 0 for inactive) and since we computed C*log_alpha in training (see previous post) we multiple by 1/C.
    
    The code involving C and the correction parameter is only used when that option is on and is based on a requirement of the original proof which said that you needed to have the same number of active parameters for each event. This is no longer used by default so perhaps the code should be updated to reflect that.
    
    Hope this helps...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- yustian01 - 2007-03-15
  
  thanks for your reply..
  That makes sense...^^
  
  And can you answer the title(please help, about GIS algorithm )that I've mentioned in Help..??
  Thanks,
  yustian..
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

GIS algorithm Implementation

Forums

Help

GIS algorithm Implementation document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

GIS algorithm Implementation