The OpenNLP Maximum Entropy Package / Discussion / Help: A Gaussian Prior for Smoothing MaxEnt

yifan peng - 2009-03-13

Hi everyone,

I would like to know that if the MaxEnt is possible to "applying a Gaussian prior on the parameters to smooth maximum entropy models."

I found a Prior interface in MaxEnt and I thought it represents p_0 in the following equation:

p(x) = 1/Z p_0 exp [ sum_i param_i * feat_i ]

which adds a prior on the outcomes based on some contexts.

So is there any other interfaces that can applying a prior on param?

Thanks and regards,

Yifan

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Thomas Morton - 2009-03-13
  
  Hi,
  There is an option (useGaussianSmoothing) in the trainer (opennlp.maxent.GISTrainer) to use a Gaussian prior when updating the parameters. You'd have to change the code to turn it on. I've done some experiments with pos tagging but it didn't improve performance so I didn't bother making it a formal option:
  
  The sigma variable is set based on the results in:
  Investigating GIS and Smoothing for Maximum Entropy Taggers, Clark and Curran (2002).
  http://acl.ldc.upenn.edu/E/E03/E03-1071.pdf
  
  FYI: There is a prior class but that is to apply a prior on the distribution on the outcomes to which the model minimizes the KL-distance to and is on the parameters themselves.
  
  Hope this helps...Tom
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - yifan peng - 2009-03-13
    
    Hi Tom,
    
    Thanks very much, it does help.
    
    I am doing experiments with POStag too. I compared the performance of the MaxEnt with another one contributed by Zhang Le (http://homepages.inf.ed.ac.uk/s0450736/maxent.html). The training set is just the same, but the latter one gets a higher acurrency.
    
    I found that Zhang Le uses the Gaussian Prior for smoothing. Therefore, I would like to try if that is the reason.
    
    I will keep you noted after getting the results.
    
    Regards,
    Yifan
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - Thomas Morton - 2009-03-13
      
      Hi,
      I'd be curious what data set you used and what the differences in results were. The experiments I did we on a larger set of data than Penn Treebank 02-21 but I may have messed something up.
      
      Thanks...Tom
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
      - yifan peng - 2009-03-14
        
        Hi,
        
        I am working on the Chinese POStag task, and the training set is the China Daily 2000, which has about 15M words, 60M in size. The number of POS tags is 39 in total. The MaxEnt works well on it, but not any more. It may depends on the performance of my server. Therefor, I tried to use the techonology of increamental learning. I think method "setParameter" in GISTrain is a good start, but I am not sure if the results would be better.
        
        Any suggestion? if you would like to know more details, maybe we could chat through the email or IM such as MSN.
        
        Email: pengyf@cis.pku.edu.cn
        MSN: pengyifan0803@hotmail.com
        
        Regards,
        
        Yifan
        
        If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- yifan peng - 2009-03-14
  
  Hi,
  
  I am working on the Chinese POStag task, and the training set is the China Daily 2000, which has about 15M words, 60M in size. The number of POS tags is 39 in total. The MaxEnt works well on it, but not any more. It may depends on the performance of my server. Therefor, I tried to use the techonology of increamental learning. I think method "setParameter" in GISTrain is a good start, but I am not sure if the results would be better.
  
  Any suggestion? if you would like to know more details, maybe we could chat through the email or IM such as MSN.
  
  Email: pengyf@cis.pku.edu.cn
  MSN: pengyifan0803@hotmail.com
  
  Regards,
  
  Yifan
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - Thomas Morton - 2009-03-15
    
    Hi,
    
    > China Daily 2000, which has about 15M words ...
    > The MaxEnt works well on it, but not any more. It may depends on the performance of my server.
    
    Yeah you would need a lot of memory to get maxent to load something of that size. My pos corpus is only 1.5M words.
    
    I've been working on a perceptron classifier for the next version of the maxent package which could be run in an online fashion. I have been doing some experiments this evening and have integrated it into the pos-tagger with pretty good results but didn't set it up to work in an online fashion (it loads all the events into memory the way maxent does) but it would probably make sense to do that.
    
    Thanks...Tom
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
    - yifan peng - 2009-03-15
      
      Hi
      
      I am really fascinated by the online fashion. Off-line version is boring while applying to larger corpus.
      
      I am trying to load an initial set of parameters from my previous model instead of 0. It will converge but I am not sure it a kind of "off-line".
      
      Yifan
      
      If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

A Gaussian Prior for Smoothing MaxEnt

Forums

Help

A Gaussian Prior for Smoothing MaxEnt document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

A Gaussian Prior for Smoothing MaxEnt