Menu

A Gaussian Prior for Smoothing MaxEnt

Help
yifan peng
2009-03-13
2013-04-11
  • yifan peng

    yifan peng - 2009-03-13

    Hi everyone,

    I would like to know that if the MaxEnt is possible to "applying a Gaussian prior on the parameters to smooth maximum entropy models."

    I found a Prior interface in MaxEnt and I thought it represents p_0 in the following equation:

    p(x) = 1/Z p_0 exp [ sum_i param_i * feat_i ]

    which adds a prior on the outcomes based on some contexts.

    So is there any other interfaces that can applying a prior on param?

    Thanks and regards,

    Yifan

     
    • Thomas Morton

      Thomas Morton - 2009-03-13

      Hi,
         There is an option (useGaussianSmoothing) in the trainer (opennlp.maxent.GISTrainer) to use a Gaussian prior when updating the parameters.  You'd have to change the code to turn it on.  I've done some experiments with pos tagging but it didn't improve performance so I didn't bother making it a formal option:

      The sigma variable is set based on the results in:
      Investigating GIS and Smoothing for Maximum Entropy Taggers, Clark and Curran (2002). 
      http://acl.ldc.upenn.edu/E/E03/E03-1071.pdf

      FYI: There is a prior class but that is to apply a prior on the distribution on the outcomes to which the model minimizes the KL-distance to and is on the parameters themselves.

      Hope this helps...Tom

       
      • yifan peng

        yifan peng - 2009-03-13

        Hi Tom,

        Thanks very much, it does help.

        I am doing experiments with POStag too. I compared the performance of the MaxEnt with another one contributed by Zhang Le (http://homepages.inf.ed.ac.uk/s0450736/maxent.html). The training set is just the same, but the latter one gets a higher acurrency.

        I found that Zhang Le uses the Gaussian Prior for smoothing. Therefore, I would like to try if that is the reason.

        I will keep you noted after getting the results.

        Regards,
        Yifan

         
        • Thomas Morton

          Thomas Morton - 2009-03-13

          Hi,
             I'd be curious what data set you used and what the differences in results were.  The experiments I did we on a larger set of data than Penn Treebank 02-21 but I may have messed something up. 

          Thanks...Tom

           
          • yifan peng

            yifan peng - 2009-03-14

            Hi,

            I am working on the Chinese POStag task, and the training set is the China Daily 2000, which has about 15M words, 60M in size. The number of POS tags is 39 in total. The MaxEnt works well on it, but not any more. It may depends on the performance of my server. Therefor, I tried to use the techonology of increamental learning. I think method "setParameter" in GISTrain is a good start, but I am not sure if the results would be better.

            Any suggestion? if you would like to know more details, maybe we could chat through the email or IM such as MSN.

            Email: pengyf@cis.pku.edu.cn
            MSN: pengyifan0803@hotmail.com

            Regards,

            Yifan

             
    • yifan peng

      yifan peng - 2009-03-14

      Hi,

      I am working on the Chinese POStag task, and the training set is the China Daily 2000, which has about 15M words, 60M in size. The number of POS tags is 39 in total. The MaxEnt works well on it, but not any more. It may depends on the performance of my server. Therefor, I tried to use the techonology of increamental learning. I think method "setParameter" in GISTrain is a good start, but I am not sure if the results would be better.

      Any suggestion? if you would like to know more details, maybe we could chat through the email or IM such as MSN.

      Email: pengyf@cis.pku.edu.cn
      MSN: pengyifan0803@hotmail.com

      Regards,

      Yifan

       
      • Thomas Morton

        Thomas Morton - 2009-03-15

        Hi,

        > China Daily 2000, which has about 15M words ...
        > The MaxEnt works well on it, but not any more. It may depends on the performance of my server.

        Yeah you would need a lot of memory to get maxent to load something of that size.  My pos corpus is only 1.5M words.

        I've been working on a perceptron classifier for the next version of the maxent package which could be run in an online fashion.  I have been doing some experiments this evening and have integrated it into the pos-tagger with pretty good results but didn't set it up to work in an online fashion (it loads all the events into memory the way maxent does) but it would probably make sense to do that.

        Thanks...Tom

         
        • yifan peng

          yifan peng - 2009-03-15

          Hi

          I am really fascinated by the online fashion. Off-line version is boring while applying to larger corpus.

          I am trying to load an initial set of parameters from my previous model instead of 0.  It will converge but  I am not sure it a kind of "off-line".

          Yifan

           

Log in to post a comment.