Hi,
There is an option (useGaussianSmoothing) in the trainer (opennlp.maxent.GISTrainer) to use a Gaussian prior when updating the parameters. You'd have to change the code to turn it on. I've done some experiments with pos tagging but it didn't improve performance so I didn't bother making it a formal option:
The sigma variable is set based on the results in:
Investigating GIS and Smoothing for Maximum Entropy Taggers, Clark and Curran (2002). http://acl.ldc.upenn.edu/E/E03/E03-1071.pdf
FYI: There is a prior class but that is to apply a prior on the distribution on the outcomes to which the model minimizes the KL-distance to and is on the parameters themselves.
Hope this helps...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am doing experiments with POStag too. I compared the performance of the MaxEnt with another one contributed by Zhang Le (http://homepages.inf.ed.ac.uk/s0450736/maxent.html). The training set is just the same, but the latter one gets a higher acurrency.
I found that Zhang Le uses the Gaussian Prior for smoothing. Therefore, I would like to try if that is the reason.
I will keep you noted after getting the results.
Regards,
Yifan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'd be curious what data set you used and what the differences in results were. The experiments I did we on a larger set of data than Penn Treebank 02-21 but I may have messed something up.
Thanks...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am working on the Chinese POStag task, and the training set is the China Daily 2000, which has about 15M words, 60M in size. The number of POS tags is 39 in total. The MaxEnt works well on it, but not any more. It may depends on the performance of my server. Therefor, I tried to use the techonology of increamental learning. I think method "setParameter" in GISTrain is a good start, but I am not sure if the results would be better.
Any suggestion? if you would like to know more details, maybe we could chat through the email or IM such as MSN.
I am working on the Chinese POStag task, and the training set is the China Daily 2000, which has about 15M words, 60M in size. The number of POS tags is 39 in total. The MaxEnt works well on it, but not any more. It may depends on the performance of my server. Therefor, I tried to use the techonology of increamental learning. I think method "setParameter" in GISTrain is a good start, but I am not sure if the results would be better.
Any suggestion? if you would like to know more details, maybe we could chat through the email or IM such as MSN.
> China Daily 2000, which has about 15M words ...
> The MaxEnt works well on it, but not any more. It may depends on the performance of my server.
Yeah you would need a lot of memory to get maxent to load something of that size. My pos corpus is only 1.5M words.
I've been working on a perceptron classifier for the next version of the maxent package which could be run in an online fashion. I have been doing some experiments this evening and have integrated it into the pos-tagger with pretty good results but didn't set it up to work in an online fashion (it loads all the events into memory the way maxent does) but it would probably make sense to do that.
Thanks...Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi everyone,
I would like to know that if the MaxEnt is possible to "applying a Gaussian prior on the parameters to smooth maximum entropy models."
I found a Prior interface in MaxEnt and I thought it represents p_0 in the following equation:
p(x) = 1/Z p_0 exp [ sum_i param_i * feat_i ]
which adds a prior on the outcomes based on some contexts.
So is there any other interfaces that can applying a prior on param?
Thanks and regards,
Yifan
Hi,
There is an option (useGaussianSmoothing) in the trainer (opennlp.maxent.GISTrainer) to use a Gaussian prior when updating the parameters. You'd have to change the code to turn it on. I've done some experiments with pos tagging but it didn't improve performance so I didn't bother making it a formal option:
The sigma variable is set based on the results in:
Investigating GIS and Smoothing for Maximum Entropy Taggers, Clark and Curran (2002).
http://acl.ldc.upenn.edu/E/E03/E03-1071.pdf
FYI: There is a prior class but that is to apply a prior on the distribution on the outcomes to which the model minimizes the KL-distance to and is on the parameters themselves.
Hope this helps...Tom
Hi Tom,
Thanks very much, it does help.
I am doing experiments with POStag too. I compared the performance of the MaxEnt with another one contributed by Zhang Le (http://homepages.inf.ed.ac.uk/s0450736/maxent.html). The training set is just the same, but the latter one gets a higher acurrency.
I found that Zhang Le uses the Gaussian Prior for smoothing. Therefore, I would like to try if that is the reason.
I will keep you noted after getting the results.
Regards,
Yifan
Hi,
I'd be curious what data set you used and what the differences in results were. The experiments I did we on a larger set of data than Penn Treebank 02-21 but I may have messed something up.
Thanks...Tom
Hi,
I am working on the Chinese POStag task, and the training set is the China Daily 2000, which has about 15M words, 60M in size. The number of POS tags is 39 in total. The MaxEnt works well on it, but not any more. It may depends on the performance of my server. Therefor, I tried to use the techonology of increamental learning. I think method "setParameter" in GISTrain is a good start, but I am not sure if the results would be better.
Any suggestion? if you would like to know more details, maybe we could chat through the email or IM such as MSN.
Email: pengyf@cis.pku.edu.cn
MSN: pengyifan0803@hotmail.com
Regards,
Yifan
Hi,
I am working on the Chinese POStag task, and the training set is the China Daily 2000, which has about 15M words, 60M in size. The number of POS tags is 39 in total. The MaxEnt works well on it, but not any more. It may depends on the performance of my server. Therefor, I tried to use the techonology of increamental learning. I think method "setParameter" in GISTrain is a good start, but I am not sure if the results would be better.
Any suggestion? if you would like to know more details, maybe we could chat through the email or IM such as MSN.
Email: pengyf@cis.pku.edu.cn
MSN: pengyifan0803@hotmail.com
Regards,
Yifan
Hi,
> China Daily 2000, which has about 15M words ...
> The MaxEnt works well on it, but not any more. It may depends on the performance of my server.
Yeah you would need a lot of memory to get maxent to load something of that size. My pos corpus is only 1.5M words.
I've been working on a perceptron classifier for the next version of the maxent package which could be run in an online fashion. I have been doing some experiments this evening and have integrated it into the pos-tagger with pretty good results but didn't set it up to work in an online fashion (it loads all the events into memory the way maxent does) but it would probably make sense to do that.
Thanks...Tom
Hi
I am really fascinated by the online fashion. Off-line version is boring while applying to larger corpus.
I am trying to load an initial set of parameters from my previous model instead of 0. It will converge but I am not sure it a kind of "off-line".
Yifan