Share

The OpenNLP Maximum Entropy Package

Code

Programming Languages: C++, Java

License: Apache License V2.0, GNU Library or Lesser General Public License (LGPL)

Repositories

browse code, statistics cvs -d:pserver:anonymous@maxent.cvs.sourceforge.net:/cvsroot/maxent login

cvs -z3 -d:pserver:anonymous@maxent.cvs.sourceforge.net:/cvsroot/maxent co -P modulename

Show:

What's happening?

  • Interpreting model file

    Hi, I have been using maxent for a sentiment analysis application for a while and it has been working well. I am quite familiar with maximum entropy modelling, but I am trying to better understand the model file in order to see where things may be going wrong in order to make improvements to my application. I have read...

    2009-12-17 22:17:05 UTC by chrisnicholls

  • Followup: RE: Training models with large datasets

    Yeah, it may well be that even getting it to deal with multimillion events takes prohibitively long (that being said, I'm happy to let it run for half a day or so if it gets a high quality model out at the end of it! I don't need fast training). If I can get this working I'll report back on training times.

    2009-12-13 15:36:09 UTC by drmaciver

  • Followup: RE: Training models with large datasets

    Note that the MaxEnt parameter optimization algorithm (GIS) requires many sums over the whole data set to compute model expectations. If you are using large amounts of data, computing those sums might kill your performance. To my knowledge the convergence characteristics of GIS are not well understood. It might require only a few iterations to converge, or it might require many iterations. I.

    2009-12-13 14:39:29 UTC by d_burfoot

  • Followup: RE: Training models with large datasets

    Here's a first draft: It requires some work to make fully polished. In particular I've reused ComparableEvent for part of a public facing API, and from the looks of it it was designed to be internal only. But let me know what you think: http://github.com/DRMacIver/maxent/commit/723af563aec0c844419826a1ff647fac409c8b27 Basically I've modified this in a backwards compatible way so that as...

    2009-12-13 14:22:31 UTC by drmaciver

  • Followup: RE: Training models with large datasets

    The model is the one for http://github.com/DRMacIver/term-extractor. I've got a training set built out of sentences from wikipedia and an earlier rule based version of the term extractor. I am using a TwoPassDataIndexer. The problem seems to be not the number of features (which is large but not ridiculous - probably of a similar order to what you'd expect for a pos tagger). But there are a...

    2009-12-12 19:05:15 UTC by drmaciver

  • Followup: RE: Training models with large datasets

    Hi, Are you using the TwoPassDataIndexer? This will typically allow you to load larger event spaces. What this does is makes one pass over the event space for determining feature count cut-offs and writes the events to a temp file. Then in the second pass it loads the events into memory but represents them as ints so the string representations never need to be loaded into memory. If you...

    2009-12-12 18:42:04 UTC by tsmorton

  • Training models with large datasets

    Hi. I've been trying to train a model with a very large number of events: On the order of a few million. I have them stored in a file suitable for using with FileEventStream. So far I've been unable to get this to work. It ends up using a really dramatic amount of memory - more than I can reasonably give to the JVM to use. Most of this seems to be taken up with arrays of ints for the...

    2009-12-12 14:54:31 UTC by drmaciver

  • Comment: temporary files remain after training

    It might be better introducing a bit more generic fix, like .close() method in an AbstractEventStream, so the resources the stream allocated during creation will be released explicitly upon .close() method call.

    2009-10-02 14:17:40 UTC by autayeu

  • temporary files remain after training

    After training, temporary files like events37537375696294525.tmp remain in the temporary directory. AFAIK, the problem TwoPassDataIndexer -> open reader in the FileEventStream -> reader remains open -> file.delete() does not work. A possible solution might be - remove deleteOnExit() due unreliabilty and memory consumption, add explicit .close() call: Left base folder...

    2009-10-02 12:21:26 UTC by autayeu

  • Followup: RE: why results are so weird

    Hi, take a look at this thread: https://sourceforge.net/projects/maxent/forums/forum/18385/topic/1925312 Also note that a feature with a value of "0" is ignored by the model. Hope this helps...Tom.

    2009-09-20 13:26:41 UTC by tsmorton