Thomas Morton

Show:

What's happening?

  • Followup: RE: Name Finder XML

    Hi, Sorry this fell off my radar for a couple of days. I was mistaken and this isn't really a bug so much as the way things work when you tag the sentence independently. This thread describes a way to remove one of the names in cases like this: http://sourceforge.net/projects/opennlp/forums/forum/9943/topic/2639839 Hope this helps...Tom.

    2009-12-16 02:50:58 UTC in OpenNLP

  • Followup: RE: Training models with large datasets

    Hi, Are you using the TwoPassDataIndexer? This will typically allow you to load larger event spaces. What this does is makes one pass over the event space for determining feature count cut-offs and writes the events to a temp file. Then in the second pass it loads the events into memory but represents them as ints so the string representations never need to be loaded into memory. If you...

    2009-12-12 18:42:04 UTC in The OpenNLP Maximum Entropy Package

  • Followup: RE: coreference resolution

    Hi, I think this is a case where its hard for the coreference component to distinguish this "it" from a pleonastic one because the adjective, noisy, is pushed out of the 5-word feature window by the adverb "very". ... [quiz , it was very] noisy look identical to: ... [quiz , it was very] sunny So this is just a hard example for the model but not a bug. Hope this helps...Tom.

    2009-12-11 14:03:47 UTC in OpenNLP

  • Followup: RE: Full Parsing

    Hi, Yeah the problem is that you are providing pos-tags and the opennlp parser doesn't want them. Just give it the tokens, and it will provide the pos-tags. The tag sequence is optimized based on the parse so it is more accurate to perform these steps together. Here is what the input and output will look like: java -mx500M opennlp.tools.lang.english.TreebankParser -d...

    2009-12-11 02:39:37 UTC in OpenNLP

  • Followup: RE: Name Finder XML

    Hi, That looks like a bug. I'll take a look this evening. Thanks...Tom.

    2009-12-10 16:20:42 UTC in OpenNLP

  • Followup: RE: Stemming

    I'm not familiar with these tools. You might check out the Lucene forums /code base where you are more likely to find an Arabic stemmer. Good luck...Tom.

    2009-11-23 14:41:32 UTC in OpenNLP

  • Followup: RE: OpenNLP for search Query parsing

    Hi, OpenNLP is designed more for full text and as such its models rely on capitalization and punctuation. Search queries typically don't contain either. You probably just want to use some sort of dictionary approach matching the largest entry in your dictionary when possible over smaller entries which are sub-strings. Hope this helps...Tom.

    2009-11-23 14:39:54 UTC in OpenNLP

  • Followup: RE: NameFinder usage

    Hi, In the last released version NameFinderME.find takes an array of tokens which are strings for the words. The spans returned are offsets into that token array. If you want to map these onto character spans (which are the kind returned by the tokenizer) then you: * Get you token spans (array of spans) * Create a parallel array of token strings (array of strings) * Call...

    2009-11-23 14:35:55 UTC in OpenNLP

  • OpenNLP

    tsmorton committed patchset 1196 of module opennlp to the OpenNLP CVS repository, changing 2 files.

    2009-11-19 03:39:14 UTC in OpenNLP

  • Followup: RE: Chunk and Parse speed is very slow even wrapp

    Hi, I checked on this and a beam size of 5 will speed the parser but 2-3x and only reduce parsing accuracy by less than 1% in F-measure. Hope this helps...Tom.

    2009-11-17 13:45:28 UTC in OpenNLP

About Me

  • 2000-06-08 (10 years ago)
  • 39289
  • tsmorton (My Site)
  • Thomas Morton

Send me a message