-
Hi,
Sorry this fell off my radar for a couple of days. I was mistaken and this isn't really a bug so much as the way things work when you tag the sentence independently. This thread describes a way to remove one of the names in cases like this:
http://sourceforge.net/projects/opennlp/forums/forum/9943/topic/2639839
Hope this helps...Tom.
2009-12-16 02:50:58 UTC in OpenNLP
-
Hi,
Are you using the TwoPassDataIndexer? This will typically allow you to load larger event spaces. What this does is makes one pass over the event space for determining feature count cut-offs and writes the events to a temp file. Then in the second pass it loads the events into memory but represents them as ints so the string representations never need to be loaded into memory.
If you...
2009-12-12 18:42:04 UTC in The OpenNLP Maximum Entropy Package
-
Hi,
I think this is a case where its hard for the coreference component to distinguish this "it" from a pleonastic one because the adjective, noisy, is pushed out of the 5-word feature window by the adverb "very".
... [quiz , it was very] noisy
look identical to:
... [quiz , it was very] sunny
So this is just a hard example for the model but not a bug.
Hope this helps...Tom.
2009-12-11 14:03:47 UTC in OpenNLP
-
Hi,
Yeah the problem is that you are providing pos-tags and the opennlp parser doesn't want them. Just give it the tokens, and it will provide the pos-tags. The tag sequence is optimized based on the parse so it is more accurate to perform these steps together. Here is what the input and output will look like:
java -mx500M opennlp.tools.lang.english.TreebankParser -d...
2009-12-11 02:39:37 UTC in OpenNLP
-
Hi,
That looks like a bug. I'll take a look this evening. Thanks...Tom.
2009-12-10 16:20:42 UTC in OpenNLP
-
I'm not familiar with these tools. You might check out the Lucene forums /code base where you are more likely to find an Arabic stemmer.
Good luck...Tom.
2009-11-23 14:41:32 UTC in OpenNLP
-
Hi,
OpenNLP is designed more for full text and as such its models rely on capitalization and punctuation. Search queries typically don't contain either. You probably just want to use some sort of dictionary approach matching the largest entry in your dictionary when possible over smaller entries which are sub-strings.
Hope this helps...Tom.
2009-11-23 14:39:54 UTC in OpenNLP
-
Hi,
In the last released version NameFinderME.find takes an array of tokens which are strings for the words. The spans returned are offsets into that token array. If you want to map these onto character spans (which are the kind returned by the tokenizer) then you:
* Get you token spans (array of spans)
* Create a parallel array of token strings (array of strings)
* Call...
2009-11-23 14:35:55 UTC in OpenNLP
-
tsmorton committed patchset 1196 of module opennlp to the OpenNLP CVS repository, changing 2 files.
2009-11-19 03:39:14 UTC in OpenNLP
-
Hi,
I checked on this and a beam size of 5 will speed the parser but 2-3x and only reduce parsing accuracy by less than 1% in F-measure. Hope this helps...Tom.
2009-11-17 13:45:28 UTC in OpenNLP