-
Yeah, I did worry about the bulk issue. It adds a lot of largish features. Perhaps a better approach would be to actually stem? Requires there to be a stemmer in OpenNLP though, either your own or an external dependency, which may be more trouble than it's worth.
Adding it to the tagdict is an interesting idea which hadn't occurred to me. I might look into that. The only problem is that it's...
2009-08-15 08:53:50 UTC in OpenNLP
-
Actually, potentially useful thought:
The feature generation seems to look at suffixes, but it doesn't look at the word being suffixed (it looks at initial prefixes of up to 4 characters, but that's not so useful here)
so for promotes it generates features
w=promotes suf=s suf=es suf=tes suf=otes
but perhaps it should also be generating
stem=promote stem=promot stem=promo...
2009-08-07 04:30:53 UTC in OpenNLP
-
This is probably also in the category of "not much that can be done about it", but the following part of speech tagging is a bit odd:
"Message/NN passing/VBG concurrency/NN promotes/NNS loosely/RB coupled/VBN application/NN components/NNS ./."
In particular the tagging of "promotes" as an NNS rather than a VBZ is surprising.
Looking at the model, the word...
2009-08-07 04:11:30 UTC in OpenNLP
-
The following code will throw an NPE:
import opennlp.maxent.*;
class NPE{
public static void main(String[] args){
EventStream es = new EventStream(){
public boolean hasNext(){ return false; }
public Event nextEvent(){ throw new RuntimeException(); }
};
GIS.trainModel(es, 100, 5);
}
}
This isn't a disaster, as clearly the resulting model isn't going...
2009-05-20 16:01:13 UTC in The OpenNLP Maximum Entropy Package
-
Hi,
I've only just noticed this thread.
I ran into this bug yesterday and spent some time tracking down the route cause. I think the snippet you posted from the error log is misleading: The issue is not the low memory detector, but in fact that hotspot is segfaulting when trying to compile the previousSpaceIndex method. Here's an example of the full error...
2009-05-20 13:59:50 UTC in OpenNLP
-
By the way, here's a reproducible test case for the JVM bug this triggers: http://pastebin.com/f25f43809 (this produces a crasher every time). Here's a patch for replacing StringBuffer with StringBuilder: http://drmaciver.com/stringbuilder.patch (I can't seem to figure out how to attach a file after the fact).
2009-05-20 13:22:31 UTC in OpenNLP
-
Fair enough. That makes sense. :-)
I'll just special case this sort of construct at the pre and post processing stages for my code. Thanks for your help.
2009-02-27 11:29:38 UTC in OpenNLP
-
Fair enough. That's more or less what I'm doing now. :-) ('though what I'm writing can't afford to be too domain specific, so I'm trying to keep things reasonably general).
Could you elaborate on what sort of things the sentence detector is and isn't good at dealing with?.
2009-02-26 18:52:22 UTC in OpenNLP
-
The english sentence detector in OpenNLP seems to treat the sequence "Chapter 3. Stuff happened." as a single sentence. "Chapter A. Stuff happened" is treated similarly, but "Chapter Fish. Stuff happened" is considerd two sentences. In general it seems like things of this form should always be two sentences ('though of course always the possibility that we are...
2009-02-26 16:59:37 UTC in OpenNLP
-
GraphML allows you to specify attributes with nested elements, but the GraphMLFileHandler doesn't currently deal with this. Here's a patch to let it do so.
2007-10-10 15:26:07 UTC in Java Universal Network/Graph Framework