David R. MacIver

Show:

What's happening?

  • Followup: RE: Another noun verb confusion

    Yeah, I did worry about the bulk issue. It adds a lot of largish features. Perhaps a better approach would be to actually stem? Requires there to be a stemmer in OpenNLP though, either your own or an external dependency, which may be more trouble than it's worth. Adding it to the tagdict is an interesting idea which hadn't occurred to me. I might look into that. The only problem is that it's...

    2009-08-15 08:53:50 UTC in OpenNLP

  • Followup: RE: Another noun verb confusion

    Actually, potentially useful thought: The feature generation seems to look at suffixes, but it doesn't look at the word being suffixed (it looks at initial prefixes of up to 4 characters, but that's not so useful here) so for promotes it generates features w=promotes suf=s suf=es suf=tes suf=otes but perhaps it should also be generating stem=promote stem=promot stem=promo...

    2009-08-07 04:30:53 UTC in OpenNLP

  • Another noun verb confusion

    This is probably also in the category of "not much that can be done about it", but the following part of speech tagging is a bit odd: "Message/NN passing/VBG concurrency/NN promotes/NNS loosely/RB coupled/VBN application/NN components/NNS ./." In particular the tagging of "promotes" as an NNS rather than a VBZ is surprising. Looking at the model, the word...

    2009-08-07 04:11:30 UTC in OpenNLP

  • Training a model off an empty EventStream NPEs

    The following code will throw an NPE: import opennlp.maxent.*; class NPE{ public static void main(String[] args){ EventStream es = new EventStream(){ public boolean hasNext(){ return false; } public Event nextEvent(){ throw new RuntimeException(); } }; GIS.trainModel(es, 100, 5); } } This isn't a disaster, as clearly the resulting model isn't going...

    2009-05-20 16:01:13 UTC in The OpenNLP Maximum Entropy Package

  • Followup: RE: Sentence Detector bug

    Hi, I've only just noticed this thread. I ran into this bug yesterday and spent some time tracking down the route cause. I think the snippet you posted from the error log is misleading: The issue is not the low memory detector, but in fact that hotspot is segfaulting when trying to compile the previousSpaceIndex method. Here's an example of the full error...

    2009-05-20 13:59:50 UTC in OpenNLP

  • Comment: previousSpaceIndex triggers a hotspot bug

    By the way, here's a reproducible test case for the JVM bug this triggers: http://pastebin.com/f25f43809 (this produces a crasher every time). Here's a patch for replacing StringBuffer with StringBuilder: http://drmaciver.com/stringbuilder.patch (I can't seem to figure out how to attach a file after the fact).

    2009-05-20 13:22:31 UTC in OpenNLP

  • Followup: RE: Sentence breaking on chapter titles

    Fair enough. That makes sense. :-) I'll just special case this sort of construct at the pre and post processing stages for my code. Thanks for your help.

    2009-02-27 11:29:38 UTC in OpenNLP

  • Followup: RE: Sentence breaking on chapter titles

    Fair enough. That's more or less what I'm doing now. :-) ('though what I'm writing can't afford to be too domain specific, so I'm trying to keep things reasonably general). Could you elaborate on what sort of things the sentence detector is and isn't good at dealing with?.

    2009-02-26 18:52:22 UTC in OpenNLP

  • Sentence breaking on chapter titles

    The english sentence detector in OpenNLP seems to treat the sequence "Chapter 3. Stuff happened." as a single sentence. "Chapter A. Stuff happened" is treated similarly, but "Chapter Fish. Stuff happened" is considerd two sentences. In general it seems like things of this form should always be two sentences ('though of course always the possibility that we are...

    2009-02-26 16:59:37 UTC in OpenNLP

  • GraphML handler doesn't deal with nested data

    GraphML allows you to specify attributes with nested elements, but the GraphMLFileHandler doesn't currently deal with this. Here's a patch to let it do so.

    2007-10-10 15:26:07 UTC in Java Universal Network/Graph Framework

About Me


Send me a message