2009-08-15 08:53:50 UTC
Yeah, I did worry about the bulk issue. It adds a lot of largish features. Perhaps a better approach would be to actually stem? Requires there to be a stemmer in OpenNLP though, either your own or an external dependency, which may be more trouble than it's worth.
Adding it to the tagdict is an interesting idea which hadn't occurred to me. I might look into that. The only problem is that it's not a very general solution - I don't so much care about this particular example as the overall impact on our term extraction.