From: ted p. <tpederse@d.umn.edu> - 2004-11-12 21:37:35
|
Hi Jason, This is really interesting. I hadn't thought of this before, but I see exactly what you are referring to. I'd suggest the following - my experience has been that compounds are very often nouns (not always, but more often than they are verbs). So if we can't do any better, I'd suggest assuming that a compound is a noun. Actually, now that I think of it - I wonder if there are many compounds that are both nouns and verbs (at least those known to WordNet)? I would doubt it. Would that be a useful fact? I'll think about this some more... Thanks! Ted On Fri, 12 Nov 2004, Jason Michelizzi wrote: > I've come across a slight difficulty in working with compoundifying > and converting POS tags from the Penn Treebank format to WN format. > If we do compoundification on tagged words, it seems that we have to > discard the POS tags. The problem is that there are compound words > that belong to more than one part of speech, such as machine_gun and > goose_step (both of them can be either nouns or verbs). > > So if we came across text such as "goose/NN step/NN" or "machine/NN > gun/NN", we could only turn that into "goose_step" and "machine_gun", > but not "goose_step#n" or "machine_gun#v". (The fact that step is > tagged as a noun isn't much of a help, the Brill tagger always seems > to tag it as a noun in the few experiments I tried, except when I had > "stepped or stepping" instead). > > Jason > -- Ted Pedersen http://www.d.umn.edu/~tpederse |