From: ted p. <tpederse@d.umn.edu> - 2005-06-17 17:17:05
|
A very interesting post from a user that they kindly agreed to allow me to forward to the list. Read on! -- http://www.d.umn.edu/~tpederse ---------- Forwarded message ---------- Date: Thu, 16 Jun 2005 15:03:55 +1200 From: Corrin Lakeland <lak...@cs...> To: tpe...@um... Subject: fast word sense tagging Hello, I recently put together some code you wrote into a pipeline to implement a word sense tagger. It was very easy to write, and it works well, so thank you very much for writing and releasing the components that made it easy. My problem is that it is much too slow for my needs. I would like to sense-tag a gigaword corpus as a preliminary step for building word clusters, but the sense-tagging code takes several seconds per word. Is there any way of getting faster performance, similar to that of a POS tagger? Perhaps by precaching distances or similar? Also, before I thought of POS tagging first, I noticed the WSD algorithm was getting the incorrect sense for "can" in "the can can hold water". In POS tagging this can be easily solved by replacing the for loop with a simple search (mathematically, by using argmax). I haven't seen this done in the WSD literature along these lines, which seemed a little odd to me? In case you're curious, here's how I turned your code into a tagger: use WordNet::SenseRelate::AllWords; use WordNet::QueryData; use Lingua::EN::Tagger; my $qd = WordNet::QueryData->new; my $wsd = WordNet::SenseRelate::AllWords->new (wordnet => $qd, measure => 'WordNet::Similarity::lesk'); my $tagger = new Lingua::EN::Tagger; my $text = "the can can hold water"; my $tagged = $tagger->add_tags($text); my @results = $wsd->disambiguate (window => 3, tagged => 1, context => [$tagged]); Thanks for your time, Corrin Lakeland |