From: Ted P. <dul...@gm...> - 2008-03-02 20:45:08
|
As best I can tell, WordNet is not case sensitive. So, any input we give it should be assumed "case insensitive", since in the end WordNet will not care what the cases of words are. In our stoplist, we should be careful not to make case distinctions, that is a stopword of "it" should eliminate IT, It and it or even iT from a context. Here's a small demonstration of this, see my input file below. BTW, I find the fact that wsd.pl does not have a format default a little strange - I think if we make a "plain" format (as described in another note) the we ought to have that as the default. marengo(224): wsd.pl --context file --format raw Current configuration: context file : file format : raw scheme : normal tagged text : no measure : WordNet::Similarity::lesk window : 4 contextScore : 0 pairScore : 0 measure config: (none) trace : no forcepos : no compound file : (none) stoplist : (none) Loading WordNet... done. The bridge#n#1 be#v#1 hold#v#1 up#a#1 by#r#2 red_tape#n#1 The Bridge#n#1 be#v#1 hold#v#1 Up#a#1 by#r#2 Red_Tape#n#1 THE BRIDGE#n#1 be#v#1 hold#v#1 UP#a#1 BY#r#2 RED_TAPE#n#1 marengo(225): cat file The bridge is held up by red_tape. The Bridge Is Held Up by Red_Tape. THE BRIDGE IS HELD UP BY RED_TAPE. Not that all sense assignments are in the end the same, regardless of the case of the input text. -- Ted Pedersen http://www.d.umn.edu/~tpederse |