Re: [Senserelate-users] All Words

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I submitted your entire file, and did get an error, and I think I
understand the problem.

First, the problem isn't being caused by not having one sentence per
line (and one line per sentence), although you really should do that
in order to get better results.

The problem turns out to be a ' that appears in your text.
WordNet-SenseRelate does not really use punctuation marks or function
words in doing disambiguation, since it relies on WordNet words (and
punctuation marks aren't in WordNet). So, WordNet-SenseRelate
essentially ignores or removes anything not found in WordNet. There
are other examples of this, for example you'll notice in your tagged
output that "of" and "the" are not assigned senses - that is because
they are not content words. WordNet-SenseRelate will only assign
senses to nouns, verbs, adjectives and adverbs that are known to
WordNet.

For some reason the ' punctuation mark is sneaking past
WordNet-SenseRelate and causing a problem because WordNet doesn't
really include anything about '.  WordNet-SenseRelate should in fact
remove these kinds of punctuation marks, but it doesn't seem to handle
this case. We'll fix that in future releases, although for the moment
there is a pretty simple fix.

Put your pos tagged output in one sentence per line format, and then
remove all of the punctuation marks before submitting to
WordNet-SenseRelate.

I hope this helps! Below you can see the error that I got when running
on your original input file.

marimba(36): wsd.pl --context myfile --format tagged
Current configuration:
    context file  : myfile
    format        : tagged
    scheme        : normal
    tagged text   : yes
    measure       : WordNet::Similarity::lesk
    window        : 4
    contextScore  : 0
    pairScore     : 0
    measure config: (none)
    trace         : no
    forcepos      : no
    compound file : (none)
    stoplist      : (none)
Loading WordNet... done.
(valid_forms) Invalid part-of-speech: ' at
/usr/local/lib/perl5/site_perl/5.8.5/WordNet/QueryData.pm line 887.

On 10/28/07, Ted Pedersen <tpederse@d.umn.edu> wrote:
> I added a few more lines of your file (one line per sentence, one
> sentence per line) and the following is the output that I got...
>
> wsd.pl --context file --format tagged
> Current configuration:
>     context file  : file
>     format        : tagged
>     scheme        : normal
>     tagged text   : yes
>     measure       : WordNet::Similarity::lesk
>     window        : 4
>     contextScore  : 0
>     pairScore     : 0
>     measure config: (none)
>     trace         : no
>     forcepos      : no
>     compound file : (none)
>     stoplist      : (none)
> Loading WordNet... done.
>
> Ad#n#1 sale#n#5 boost#v#3 Time Warner#n#1 profit#n#1 Quarterly#r#2 profit=
s#n#1 a
> t US#n#1 medium#n#1 giant#n#6 TimeWarner#n jump#v#1 76#a#1 %#n to $#v 1#a=
#1 .
> 13bn#n ( =C2#n =A3#n 600m#n ) for the three#a#1 month#n#2 to December , f=
rom $#n 639
> m#n year#n#3 -#n earlier#r#2 .
> The firm#n#1 , which is now#r#3 one#a#1 of the biggest investor#n#1 in Go=
ogle#n#
> 1 , benefit#v#1 from sale#n#5 of high#a#2 -#n speed#n#1 internet#n#1 conn=
ection#
> n#9 and high#a#2 advert#n#1 sale#n#4 .
> TimeWarner#n say#v#8 fourth quarter#n#2 sale#n#5 rise#v#1 2#a#1 %#n to $#=
v 11#a#
> 1 . 1bn#n from $#n 10#a#1 .
> 9bn#n .
> Its profits#n#1 were buoy#v#3 by one#a#1 -#n off#r#2 gain#n#3 which offse=
t#n#1 a
>  profit#n#2 dip#n#6 at Warner#n#2 Bros#n , and less user#n#1 for AOL#n .
> Time Warner#n#2 say#v#2 on Friday that it now#r#3 own#v#1 8#a#1 %#n of se=
arch#n#
> 1 -#n engine#n#3 Google#n#1 .
> But its own#a#1 internet#n#1 business#n#2 , AOL#n , had has mix#v#6 fortu=
ne#n#4
> . It lose#v#3 464#a , 000#a subscriber#n#2 in the fourth quarter#n#2 prof=
its#n#1
>  were low#a#2 than in the precede#v#5 three#a#1 quarter#n#2 .
>
>
>
> On 10/28/07, Ted Pedersen <tpederse@d.umn.edu> wrote:
> > Your input file should have just one sentence per line. I don't know
> > if that explains the problem exactly or not, but when I ran with just
> > one sentence on the first line, I got the output as shown below:
> >
> > marimba(6): wsd.pl --context file --format tagged
> > Current configuration:
> >     context file  : file
> >     format        : tagged
> >     scheme        : normal
> >     tagged text   : yes
> >     measure       : WordNet::Similarity::lesk
> >     window        : 4
> >     contextScore  : 0
> >     pairScore     : 0
> >     measure config: (none)
> >     trace         : no
> >     forcepos      : no
> >     compound file : (none)
> >     stoplist      : (none)
> > Loading WordNet... done.
> > Ad#n#1 sale#n#5 boost#v#3 Time Warner#n#1 profit#n#1 Quarterly#r#2
> > profits#n#1 at US#n#1 medium#n#1 giant#n#6 TimeWarner#n jump#v#1
> > 76#a#1 %#n to $#v 1#a#1 .
> >
> >
> > On 10/27/07, wael gomaa <drw...@ya...> wrote:
> > >
> > > I had used the Brill Tagger to tag my corpora using NLTK .
> > > When typing  this command to perform WSD     wsd.pl -context myfile.t=
xt
> > > -format tagged
> > > This error was [Invalid Part Of Speech in QueryData.pm at line 887]
> > > I know that wsd modules use Penn Treebank tagset , is there a differa=
nce
> > > between Brill tagset and Penn Treebank dataset ? if yes how can i con=
vert
> > > from brill to treebank to support wsd module .
> > >
> > > 0001.txt Example of my tagged data is attached with my message.
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam? Yahoo! Mail has the best spam protection around
> > > http://mail.yahoo.com
> > > ---------------------------------------------------------------------=
----
> > > This SF.net email is sponsored by: Splunk Inc.
> > > Still grepping through log files to find problems?  Stop.
> > > Now Search log events and configuration files using AJAX and a browse=
r.
> > > Download your FREE copy of Splunk now >> http://get.splunk.com/
> > > _______________________________________________
> > > senserelate-users mailing list
> > > sen...@li...
> > > https://lists.sourceforge.net/lists/listinfo/senserelate-users
> > >
> > >
> > >
> >
> >
> > --
> > Ted Pedersen
> > http://www.d.umn.edu/~tpederse
> >
>
>
> --
> Ted Pedersen
> http://www.d.umn.edu/~tpederse
>

--=20
Ted Pedersen
http://www.d.umn.edu/~tpederse