Hi Arun,
I'd try the --config option, in order to provide the lesk measure with
a stoplist. If you aren't using any configuration file then I don't
think you are getting a stoplist for lesk (note this is different than
the --stoplist option) and that might help.
You can find the configuration options for lesk documented here...
http://search.cpan.org/dist/WordNet-Similarity/lib/WordNet/Similarity/lesk.pm
As a quick primer, you specify a configuration file that can look like this:
WordNet::Similarity::lesk
stop::stoplist.txt
An example of a stoplist file can be found here :
http://cpansearch.perl.org/src/TPEDERSE/WordNet-Similarity-2.05/samples/stoplist.txt
So, you would then simply run wsd.pl with the config option (--config)
and that would change the lesk measure by adding a stoplist to it
(that would remove stop words from the glosses that it matches).
Otherwise, I think it might just be worthwhile to experiment a bit -
there are quite a few options to wsd.pl that have a fairly significant
impact on the algorithm (for better or worse), so you might even
discover some combination of options that works even better (in which
case we'd be happy to hear about that).
Best of luck, and keep us posted on how things go.
Cordially,
Ted
On Wed, May 11, 2011 at 3:56 PM, Arun N <arunn3.14@...> wrote:
> I tested on Senseval 3 with following options as described in varadha's
> thesis.
> --format wntagged --type WordNet::Similarity::lesk --window 15 --backoff
> --contextScore=0.0 --pairScore=0.0 --stoplist default-stoplist-raw.txt
> --nocompoundify
> I got F-measure of 51.6(used allwords-scorer.pl)
> I have attached my output file.
> How can this be improved to 54 ?
>
> Arun,
> On Wed, May 11, 2011 at 2:22 PM, Ted Pedersen <tpederse@...> wrote:
>>
>> Hi Arun,
>>
>> See comments inline...
>>
>> On Wed, May 11, 2011 at 12:25 PM, Arun N <arunn3.14@...> wrote:
>> > Thanks for the reply.
>> > I agree that results improve when backoff option is used.
>> > In page 186, The table has results for noun, verbs, adjectives, and
>> > adverbs.
>> > There is no column for all words results.
>>
>> That's correct.
>>
>> > Also, the demo paper's results are for all parts of speech . I guess.
>>
>> Correct - that was a very short paper so we tried to make it as
>> condensed as possible.
>>
>> > whereas the Page 174 has backoff results for window 15 and it has all
>> > POS
>> > results.
>>
>> Correct.
>>
>> > My question is, Does the table in demo paper correspond to all POS
>> > results ?
>>
>> Yes.
>>
>> > If yes, then Page 174 has the results I guess, because, page 186 doesnot
>> > have all POS results.
>>
>> Yes, I think the overall results are presented first, with the more
>> detailed scores later.
>>
>> > Moreover, In Page 174, table 143 has results with backoff option set and
>> > the
>> > results for lesk algorithm is 50.9.
>>
>> Yes, that's true.
>>
>> I think Varada's thesis specifies completely the options we used, so
>> that's your best starting point. I don't recall what options were used
>> in the NAACL demo paper. I think the main thing that could differ
>> might be the window size or perhaps the stoplist used by lesk when
>> measuring it's overlaps, or the stoplist used by wsd.pl.
>>
>> But, I'm confident that the results in the NAACL paper are as
>> reported, and also with Varada's thesis. There is some variation in
>> the experiments that appears to be important, although I can't
>> reconstruct exactly what that is. I think the best thing might be to
>> try to run the wsd.pl program on the Senseval-3 data with the options
>> described in Varada's thesis and see what that results in. I'd be
>> happy to look at those results and comments further (once we know what
>> happens there.)
>>
>> Hope this helps.
>>
>> Good luck,
>> Ted
>>
>> > Arun,
>> >
>> > On Wed, May 11, 2011 at 8:21 AM, Ted Pedersen <tpederse@...>
>> > wrote:
>> >>
>> >> Hi Arun,
>> >>
>> >> See comments inline...
>> >>
>> >> On Wed, May 11, 2011 at 12:18 AM, Arun N <arunn3.14@...> wrote:
>> >> > Hi Guys,
>> >> > I need a small clarification.
>> >> >
>> >> > In the paper
>> >> >
>> >> > http://www.d.umn.edu/~kolha002/publications/pedersenk09-demo-final.pdf
>> >> > the F-measure for Senseval 3 (lesk) with window size 15 is 54 [ P -
>> >> > 54
>> >> > : R
>> >> > - 53 ]
>> >> >
>> >> > But I cannot find a similar F-measure value in Varadha's thesis.
>> >> > In the thesis
>> >> > http://www.d.umn.edu/~kolha002/publications/Kolhatkar-thesis.pdf
>> >> >
>> >> > page number 173-174 has the results for Window size 15
>> >> >
>> >> > All the results for window = 15 and lesk measure is not more than 51
>> >>
>> >> If you look on the last page of Varada's thesis (page 186) I think
>> >> you'll see the source of the results in the NAACL demo paper - note
>> >> that in this case we use the --backoff option, which means default to
>> >> sense 1 when we can't establish anything with the SenseRelate
>> >> algorithm. In the earlier results you mention (pages 173-174) there is
>> >> no such backoff, so you see somewhat lower results.
>> >>
>> >> >
>> >> > Could you tell what options did u set for getting the highest
>> >> > F-measure
>> >> > 54
>> >> > as reported in the paper ?
>> >>
>> >> See page 186 of Varada's thesis.
>> >>
>> >> >
>> >> >
>> >> > Secondly,
>> >> > Agirre et al
>> >> > http://www.aclweb.org/anthology/E/E09/E09-1005.pdf
>> >> >
>> >> > The authors claim that they get better results when Wordnet 1.7 was
>> >> > used
>> >> > instead of Wordnet 3.0.
>> >> > So, did you guys experiment SR-AW with wordnet 1.7 ?
>> >>
>> >> No. The WordNet group at Princeton doesn't support 1.7 any longer, so
>> >> we don't use it. Overall WordNet 3.0 is much improved on earlier
>> >> versions of WordNet, so I think it makes sense to use it.
>> >>
>> >> However, remember that the SemCor data is based on version 1.5 of
>> >> WordNet, so in some ways it makes sense that an earlier version would
>> >> work better (since as the versions progress those mappings back to 1.5
>> >> become more and more noisy). But, I think that tells us more about the
>> >> evaluation data than it does WordNet.
>> >>
>> >> > Also, I would like to know whether the actual key given for senseval
>> >> > 2
>> >> > and
>> >> > 3 was based on Wordnet 1.7 or Wordnet 3.0 ?
>> >>
>> >> To be honest I just don't recall. You might need to dig around a bit
>> >> for some answers to that - http://senseval.org will be a good starting
>> >> point for that. Also remember that WordNet 2.0 was quite popular for
>> >> some time, and could have been used (especially for Senseval-2, since
>> >> I don't think 3.0 was released at that time).
>> >>
>> >> > I downloaded Senseval data sets from Rada Mihalcea's website which
>> >> > was
>> >> > actually suggested by Varadha.
>> >>
>> >> Great! That's a very useful resource.
>> >> ( http://www.cse.unt.edu/~rada/downloads.html#sensevalsemcor )
>> >>
>> >> Hope this helps!
>> >>
>> >> Good luck,
>> >> Ted
>> >>
>> >> >
>> >> > Arun,
>> >> >
>> >> > On Mon, Apr 25, 2011 at 12:16 PM, Arun N <arunn3.14@...> wrote:
>> >> >>
>> >> >> Thanks Varadha. This is what I was searching for.
>> >> >> Arun,
>> >> >>
>> >> >> On Mon, Apr 25, 2011 at 10:47 AM, Ted Pedersen <tpederse@...>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hi Varada,
>> >> >>>
>> >> >>> Ah......that's the part I was forgetting!!!!!!!!!!!!!!!!!!!!!!!!!!!
>> >> >>> :)
>> >> >>> Thanks very much for clarifying this.
>> >> >>>
>> >> >>> Arun, I hope this works out, and please let us know if additional
>> >> >>> questions arise.
>> >> >>>
>> >> >>> Thanks!
>> >> >>> Ted
>> >> >>>
>> >> >>> On Mon, Apr 25, 2011 at 10:23 AM, varada kolhatkar
>> >> >>> <varada.kolhatkar@...> wrote:
>> >> >>> > Hi Arun,
>> >> >>> > semcor-reformat.pl needs SemCor formatted input. For my
>> >> >>> > experiments
>> >> >>> > I
>> >> >>> > used
>> >> >>> > Senseval data converted into SemCor format by Rada Mihalcea.
>> >> >>> > You can download it from her webpage.
>> >> >>> > http://www.cse.unt.edu/~rada/downloads.html
>> >> >>> > Search for 'Senseval-3 English all-words converted into SemCor
>> >> >>> > format'
>> >> >>> > Hope that helps,
>> >> >>> > Varada
>> >> >>> >
>> >> >>> > On Mon, Apr 25, 2011 at 7:39 AM, Ted Pedersen
>> >> >>> > <tpederse@...>
>> >> >>> > wrote:
>> >> >>> >>
>> >> >>> >> Thanks for these additional details Arun! We'll investigate
>> >> >>> >> further
>> >> >>> >> and report back asap, I hope later today (Monday).
>> >> >>> >>
>> >> >>> >> Cordially,
>> >> >>> >> Ted
>> >> >>> >>
>> >> >>> >> On Sun, Apr 24, 2011 at 10:16 PM, Arun N <arunn3.14@...>
>> >> >>> >> wrote:
>> >> >>> >> > @Ted,
>> >> >>> >> > This is the command that I used and the corresponding error
>> >> >>> >> > message.
>> >> >>> >> > $ semcor-reformat.pl --file english-all-words.xml
>> >> >>> >> > Nameless tag: '?xml version="1.0"?'
>> >> >>> >> > Nameless tag: '!DOCTYPE corpus SYSTEM "all-words.dtd"'
>> >> >>> >> > Use of uninitialized value in subroutine entry at
>> >> >>> >> > /usr/local/bin/semcor-reformat.pl line 222, <FH> chunk 1.
>> >> >>> >> > Can't use string ("") as a subroutine ref while "strict refs"
>> >> >>> >> > in
>> >> >>> >> > use
>> >> >>> >> > at
>> >> >>> >> > /usr/local/bin/semcor-reformat.pl line 222, <FH> chunk 1.
>> >> >>> >> > Arun,
>> >> >>> >> > On Sun, Apr 24, 2011 at 10:12 PM, Arun N <arunn3.14@...>
>> >> >>> >> > wrote:
>> >> >>> >> >>
>> >> >>> >> >> @Varadha
>> >> >>> >> >> The results in Varadha's thesis (p. 193) say that SENSEVAL 3
>> >> >>> >> >> was
>> >> >>> >> >> given
>> >> >>> >> >> in
>> >> >>> >> >> wntagged format.
>> >> >>> >> >> I just want to know how did you convert that to wntagged
>> >> >>> >> >> format
>> >> >>> >> >> ?
>> >> >>> >> >> the .xml file doesnt have POS tags at all as ted mentioned in
>> >> >>> >> >> the
>> >> >>> >> >> earlier
>> >> >>> >> >> mail
>> >> >>> >> >> So I guess, I am using a wrong file for SENSEVAL 3, but I am
>> >> >>> >> >> sure
>> >> >>> >> >> that
>> >> >>> >> >> I
>> >> >>> >> >> downloaded it from the SENSEVAL 3 site.
>> >> >>> >> >> Arun,
>> >> >>> >> >>
>> >> >>> >> >> On Sun, Apr 24, 2011 at 10:08 PM, Arun N
>> >> >>> >> >> <arunn3.14@...>
>> >> >>> >> >> wrote:
>> >> >>> >> >>>
>> >> >>> >> >>> One quick clarification, the .xml file that I sent, was the
>> >> >>> >> >>> one
>> >> >>> >> >>> that
>> >> >>> >> >>> Varadha experimented for SENSEVAL 3?
>> >> >>> >> >>> or
>> >> >>> >> >>> Varadha, can u give me link where you downloaded the data
>> >> >>> >> >>> set
>> >> >>> >> >>> for
>> >> >>> >> >>> evaluating SR-AW on SENSEVAL 3.
>> >> >>> >> >>>
>> >> >>> >> >>> Arun,
>> >> >>> >> >>> On Sun, Apr 24, 2011 at 9:41 PM, Ted Pedersen
>> >> >>> >> >>> <tpederse@...>
>> >> >>> >> >>> wrote:
>> >> >>> >> >>>>
>> >> >>> >> >>>> Hi Arun,
>> >> >>> >> >>>>
>> >> >>> >> >>>> BTW, I might be wrong about not having this functionality
>> >> >>> >> >>>> in
>> >> >>> >> >>>> SenseRelate::AllWords. Can you send the command that you
>> >> >>> >> >>>> try
>> >> >>> >> >>>> to
>> >> >>> >> >>>> run
>> >> >>> >> >>>> and the error that you get? I'll check on a few things in
>> >> >>> >> >>>> the
>> >> >>> >> >>>> meantime.
>> >> >>> >> >>>>
>> >> >>> >> >>>> Thanks!
>> >> >>> >> >>>> Ted
>> >> >>> >> >>>>
>> >> >>> >> >>>> On Sun, Apr 24, 2011 at 9:33 PM, Ted Pedersen
>> >> >>> >> >>>> <tpederse@...>
>> >> >>> >> >>>> wrote:
>> >> >>> >> >>>> > Hi Arun,
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > You can format input to WordNet::SenseRelate::AllWords as
>> >> >>> >> >>>> > wntagged
>> >> >>> >> >>>> > (four part of speech tags, n, v, a, r)
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > cats#n run#v
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > or raw (plain text)
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > cats run
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > or tagged (penn treebank)
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > cats/NP run/VB
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > Based on what I see in the xml file you sent, I think you
>> >> >>> >> >>>> > probably
>> >> >>> >> >>>> > just want to convert this to a raw text format (where you
>> >> >>> >> >>>> > have
>> >> >>> >> >>>> > one
>> >> >>> >> >>>> > sentence per line, one line per sentence) since there are
>> >> >>> >> >>>> > no
>> >> >>> >> >>>> > pos
>> >> >>> >> >>>> > tags
>> >> >>> >> >>>> > (so no point in using wntagged or tagged).
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > We don't have a converter from SensEval-3 format in
>> >> >>> >> >>>> > SenseRelate::AllWords...however, I think I might know of
>> >> >>> >> >>>> > one
>> >> >>> >> >>>> > I
>> >> >>> >> >>>> > can
>> >> >>> >> >>>> > refer you to....let me check on that and report back on
>> >> >>> >> >>>> > Monday.
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > Cordially,
>> >> >>> >> >>>> > Ted
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > On Sun, Apr 24, 2011 at 9:21 PM, Arun N
>> >> >>> >> >>>> > <arunn3.14@...>
>> >> >>> >> >>>> > wrote:
>> >> >>> >> >>>> >> I am planning to experiment on SENSEVAL 3 all words data
>> >> >>> >> >>>> >> set.
>> >> >>> >> >>>> >> But, it is in a different format from Semcor.
>> >> >>> >> >>>> >> When I tried to use extract-semcor.pl on the file, it
>> >> >>> >> >>>> >> showed
>> >> >>> >> >>>> >> some
>> >> >>> >> >>>> >> error.
>> >> >>> >> >>>> >> I downloaded the senseval3 all words test data from site
>> >> >>> >> >>>> >> http://www.senseval.org/senseval3/data.html
>> >> >>> >> >>>> >> I have also attached the file.
>> >> >>> >> >>>> >> I just want to know how should I format SENSEVAL 3 all
>> >> >>> >> >>>> >> words
>> >> >>> >> >>>> >> data
>> >> >>> >> >>>> >> and
>> >> >>> >> >>>> >> give
>> >> >>> >> >>>> >> to wsd.pl ?
>> >> >>> >> >>>> >> Arun,
>> >> >>> >> >>>> >>
>> >> >>> >> >>>> >> On Sun, Apr 24, 2011 at 8:20 PM, Ted Pedersen
>> >> >>> >> >>>> >> <tpederse@...>
>> >> >>> >> >>>> >> wrote:
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>> Hi Arun,
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>> I'm not sure what you mean by senseval...do you mean
>> >> >>> >> >>>> >>> the
>> >> >>> >> >>>> >>> semcor
>> >> >>> >> >>>> >>> format?
>> >> >>> >> >>>> >>> Or... ?
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>> BTW, for wntagged, do you mean text that looks like
>> >> >>> >> >>>> >>> this:
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>> cats#n run#v fast#r
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>> Just wanted to clarify since there are a few different
>> >> >>> >> >>>> >>> formats...
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>> Thanks!
>> >> >>> >> >>>> >>> Ted
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>> On Sun, Apr 24, 2011 at 7:41 PM, Arun N
>> >> >>> >> >>>> >>> <arunn3.14@...>
>> >> >>> >> >>>> >>> wrote:
>> >> >>> >> >>>> >>> > Thanks for the reply guys.
>> >> >>> >> >>>> >>> > Is there any perl script to convert senseval format
>> >> >>> >> >>>> >>> > to
>> >> >>> >> >>>> >>> > wntagged
>> >> >>> >> >>>> >>> > for
>> >> >>> >> >>>> >>> > senserelate.
>> >> >>> >> >>>> >>> >
>> >> >>> >> >>>> >>> >
>> >> >>> >> >>>> >>> > Arun,
>> >> >>> >> >>>> >>> >
>> >> >>> >> >>>> >>> > On Sun, Apr 24, 2011 at 6:30 PM, varada kolhatkar
>> >> >>> >> >>>> >>> > <varada.kolhatkar@...> wrote:
>> >> >>> >> >>>> >>> >>
>> >> >>> >> >>>> >>> >> Yes, semcor-reformat.pl is the script which can be
>> >> >>> >> >>>> >>> >> used
>> >> >>> >> >>>> >>> >> to
>> >> >>> >> >>>> >>> >> generate wsd
>> >> >>> >> >>>> >>> >> key file. We also provide scorer2-sort.pl as it's
>> >> >>> >> >>>> >>> >> easier
>> >> >>> >> >>>> >>> >> to
>> >> >>> >> >>>> >>> >> compare
>> >> >>> >> >>>> >>> >> sorted
>> >> >>> >> >>>> >>> >> lists.
>> >> >>> >> >>>> >>> >> extract-semcor-plaintext.pl can be used to extract
>> >> >>> >> >>>> >>> >> plain
>> >> >>> >> >>>> >>> >> text
>> >> >>> >> >>>> >>> >> (text
>> >> >>> >> >>>> >>> >> without POS tag info) from semcor. If you want to
>> >> >>> >> >>>> >>> >> experiment
>> >> >>> >> >>>> >>> >> with
>> >> >>> >> >>>> >>> >> the
>> >> >>> >> >>>> >>> >> effect
>> >> >>> >> >>>> >>> >> of POS tagging on wsd, you can use this script.
>> >> >>> >> >>>> >>> >>
>> >> >>> >> >>>> >>> >> Varada
>> >> >>> >> >>>> >>> >> On Sat, Apr 23, 2011 at 11:13 PM, Ted Pedersen
>> >> >>> >> >>>> >>> >> <tpederse@...>
>> >> >>> >> >>>> >>> >> wrote:
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Hi Arun,
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Try out the following commands to create a key
>> >> >>> >> >>>> >>> >>> file...Note
>> >> >>> >> >>>> >>> >>> that
>> >> >>> >> >>>> >>> >>> I'm
>> >> >>> >> >>>> >>> >>> using semcor-sample.txt as the source of the key.
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> This is my key...
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> marengo(22): more semcor-sample.txt
>> >> >>> >> >>>> >>> >>> <contextfile concordance=brown>
>> >> >>> >> >>>> >>> >>> <context filename=br-e24 paras=yes>
>> >> >>> >> >>>> >>> >>> <p pnum=1>
>> >> >>> >> >>>> >>> >>> <s snum=1>
>> >> >>> >> >>>> >>> >>> <wf cmd=ignore pos=DT>The</wf>
>> >> >>> >> >>>> >>> >>> <wf cmd=done pos=JJ lemma=russian wnsn=1
>> >> >>> >> >>>> >>> >>> lexsn=3:01:00::>Russian</wf>
>> >> >>> >> >>>> >>> >>> <wf cmd=done pos=NN lemma=gymnast wnsn=1
>> >> >>> >> >>>> >>> >>> lexsn=1:18:00::>gymnasts</wf>
>> >> >>> >> >>>> >>> >>> <wf cmd=done pos=IN
>> >> >>> >> >>>> >>> >>> ot=idiom>beat_the_tar_out_of</wf>
>> >> >>> >> >>>> >>> >>> <wf cmd=ignore pos=DT>the</wf>
>> >> >>> >> >>>> >>> >>> </s>
>> >> >>> >> >>>> >>> >>> </p>
>> >> >>> >> >>>> >>> >>> </context>
>> >> >>> >> >>>> >>> >>> </contextfile>
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> marengo(20): semcor-reformat.pl --file
>> >> >>> >> >>>> >>> >>> semcor-sample.txt
>> >> >>> >> >>>> >>> >>> --key |
>> >> >>> >> >>>> >>> >>> scorer2-sort.pl > key.txt
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> marengo(21): cat key.txt
>> >> >>> >> >>>> >>> >>> gymnast.n 2 1
>> >> >>> >> >>>> >>> >>> russian.a 1
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Assume that these are the answers generated by my
>> >> >>> >> >>>> >>> >>> system...
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> marengo(23): more answers.txt
>> >> >>> >> >>>> >>> >>> gymnast.n 2 3
>> >> >>> >> >>>> >>> >>> russian.a 1 1
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Then I could run the scorer like this...
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> marengo(29): allwords-scorer2.pl --ansfile
>> >> >>> >> >>>> >>> >>> answers.txt
>> >> >>> >> >>>> >>> >>> --keyfile
>> >> >>> >> >>>> >>> >>> key.txt
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> score for "answers.txt" using key "key.txt" :
>> >> >>> >> >>>> >>> >>> precision: 0.500 (1 correct of 2 attempted.)
>> >> >>> >> >>>> >>> >>> recall: 0.500 (1 correct of 2 in total)
>> >> >>> >> >>>> >>> >>> F-measure: 0.500
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> attempted: 100.00%(2 attempted of 2 in total)
>> >> >>> >> >>>> >>> >>> part of speech tag mismatch in attempted
>> >> >>> >> >>>> >>> >>> instances:
>> >> >>> >> >>>> >>> >>> 0.00% (0
>> >> >>> >> >>>> >>> >>> mismatches of 2 attempted instances)
>> >> >>> >> >>>> >>> >>> skipped instances : 0.00% (skipped 0 instances of
>> >> >>> >> >>>> >>> >>> total
>> >> >>> >> >>>> >>> >>> 2
>> >> >>> >> >>>> >>> >>> instances
>> >> >>> >> >>>> >>> >>> because the instance id or the word was not found
>> >> >>> >> >>>> >>> >>> in
>> >> >>> >> >>>> >>> >>> the
>> >> >>> >> >>>> >>> >>> answer
>> >> >>> >> >>>> >>> >>> file)
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Nouns:
>> >> >>> >> >>>> >>> >>> Precision : 0.000 (0 correct of 1 nouns
>> >> >>> >> >>>> >>> >>> attempted.)
>> >> >>> >> >>>> >>> >>> Recall : 0.000 (0 correct of 1 noun instances in
>> >> >>> >> >>>> >>> >>> total)
>> >> >>> >> >>>> >>> >>> F-measure: 0.000
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Verbs:
>> >> >>> >> >>>> >>> >>> Precision : 0.000 (0 correct of 0 verbs
>> >> >>> >> >>>> >>> >>> attempted.)
>> >> >>> >> >>>> >>> >>> Recall : 0.000 (0 correct of 0 verb instances in
>> >> >>> >> >>>> >>> >>> total)
>> >> >>> >> >>>> >>> >>> F-measure: 0.000
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Adjectives:
>> >> >>> >> >>>> >>> >>> Precision : 1.000 (1 correct of 1 adjectives
>> >> >>> >> >>>> >>> >>> attempted.)
>> >> >>> >> >>>> >>> >>> Recall : 1.000 (1 correct of 1 adjective instances
>> >> >>> >> >>>> >>> >>> in
>> >> >>> >> >>>> >>> >>> total)
>> >> >>> >> >>>> >>> >>> F-measure: 1.000
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Adverbs:
>> >> >>> >> >>>> >>> >>> Precision : 0.000 (0 correct of 0 adverbs
>> >> >>> >> >>>> >>> >>> attempted.)
>> >> >>> >> >>>> >>> >>> Recall : 0.000 (0 correct of 0 adverb instances in
>> >> >>> >> >>>> >>> >>> total)
>> >> >>> >> >>>> >>> >>> F-measure: 0.000
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Confusion Matrix for part of speech tags :
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Noun Verb Adj
>> >> >>> >> >>>> >>> >>> Adv
>> >> >>> >> >>>> >>> >>> | Key
>> >> >>> >> >>>> >>> >>> Noun 1 0 0
>> >> >>> >> >>>> >>> >>> 0
>> >> >>> >> >>>> >>> >>> | 1
>> >> >>> >> >>>> >>> >>> Verb 0 0 0
>> >> >>> >> >>>> >>> >>> 0
>> >> >>> >> >>>> >>> >>> | 0
>> >> >>> >> >>>> >>> >>> Adj 0 0 1
>> >> >>> >> >>>> >>> >>> 0
>> >> >>> >> >>>> >>> >>> | 1
>> >> >>> >> >>>> >>> >>> Adv 0 0 0
>> >> >>> >> >>>> >>> >>> 0
>> >> >>> >> >>>> >>> >>> | 0
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> --------------------------------------------------------------------------------|-------
>> >> >>> >> >>>> >>> >>> Ans 1 0 1
>> >> >>> >> >>>> >>> >>> 0
>> >> >>> >> >>>> >>> >>> | 2
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> I hope this is of some help. Please let us know
>> >> >>> >> >>>> >>> >>> though
>> >> >>> >> >>>> >>> >>> if
>> >> >>> >> >>>> >>> >>> there
>> >> >>> >> >>>> >>> >>> are
>> >> >>> >> >>>> >>> >>> additional issues to resolve!
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> Cordially,
>> >> >>> >> >>>> >>> >>> Ted
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> On Sat, Apr 23, 2011 at 7:54 PM, Arun N
>> >> >>> >> >>>> >>> >>> <arunn3.14@...>
>> >> >>> >> >>>> >>> >>> wrote:
>> >> >>> >> >>>> >>> >>> > Hi,
>> >> >>> >> >>>> >>> >>> > Thanks, for the quick reply.
>> >> >>> >> >>>> >>> >>> >
>> >> >>> >> >>>> >>> >>> > Actually, I wrote my own scorer. But, I am
>> >> >>> >> >>>> >>> >>> > thinking
>> >> >>> >> >>>> >>> >>> > of
>> >> >>> >> >>>> >>> >>> > using
>> >> >>> >> >>>> >>> >>> > the
>> >> >>> >> >>>> >>> >>> > scorer
>> >> >>> >> >>>> >>> >>> > provided in senserelate package.
>> >> >>> >> >>>> >>> >>> >
>> >> >>> >> >>>> >>> >>> > btw, It would be great, if you could tell how to
>> >> >>> >> >>>> >>> >>> > generate
>> >> >>> >> >>>> >>> >>> > the
>> >> >>> >> >>>> >>> >>> > key
>> >> >>> >> >>>> >>> >>> > file
>> >> >>> >> >>>> >>> >>> > for a
>> >> >>> >> >>>> >>> >>> > semcor input file.
>> >> >>> >> >>>> >>> >>> > There is a perl script
>> >> >>> >> >>>> >>> >>> > extract-semcor-plaintext.pl
>> >> >>> >> >>>> >>> >>> > --key
>> >> >>> >> >>>> >>> >>> > flag, but
>> >> >>> >> >>>> >>> >>> > it
>> >> >>> >> >>>> >>> >>> > generates a key with just the POS tags but not
>> >> >>> >> >>>> >>> >>> > the
>> >> >>> >> >>>> >>> >>> > wordnet
>> >> >>> >> >>>> >>> >>> > senses.
>> >> >>> >> >>>> >>> >>> >
>> >> >>> >> >>>> >>> >>> > Do you have any perl code to generate the key
>> >> >>> >> >>>> >>> >>> > file(in a
>> >> >>> >> >>>> >>> >>> > suitable
>> >> >>> >> >>>> >>> >>> > format
>> >> >>> >> >>>> >>> >>> > for
>> >> >>> >> >>>> >>> >>> > scorer) from a semcor file ? so that it can be
>> >> >>> >> >>>> >>> >>> > passed
>> >> >>> >> >>>> >>> >>> > to
>> >> >>> >> >>>> >>> >>> > the
>> >> >>> >> >>>> >>> >>> > allwords-scorer.pl.
>> >> >>> >> >>>> >>> >>> >
>> >> >>> >> >>>> >>> >>> > Arun,
>> >> >>> >> >>>> >>> >>> >
>> >> >>> >> >>>> >>> >>> > On Sat, Apr 23, 2011 at 3:16 PM, Ted Pedersen
>> >> >>> >> >>>> >>> >>> > <tpederse@...>
>> >> >>> >> >>>> >>> >>> > wrote:
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >> Hi Arun,
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >> Nice to hear from you. You may also wish to
>> >> >>> >> >>>> >>> >>> >> consult
>> >> >>> >> >>>> >>> >>> >> Varada
>> >> >>> >> >>>> >>> >>> >> Kolhatkar's
>> >> >>> >> >>>> >>> >>> >> MS thesis, which is a more recent use of
>> >> >>> >> >>>> >>> >>> >> WordNet::SenseRelate::Allwords. While some
>> >> >>> >> >>>> >>> >>> >> differences
>> >> >>> >> >>>> >>> >>> >> in
>> >> >>> >> >>>> >>> >>> >> results
>> >> >>> >> >>>> >>> >>> >> are
>> >> >>> >> >>>> >>> >>> >> to be expected as the years go by (due to
>> >> >>> >> >>>> >>> >>> >> changes
>> >> >>> >> >>>> >>> >>> >> in
>> >> >>> >> >>>> >>> >>> >> WordNet
>> >> >>> >> >>>> >>> >>> >> for
>> >> >>> >> >>>> >>> >>> >> example) they should be fairly minor.
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >> An Extended Analysis of a Method of All Words
>> >> >>> >> >>>> >>> >>> >> Sense
>> >> >>> >> >>>> >>> >>> >> Disambiguation
>> >> >>> >> >>>> >>> >>> >> (Kolhatkar) - Master of Science Thesis,
>> >> >>> >> >>>> >>> >>> >> Department
>> >> >>> >> >>>> >>> >>> >> of
>> >> >>> >> >>>> >>> >>> >> Computer
>> >> >>> >> >>>> >>> >>> >> Science, University of Minnesota, Duluth,
>> >> >>> >> >>>> >>> >>> >> August,
>> >> >>> >> >>>> >>> >>> >> 2009.
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >> http://www.d.umn.edu/~tpederse/Pubs/varada-thesis.pdf
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >> Regarding your results, what were your precision
>> >> >>> >> >>>> >>> >>> >> and
>> >> >>> >> >>>> >>> >>> >> recall
>> >> >>> >> >>>> >>> >>> >> values?
>> >> >>> >> >>>> >>> >>> >> Did you use the scoring program that comes with
>> >> >>> >> >>>> >>> >>> >> WordNet::SenseRelate::AllWords? Also, if you
>> >> >>> >> >>>> >>> >>> >> could
>> >> >>> >> >>>> >>> >>> >> send
>> >> >>> >> >>>> >>> >>> >> the
>> >> >>> >> >>>> >>> >>> >> exact
>> >> >>> >> >>>> >>> >>> >> command you ran that would help us understand
>> >> >>> >> >>>> >>> >>> >> what
>> >> >>> >> >>>> >>> >>> >> might
>> >> >>> >> >>>> >>> >>> >> be
>> >> >>> >> >>>> >>> >>> >> happening.
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >> Thanks!
>> >> >>> >> >>>> >>> >>> >> Ted
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >> On Sat, Apr 23, 2011 at 1:07 PM, Arun N
>> >> >>> >> >>>> >>> >>> >> <arunn3.14@...>
>> >> >>> >> >>>> >>> >>> >> wrote:
>> >> >>> >> >>>> >>> >>> >> > Hi Jason and Ted,
>> >> >>> >> >>>> >>> >>> >> > I am Arun Nedunchezhian, Graduate Student at
>> >> >>> >> >>>> >>> >>> >> > UT
>> >> >>> >> >>>> >>> >>> >> > Austin. I
>> >> >>> >> >>>> >>> >>> >> > am
>> >> >>> >> >>>> >>> >>> >> > working
>> >> >>> >> >>>> >>> >>> >> > on a
>> >> >>> >> >>>> >>> >>> >> > project which uses
>> >> >>> >> >>>> >>> >>> >> > WordNet::SenseRelate::Allwords
>> >> >>> >> >>>> >>> >>> >> > package.
>> >> >>> >> >>>> >>> >>> >> > I read the results section in your(Jason) MS
>> >> >>> >> >>>> >>> >>> >> > thesis.
>> >> >>> >> >>>> >>> >>> >> > You
>> >> >>> >> >>>> >>> >>> >> > have
>> >> >>> >> >>>> >>> >>> >> > mentioned
>> >> >>> >> >>>> >>> >>> >> > that
>> >> >>> >> >>>> >>> >>> >> > Precision and Recall for Semcor5 (5 documents
>> >> >>> >> >>>> >>> >>> >> > from
>> >> >>> >> >>>> >>> >>> >> > semcor[br-a01,br-a02,br-k18,br-m02,br-r05]) is
>> >> >>> >> >>>> >>> >>> >> > .63
>> >> >>> >> >>>> >>> >>> >> > and
>> >> >>> >> >>>> >>> >>> >> > .51.
>> >> >>> >> >>>> >>> >>> >> > I tried to run SR-AW package over the same
>> >> >>> >> >>>> >>> >>> >> > set
>> >> >>> >> >>>> >>> >>> >> > of
>> >> >>> >> >>>> >>> >>> >> > documents
>> >> >>> >> >>>> >>> >>> >> > and I
>> >> >>> >> >>>> >>> >>> >> > got
>> >> >>> >> >>>> >>> >>> >> > much lesser values for Precision and recall.
>> >> >>> >> >>>> >>> >>> >> > Precision = No.of words sense tagged
>> >> >>> >> >>>> >>> >>> >> > correctly
>> >> >>> >> >>>> >>> >>> >> > /
>> >> >>> >> >>>> >>> >>> >> > No.of
>> >> >>> >> >>>> >>> >>> >> > words
>> >> >>> >> >>>> >>> >>> >> > sense
>> >> >>> >> >>>> >>> >>> >> > tagged.
>> >> >>> >> >>>> >>> >>> >> > Recall = No.of words sense tagged correctly
>> >> >>> >> >>>> >>> >>> >> > /
>> >> >>> >> >>>> >>> >>> >> > No.of
>> >> >>> >> >>>> >>> >>> >> > words
>> >> >>> >> >>>> >>> >>> >> > in
>> >> >>> >> >>>> >>> >>> >> > the
>> >> >>> >> >>>> >>> >>> >> > documents(tagged as cmd=done).
>> >> >>> >> >>>> >>> >>> >> > SR-AW tags word either as <word#pos#senseid>
>> >> >>> >> >>>> >>> >>> >> > or
>> >> >>> >> >>>> >>> >>> >> > <word#ND>.
>> >> >>> >> >>>> >>> >>> >> > No. of words sense tagged = count of
>> >> >>> >> >>>> >>> >>> >> > <word#pos#senseid>.
>> >> >>> >> >>>> >>> >>> >> > is the above equation correct ?
>> >> >>> >> >>>> >>> >>> >> > Is this the way to compute precision and
>> >> >>> >> >>>> >>> >>> >> > recall?
>> >> >>> >> >>>> >>> >>> >> > What are the tags that you set for SR-AW
>> >> >>> >> >>>> >>> >>> >> > execution?
>> >> >>> >> >>>> >>> >>> >> > I set the following
>> >> >>> >> >>>> >>> >>> >> > Window = 3
>> >> >>> >> >>>> >>> >>> >> > type = WordNet::SenseRelate::lesk
>> >> >>> >> >>>> >>> >>> >> > I used a stoplist of articles, prepositions.
>> >> >>> >> >>>> >>> >>> >> >
>> >> >>> >> >>>> >>> >>> >> > Arun,
>> >> >>> >> >>>> >>> >>> >> >
>> >> >>> >> >>>> >>> >>> >> > --
>> >> >>> >> >>>> >>> >>> >> > The mind is everything.
>> >> >>> >> >>>> >>> >>> >> > What you think you become. - Buddha
>> >> >>> >> >>>> >>> >>> >> >
>> >> >>> >> >>>> >>> >>> >> >
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >>
>> >> >>> >> >>>> >>> >>> >> --
>> >> >>> >> >>>> >>> >>> >> Ted Pedersen
>> >> >>> >> >>>> >>> >>> >> http://www.d.umn.edu/~tpederse
>> >> >>> >> >>>> >>> >>> >
>> >> >>> >> >>>> >>> >>> >
>> >> >>> >> >>>> >>> >>> >
>> >> >>> >> >>>> >>> >>> > --
>> >> >>> >> >>>> >>> >>> > The mind is everything.
>> >> >>> >> >>>> >>> >>> > What you think you become. - Buddha
>> >> >>> >> >>>> >>> >>> >
>> >> >>> >> >>>> >>> >>> >
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>>
>> >> >>> >> >>>> >>> >>> --
>> >> >>> >> >>>> >>> >>> Ted Pedersen
>> >> >>> >> >>>> >>> >>> http://www.d.umn.edu/~tpederse
>> >> >>> >> >>>> >>> >>
>> >> >>> >> >>>> >>> >
>> >> >>> >> >>>> >>> >
>> >> >>> >> >>>> >>> >
>> >> >>> >> >>>> >>> > --
>> >> >>> >> >>>> >>> > The mind is everything.
>> >> >>> >> >>>> >>> > What you think you become. - Buddha
>> >> >>> >> >>>> >>> >
>> >> >>> >> >>>> >>> >
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>>
>> >> >>> >> >>>> >>> --
>> >> >>> >> >>>> >>> Ted Pedersen
>> >> >>> >> >>>> >>> http://www.d.umn.edu/~tpederse
>> >> >>> >> >>>> >>
>> >> >>> >> >>>> >>
>> >> >>> >> >>>> >>
>> >> >>> >> >>>> >> --
>> >> >>> >> >>>> >> The mind is everything.
>> >> >>> >> >>>> >> What you think you become. - Buddha
>> >> >>> >> >>>> >>
>> >> >>> >> >>>> >>
>> >> >>> >> >>>> >
>> >> >>> >> >>>> >
>> >> >>> >> >>>> >
>> >> >>> >> >>>> > --
>> >> >>> >> >>>> > Ted Pedersen
>> >> >>> >> >>>> > http://www.d.umn.edu/~tpederse
>> >> >>> >> >>>> >
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>>
>> >> >>> >> >>>> --
>> >> >>> >> >>>> Ted Pedersen
>> >> >>> >> >>>> http://www.d.umn.edu/~tpederse
>> >> >>> >> >>>
>> >> >>> >> >>>
>> >> >>> >> >>>
>> >> >>> >> >>> --
>> >> >>> >> >>> The mind is everything.
>> >> >>> >> >>> What you think you become. - Buddha
>> >> >>> >> >>>
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >>
>> >> >>> >> >> --
>> >> >>> >> >> The mind is everything.
>> >> >>> >> >> What you think you become. - Buddha
>> >> >>> >> >>
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >> > --
>> >> >>> >> > The mind is everything.
>> >> >>> >> > What you think you become. - Buddha
>> >> >>> >> >
>> >> >>> >> >
>> >> >>> >>
>> >> >>> >>
>> >> >>> >>
>> >> >>> >> --
>> >> >>> >> Ted Pedersen
>> >> >>> >> http://www.d.umn.edu/~tpederse
>> >> >>> >
>> >> >>> >
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Ted Pedersen
>> >> >>> http://www.d.umn.edu/~tpederse
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> The mind is everything.
>> >> >> What you think you become. - Buddha
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > The mind is everything.
>> >> > What you think you become. - Buddha
>> >> >
>> >> >
>> >>
>> >>
>> >>
>> >> --
>> >> Ted Pedersen
>> >> http://www.d.umn.edu/~tpederse
>> >
>> >
>> >
>> > --
>> > The mind is everything.
>> > What you think you become. - Buddha
>> >
>> >
>>
>>
>>
>> --
>> Ted Pedersen
>> http://www.d.umn.edu/~tpederse
>
>
>
> --
> The mind is everything.
> What you think you become. - Buddha
>
>
--
Ted Pedersen
http://www.d.umn.edu/~tpederse
|