Thanks for your reply.
At the moment I'm just searching for certain words and I'm counting the
number of their occurrences in my documents, and the corpus and word lists
that I'm using both in GATE and Java are the same. Since it seems to be a
straight forward procedure, as you said , the results should be almost the
same, but still couldn't realize what causes such a big difference.
On Wed, Oct 6, 2010 at 6:18 PM, Shekhar Pradhan
> It depends on how you've written your JAPE rules. Clearly, something about
> your rules causes GATE to miss certain occurrences of positive and negative
> If all you are doing is counting the occurrences of certain terms in a
> bunch of documents, there is no advantage to using the machinery of GATE
> over a Java program. But if you want to correctly identify that in a
> sentence like "this movie can hardly be said to be fantastic" the author is
> not praising the movie, using GATE can be very helpful. Or a sentence like
> "some people think the movie is fantastic but I heartily disagree" correctly
> identifying this as a negative sentiment requires some sophisticated JAPE
> Shekhar Pradhan
> -----Maral Dadvar <dadvar.maral@...> wrote: -----
> To: Diana Maynard <d.maynard@...>
> From: Maral Dadvar <dadvar.maral@...>
> Date: 10/05/2010 08:32AM
> cc: gate-users@...
> Subject: Re: [gate-users] Java vs. GATE
> Thank you for your reply Diana.
> I have added two new word lists ('Positives' and 'Negatives') to the GATE
> gazetteer lists. They contain a list of synonyms for positive and negative
> words which are mostly use for expressing opinions. I also have a Jape file,
> which looks for these words in the documents and for each group creates a
> new annotation set called Negatives and Positives.
> Another Jape file would count the number of words which are labeled with
> each of these new annotation sets, and according to their frequency,
> determines either a document is positive or negative. Total number of
> documents which are determined to be negative are names Totalnegatives and
> total number of documents which are determined to be positive are names
> At the end, by using ANNIC, I can see these numbers.
> My ground truth is a collection of 1000 positive reviews and 1000 negative
> reviews. I used these documents as my corpus and I implement cross
> validation technique for my evaluation.
> I hope this information can be helpful.
> On Tue, Oct 5, 2010 at 1:53 PM, Diana Maynard <d.maynard@...:
>> Hi Maral
>> How exactly are you processing them in GATE?
>> On 05/10/2010 12:16, Maral Dadvar wrote:
>>> Hi every one,
>>> At the moment I'm implementing an experiment, Movie review analysis,
>>> using GATE toolkit. This experiment has classified the documents into
>>> positive and negative groups by counting the positive and negative words
>>> in the documents. These words are also listed in two word lists which
>>> are added to gazetteer lists. The number of occurrences of each type of
>>> word is counted and compared with each other and accordingly the
>>> polarity of the document is decided.
>>> I repeated exactly the same procedure with Java. But this time the
>>> precision, recall and accuracy has increased significantly.
>>> I went through some of the documents, but i couldn't find any
>>> explanatory reason.
>>> Any ideas that what causes this big difference between Java and GATE
> Beautiful is writing same markup. Internet Explorer 9 supports
> standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
> Spend less time writing and rewriting code and more time creating great
> experiences on the web. Be a part of the beta today.
> GATE-users mailing list