[Classifier4j-devel] How to Classify Subject Field with defaultStopWords.txt

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi 

Filter is working now on black list and white list when I compare the "from"
field.

If I want to apply the filtering on "subject" field (but its giving me 0.5
or 0.99 no matter what subject I use)

At the moment I am doing this:

1)       Transfer each line (which is a single word) of
"defaultStopWords.txt" in an array stopWordListArray[ ]

2)       Then I create another instance of IwordDatasource as (swds) and
ITrainableClassifier as (sclassifier).

3)       I used a for loop to teach match. I know that I should also train
non match as well. But not sure with What?

4)       I was wondering with that does the c4J uses defaultStopWords.txt,
automatically or we have to call the list some how?

Here's my code:

            IWordsDataSource swds = new SimpleWordsDataSource();

            ITrainableClassifier sclassifier = new BayesianClassifier(swds);

            for (int i=0; i<stopWordListArray.length; i++) {

                sclassifier.teachMatch(stopWordListArray[i]);

             }

            for (int i=0; i<n; i++) {

                        double result[] = new double[n];

                        result[i] =
sclassifier.classify(message[i].getSubject());

                        System.out.println("The Probability of the message
no. " + i + " is: " + result[i] );

             }

Thanks heaps for your help