[Classifier4j-devel] How to Classify Subject Field with defaultStopWords.txt
Status: Beta
Brought to you by:
nicklothian
|
From: Kashif <ks...@ai...> - 2004-07-16 08:11:23
|
Hi
Filter is working now on black list and white list when I compare the "from"
field.
If I want to apply the filtering on "subject" field (but its giving me 0.5
or 0.99 no matter what subject I use)
At the moment I am doing this:
1) Transfer each line (which is a single word) of
"defaultStopWords.txt" in an array stopWordListArray[ ]
2) Then I create another instance of IwordDatasource as (swds) and
ITrainableClassifier as (sclassifier).
3) I used a for loop to teach match. I know that I should also train
non match as well. But not sure with What?
4) I was wondering with that does the c4J uses defaultStopWords.txt,
automatically or we have to call the list some how?
Here's my code:
IWordsDataSource swds = new SimpleWordsDataSource();
ITrainableClassifier sclassifier = new BayesianClassifier(swds);
for (int i=0; i<stopWordListArray.length; i++) {
sclassifier.teachMatch(stopWordListArray[i]);
}
for (int i=0; i<n; i++) {
double result[] = new double[n];
result[i] =
sclassifier.classify(message[i].getSubject());
System.out.println("The Probability of the message
no. " + i + " is: " + result[i] );
}
Thanks heaps for your help
|