[Classifier4j-devel] pseudocode

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

hi ,
I looked at the Code again and have gone through, the archives of the 
mailing list (first 5 months). I am writing down a pseudocode for Bayesian 
that I maybe you could help me visualize in terms of CLASSIFIER4J.
I am uploading this to>>  www.satmeet.com/bayesian.html
its optimized for IE (sorry ,I was in a hurry)

I would like to know, what is the sequence for making tokens from text, 
then using them for Training and Classification . If you could just tell 
me the flow of Classifier4J according to this given pattern , I know you 
are busy people but I will be very greatful if you could help me .

In pseudo-code training is ,Given: an email message, X, and a label Ci ε 
{CN,CS},
1. break X into its tokens, hx1, . . . , xki
2. for each token, xj
(a) Increment the counter for token xj for class Ci
(b) Increment the count of total tokens in class Ci
3. Increment the total number of email messages for class Ci   	

And Classification can be written as :
Given: an UNLABELED email message, X
1. PN := Pr[CN]
2. PS := Pr[CS]
3. break X into its tokens, hx1, . . . , xki
4. for each token, xj
(a) PN := PN • Pr[xj |CN]
(b) PS := PS • Pr[xj |CS]
5. if PN >PS then return NORMAL
6. else return SPAM

Where Pr[CN] =#NORMAL emails     =     #NORMAL emails
                  --------------           ---------------
                  total # emails           # SPAM + # NORMAL

Pr[CS] = # SPAM emails           =   #SPAM emails
               ----------                ------------
              total # emails             #SPAM + # NORMAL

Pr[xj |Ci] = # of tokens of type xj seen in class Ci
                 -------------------------------------------
                      total # of tokens seen in class Ci

Thanking you

Satmeet

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/