[Classifier4j-devel] pseudocode
Status: Beta
Brought to you by:
nicklothian
From: satmeet <ja...@sa...> - 2004-08-03 10:54:41
|
hi , I looked at the Code again and have gone through, the archives of the mailing list (first 5 months). I am writing down a pseudocode for Bayesian that I maybe you could help me visualize in terms of CLASSIFIER4J. I am uploading this to>> www.satmeet.com/bayesian.html its optimized for IE (sorry ,I was in a hurry) I would like to know, what is the sequence for making tokens from text, then using them for Training and Classification . If you could just tell me the flow of Classifier4J according to this given pattern , I know you are busy people but I will be very greatful if you could help me . In pseudo-code training is ,Given: an email message, X, and a label Ci ε {CN,CS}, 1. break X into its tokens, hx1, . . . , xki 2. for each token, xj (a) Increment the counter for token xj for class Ci (b) Increment the count of total tokens in class Ci 3. Increment the total number of email messages for class Ci And Classification can be written as : Given: an UNLABELED email message, X 1. PN := Pr[CN] 2. PS := Pr[CS] 3. break X into its tokens, hx1, . . . , xki 4. for each token, xj (a) PN := PN • Pr[xj |CN] (b) PS := PS • Pr[xj |CS] 5. if PN >PS then return NORMAL 6. else return SPAM Where Pr[CN] =#NORMAL emails = #NORMAL emails -------------- --------------- total # emails # SPAM + # NORMAL Pr[CS] = # SPAM emails = #SPAM emails ---------- ------------ total # emails #SPAM + # NORMAL Pr[xj |Ci] = # of tokens of type xj seen in class Ci ------------------------------------------- total # of tokens seen in class Ci Thanking you Satmeet -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/ |