Re: [Classifier4j-devel] pseudocode
Status: Beta
Brought to you by:
nicklothian
From: Nick L. <ni...@ma...> - 2004-08-07 12:30:16
|
Sorry - I've been away for a few days. To train Classifier4J, do something like this: http://sourceforge.net/mailarchive/forum.php?thread_id=3D5110155&forum_id= =3D34026 That will y the text passed to the teachMatch & teachNonMatch methods as=20 it goes. To classify after training use the classify(String) and/or=20 isMatch(String) methods. Nick satmeet wrote: > hi , > I looked at the Code again and have gone through, the archives of the=20 > mailing list (first 5 months). I am writing down a pseudocode for=20 > Bayesian that I maybe you could help me visualize in terms of=20 > CLASSIFIER4J. > I am uploading this to>> www.satmeet.com/bayesian.html > its optimized for IE (sorry ,I was in a hurry) > > I would like to know, what is the sequence for making tokens from=20 > text, then using them for Training and Classification . If you could=20 > just tell me the flow of Classifier4J according to this given pattern=20 > , I know you are busy people but I will be very greatful if you could=20 > help me . > > In pseudo-code training is ,Given: an email message, X, and a label Ci=20 > =CE=B5 {CN,CS}, > 1. break X into its tokens, hx1, . . . , xki > 2. for each token, xj > (a) Increment the counter for token xj for class Ci > (b) Increment the count of total tokens in class Ci > 3. Increment the total number of email messages for class Ci =20 > > > And Classification can be written as : > Given: an UNLABELED email message, X > 1. PN :=3D Pr[CN] > 2. PS :=3D Pr[CS] > 3. break X into its tokens, hx1, . . . , xki > 4. for each token, xj > (a) PN :=3D PN =E2=80=A2 Pr[xj |CN] > (b) PS :=3D PS =E2=80=A2 Pr[xj |CS] > 5. if PN >PS then return NORMAL > 6. else return SPAM > > > Where Pr[CN] =3D#NORMAL emails =3D #NORMAL emails > -------------- --------------- > total # emails # SPAM + # NORMAL > > Pr[CS] =3D # SPAM emails =3D #SPAM emails > ---------- ------------ > total # emails #SPAM + # NORMAL > > Pr[xj |Ci] =3D # of tokens of type xj seen in class Ci > ------------------------------------------- > total # of tokens seen in class Ci > > Thanking you > > Satmeet > > > > |