Re: [Classifier4j-devel] pseudocode
Status: Beta
Brought to you by:
nicklothian
From: Nick L. <ni...@ma...> - 2004-08-07 15:58:08
|
That should read: That will _tokenise_ the text passed to the teachMatch & teachNonMatch=20 methods as it goes. > Sorry - I've been away for a few days. > > To train Classifier4J, do something like this: > > http://sourceforge.net/mailarchive/forum.php?thread_id=3D5110155&forum_= id=3D34026=20 > > > That will y the text passed to the teachMatch & teachNonMatch methods=20 > as it goes. > > To classify after training use the classify(String) and/or=20 > isMatch(String) methods. > > Nick > > > satmeet wrote: > >> hi , >> I looked at the Code again and have gone through, the archives of the=20 >> mailing list (first 5 months). I am writing down a pseudocode for=20 >> Bayesian that I maybe you could help me visualize in terms of=20 >> CLASSIFIER4J. >> I am uploading this to>> www.satmeet.com/bayesian.html >> its optimized for IE (sorry ,I was in a hurry) >> >> I would like to know, what is the sequence for making tokens from=20 >> text, then using them for Training and Classification . If you could=20 >> just tell me the flow of Classifier4J according to this given pattern=20 >> , I know you are busy people but I will be very greatful if you could=20 >> help me . >> >> In pseudo-code training is ,Given: an email message, X, and a label=20 >> Ci =CE=B5 {CN,CS}, >> 1. break X into its tokens, hx1, . . . , xki >> 2. for each token, xj >> (a) Increment the counter for token xj for class Ci >> (b) Increment the count of total tokens in class Ci >> 3. Increment the total number of email messages for class Ci =20 >> >> And Classification can be written as : >> Given: an UNLABELED email message, X >> 1. PN :=3D Pr[CN] >> 2. PS :=3D Pr[CS] >> 3. break X into its tokens, hx1, . . . , xki >> 4. for each token, xj >> (a) PN :=3D PN =E2=80=A2 Pr[xj |CN] >> (b) PS :=3D PS =E2=80=A2 Pr[xj |CS] >> 5. if PN >PS then return NORMAL >> 6. else return SPAM >> >> >> Where Pr[CN] =3D#NORMAL emails =3D #NORMAL emails >> -------------- --------------- >> total # emails # SPAM + # NORMAL >> >> Pr[CS] =3D # SPAM emails =3D #SPAM emails >> ---------- ------------ >> total # emails #SPAM + # NORMAL >> >> Pr[xj |Ci] =3D # of tokens of type xj seen in class Ci >> ------------------------------------------- >> total # of tokens seen in class Ci >> >> Thanking you >> >> Satmeet >> >> >> >> > > > > ------------------------------------------------------- > This SF.Net email is sponsored by OSTG. Have you noticed the changes on > Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, > one more big change to announce. We are now OSTG- Open Source Technolog= y > Group. Come see the changes on the new OSTG site. www.ostg.com > _______________________________________________ > Classifier4j-devel mailing list > Cla...@li... > https://lists.sourceforge.net/lists/listinfo/classifier4j-devel > |