[Tess-developers] TheSpamSecretary TheSpamSecretary.py,1.13,1.14
Brought to you by:
kwerle
|
From: <kw...@us...> - 2003-05-04 20:34:01
|
Update of /cvsroot/tess/TheSpamSecretary
In directory sc8-pr-cvs1:/tmp/cvs-serv19652
Modified Files:
TheSpamSecretary.py
Log Message:
Added subject tagging, which will double-count subject words (and mark them as SUBJECT:word in the keyvalue dicts).
Index: TheSpamSecretary.py
===================================================================
RCS file: /cvsroot/tess/TheSpamSecretary/TheSpamSecretary.py,v
retrieving revision 1.13
retrieving revision 1.14
diff -C2 -d -r1.13 -r1.14
*** TheSpamSecretary.py 12 Apr 2003 17:33:38 -0000 1.13
--- TheSpamSecretary.py 4 May 2003 20:33:58 -0000 1.14
***************
*** 457,460 ****
--- 457,462 ----
outputData = StringIO.StringIO()
outputData.write(aMessage)
+ #sys.stderr.write("Subject: %s\n" % aMessage.getheader('Subject'))
+ self.addTokensFromTextToDict(aMessage.getheader('Subject'), self.tempDict, "SUBJECT:")
#print("MS:%s:ME" % outputData.getvalue())
#deal with mime messages
***************
*** 519,525 ****
##################################################
! def addTokensFromTextToDict(self, someText, someDict):
"""
! Find all the tokens in the text and add them to the given dict
"""
someText = someText.lower()
--- 521,528 ----
##################################################
! def addTokensFromTextToDict(self, someText, someDict, textType = ''):
"""
! Find all the tokens in the text and add them to the given dict.
! textType is the type of text being added - '' for body text, SUBJECT: for subject text.
"""
someText = someText.lower()
***************
*** 532,535 ****
--- 535,540 ----
if (len(one_word) > self.MAX_WORD_LENGTH):
continue
+ one_word = textType + one_word
+ #sys.stderr.write("One word: %s\n" % one_word)
word_count = someDict.get(one_word)
try:
***************
*** 634,638 ****
interestValue = .5 - one_prob
interestValue *= 2.0
! #print("%s %s %s" % (one_key, interestValue, one_prob))
if ((interestValue > leastInteresting) or (len(interestingListValues) < 15)):
#INSERT SORT - FIX ME - will sorting be a win?
--- 639,644 ----
interestValue = .5 - one_prob
interestValue *= 2.0
! if (self.debugFilter):
! print("%s %s" % (one_key, one_prob))
if ((interestValue > leastInteresting) or (len(interestingListValues) < 15)):
#INSERT SORT - FIX ME - will sorting be a win?
|