assorted-commits Mailing List for Assorted projects (Page 48)
Brought to you by:
yangzhang
You can subscribe to this list here.
2007 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(9) |
Dec
(12) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2008 |
Jan
(86) |
Feb
(265) |
Mar
(96) |
Apr
(47) |
May
(136) |
Jun
(28) |
Jul
(57) |
Aug
(42) |
Sep
(20) |
Oct
(67) |
Nov
(37) |
Dec
(34) |
2009 |
Jan
(39) |
Feb
(85) |
Mar
(96) |
Apr
(24) |
May
(82) |
Jun
(13) |
Jul
(10) |
Aug
(8) |
Sep
(2) |
Oct
(20) |
Nov
(31) |
Dec
(17) |
2010 |
Jan
(16) |
Feb
(11) |
Mar
(17) |
Apr
(53) |
May
(31) |
Jun
(13) |
Jul
(3) |
Aug
(6) |
Sep
(11) |
Oct
(4) |
Nov
(17) |
Dec
(17) |
2011 |
Jan
(3) |
Feb
(19) |
Mar
(5) |
Apr
(17) |
May
(3) |
Jun
(4) |
Jul
(14) |
Aug
(3) |
Sep
(2) |
Oct
(1) |
Nov
(3) |
Dec
(2) |
2012 |
Jan
(3) |
Feb
(7) |
Mar
(1) |
Apr
|
May
(1) |
Jun
|
Jul
(4) |
Aug
(5) |
Sep
(2) |
Oct
(3) |
Nov
|
Dec
|
2013 |
Jan
|
Feb
|
Mar
(9) |
Apr
(5) |
May
|
Jun
(2) |
Jul
(1) |
Aug
(10) |
Sep
(1) |
Oct
(2) |
Nov
|
Dec
|
2014 |
Jan
(1) |
Feb
(3) |
Mar
(3) |
Apr
(1) |
May
(4) |
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
|
Dec
|
2016 |
Jan
(1) |
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2017 |
Jan
|
Feb
|
Mar
(1) |
Apr
|
May
(5) |
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(2) |
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(1) |
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: <yan...@us...> - 2008-05-08 08:36:51
|
Revision: 727 http://assorted.svn.sourceforge.net/assorted/?rev=727&view=rev Author: yangzhang Date: 2008-05-08 01:36:43 -0700 (Thu, 08 May 2008) Log Message: ----------- added custom valgrind (instead of using a non-preemptible valgrindrc in configs); still figuring out perl paths and manpaths Modified Paths: -------------- shell-tools/trunk/src/bash-commons/bashrc.bash Modified: shell-tools/trunk/src/bash-commons/bashrc.bash =================================================================== --- shell-tools/trunk/src/bash-commons/bashrc.bash 2008-05-08 08:35:45 UTC (rev 726) +++ shell-tools/trunk/src/bash-commons/bashrc.bash 2008-05-08 08:36:43 UTC (rev 727) @@ -110,6 +110,8 @@ # TODO fix this to not use an explicit ruby version prepend_std PATH bin sbin /sbin /usr/sbin /var/lib/gems/1.8/bin +# TODO fix this, things like /usr/local/ should get precedence over default +# locations prepend_std MANPATH man '' prepend_std INFOPATH info # TODO fix this to not use an explicit ruby version @@ -138,7 +140,7 @@ # perl -prepend_std PERL5LIB lib/perl5/site_perl +prepend_std PERL5LIB lib/perl5/site_perl lib/perl share/perl # python @@ -561,6 +563,17 @@ make -sj2 } +xvalgrind() { + valgrind \ + --track-fds=yes \ + --db-attach=yes \ + --verbose \ + --tool=memcheck \ + --leak-check=yes \ + --leak-resolution=high \ + --show-reachable=yes +} + #function set_title() { # if [ $# -eq 0 ] ; then # eval set -- "$PWD" This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 08:35:57
|
Revision: 726 http://assorted.svn.sourceforge.net/assorted/?rev=726&view=rev Author: yangzhang Date: 2008-05-08 01:35:45 -0700 (Thu, 08 May 2008) Log Message: ----------- fixed pypi urls Modified Paths: -------------- shell-tools/trunk/src/bash-commons/assorted.bash Modified: shell-tools/trunk/src/bash-commons/assorted.bash =================================================================== --- shell-tools/trunk/src/bash-commons/assorted.bash 2008-05-08 08:29:57 UTC (rev 725) +++ shell-tools/trunk/src/bash-commons/assorted.bash 2008-05-08 08:35:45 UTC (rev 726) @@ -48,9 +48,9 @@ elif kind == 'pypi': # TODO this should be more robust yield ( '[download $version egg]', - 'http://pypi.python.org/packages/2.5/p/$project/$( echo $project | sed s/-/_/g )-$version-py2.5.egg' ) + 'http://pypi.python.org/packages/2.5/${project:0:1}/$project/$( echo $project | sed s/-/_/g )-$version-py2.5.egg' ) yield ( '[download $version src tgz]', - 'http://pypi.python.org/packages/source/p/$project/$package.tar.gz' ) + 'http://pypi.python.org/packages/source/${project:0:1}/$project/$package.tar.gz' ) yield ( '[PyPI page]', 'http://pypi.python.org/pypi/$project/' ) tail = '|' if 'pypi' in kinds() else '| [more downloads] |' This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 08:29:51
|
Revision: 725 http://assorted.svn.sourceforge.net/assorted/?rev=725&view=rev Author: yangzhang Date: 2008-05-08 01:29:57 -0700 (Thu, 08 May 2008) Log Message: ----------- tagged 0.1 release Added Paths: ----------- mailing-list-filter/tags/ mailing-list-filter/tags/0.1/ mailing-list-filter/tags/0.1/README mailing-list-filter/tags/0.1/publish.bash mailing-list-filter/tags/0.1/setup.py mailing-list-filter/tags/0.1/src/mlf.py Removed Paths: ------------- mailing-list-filter/tags/0.1/src/filter.py Copied: mailing-list-filter/tags/0.1 (from rev 704, mailing-list-filter/trunk) Copied: mailing-list-filter/tags/0.1/README (from rev 720, mailing-list-filter/trunk/README) =================================================================== --- mailing-list-filter/tags/0.1/README (rev 0) +++ mailing-list-filter/tags/0.1/README 2008-05-08 08:29:57 UTC (rev 725) @@ -0,0 +1,63 @@ +Overview +-------- + +I have a Gmail account that I use for subscribing to and posting to mailing +lists. When dealing with high-volume mailing lists, I am typically only +interested in those threads that I participated in. This is a simple filter +for starring and marking unread any messages belonging to such threads. + +This is accomplished by looking at the set of messages that were either sent +from me or explicitly addressed to me. From this "root set" of messages, we +can use the `Message-ID`, `References`, and `In-Reply-To` headers to determine +threads, and thus the other messages that we care about. + +I have found this to be more accurate than my two original approaches. I used +to have Gmail filters that starred/marked unread any messages containing my +name anywhere in the message. This worked OK since my name is not too common, +but it produced some false positives (not that bad, just unstar messages) and +some false negatives (much harder to detect). + +A second approach is to tag all subjects with some signature string. This +usually is fine, but it doesn't work when you did not start the thread (and +thus determine the subject). You can try to change the subject line, but this +is (1) poor netiquette, (2) unreliable because your reply may not register in +other mail clients as being part of the same thread (and thus other +participants may miss your reply), and (3) unreliable because replies might not +directly referencing your post (either intentionally or unintentionally). It +also fails when others change the subject. Finally, this approach is +unsatisfactory because it pollutes subject lines, and it essentially replicates +exactly what Message-ID was intended for. + +This script is not intended to be a replacement for the Gmail filters. I still +keep those active so that I can get immediate first-pass filtering. I execute +this script on a daily basis to perform second-pass filtering/unfiltering to +catch those false negatives that may have been missed. + +Setup +----- + +Requirements: + +- [argparse](http://argparse.python-hosting.com/) +- [Python Commons](http://assorted.sf.net/python-commons/) 0.4 +- [path](http://www.jorendorff.com/articles/python/path/) + +Install the program using the standard `setup.py` program. + +Future Work Ideas +----------------- + +- Currently, we assume that the server specification points to a mailbox + containing all messages (both sent and received), and a message is determined + to have been sent by you by looking at the From: header field. This works + well with Gmail. An alternative strategy is to look through two folders, one + that's the Inbox and one that's the Sent mailbox, and treat all messages in + Sent as having been sent by you. This is presumably how most other IMAP + servers work. + +- Implement incremental maintenance of local cache. + +- Accept custom operations for filtered/unfiltered messages + (trashing/untrashing, labeling/unlabeling, etc.). + +- Refactor the message fetching/management part out into its own library. Copied: mailing-list-filter/tags/0.1/publish.bash (from rev 717, mailing-list-filter/trunk/publish.bash) =================================================================== --- mailing-list-filter/tags/0.1/publish.bash (rev 0) +++ mailing-list-filter/tags/0.1/publish.bash 2008-05-08 08:29:57 UTC (rev 725) @@ -0,0 +1,8 @@ +#!/usr/bin/env bash + +fullname='Mailing List Filter' +version=0.1 +license=psf +websrcs=( README ) +rels=( pypi: ) +. assorted.bash "$@" Copied: mailing-list-filter/tags/0.1/setup.py (from rev 721, mailing-list-filter/trunk/setup.py) =================================================================== --- mailing-list-filter/tags/0.1/setup.py (rev 0) +++ mailing-list-filter/tags/0.1/setup.py 2008-05-08 08:29:57 UTC (rev 725) @@ -0,0 +1,28 @@ +#!/usr/bin/env python + +from commons.setup import run_setup + +pkg_info_text = """ +Metadata-Version: 1.1 +Name: mailing-list-filter +Version: 0.1 +Author: Yang Zhang +Author-email: yaaang NOSPAM at REMOVECAPS gmail +Home-page: http://assorted.sourceforge.net/mailing-list-filter/ +Download-url: http://pypi.python.org/pypi/mailing-list-filter/ +Summary: Mailing List Filter +License: Python Software Foundation License +Description: Filter mailing list email for relevant threads only. +Keywords: mailing,list,email,filter,IMAP,Gmail +Platform: any +Provides: commons +Classifier: Development Status :: 4 - Beta +Classifier: Environment :: No Input/Output (Daemon) +Classifier: Intended Audience :: End Users/Desktop +Classifier: License :: OSI Approved :: Python Software Foundation License +Classifier: Operating System :: OS Independent +Classifier: Programming Language :: Python +Classifier: Topic :: Communications :: Email +""" + +run_setup(pkg_info_text, scripts = ['src/mlf.py']) Deleted: mailing-list-filter/tags/0.1/src/filter.py =================================================================== --- mailing-list-filter/trunk/src/filter.py 2008-05-07 16:06:28 UTC (rev 704) +++ mailing-list-filter/tags/0.1/src/filter.py 2008-05-08 08:29:57 UTC (rev 725) @@ -1,149 +0,0 @@ -#!/usr/bin/env python - -""" -Given an IMAP mailbox, mark all messages as read except for those threads in -which you were a participant, where thread grouping is performed via the -In-Reply-To and References headers. - -Currently, we assume that the server specification points to a mailbox -containing all messages (both sent and received), and a message is determined -to have been sent by you by looking at the From: header field. This should work -well with Gmail. An alternative strategy is to look through two folders, one -that's the Inbox and one that's the Sent mailbox, and treat all messages in -Sent as having been sent by you. -""" - -from __future__ import with_statement -from collections import defaultdict -from email import message_from_string -from getpass import getpass -from imaplib import IMAP4_SSL -from argparse import ArgumentParser -from path import path -from re import match -from functools import partial -from commons.decs import pickle_memoized -from commons.log import * -from commons.files import cleanse_filename, soft_makedirs -from commons.misc import default_if_none -from commons.networking import logout -from commons.seqs import concat, grouper -from commons.startup import run_main -from contextlib import closing - -info = partial(info, '') -debug = partial(debug, '') -error = partial(error, '') -die = partial(die, '') - -def getmail(imap): - info( 'finding max seqno' ) - ok, [seqnos] = imap.search(None, 'ALL') - maxseqno = int( seqnos.split()[-1] ) - del seqnos - - info( 'actually fetching the messages in chunks' ) - # The syntax/fields of the FETCH command is documented in RFC 2060. Also, - # this article contains a brief overview: - # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/ - # BODY.PEEK prevents the message from automatically being flagged as \Seen. - query = '(FLAGS BODY.PEEK[HEADER.FIELDS (Message-ID References In-Reply-To From Subject)])' - step = 1000 - return list( concat( - imap.fetch('%d:%d' % (start, start + step - 1), query)[1] - for start in xrange(1, maxseqno + 1, step) ) ) - -def main(argv): - import logging - config_logging(level = logging.INFO, do_console = True) - - p = ArgumentParser(description = __doc__) - p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(), - help = """File containing your login credentials, with the username on the - first line and the password on the second line. Ignored iff --prompt.""") - p.add_argument('--cachedir', default = path( '~/.mlf.cache' ).expanduser(), - help = "Directory to use for caching our data.") - p.add_argument('--prompt', action = 'store_true', - help = "Interactively prompt for the username and password.") - p.add_argument('sender', - help = "Your email address.") - p.add_argument('server', - help = "The server in the format: <host>[:<port>][/<mailbox>].") - - cfg = p.parse_args(argv[1:]) - - if cfg.prompt: - print "username:", - cfg.user = raw_input() - print "password:", - cfg.passwd = getpass() - else: - with file(cfg.credfile) as f: - [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines()) - - try: - m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$', cfg.server ) - cfg.host = m.group('host') - cfg.port = int( default_if_none(m.group('port'), 993) ) - cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX') - except: - p.error('Need to specify the server in the correct format.') - - soft_makedirs(cfg.cachedir) - - with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap: - imap.login(cfg.user, cfg.passwd) - with closing(imap) as imap: - # Select the main mailbox (INBOX). - imap.select(cfg.mailbox) - - # Fetch message IDs, references, and senders. - xs = pickle_memoized \ - (lambda imap: cfg.cachedir / cleanse_filename(cfg.sender)) \ - (getmail) \ - (imap) - - debug('fetched:', xs) - - info('determining the set of messages that were sent by you') - - sent = set() - for (envelope, data), paren in grouper(2, xs): - msg = message_from_string(data) - if cfg.sender in msg['From']: - sent.add( msg['Message-ID'] ) - - info( 'find the threads in which I am a participant' ) - - # Every second item is just a closing paren. - # Example data: - # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}', - # 'Message-ID: <mai...@py...>\r\n\r\n'), - # ')', - # ('13301 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {59}', - # 'Message-Id: <200...@hv...>\r\n\r\n'), - # ')', - # ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}', - # 'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')] - for (envelope, data), paren in grouper(2, xs): - m = match( r"(?P<seqno>\d+) \(FLAGS \((?P<flags>[^)]+)\)", envelope ) - seqno = m.group('seqno') - flags = m.group('flags') - if r'\Flagged' in flags: # flags != r'\Seen' and flags != r'\Seen NonJunk': - print 'FLAG' - print seqno, flags - print '\n'.join( map( str, msg.items() ) ) - print - msg = message_from_string(data) - id = msg['Message-ID'] - irt = default_if_none( msg.get_all('In-Reply-To'), [] ) - refs = default_if_none( msg.get_all('References'), [] ) - refs = set( ' '.join( irt + refs ).split() ) - if refs & sent: - print 'SENT' - print seqno, flags - print '\n'.join( map( str, msg.items() ) ) - print -# if refs & sent: - -run_main() Copied: mailing-list-filter/tags/0.1/src/mlf.py (from rev 722, mailing-list-filter/trunk/src/mlf.py) =================================================================== --- mailing-list-filter/tags/0.1/src/mlf.py (rev 0) +++ mailing-list-filter/tags/0.1/src/mlf.py 2008-05-08 08:29:57 UTC (rev 725) @@ -0,0 +1,236 @@ +#!/usr/bin/env python + +""" +Given a Gmail IMAP mailbox, star all messages in which you were a participant +(either a sender or an explicit recipient in To: or Cc:), where thread grouping +is performed via the In-Reply-To and References headers. +""" + +from __future__ import with_statement +from collections import defaultdict +from email import message_from_string +from getpass import getpass +from imaplib import IMAP4_SSL +from argparse import ArgumentParser +from path import path +from re import match +from functools import partial +from itertools import count +from commons.decs import pickle_memoized +from commons.files import cleanse_filename, soft_makedirs +from commons.log import * +from commons.misc import default_if_none, seq +from commons.networking import logout +from commons.seqs import concat, grouper +from commons.startup import run_main +from contextlib import closing +import logging +from commons import log + +info = partial(log.info, 'main') +debug = partial(log.debug, 'main') +warning = partial(log.warning, 'main') +error = partial(log.error, 'main') +die = partial(log.die, 'main') + +def thread_dfs(msg, tid, tid2msgs): + assert msg.tid is None + msg.tid = tid + tid2msgs[tid].append(msg) + for ref in msg.refs: + if ref.tid is None: + thread_dfs(ref, tid, tid2msgs) + else: + assert ref.tid == tid + +def getmail(imap): + info( 'finding max UID' ) + # We use UIDs rather than the default of sequence numbers because UIDs are + # guaranteed to be persistent across sessions. This means that we can, for + # instance, fetch messages in one session and operate on this locally cached + # data before marking messages in a separate session. + ok, [uids] = imap.uid('SEARCH', None, 'ALL') + maxuid = int( uids.split()[-1] ) + del uids + + info( 'actually fetching the messages in chunks up to max', maxuid ) + # The syntax/fields of the FETCH command is documented in RFC 2060. Also, + # this article contains a brief overview: + # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/ + # BODY.PEEK prevents the message from automatically being flagged as \Seen. + query = '(FLAGS BODY.PEEK[HEADER.FIELDS ' \ + '(Message-ID References In-Reply-To From To Cc Subject)])' + step = 1000 + return list( concat( + seq( lambda: info('fetching', start, 'to', start + step - 1), + lambda: imap.uid('FETCH', '%d:%d' % (start, start + step - 1), + query)[1] ) + for start in xrange(1, maxuid + 1, step) ) ) + +def main(argv): + p = ArgumentParser(description = __doc__) + p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(), + help = """File containing your login credentials, with the username on the + first line and the password on the second line. Ignored iff --prompt.""") + p.add_argument('--cachedir', default = path( '~/.mlf.cache' ).expanduser(), + help = "Directory to use for caching our data.") + p.add_argument('--prompt', action = 'store_true', + help = "Interactively prompt for the username and password.") + p.add_argument('--pretend', action = 'store_true', + help = """Do not actually carry out any updates to the server. Use in + conjunction with --debug to observe what would happen.""") + p.add_argument('--no-mark-unseen', action = 'store_true', + help = "Do not mark newly revelant threads as unread.") + p.add_argument('--no-mark-seen', action = 'store_true', + help = "Do not mark newly irrevelant threads as read.") + p.add_argument('--debug', action = 'append', + help = """Enable logging for messages of the given flags. Flags include: + refs (references to missing Message-IDs), dups (duplicate Message-IDs), + main (the main program logic), and star (which messages are being + starred), unstar (which messages are being unstarred).""") + p.add_argument('sender', + help = "Your email address.") + p.add_argument('server', + help = "The server in the format: <host>[:<port>][/<mailbox>].") + + cfg = p.parse_args(argv[1:]) + + config_logging(level = logging.ERROR, do_console = True, flags = cfg.debug) + + if cfg.prompt: + print "username:", + cfg.user = raw_input() + print "password:", + cfg.passwd = getpass() + else: + with file(cfg.credfile) as f: + [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines()) + + try: + m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$', + cfg.server ) + cfg.host = m.group('host') + cfg.port = int( default_if_none(m.group('port'), 993) ) + cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX') + except: + p.error('Need to specify the server in the correct format.') + + soft_makedirs(cfg.cachedir) + + with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap: + imap.login(cfg.user, cfg.passwd) + # Close is only valid in the authenticated state. + with closing(imap) as imap: + # Select the main mailbox (INBOX). + imap.select(cfg.mailbox) + + # Fetch message IDs, references, and senders. + xs = pickle_memoized \ + (lambda imap: cfg.cachedir / cleanse_filename(cfg.sender)) \ + (getmail) \ + (imap) + + log.debug('fetched', xs) + + info('building message-id map and determining the set of messages sent ' + 'by you or addressed to you (the "source set")') + + srcs = [] + mid2msg = {} + # Every second item is just a closing paren. + # Example data: + # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}', + # 'Message-ID: <mai...@py...>\r\n\r\n'), + # ')', + # ('13301 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {59}', + # 'Message-Id: <200...@hv...>\r\n\r\n'), + # ')', + # ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}', + # 'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')] + for (envelope, data), paren in grouper(2, xs): + # Parse the body. + msg = message_from_string(data) + + # Parse the envelope. + m = match( + r"(?P<seqno>\d+) \(UID (?P<uid>\d+) FLAGS \((?P<flags>[^)]+)\)", + envelope ) + msg.seqno = m.group('seqno') + msg.uid = m.group('uid') + msg.flags = m.group('flags').split() + + # Prepare a container for references to other msgs, and initialize the + # thread ID. + msg.refs = [] + msg.tid = None + + # Add these to the map. + if msg['Message-ID'] in mid2msg: + log.warning( 'dups', 'duplicate message IDs:', + msg['Message-ID'], msg['Subject'] ) + mid2msg[ msg['Message-ID'] ] = msg + + # Add to "srcs" set if sent by us or addressed to us. + if ( cfg.sender in default_if_none( msg['From'], '' ) or + cfg.sender in default_if_none( msg['To'], '' ) or + cfg.sender in default_if_none( msg['Cc'], '' ) ): + srcs.append( msg ) + + info( 'constructing undirected graph' ) + + for mid, msg in mid2msg.iteritems(): + # Extract any references. + irt = default_if_none( msg.get_all('In-Reply-To'), [] ) + refs = default_if_none( msg.get_all('References'), [] ) + refs = set( ' '.join( irt + refs ).replace('><', '> <').split() ) + + # Connect nodes in graph bidirectionally. Ignore references to MIDs + # that don't exist. + for ref in refs: + try: + refmsg = mid2msg[ref] + # We can use lists/append (not worry about duplicates) because the + # original sources should be acyclic. If a -> b, then there is no b -> + # a, so when crawling a we can add a <-> b without worrying that later + # we may re-add b -> a. + msg.refs.append(refmsg) + refmsg.refs.append(msg) + except: + log.warning( 'refs', ref ) + + info('finding connected components (grouping the messages into threads)') + + tids = count() + tid2msgs = defaultdict(list) + for mid, msg in mid2msg.iteritems(): + if msg.tid is None: + thread_dfs(msg, tids.next(), tid2msgs) + + info( 'starring the relevant threads, in which I am a participant' ) + + rel_tids = set() + for srcmsg in srcs: + if srcmsg.tid not in rel_tids: + rel_tids.add(srcmsg.tid) + for msg in tid2msgs[srcmsg.tid]: + if r'\Flagged' not in msg.flags: + log.info( 'star', '\n', msg ) + if not cfg.pretend: + imap.uid('STORE', msg.uid, '+FLAGS', r'\Flagged') + if not cfg.no_mark_unseen and r'\Seen' in msg.flags: + imap.uid('STORE', msg.uid, '-FLAGS', r'\Seen') + + info( 'unstarring irrelevant threads, in which I am not a participant' ) + + all_tids = set( tid2msgs.iterkeys() ) + irrel_tids = all_tids - rel_tids + for tid in irrel_tids: + for msg in tid2msgs[tid]: + if r'\Flagged' in msg.flags: + log.info( 'unstar', '\n', msg ) + if not cfg.pretend: + imap.uid('STORE', msg.uid, '-FLAGS', r'\Flagged') + if not cfg.no_mark_seen and r'\Seen' not in msg.flags: + imap.uid('STORE', msg.uid, '+FLAGS', r'\Seen') + +run_main() This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 08:29:17
|
Revision: 724 http://assorted.svn.sourceforge.net/assorted/?rev=724&view=rev Author: yangzhang Date: 2008-05-08 01:29:25 -0700 (Thu, 08 May 2008) Log Message: ----------- tagged release 0.4 Added Paths: ----------- python-commons/tags/0.4/ python-commons/tags/0.4/README python-commons/tags/0.4/publish.bash python-commons/tags/0.4/setup.py python-commons/tags/0.4/src/commons/decs.py python-commons/tags/0.4/src/commons/files.py python-commons/tags/0.4/src/commons/misc.py python-commons/tags/0.4/src/commons/networking.py python-commons/tags/0.4/src/commons/seqs.py python-commons/tags/0.4/src/commons/setup.py Removed Paths: ------------- python-commons/tags/0.4/README python-commons/tags/0.4/publish.bash python-commons/tags/0.4/setup.py python-commons/tags/0.4/src/commons/decs.py python-commons/tags/0.4/src/commons/files.py python-commons/tags/0.4/src/commons/misc.py python-commons/tags/0.4/src/commons/networking.py python-commons/tags/0.4/src/commons/seqs.py python-commons/tags/0.4/src/commons/setup.py Copied: python-commons/tags/0.4 (from rev 679, python-commons/trunk) Deleted: python-commons/tags/0.4/README =================================================================== --- python-commons/trunk/README 2008-04-24 16:13:04 UTC (rev 679) +++ python-commons/tags/0.4/README 2008-05-08 08:29:25 UTC (rev 724) @@ -1,35 +0,0 @@ -[documentation](doc) - -Overview --------- - -Python Commons is a general-purpose library for Python. To get a sense of -what it provides, please glance over the [documentation](doc). - -Requirements ------------- - -- [Python](http://python.org/) 2.5 -- [setuptools](http://peak.telecommunity.com/DevCenter/setuptools) 0.6 - -Certain sub-modules have extra requirements: - -- `async` requires [Twisted](http://twistedmatrix.com/trac/) 2.5 -- `files` requires [path](http://www.jorendorff.com/articles/python/path/) 2.2 - -This library has only been tested on Linux. - -Setup ------ - -To install, run `easy_install python-commons`, or download the source tarball -and run `python setup.py install`. - -Related Work ------------- - -- [ASPN Cookbook]: a valuable repository of Python snippets -- [AIMA Utilities]: accompaniment to a popular AI textbook - -[ASPN Cookbook]: http://aspn.activestate.com/ASPN/Cookbook/Python -[AIMA Utilities]: http://aima.cs.berkeley.edu/python/utils.py Copied: python-commons/tags/0.4/README (from rev 723, python-commons/trunk/README) =================================================================== --- python-commons/tags/0.4/README (rev 0) +++ python-commons/tags/0.4/README 2008-05-08 08:29:25 UTC (rev 724) @@ -0,0 +1,65 @@ +[documentation](doc) + +Overview +-------- + +Python Commons is a general-purpose library for Python. To get a sense of +what it provides, please glance over the [documentation](doc). + +Requirements +------------ + +- [Python](http://python.org/) 2.5 +- [setuptools](http://peak.telecommunity.com/DevCenter/setuptools) 0.6 + +Certain sub-modules have extra requirements: + +- `async` requires [Twisted](http://twistedmatrix.com/trac/) 2.5 +- `files` requires [path](http://www.jorendorff.com/articles/python/path/) 2.2 + +This library has only been tested on Linux. + +Setup +----- + +To install, run `easy_install python-commons`, or download the source tarball +and run `python setup.py install`. + +Related Work +------------ + +- [ASPN Cookbook]: a valuable repository of Python snippets +- [AIMA Utilities]: accompaniment to a popular AI textbook + +[ASPN Cookbook]: http://aspn.activestate.com/ASPN/Cookbook/Python +[AIMA Utilities]: http://aima.cs.berkeley.edu/python/utils.py + +Changes +------- + +version 0.4 + +- removed extraneous debug print statements +- added `logout()` context manager +- added `seq()`, `default_if_none()` +- fixed missing `import` bug +- released for [Mailing List + Filter](http://assorted.sf.net/mailing-list-filter/) + +version 0.3 + +- added versioned guards +- added file memoization +- added retry with exp backoff +- added `countstep()` +- released for + [gbookmark2delicious](http://gbookmark2delicious.googlecode.com/) + +version 0.2 + +- added `clients`, `setup` +- released for [icedb](http://cartel.csail.mit.edu/icedb/) + +version 0.1 + +- initial release Deleted: python-commons/tags/0.4/publish.bash =================================================================== --- python-commons/trunk/publish.bash 2008-04-24 16:13:04 UTC (rev 679) +++ python-commons/tags/0.4/publish.bash 2008-05-08 08:29:25 UTC (rev 724) @@ -1,12 +0,0 @@ -#!/usr/bin/env bash - -post-stage() { - epydoc -o $stagedir/doc src/commons/ -} - -fullname='Python Commons' -version=0.2 -license=psf -websrcs=( README ) -rels=( pypi: ) -. assorted.bash "$@" Copied: python-commons/tags/0.4/publish.bash (from rev 723, python-commons/trunk/publish.bash) =================================================================== --- python-commons/tags/0.4/publish.bash (rev 0) +++ python-commons/tags/0.4/publish.bash 2008-05-08 08:29:25 UTC (rev 724) @@ -0,0 +1,12 @@ +#!/usr/bin/env bash + +post-stage() { + epydoc -o $stagedir/doc src/commons/ +} + +fullname='Python Commons' +version=0.4 +license=psf +websrcs=( README ) +rels=( pypi: ) +. assorted.bash "$@" Deleted: python-commons/tags/0.4/setup.py =================================================================== --- python-commons/trunk/setup.py 2008-04-24 16:13:04 UTC (rev 679) +++ python-commons/tags/0.4/setup.py 2008-05-08 08:29:25 UTC (rev 724) @@ -1,43 +0,0 @@ -#!/usr/bin/env python -# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- -# vim:ft=python:et:sw=4:ts=4 - -import os,sys -sys.path.insert( 0, os.path.join( os.path.dirname( sys.argv[0] ), 'src' ) ) -from commons import setup - -pkg_info_text = """ -Metadata-Version: 1.1 -Name: python-commons -Version: 0.2 -Author: Yang Zhang -Author-email: yaaang NOSPAM at REMOVECAPS gmail -Home-page: http://assorted.sourceforge.net/python-commons -Summary: Python Commons -License: Python Software Foundation License -Description: General-purpose library of utilities and extensions to the - standard library. -Keywords: Python,common,commons,utility,utilities,library,libraries -Platform: any -Provides: commons -Classifier: Development Status :: 4 - Beta -Classifier: Environment :: No Input/Output (Daemon) -Classifier: Intended Audience :: Developers -Classifier: License :: OSI Approved :: Python Software Foundation License -Classifier: Operating System :: OS Independent -Classifier: Programming Language :: Python -Classifier: Topic :: Communications -Classifier: Topic :: Database -Classifier: Topic :: Internet -Classifier: Topic :: Software Development :: Libraries :: Python Modules -Classifier: Topic :: System -Classifier: Topic :: System :: Filesystems -Classifier: Topic :: System :: Logging -Classifier: Topic :: System :: Networking -Classifier: Topic :: Text Processing -Classifier: Topic :: Utilities -""" - -setup.run_setup( pkg_info_text, - #scripts = ['frontend/py_hotshot.py'], - ) Copied: python-commons/tags/0.4/setup.py (from rev 723, python-commons/trunk/setup.py) =================================================================== --- python-commons/tags/0.4/setup.py (rev 0) +++ python-commons/tags/0.4/setup.py 2008-05-08 08:29:25 UTC (rev 724) @@ -0,0 +1,43 @@ +#!/usr/bin/env python +# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- +# vim:ft=python:et:sw=4:ts=4 + +import os,sys +sys.path.insert( 0, os.path.join( os.path.dirname( sys.argv[0] ), 'src' ) ) +from commons import setup + +pkg_info_text = """ +Metadata-Version: 1.1 +Name: python-commons +Version: 0.4 +Author: Yang Zhang +Author-email: yaaang NOSPAM at REMOVECAPS gmail +Home-page: http://assorted.sourceforge.net/python-commons +Summary: Python Commons +License: Python Software Foundation License +Description: General-purpose library of utilities and extensions to the + standard library. +Keywords: Python,common,commons,utility,utilities,library,libraries +Platform: any +Provides: commons +Classifier: Development Status :: 4 - Beta +Classifier: Environment :: No Input/Output (Daemon) +Classifier: Intended Audience :: Developers +Classifier: License :: OSI Approved :: Python Software Foundation License +Classifier: Operating System :: OS Independent +Classifier: Programming Language :: Python +Classifier: Topic :: Communications +Classifier: Topic :: Database +Classifier: Topic :: Internet +Classifier: Topic :: Software Development :: Libraries :: Python Modules +Classifier: Topic :: System +Classifier: Topic :: System :: Filesystems +Classifier: Topic :: System :: Logging +Classifier: Topic :: System :: Networking +Classifier: Topic :: Text Processing +Classifier: Topic :: Utilities +""" + +setup.run_setup( pkg_info_text, + #scripts = ['frontend/py_hotshot.py'], + ) Deleted: python-commons/tags/0.4/src/commons/decs.py =================================================================== --- python-commons/trunk/src/commons/decs.py 2008-04-24 16:13:04 UTC (rev 679) +++ python-commons/tags/0.4/src/commons/decs.py 2008-05-08 08:29:25 UTC (rev 724) @@ -1,92 +0,0 @@ -# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- -# vim:ft=python:et:sw=4:ts=4 - -""" -Decorators and decorator utilities. - -@todo: Move the actual decorators to modules based on their topic. -""" - -import functools, inspect, xmlrpclib - -def wrap_callable(any_callable, before, after): - """ - Wrap any callable with before/after calls. - - From the Python Cookbook. Modified to support C{None} for - C{before} or C{after}. - - @copyright: O'Reilly Media - - @param any_callable: The function to decorate. - @type any_callable: function - - @param before: The pre-processing procedure. If this is C{None}, then no pre-processing will be done. - @type before: function - - @param after: The post-processing procedure. If this is C{None}, then no post-processing will be done. - @type after: function - """ - def _wrapped(*a, **kw): - if before is not None: - before( ) - try: - return any_callable(*a, **kw) - finally: - if after is not None: - after( ) - # In 2.4, only: _wrapped.__name__ = any_callable.__name__ - return _wrapped - -class GenericWrapper( object ): - """ - Wrap all of an object's methods with before/after calls. This is - like a decorator for objects. - - From the I{Python Cookbook}. - - @copyright: O'Reilly Media - """ - def __init__(self, obj, before, after, ignore=( )): - # we must set into __dict__ directly to bypass __setattr__; so, - # we need to reproduce the name-mangling for double-underscores - clasname = 'GenericWrapper' - self.__dict__['_%s__methods' % clasname] = { } - self.__dict__['_%s__obj' % clasname] = obj - for name, method in inspect.getmembers(obj, inspect.ismethod): - if name not in ignore and method not in ignore: - self.__methods[name] = wrap_callable(method, before, after) - def __getattr__(self, name): - try: - return self.__methods[name] - except KeyError: - return getattr(self.__obj, name) - def __setattr__(self, name, value): - setattr(self.__obj, name, value) - -########################################################## - -def xmlrpc_safe(func): - """ - Makes a procedure "XMLRPC-safe" by returning 0 whenever the inner - function returns C{None}. This is useful because XMLRPC requires - return values, and 0 is commonly used when functions don't intend - to return anything. - - Also, if the procedure returns a boolean, it will be wrapped in - L{xmlrpclib.Boolean}. - - @param func: The procedure to decorate. - @type func: function - """ - @functools.wraps(func) - def wrapper(*args,**kwargs): - result = func(*args,**kwargs) - if result is not None: - if type( result ) == bool: - return xmlrpclib.Boolean( result ) - else: - return result - else: - return 0 - return wrapper Copied: python-commons/tags/0.4/src/commons/decs.py (from rev 687, python-commons/trunk/src/commons/decs.py) =================================================================== --- python-commons/tags/0.4/src/commons/decs.py (rev 0) +++ python-commons/tags/0.4/src/commons/decs.py 2008-05-08 08:29:25 UTC (rev 724) @@ -0,0 +1,156 @@ +# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- +# vim:ft=python:et:sw=4:ts=4 + +""" +Decorators and decorator utilities. + +@todo: Move the actual decorators to modules based on their topic. +""" + +from __future__ import with_statement +import functools, inspect, xmlrpclib +from cPickle import * + +def wrap_callable(any_callable, before, after): + """ + Wrap any callable with before/after calls. + + From the Python Cookbook. Modified to support C{None} for + C{before} or C{after}. + + @copyright: O'Reilly Media + + @param any_callable: The function to decorate. + @type any_callable: function + + @param before: The pre-processing procedure. If this is C{None}, then no pre-processing will be done. + @type before: function + + @param after: The post-processing procedure. If this is C{None}, then no post-processing will be done. + @type after: function + """ + def _wrapped(*a, **kw): + if before is not None: + before( ) + try: + return any_callable(*a, **kw) + finally: + if after is not None: + after( ) + # In 2.4, only: _wrapped.__name__ = any_callable.__name__ + return _wrapped + +class GenericWrapper( object ): + """ + Wrap all of an object's methods with before/after calls. This is + like a decorator for objects. + + From the I{Python Cookbook}. + + @copyright: O'Reilly Media + """ + def __init__(self, obj, before, after, ignore=( )): + # we must set into __dict__ directly to bypass __setattr__; so, + # we need to reproduce the name-mangling for double-underscores + clasname = 'GenericWrapper' + self.__dict__['_%s__methods' % clasname] = { } + self.__dict__['_%s__obj' % clasname] = obj + for name, method in inspect.getmembers(obj, inspect.ismethod): + if name not in ignore and method not in ignore: + self.__methods[name] = wrap_callable(method, before, after) + def __getattr__(self, name): + try: + return self.__methods[name] + except KeyError: + return getattr(self.__obj, name) + def __setattr__(self, name, value): + setattr(self.__obj, name, value) + +########################################################## + +def xmlrpc_safe(func): + """ + Makes a procedure "XMLRPC-safe" by returning 0 whenever the inner + function returns C{None}. This is useful because XMLRPC requires + return values, and 0 is commonly used when functions don't intend + to return anything. + + Also, if the procedure returns a boolean, it will be wrapped in + L{xmlrpclib.Boolean}. + + @param func: The procedure to decorate. + @type func: function + """ + @functools.wraps(func) + def wrapper(*args,**kwargs): + result = func(*args,**kwargs) + if result is not None: + if type( result ) == bool: + return xmlrpclib.Boolean( result ) + else: + return result + else: + return 0 + return wrapper + +########################################################## + +def file_memoized(serializer, deserializer, pathfunc): + """ + The string result of the given function is saved to the given path. + + Example:: + + @file_memoized(lambda x,f: f.write(x), + lambda f: f.read(), + lambda: "/tmp/cache") + def foo(): return "hello" + + @file_memoized(pickle.dump, + pickle.load, + lambda x,y: "/tmp/cache-%d-%d" % (x,y)) + def foo(x,y): return "hello %d %d" % (x,y) + + @param serializer: The function to serialize the return value into a + string. This should take the return value object and + the file object. + @type serializer: function + + @param deserializer: The function te deserialize the cache file contents + into the return value. This should take the file + object and return a string. + type: deserializer: function + + @param pathfunc: Returns the path where the files should be saved. This + should be able to take the same arguments as the original + function. + @type pathfunc: str + """ + def dec(func): + @functools.wraps(func) + def wrapper(*args, **kwargs): + p = pathfunc(*args, **kwargs) + try: + with file(p) as f: + return deserializer(f) + except IOError, (errno, errstr): + if errno != 2: raise + with file(p, 'w') as f: + x = func(*args, **kwargs) + serializer(x, f) + return x + return wrapper + return dec + +def file_string_memoized(pathfunc): + """ + Wrapper around L{file_memoized} that expects the decorated function to + return strings, so the string is written verbatim. + """ + return file_memoized(lambda x,f: f.write(x), lambda f: f.read(), pathfunc) + +def pickle_memoized(pathfunc): + """ + Wrapper around L{file_memoized} that uses pickle. + """ + return file_memoized(dump, load, pathfunc) Deleted: python-commons/tags/0.4/src/commons/files.py =================================================================== --- python-commons/trunk/src/commons/files.py 2008-04-24 16:13:04 UTC (rev 679) +++ python-commons/tags/0.4/src/commons/files.py 2008-05-08 08:29:25 UTC (rev 724) @@ -1,115 +0,0 @@ -# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- -# vim:ft=python:et:sw=4:ts=4 - -""" -File and directory manipulation. - -@var invalid_filename_chars: The characters which are usually -prohibited on most modern file systems. - -@var invalid_filename_chars_regex: A regex character class constructed -from L{invalid_filename_chars}. -""" - -from __future__ import with_statement - -import os, re, tempfile - -from path import path - -def soft_makedirs( path ): - """ - Emulate C{mkdir -p} (doesn't complain if it already exists). - - @param path: The path of the directory to create. - @type path: str - - @raise OSError: If it cannot create the directory. It only - swallows OS error 17. - """ - try: - os.makedirs( path ) - except OSError, ex: - if ex.errno == 17: - pass - else: - raise - -def temp_dir( base_dir_name, do_create_subdir = True ): - """ - Get a temporary directory without polluting top-level /tmp. This follows - Ubuntu's conventions, choosing a temporary directory name based on - the given name plus the user name to avoid user conflicts. - - @param base_dir_name: The "name" of the temporary directory. This - is usually identifies the purpose of the directory, or the - application to which the temporary directory belongs. E.g., if joe - calls passes in C{"ssh-agent"} on a standard Linux/Unix system, - then the full path of the temporary directory will be - C{"/tmp/ssh-agent-joe"}. - @type base_dir_name: str - - @param do_create_subdir: If C{True}, then creates a - sub-sub-directory within the temporary sub-directory (and returns - the path to that). The sub-sub-directory's name is randomized - (uses L{tempfile.mkdtemp}). - @type do_create_subdir: bool - - @return: The path to the temporary (sub-)sub-directory. - @rtype: str - """ - base_dir_name += '-' + os.environ[ 'USER' ] - base_dir = path( tempfile.gettempdir() ) / base_dir_name - soft_makedirs( base_dir ) - if do_create_subdir: - return tempfile.mkdtemp( dir = base_dir ) - else: - return base_dir - -invalid_filename_chars = r'*|\/:<>?' -invalid_filename_chars_regex = r'[*|\\\/:<>?]' - -def cleanse_filename( filename ): - """ - Replaces all problematic characters in a filename with C{"_"}, as - specified by L{invalid_filename_chars}. - - @param filename: The filename to cleanse. - @type filename: str - """ - pattern = invalid_filename_chars_regex - return re.sub( pattern, '_', filename ) - -class disk_double_buffer( object ): - """ - A simple disk double-buffer. One file is for reading, the other is for - writing, and a facility for swapping the two roles is provided. - """ - def __init__( self, path_base, do_persist = True ): - self.paths = map( path, [ path_base + '.0', path_base + '.1' ] ) - self.do_persist = do_persist - self.switch_status = path( path_base + '.switched' ) - if not do_persist or not self.switch_status.exists(): - self.w, self.r = 0, 1 # default - else: - self.w, self.r = 1, 0 - self.reload_files() - def reload_files( self ): - self.writer = file( self.paths[ self.w ], 'w' ) - if not self.paths[ self.r ].exists(): - self.paths[ self.r ].touch() - self.reader = file( self.paths[ self.r ] ) - def switch( self ): - self.close() - if self.do_persist: - if self.w == 0: self.switch_status.touch() - else: self.switch_status.remove() - self.r, self.w = self.w, self.r - self.reload_files() - def write( self, x ): - self.writer.write( x ) - def read( self, len = 8192 ): - return self.reader.read( len ) - def close( self ): - self.reader.close() - self.writer.close() Copied: python-commons/tags/0.4/src/commons/files.py (from rev 693, python-commons/trunk/src/commons/files.py) =================================================================== --- python-commons/tags/0.4/src/commons/files.py (rev 0) +++ python-commons/tags/0.4/src/commons/files.py 2008-05-08 08:29:25 UTC (rev 724) @@ -0,0 +1,171 @@ +# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- +# vim:ft=python:et:sw=4:ts=4 + +""" +File and directory manipulation. + +@var invalid_filename_chars: The characters which are usually +prohibited on most modern file systems. + +@var invalid_filename_chars_regex: A regex character class constructed +from L{invalid_filename_chars}. +""" + +from __future__ import with_statement + +import os, re, tempfile +from cPickle import * +from path import path + +def soft_makedirs( path ): + """ + Emulate C{mkdir -p} (doesn't complain if it already exists). + + @param path: The path of the directory to create. + @type path: str + + @raise OSError: If it cannot create the directory. It only + swallows OS error 17. + """ + try: + os.makedirs( path ) + except OSError, ex: + if ex.errno == 17: + pass + else: + raise + +def temp_dir( base_dir_name, do_create_subdir = True ): + """ + Get a temporary directory without polluting top-level /tmp. This follows + Ubuntu's conventions, choosing a temporary directory name based on + the given name plus the user name to avoid user conflicts. + + @param base_dir_name: The "name" of the temporary directory. This + is usually identifies the purpose of the directory, or the + application to which the temporary directory belongs. E.g., if joe + calls passes in C{"ssh-agent"} on a standard Linux/Unix system, + then the full path of the temporary directory will be + C{"/tmp/ssh-agent-joe"}. + @type base_dir_name: str + + @param do_create_subdir: If C{True}, then creates a + sub-sub-directory within the temporary sub-directory (and returns + the path to that). The sub-sub-directory's name is randomized + (uses L{tempfile.mkdtemp}). + @type do_create_subdir: bool + + @return: The path to the temporary (sub-)sub-directory. + @rtype: str + """ + base_dir_name += '-' + os.environ[ 'USER' ] + base_dir = path( tempfile.gettempdir() ) / base_dir_name + soft_makedirs( base_dir ) + if do_create_subdir: + return tempfile.mkdtemp( dir = base_dir ) + else: + return base_dir + +invalid_filename_chars = r'*|\/:<>?' +invalid_filename_chars_regex = r'[*|\\\/:<>?]' + +def cleanse_filename( filename ): + """ + Replaces all problematic characters in a filename with C{"_"}, as + specified by L{invalid_filename_chars}. + + @param filename: The filename to cleanse. + @type filename: str + """ + pattern = invalid_filename_chars_regex + return re.sub( pattern, '_', filename ) + +class disk_double_buffer( object ): + """ + A simple disk double-buffer. One file is for reading, the other is for + writing, and a facility for swapping the two roles is provided. + """ + def __init__( self, path_base, do_persist = True ): + self.paths = map( path, [ path_base + '.0', path_base + '.1' ] ) + self.do_persist = do_persist + self.switch_status = path( path_base + '.switched' ) + if not do_persist or not self.switch_status.exists(): + self.w, self.r = 0, 1 # default + else: + self.w, self.r = 1, 0 + self.reload_files() + def reload_files( self ): + self.writer = file( self.paths[ self.w ], 'w' ) + if not self.paths[ self.r ].exists(): + self.paths[ self.r ].touch() + self.reader = file( self.paths[ self.r ] ) + def switch( self ): + self.close() + if self.do_persist: + if self.w == 0: self.switch_status.touch() + else: self.switch_status.remove() + self.r, self.w = self.w, self.r + self.reload_files() + def write( self, x ): + self.writer.write( x ) + def read( self, len = 8192 ): + return self.reader.read( len ) + def close( self ): + self.reader.close() + self.writer.close() + +def versioned_guard(path, fresh_version): + """ + Maintain a version object. This is useful for working with versioned + caches. + + @param path: The path to the file containing the cached version object. + @type path: str + + @param fresh_version: The actual latest version that the cached version + should be compared against. + @type fresh_version: object (any type that can be compared) + + @return: True iff the cached version is obsolete (less than the fresh + version or doesn't exist). + @rtype: bool + """ + cache_version = None + try: + with file( path ) as f: cache_version = load(f) + except IOError, (errno, errstr): + if errno != 2: raise + if cache_version is None or fresh_version > cache_version: + with file( path, 'w' ) as f: dump(fresh_version, f) + return True + else: + return False + +def versioned_cache(version_path, fresh_version, cache_path, cache_func): + """ + If fresh_version is newer than the version in version_path, then invoke + cache_func and cache the result in cache_path (using pickle). + + Note the design flaw with L{versioned_guard}: the updated version value is + stored immediately, rather than after updating the cache. + + @param version_path: The path to the file version. + @type version_path: str + + @param fresh_version: The actual, up-to-date version value. + @type fresh_version: object (any type that can be compared) + + @param cache_path: The path to the cached data. + @type cache_path: str + + @param cache_func: The function that produces the fresh data to be cached. + @type cache_func: function (no arguments) + """ + if versioned_guard( version_path, fresh_version ): + # cache obsolete, force-fetch new data + result = cache_func() + with file(cache_path, 'w') as f: dump(result, f) + return result + else: + # cache up-to-date (should be available since dlcs-timestamp exists!) + with file(cache_path) as f: return load(f) Deleted: python-commons/tags/0.4/src/commons/misc.py =================================================================== --- python-commons/trunk/src/commons/misc.py 2008-04-24 16:13:04 UTC (rev 679) +++ python-commons/tags/0.4/src/commons/misc.py 2008-05-08 08:29:25 UTC (rev 724) @@ -1,48 +0,0 @@ -# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- -# vim:ft=python:et:sw=4:ts=4 - -from contextlib import * -from time import * - -""" -Miscellanea. -""" - -def generate_bit_fields(count): - """ - A generator of [2^i] for i from 0 to (count - 1). Useful for, - e.g., enumerating bitmask flags:: - - red, yellow, green, blue = generate_bit_fields(4) - color1 = blue - color2 = red | yellow - - @param count: The number of times to perform the left-shift. - @type count: int - """ - j = 1 - for i in xrange( count ): - yield j - j <<= 1 - -@contextmanager -def wall_clock(output): - """ - A simple timer for code sections. - - @param output: The resulting time is put into index 0 of L{output}. - @type output: index-writeable - - Example: - - t = [0] - with wall_clock(t): - sleep(1) - print "the sleep operation took %d seconds" % t[0] - """ - start = time() - try: - yield - finally: - end = time() - output[0] = end - start Copied: python-commons/tags/0.4/src/commons/misc.py (from rev 705, python-commons/trunk/src/commons/misc.py) =================================================================== --- python-commons/tags/0.4/src/commons/misc.py (rev 0) +++ python-commons/tags/0.4/src/commons/misc.py 2008-05-08 08:29:25 UTC (rev 724) @@ -0,0 +1,62 @@ +# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- +# vim:ft=python:et:sw=4:ts=4 + +from contextlib import * +from time import * + +""" +Miscellanea. +""" + +def generate_bit_fields(count): + """ + A generator of [2^i] for i from 0 to (count - 1). Useful for, + e.g., enumerating bitmask flags:: + + red, yellow, green, blue = generate_bit_fields(4) + color1 = blue + color2 = red | yellow + + @param count: The number of times to perform the left-shift. + @type count: int + """ + j = 1 + for i in xrange( count ): + yield j + j <<= 1 + +@contextmanager +def wall_clock(output): + """ + A simple timer for code sections. + + @param output: The resulting time is put into index 0 of L{output}. + @type output: index-writeable + + Example: + + t = [0] + with wall_clock(t): + sleep(1) + print "the sleep operation took %d seconds" % t[0] + """ + start = time() + try: + yield + finally: + end = time() + output[0] = end - start + +def default_if_none(x, d): + """ + Returns L{x} if it's not None, otherwise returns L{d}. + """ + if x is None: return d + else: return x + +def seq(f, g): + """ + Evaluate 0-ary functions L{f} then L{g}, returning L{g()}. + """ + f() + return g() Deleted: python-commons/tags/0.4/src/commons/networking.py =================================================================== --- python-commons/trunk/src/commons/networking.py 2008-04-24 16:13:04 UTC (rev 679) +++ python-commons/tags/0.4/src/commons/networking.py 2008-05-08 08:29:25 UTC (rev 724) @@ -1,39 +0,0 @@ -# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- -# vim:ft=python:et:sw=4:ts=4 - -""" -Networking tools. -""" - -import os, sys - -class NoMacAddrError( Exception ): pass - -def get_mac_addr(): - """ - Simply parses the output of C{ifconfig} or C{ipconfig} to estimate - this machine's IP address. This is not at all reliable, but tends - to work "well enough" for my own purposes. - - From U{http://mail.python.org/pipermail/python-list/2005-December/357300.html}. - - @copyright: Frank Millman - - Note that U{http://libdnet.sf.net/} provides this functionality and much - more. - """ - mac = None - if sys.platform == 'win32': - for line in os.popen("ipconfig /all"): - if line.lstrip().startswith('Physical Address'): - mac = line.split(':')[1].strip().replace('-',':') - break - else: - for line in os.popen("/sbin/ifconfig"): - if line.find('Ether') > -1: - mac = line.split()[4] - break - if mac is None: - raise NoMacAddrError - return mac - Copied: python-commons/tags/0.4/src/commons/networking.py (from rev 706, python-commons/trunk/src/commons/networking.py) =================================================================== --- python-commons/tags/0.4/src/commons/networking.py (rev 0) +++ python-commons/tags/0.4/src/commons/networking.py 2008-05-08 08:29:25 UTC (rev 724) @@ -0,0 +1,74 @@ +# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- +# vim:ft=python:et:sw=4:ts=4 + +""" +Networking tools. +""" + +import os, sys +from time import * +from contextlib import contextmanager + +class NoMacAddrError( Exception ): pass + +def get_mac_addr(): + """ + Simply parses the output of C{ifconfig} or C{ipconfig} to estimate + this machine's IP address. This is not at all reliable, but tends + to work "well enough" for my own purposes. + + From U{http://mail.python.org/pipermail/python-list/2005-December/357300.html}. + + @copyright: Frank Millman + + Note that U{http://libdnet.sf.net/} provides this functionality and much + more. + """ + mac = None + if sys.platform == 'win32': + for line in os.popen("ipconfig /all"): + if line.lstrip().startswith('Physical Address'): + mac = line.split(':')[1].strip().replace('-',':') + break + else: + for line in os.popen("/sbin/ifconfig"): + if line.find('Ether') > -1: + mac = line.split()[4] + break + if mac is None: + raise NoMacAddrError + return mac + +def retry_exp_backoff(initial_backoff, multiplier, func): + """ + Repeatedly invoke L{func} until it succeeds (returns non-None), with + exponentially growing backoff delay between each try. + + @param initial_backoff: The initial backoff. + @type initial_backoff: float + + @param multiplier: The amount by which the backoff is multiplied on each + failure. + @type multiplier: float + + @param func: The zero-argument function to be invoked that returns True on + success and False on failure. + @type func: function + + @return: The result of the function + """ + backoff = initial_backoff + while True: + res = func() + if res is not None: return res + print 'backing off for', backoff + sleep(backoff) + backoff = multiplier * backoff + +@contextmanager +def logout(x): + """ + A context manager for finally calling the C{logout()} method of an object. + """ + try: yield x + finally: x.logout() Deleted: python-commons/tags/0.4/src/commons/seqs.py =================================================================== --- python-commons/trunk/src/commons/seqs.py 2008-04-24 16:13:04 UTC (rev 679) +++ python-commons/tags/0.4/src/commons/seqs.py 2008-05-08 08:29:25 UTC (rev 724) @@ -1,357 +0,0 @@ -# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- -# vim:ft=python:et:sw=4:ts=4 - -from __future__ import ( absolute_import, with_statement ) - -from cStringIO import StringIO -from cPickle import * -from struct import pack, unpack -from contextlib import closing -from itertools import ( chain, count, ifilterfalse, islice, - izip, tee ) -from .log import warning - -""" -Sequences, streams, and generators. - -@var default_chunk_size: The default chunk size used by L{chunkify}. -""" - -default_chunk_size = 8192 - -def read_pickle( read, init = '', length_thresh = 100000 ): - """ - Given a reader function L{read}, reads in pickled objects from it. I am a - generator which yields unpickled objects. I assume that the pickling - is "safe," done using L{safe_pickle}. - - @param read: The reader function that reads from a stream. It should take - a single argument, the number of bytes to consume. - @type read: function - - @return: A tuple whose first element is the deserialized object or None if - EOF was encountered, and whose second element is the remainder bytes until - the EOF that were not consumed by unpickling. - @rtype: (object, str) - """ - with closing( StringIO() ) as sio: - obj = None # return this if we hit eof (not enough bytes read) - sio.write( init ) - - def read_until( target ): - remain = target - streamlen( sio ) - if remain > 0: - chunk = read( remain ) - # append to end - sio.seek(0,2) - sio.write( chunk ) - offset = streamlen( sio ) - sio.seek(0) - return offset >= target - - if read_until(4): - lengthstr = sio.read(4) - (length,) = unpack('i4', lengthstr) - if length_thresh is not None and length > length_thresh or \ - length <= 0: - warning( 'read_pickle', - 'got length', length, - 'streamlen', streamlen(sio), - 'first bytes %x %x %x %x' % tuple(map(ord,lengthstr)) ) - if read_until(length+4): - # start reading from right after header - sio.seek(4) - obj = load(sio) - - return ( obj, sio.read() ) - -def read_pickles( read ): - """ - Reads all the consecutively pickled objects from the L{read} function. - """ - while True: - pair = ( obj, rem ) = read_pickle( read ) - if obj is None: break - yield pair - -class safe_pickler( object ): - def __init__( self, protocol = HIGHEST_PROTOCOL ): - self.sio = StringIO() - self.pickler = Pickler( self.sio, protocol ) - def dumps( self, obj ): - """ - Pickle L{obj} but prepends the serialized length in bytes. - """ - self.pickler.clear_memo() - self.sio.seek(0) - self.pickler.dump(obj) - self.sio.truncate() - msg = self.sio.getvalue() - return pack('i4', self.sio.tell()) + msg - -def write_pickle( obj, write ): - """ - Write L{obj} using function L{write}, in a safe, pickle-able fashion. - """ - return write( safe_pickle( obj ) ) - -def streamlen( stream ): - """ - Get the length of a stream (e.g. file stream or StringIO). - Tries to restore the original position in the stream. - """ - orig_pos = stream.tell() - stream.seek(0,2) # seek to 0 relative to eof - length = stream.tell() # get the position - stream.seek(orig_pos) # return to orig_pos - return length - -def chunkify( stream, chunk_size = default_chunk_size ): - """ - Given an input stream (an object exposing a file-like interface), - reads data in from it one chunk at a time. This is a generator - which yields those chunks as they come. - - @param stream: The input stream. - @type stream: stream - - @param chunk_size: The size of the chunk (usually the number of - bytes to read). - @type chunk_size: int - """ - offset = 0 - while True: - chunk = stream.read( chunk_size ) - if not chunk: - break - yield offset, chunk - offset += len( chunk ) - -def total( iterable ): - """ - Counts the number of items in an iterable. Note that this will - consume the elements of the iterable, and if the iterable is - infinite, this will not halt. - - @param iterable: The iterable to count. - @type iterable - - @return: The number of elements consumed. - @rtype: int - """ - return sum( 1 for i in iterable ) - -#class FilePersistence(): -# def __init__( self ): -# -# -#class DbPersistence(): -# def __init__( self ): -# - -class ClosedError( Exception ): pass - -class PersistentConsumedSeq( object ): - """ - I generate C{[0, 1, ...]}, like L{count}, but I can also - save my state to disk. Similar to L{PersistentSeq}, but instead of - committing on each call to L{next}, require manual explicit calls - to L{commit}. I'm useful for generating unique IDs. - - Why not simply use L{PersistentSeq} instead of me? You usually - can. However, some applications use me for efficiency. For - instance, consider an application that generates a lot of network - packets (with sequence numbers), but only sends a small fraction - of them out onto the network. If we only want to guarantee the - uniqueness of sequence numbers that are exposed to the world, we - need only commit when upon sending a packet, and not on generating - a packet (L{next}). This could avoid excessive writes. - - @ivar seqno: The next sequence number to be generated. - @type seqno: int - """ - def __init__( self, path ): - """ - @param path: File to save my state in. I keep this file open. - @type path: str - """ - try: - self.log = file( path, 'r+' ) - except IOError, ex: - if ex.errno == 2: - self.log = file( path, 'w+' ) - else: - raise - contents = self.log.read() - if len( contents ) > 0: - self.seqno = int( contents ) - else: - self.seqno = 0 - self.max_commit = self.seqno - def next( self ): - """ - @return: The next number in the sequence. - @rtype: int - - @throw ClosedError: If I was previously L{close}d. - """ - if self.log is None: - raise ClosedError() - self.seqno += 1 - return self.seqno - 1 - def commit( self, seqno ): - """ - @param seqno: If this is the maximum committed sequence - number, then commit this sequence number (to disk). The - semantics will get weird if you pass in sequence numbers that - haven't been generated yet. - - @type seqno: int - - @return: The maximum sequence number ever committed (possibly - L{seqno}). - @rtype: int - - @throw ClosedError: If I was previously L{close}d. - """ - if self.log is None: - raise ClosedError() - if seqno > self.max_commit: - # TODO use a more flexible logging system that can switch - # between Python's logging module and Twisted's log module - self.max_commit = seqno - self.log.seek( 0 ) - # yes I write +1 here - self.log.write( str( seqno + 1 ) ) - self.log.truncate() - self.log.flush() - return self.max_commit - def close( self ): - """ - Closes the log file. No more operations can be performed. - """ - self.log.close() - self.log = None - -class PersistentSeq( PersistentConsumedSeq ): - """ - I generate C{[0, 1, ...]}, like L{count}, but I can also - save my state to disk. I save my state immediately to disk on each - call to L{next}. - """ - def __init__( self, path ): - """ - @param path: File to save my state in. I keep this file open. - @type path: str - """ - PersistentConsumedSeq.__init__( self, path ) - def next( self ): - """ - Generates the next number in the sequence and immediately - commits it. - """ - cur = PersistentConsumedSeq.next( self ) - self.commit( cur ) - return cur - -def pairwise(iterable): - "s -> (s0,s1), (s1,s2), (s2, s3), ..." - a, b = tee(iterable) - try: - b.next() - except StopIteration: - pass - return izip(a, b) - -def argmax(sequence, fn=None): - """Two usage patterns: - C{argmax([s0, s1, ...], fn)} - C{argmax([(fn(s0), s0), (fn(s1), s1), ...])} - Both return the si with greatest fn(si)""" - if fn is None: - return max(sequence)[1] - else: - return max((fn(e), e) for e in sequence)[1] - -def argmin(sequence, fn=None): - """Two usage patterns: - C{argmin([s0, s1, ...], fn)} - C{argmin([(fn(s0), s0), (fn(s1), s1), ...])} - Both return the si with smallest fn(si)""" - if fn is None: - return min(sequence)[1] - else: - return min((fn(e), e) for e in sequence)[1] - -def all(seq, pred=bool): - """ - Returns C{True} if C{pred(x) is True} for every element in the - iterable - """ - for elem in ifilterfalse(pred, seq): - return False - return True - -def concat(listOfLists): - return list(chain(*listOfLists)) - -def flatten( stream ): - """ - For each item yielded by L{gen}, if that item is itself an - iterator/generator, then I will recurse into C{flatten(gen)}; - otherwise, I'll yield the yielded item. Thus, I essentially - "flatten" out a tree of iterators. - - I test whether something is an iterator/generator simply by - checking to see if it has a C{next} attribute. Note that this - won't include any iterable, so things like L{list}s are yielded - like any regular item. This is my author's desired behavior! - - I am useful for coroutines, a la DeferredGenerators from Twisted. - - See also: - U{http://mail.python.org/pipermail/python-list/2003-October/232874.html} - """ - for item in stream: - if hasattr( item, 'next' ): - for item in flatten( item ): - yield item - else: - yield item - -def grouper(n, iterable, padvalue=None): - "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" - return izip(*[chain(iterable, repeat(padvalue, n-1))]*n) - -def chunker( n, iterable, in_place = False ): - """ - Like L{grouper} but designed to scale for larger L{n}. Also, does - not perform padding. The end of the stream is reached when we - yield a chunk with fewer than L{n} items. - """ - i = -1 - chunk = [ None ] * n - for i, item in enumerate( iterable ): - chunk[ i % n ] = item - if ( i + 1 ) % n == 0: - yield chunk - if not in_place: chunk = [ None ] * n - else: - if i % n < n - 1: - del chunk[ ( i + 1 ) % n : ] - yield chunk - -def take(n, seq): - return list(islice(seq, n)) - -def delimit(sep, xs): - for x in xs: - yield x - break - for x in xs: - yield sep - yield x - -# TODO not quite right -def interleave(xs, ys): - return concat(izip( xs, ys )) Copied: python-commons/tags/0.4/src/commons/seqs.py (from rev 707, python-commons/trunk/src/commons/seqs.py) =================================================================== --- python-commons/tags/0.4/src/commons/seqs.py (rev 0) +++ python-commons/tags/0.4/src/commons/seqs.py 2008-05-08 08:29:25 UTC (rev 724) @@ -0,0 +1,366 @@ +# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- +# vim:ft=python:et:sw=4:ts=4 + +from __future__ import ( absolute_import, with_statement ) + +from cStringIO import StringIO +from cPickle import * +from struct import pack, unpack +from contextlib import closing +from itertools import ( chain, count, ifilterfalse, islice, + izip, repeat, tee ) +from .log import warning + +""" +Sequences, streams, and generators. + +@var default_chunk_size: The default chunk size used by L{chunkify}. +""" + +default_chunk_size = 8192 + +def read_pickle( read, init = '', length_thresh = 100000 ): + """ + Given a reader function L{read}, reads in pickled objects from it. I am a + generator which yields unpickled objects. I assume that the pickling + is "safe," done using L{safe_pickle}. + + @param read: The reader function that reads from a stream. It should take + a single argument, the number of bytes to consume. + @type read: function + + @return: A tuple whose first element is the deserialized object or None if + EOF was encountered, and whose second element is the remainder bytes until + the EOF that were not consumed by unpickling. + @rtype: (object, str) + """ + with closing( StringIO() ) as sio: + obj = None # return this if we hit eof (not enough bytes read) + sio.write( init ) + + def read_until( target ): + remain = target - streamlen( sio ) + if remain > 0: + chunk = read( remain ) + # append to end + sio.seek(0,2) + sio.write( chunk ) + offset = streamlen( sio ) + sio.seek(0) + return offset >= target + + if read_until(4): + lengthstr = sio.read(4) + (length,) = unpack('i4', lengthstr) + if length_thresh is not None and length > length_thresh or \ + length <= 0: + warning( 'read_pickle', + 'got length', length, + 'streamlen', streamlen(sio), + 'first bytes %x %x %x %x' % tuple(map(ord,lengthstr)) ) + if read_until(length+4): + # start reading from right after header + sio.seek(4) + obj = load(sio) + + return ( obj, sio.read() ) + +def read_pickles( read ): + """ + Reads all the consecutively pickled objects from the L{read} function. + """ + while True: + pair = ( obj, rem ) = read_pickle( read ) + if obj is None: break + yield pair + +class safe_pickler( object ): + def __init__( self, protocol = HIGHEST_PROTOCOL ): + self.sio = StringIO() + self.pickler = Pickler( self.sio, protocol ) + def dumps( self, obj ): + """ + Pickle L{obj} but prepends the serialized length in bytes. + """ + self.pickler.clear_memo() + self.sio.seek(0) + self.pickler.dump(obj) + self.sio.truncate() + msg = self.sio.getvalue() + return pack('i4', self.sio.tell()) + msg + +def write_pickle( obj, write ): + """ + Write L{obj} using function L{write}, in a safe, pickle-able fashion. + """ + return write( safe_pickle( obj ) ) + +def streamlen( stream ): + """ + Get the length of a stream (e.g. file stream or StringIO). + Tries to restore the original position in the stream. + """ + orig_pos = stream.tell() + stream.seek(0,2) # seek to 0 relative to eof + length = stream.tell() # get the position + stream.seek(orig_pos) # return to orig_pos + return length + +def chunkify( stream, chunk_size = default_chunk_size ): + """ + Given an input stream (an object exposing a file-like interface), + reads data in from it one chunk at a time. This is a generator + which yields those chunks as they come. + + @param stream: The input stream. + @type stream: stream + + @param chunk_size: The size of the chunk (usually the number of + bytes to read). + @type chunk_size: int + """ + offset = 0 + while True: + chunk = stream.read( chunk_size ) + if not chunk: + break + yield offset, chunk + offset += len( chunk ) + +def total( iterable ): + """ + Counts the number of items in an iterable. Note that this will + consume the elements of the iterable, and if the iterable is + infinite, this will not halt. + + @param iterable: The iterable to count. + @type iterable + + @return: The number of elements consumed. + @rtype: int + """ + return sum( 1 for i in iterable ) + +#class FilePersistence(): +# def __init__( self ): +# +# +#class DbPersistence(): +# def __init__( self ): +# + +class ClosedError( Exception ): pass + +class PersistentConsumedSeq( object ): + """ + I generate C{[0, 1, ...]}, like L{count}, but I can also + save my state to disk. Similar to L{PersistentSeq}, but instead of + committing on each call to L{next}, require manual explicit calls + to L{commit}. I'm useful for generating unique IDs. + + Why not simply use L{PersistentSeq} instead of me? You usually + can. However, some applications use me for efficiency. For + instance, consider an application that generates a lot of network + packets (with sequence numbers), but only sends a small fraction + of them out onto the network. If we only want to guarantee the + uniqueness of sequence numbers that are exposed to the world, we + need only commit when upon sending a packet, and not on generating + a packet (L{next}). This could avoid excessive writes. + + @ivar seqno: The next sequence number to be generated. + @type seqno: int + """ + def __init__( self, path ): + """ + @param path: File to save my state in. I keep this file open. + @type path: str + """ + try: + self.log = file( path, 'r+' ) + except IOError, ex: + if ex.errno == 2: + self.log = file( path, 'w+' ) + else: + raise + contents = self.log.read() + if len( contents ) > 0: + self.seqno = int( contents ) + else: + self.seqno = 0 + self.max_commit = self.seqno + def next( self ): + """ + @return: The next number in the sequence. + @rtype: int + + @throw ClosedError: If I was previously L{close}d. + """ + if self.log is None: + raise ClosedError() + self.seqno += 1 + return self.seqno - 1 + def commit( self, seqno ): + """ + @param seqno: If this is the maximum committed sequence + number, then commit this sequence number (to disk). The + semantics will get weird if you pass in sequence numbers that + haven't been generated yet. + + @type seqno: int + + @return: The maximum sequence number ever committed (possibly + L{seqno}). + @rtype: int + + @throw ClosedError: If I was previously L{close}d. + """ + if self.log is None: + raise ClosedError() + if seqno > self.max_commit: + # TODO use a more flexible logging system that can switch + # between Python's logging module and Twisted's log module + self.max_commit = seqno + self.log.seek( 0 ) + # yes I write +1 here + self.log.write( str( seqno + 1 ) ) + self.log.truncate() + self.log.flush() + return self.max_commit + def close( self ): + """ + Closes the log file. No more operations can be performed. + """ + self.log.close() + self.log = None + +class PersistentSeq( PersistentConsumedSeq ): + """ + I generate C{[0, 1, ...]}, like L{count}, but I can also + save my state to disk. I save my state immediately to disk on each + call to L{next}. + """ + def __init__( self, path ): + """ + @param path: File to save my state in. I keep this file open. + @type path: str + """ + PersistentConsumedSeq.__init__( self, path ) + def next( self ): + """ + Generates the next number in the sequence and immediately + commits it. + """ + cur = PersistentConsumedSeq.next( self ) + self.commit( cur ) + return cur + +def pairwise(iterable): + "s -> (s0,s1), (s1,s2), (s2, s3), ..." + a, b = tee(iterable) + try: + b.next() + except StopIteration: + pass + return izip(a, b) + +def argmax(sequence, fn=None): + """Two usage patterns: + C{argmax([s0, s1, ...], fn)} + C{argmax([(fn(s0), s0), (fn(s1), s1), ...])} + Both return the si with greatest fn(si)""" + if fn is None: + return max(sequence)[1] + else: + return max((fn(e), e) for e in sequence)[1] + +def argmin(sequence, fn=None): + """Two usage patterns: + C{argmin([s0, s1, ...], fn)} + C{argmin([(fn(s0), s0), (fn(s1), s1), ...])} + Both return the si with smallest fn(si)""" + if fn is None: + return min(sequence)[1] + else: + return min((fn(e), e) for e in sequence)[1] + +def all(seq, pred=bool): + """ + Returns C{True} if C{pred(x) is True} for every element in the + iterable + """ + for elem in ifilterfalse(pred, seq): + return False + return True + +def concat(listOfLists): + return list(chain(*listOfLists)) + +def flatten( stream ): + """ + For each item yielded by L{gen}, if that item is itself an + iterator/generator, then I will recurse into C{flatten(gen)}; + otherwise, I'll yield the yielded item. Thus, I essentially + "flatten" out a tree of iterators. + + I test whether something is an iterator/generator simply by + checking to see if it has a C{next} attribute. Note that this + won't include any iterable, so things like L{list}s are yielded + like any regular item. This is my author's desired behavior! + + I am useful for coroutines, a la DeferredGenerators from Twisted. + + See also: + U{http://mail.python.org/pipermail/python-list/2003-October/232874.html} + """ + for item in stream: + if hasattr( item, 'next' ): + for item in flatten( item ): + yield item + else: + yield item + +def grouper(n, iterable, padvalue=None): + "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')" + return izip(*[chain(iterable, repeat(padvalue, n-1))]*n) + +def chunker( n, iterable, in_place = False ): + """ + Like L{grouper} but designed to scale for larger L{n}. Also, does + not perform padding. The end of the stream is reached when we + yield a chunk with fewer than L{n} items. + """ + i = -1 + chunk = [ None ] * n + for i, item in enumerate( iterable ): + chunk[ i % n ] = item + if ( i + 1 ) % n == 0: + yield chunk + if not in_place: chunk = [ None ] * n + else: + if i % n < n - 1: + del chunk[ ( i + 1 ) % n : ] + yield chunk + +def countstep(start, step): + """ + Generate [start, start+step, start+2*step, start+3*step, ...]. + """ + i = start + while True: + yield i + i += step + +def take(n, seq): + return list(islice(seq, n)) + +def delimit(sep, xs): + for x in xs: + yield x + break + for x in xs: + yield sep + yield x + +# TODO not quite right +def interleave(xs, ys): + return concat(izip( xs, ys )) Deleted: python-commons/tags/0.4/src/commons/setup.py =================================================================== --- python-commons/trunk/src/commons/setup.py 2008-04-24 16:13:04 UTC (rev 679) +++ python-commons/tags/0.4/src/commons/setup.py 2008-05-08 08:29:25 UTC (rev 724) @@ -1,98 +0,0 @@ -#!/usr/bin/env python -# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*- -# vim:ft=python:et:sw=4:ts=4 - -""" -Common code for setup.py files. -""" - -arg_keys = """ -name -version -author -author_email -description: Summary -download_url: Download-url -long_description: Description -keywords: Keywords -url: Home-page -license -classifiers: Classifier -platforms: Platform -""" - -import sys -if not hasattr(sys, "version_info") or sys.version_info < (2, 3): - from distutils.core import setup - _setup = setup - def setup(**kwargs): - for key in [ - # distutils >= Python 2.3 args - # XXX probably download_url came... [truncated message content] |
From: <yan...@us...> - 2008-05-08 08:27:31
|
Revision: 723 http://assorted.svn.sourceforge.net/assorted/?rev=723&view=rev Author: yangzhang Date: 2008-05-08 01:27:37 -0700 (Thu, 08 May 2008) Log Message: ----------- updated version Modified Paths: -------------- python-commons/trunk/README python-commons/trunk/publish.bash python-commons/trunk/setup.py Modified: python-commons/trunk/README =================================================================== --- python-commons/trunk/README 2008-05-08 08:26:56 UTC (rev 722) +++ python-commons/trunk/README 2008-05-08 08:27:37 UTC (rev 723) @@ -37,9 +37,14 @@ Changes ------- -version 0.3.1 +version 0.4 - removed extraneous debug print statements +- added `logout()` context manager +- added `seq()`, `default_if_none()` +- fixed missing `import` bug +- released for [Mailing List + Filter](http://assorted.sf.net/mailing-list-filter/) version 0.3 Modified: python-commons/trunk/publish.bash =================================================================== --- python-commons/trunk/publish.bash 2008-05-08 08:26:56 UTC (rev 722) +++ python-commons/trunk/publish.bash 2008-05-08 08:27:37 UTC (rev 723) @@ -5,7 +5,7 @@ } fullname='Python Commons' -version=0.3.1 +version=0.4 license=psf websrcs=( README ) rels=( pypi: ) Modified: python-commons/trunk/setup.py =================================================================== --- python-commons/trunk/setup.py 2008-05-08 08:26:56 UTC (rev 722) +++ python-commons/trunk/setup.py 2008-05-08 08:27:37 UTC (rev 723) @@ -9,7 +9,7 @@ pkg_info_text = """ Metadata-Version: 1.1 Name: python-commons -Version: 0.3.1 +Version: 0.4 Author: Yang Zhang Author-email: yaaang NOSPAM at REMOVECAPS gmail Home-page: http://assorted.sourceforge.net/python-commons This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 08:26:51
|
Revision: 722 http://assorted.svn.sourceforge.net/assorted/?rev=722&view=rev Author: yangzhang Date: 2008-05-08 01:26:56 -0700 (Thu, 08 May 2008) Log Message: ----------- amoved ideas list to README Modified Paths: -------------- mailing-list-filter/trunk/src/mlf.py Modified: mailing-list-filter/trunk/src/mlf.py =================================================================== --- mailing-list-filter/trunk/src/mlf.py 2008-05-08 08:26:38 UTC (rev 721) +++ mailing-list-filter/trunk/src/mlf.py 2008-05-08 08:26:56 UTC (rev 722) @@ -6,15 +6,6 @@ is performed via the In-Reply-To and References headers. """ -# Currently, we assume that the server specification points to a mailbox -# containing all messages (both sent and received), and a message is determined -# to have been sent by you by looking at the From: header field. This should -# work well with Gmail. An alternative strategy is to look through two folders, -# one that's the Inbox and one that's the Sent mailbox, and treat all messages -# in Sent as having been sent by you. -# -# Possible future tasks: implement incremental maintenance of local cache. - from __future__ import with_statement from collections import defaultdict from email import message_from_string This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 08:26:32
|
Revision: 721 http://assorted.svn.sourceforge.net/assorted/?rev=721&view=rev Author: yangzhang Date: 2008-05-08 01:26:38 -0700 (Thu, 08 May 2008) Log Message: ----------- accounting for rename Modified Paths: -------------- mailing-list-filter/trunk/setup.py Modified: mailing-list-filter/trunk/setup.py =================================================================== --- mailing-list-filter/trunk/setup.py 2008-05-08 08:04:04 UTC (rev 720) +++ mailing-list-filter/trunk/setup.py 2008-05-08 08:26:38 UTC (rev 721) @@ -25,4 +25,4 @@ Classifier: Topic :: Communications :: Email """ -run_setup(pkg_info_text, scripts = ['src/filter.py']) +run_setup(pkg_info_text, scripts = ['src/mlf.py']) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 08:04:03
|
Revision: 720 http://assorted.svn.sourceforge.net/assorted/?rev=720&view=rev Author: yangzhang Date: 2008-05-08 01:04:04 -0700 (Thu, 08 May 2008) Log Message: ----------- added todos and setup to readme Modified Paths: -------------- mailing-list-filter/trunk/README Modified: mailing-list-filter/trunk/README =================================================================== --- mailing-list-filter/trunk/README 2008-05-08 08:03:44 UTC (rev 719) +++ mailing-list-filter/trunk/README 2008-05-08 08:04:04 UTC (rev 720) @@ -1,6 +1,3 @@ -% Mailing List Filter -% Yang Zhang - Overview -------- @@ -35,3 +32,32 @@ keep those active so that I can get immediate first-pass filtering. I execute this script on a daily basis to perform second-pass filtering/unfiltering to catch those false negatives that may have been missed. + +Setup +----- + +Requirements: + +- [argparse](http://argparse.python-hosting.com/) +- [Python Commons](http://assorted.sf.net/python-commons/) 0.4 +- [path](http://www.jorendorff.com/articles/python/path/) + +Install the program using the standard `setup.py` program. + +Future Work Ideas +----------------- + +- Currently, we assume that the server specification points to a mailbox + containing all messages (both sent and received), and a message is determined + to have been sent by you by looking at the From: header field. This works + well with Gmail. An alternative strategy is to look through two folders, one + that's the Inbox and one that's the Sent mailbox, and treat all messages in + Sent as having been sent by you. This is presumably how most other IMAP + servers work. + +- Implement incremental maintenance of local cache. + +- Accept custom operations for filtered/unfiltered messages + (trashing/untrashing, labeling/unlabeling, etc.). + +- Refactor the message fetching/management part out into its own library. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 08:03:39
|
Revision: 719 http://assorted.svn.sourceforge.net/assorted/?rev=719&view=rev Author: yangzhang Date: 2008-05-08 01:03:44 -0700 (Thu, 08 May 2008) Log Message: ----------- added note on trove classifier reference Modified Paths: -------------- python-commons/trunk/src/commons/setup.py Modified: python-commons/trunk/src/commons/setup.py =================================================================== --- python-commons/trunk/src/commons/setup.py 2008-05-08 07:48:18 UTC (rev 718) +++ python-commons/trunk/src/commons/setup.py 2008-05-08 08:03:44 UTC (rev 719) @@ -3,7 +3,9 @@ # vim:ft=python:et:sw=4:ts=4 """ -Common code for setup.py files. +Common code for setup.py files. Details about the Trove classifiers are +available at +U{http://pypi.python.org/pypi?%3Aaction=list_classifiers}. """ arg_keys = """ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 07:48:12
|
Revision: 718 http://assorted.svn.sourceforge.net/assorted/?rev=718&view=rev Author: yangzhang Date: 2008-05-08 00:48:18 -0700 (Thu, 08 May 2008) Log Message: ----------- renamed filter script Added Paths: ----------- mailing-list-filter/trunk/src/mlf.py Removed Paths: ------------- mailing-list-filter/trunk/src/filter.py Deleted: mailing-list-filter/trunk/src/filter.py =================================================================== --- mailing-list-filter/trunk/src/filter.py 2008-05-08 06:55:12 UTC (rev 717) +++ mailing-list-filter/trunk/src/filter.py 2008-05-08 07:48:18 UTC (rev 718) @@ -1,245 +0,0 @@ -#!/usr/bin/env python - -""" -Given a Gmail IMAP mailbox, star all messages in which you were a participant -(either a sender or an explicit recipient in To: or Cc:), where thread grouping -is performed via the In-Reply-To and References headers. -""" - -# Currently, we assume that the server specification points to a mailbox -# containing all messages (both sent and received), and a message is determined -# to have been sent by you by looking at the From: header field. This should -# work well with Gmail. An alternative strategy is to look through two folders, -# one that's the Inbox and one that's the Sent mailbox, and treat all messages -# in Sent as having been sent by you. -# -# Possible future tasks: implement incremental maintenance of local cache. - -from __future__ import with_statement -from collections import defaultdict -from email import message_from_string -from getpass import getpass -from imaplib import IMAP4_SSL -from argparse import ArgumentParser -from path import path -from re import match -from functools import partial -from itertools import count -from commons.decs import pickle_memoized -from commons.files import cleanse_filename, soft_makedirs -from commons.log import * -from commons.misc import default_if_none, seq -from commons.networking import logout -from commons.seqs import concat, grouper -from commons.startup import run_main -from contextlib import closing -import logging -from commons import log - -info = partial(log.info, 'main') -debug = partial(log.debug, 'main') -warning = partial(log.warning, 'main') -error = partial(log.error, 'main') -die = partial(log.die, 'main') - -def thread_dfs(msg, tid, tid2msgs): - assert msg.tid is None - msg.tid = tid - tid2msgs[tid].append(msg) - for ref in msg.refs: - if ref.tid is None: - thread_dfs(ref, tid, tid2msgs) - else: - assert ref.tid == tid - -def getmail(imap): - info( 'finding max UID' ) - # We use UIDs rather than the default of sequence numbers because UIDs are - # guaranteed to be persistent across sessions. This means that we can, for - # instance, fetch messages in one session and operate on this locally cached - # data before marking messages in a separate session. - ok, [uids] = imap.uid('SEARCH', None, 'ALL') - maxuid = int( uids.split()[-1] ) - del uids - - info( 'actually fetching the messages in chunks up to max', maxuid ) - # The syntax/fields of the FETCH command is documented in RFC 2060. Also, - # this article contains a brief overview: - # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/ - # BODY.PEEK prevents the message from automatically being flagged as \Seen. - query = '(FLAGS BODY.PEEK[HEADER.FIELDS ' \ - '(Message-ID References In-Reply-To From To Cc Subject)])' - step = 1000 - return list( concat( - seq( lambda: info('fetching', start, 'to', start + step - 1), - lambda: imap.uid('FETCH', '%d:%d' % (start, start + step - 1), - query)[1] ) - for start in xrange(1, maxuid + 1, step) ) ) - -def main(argv): - p = ArgumentParser(description = __doc__) - p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(), - help = """File containing your login credentials, with the username on the - first line and the password on the second line. Ignored iff --prompt.""") - p.add_argument('--cachedir', default = path( '~/.mlf.cache' ).expanduser(), - help = "Directory to use for caching our data.") - p.add_argument('--prompt', action = 'store_true', - help = "Interactively prompt for the username and password.") - p.add_argument('--pretend', action = 'store_true', - help = """Do not actually carry out any updates to the server. Use in - conjunction with --debug to observe what would happen.""") - p.add_argument('--no-mark-unseen', action = 'store_true', - help = "Do not mark newly revelant threads as unread.") - p.add_argument('--no-mark-seen', action = 'store_true', - help = "Do not mark newly irrevelant threads as read.") - p.add_argument('--debug', action = 'append', - help = """Enable logging for messages of the given flags. Flags include: - refs (references to missing Message-IDs), dups (duplicate Message-IDs), - main (the main program logic), and star (which messages are being - starred), unstar (which messages are being unstarred).""") - p.add_argument('sender', - help = "Your email address.") - p.add_argument('server', - help = "The server in the format: <host>[:<port>][/<mailbox>].") - - cfg = p.parse_args(argv[1:]) - - config_logging(level = logging.ERROR, do_console = True, flags = cfg.debug) - - if cfg.prompt: - print "username:", - cfg.user = raw_input() - print "password:", - cfg.passwd = getpass() - else: - with file(cfg.credfile) as f: - [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines()) - - try: - m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$', - cfg.server ) - cfg.host = m.group('host') - cfg.port = int( default_if_none(m.group('port'), 993) ) - cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX') - except: - p.error('Need to specify the server in the correct format.') - - soft_makedirs(cfg.cachedir) - - with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap: - imap.login(cfg.user, cfg.passwd) - # Close is only valid in the authenticated state. - with closing(imap) as imap: - # Select the main mailbox (INBOX). - imap.select(cfg.mailbox) - - # Fetch message IDs, references, and senders. - xs = pickle_memoized \ - (lambda imap: cfg.cachedir / cleanse_filename(cfg.sender)) \ - (getmail) \ - (imap) - - log.debug('fetched', xs) - - info('building message-id map and determining the set of messages sent ' - 'by you or addressed to you (the "source set")') - - srcs = [] - mid2msg = {} - # Every second item is just a closing paren. - # Example data: - # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}', - # 'Message-ID: <mai...@py...>\r\n\r\n'), - # ')', - # ('13301 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {59}', - # 'Message-Id: <200...@hv...>\r\n\r\n'), - # ')', - # ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}', - # 'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')] - for (envelope, data), paren in grouper(2, xs): - # Parse the body. - msg = message_from_string(data) - - # Parse the envelope. - m = match( - r"(?P<seqno>\d+) \(UID (?P<uid>\d+) FLAGS \((?P<flags>[^)]+)\)", - envelope ) - msg.seqno = m.group('seqno') - msg.uid = m.group('uid') - msg.flags = m.group('flags').split() - - # Prepare a container for references to other msgs, and initialize the - # thread ID. - msg.refs = [] - msg.tid = None - - # Add these to the map. - if msg['Message-ID'] in mid2msg: - log.warning( 'dups', 'duplicate message IDs:', - msg['Message-ID'], msg['Subject'] ) - mid2msg[ msg['Message-ID'] ] = msg - - # Add to "srcs" set if sent by us or addressed to us. - if ( cfg.sender in default_if_none( msg['From'], '' ) or - cfg.sender in default_if_none( msg['To'], '' ) or - cfg.sender in default_if_none( msg['Cc'], '' ) ): - srcs.append( msg ) - - info( 'constructing undirected graph' ) - - for mid, msg in mid2msg.iteritems(): - # Extract any references. - irt = default_if_none( msg.get_all('In-Reply-To'), [] ) - refs = default_if_none( msg.get_all('References'), [] ) - refs = set( ' '.join( irt + refs ).replace('><', '> <').split() ) - - # Connect nodes in graph bidirectionally. Ignore references to MIDs - # that don't exist. - for ref in refs: - try: - refmsg = mid2msg[ref] - # We can use lists/append (not worry about duplicates) because the - # original sources should be acyclic. If a -> b, then there is no b -> - # a, so when crawling a we can add a <-> b without worrying that later - # we may re-add b -> a. - msg.refs.append(refmsg) - refmsg.refs.append(msg) - except: - log.warning( 'refs', ref ) - - info('finding connected components (grouping the messages into threads)') - - tids = count() - tid2msgs = defaultdict(list) - for mid, msg in mid2msg.iteritems(): - if msg.tid is None: - thread_dfs(msg, tids.next(), tid2msgs) - - info( 'starring the relevant threads, in which I am a participant' ) - - rel_tids = set() - for srcmsg in srcs: - if srcmsg.tid not in rel_tids: - rel_tids.add(srcmsg.tid) - for msg in tid2msgs[srcmsg.tid]: - if r'\Flagged' not in msg.flags: - log.info( 'star', '\n', msg ) - if not cfg.pretend: - imap.uid('STORE', msg.uid, '+FLAGS', r'\Flagged') - if not cfg.no_mark_unseen and r'\Seen' in msg.flags: - imap.uid('STORE', msg.uid, '-FLAGS', r'\Seen') - - info( 'unstarring irrelevant threads, in which I am not a participant' ) - - all_tids = set( tid2msgs.iterkeys() ) - irrel_tids = all_tids - rel_tids - for tid in irrel_tids: - for msg in tid2msgs[tid]: - if r'\Flagged' in msg.flags: - log.info( 'unstar', '\n', msg ) - if not cfg.pretend: - imap.uid('STORE', msg.uid, '-FLAGS', r'\Flagged') - if not cfg.no_mark_seen and r'\Seen' not in msg.flags: - imap.uid('STORE', msg.uid, '+FLAGS', r'\Seen') - -run_main() Copied: mailing-list-filter/trunk/src/mlf.py (from rev 716, mailing-list-filter/trunk/src/filter.py) =================================================================== --- mailing-list-filter/trunk/src/mlf.py (rev 0) +++ mailing-list-filter/trunk/src/mlf.py 2008-05-08 07:48:18 UTC (rev 718) @@ -0,0 +1,245 @@ +#!/usr/bin/env python + +""" +Given a Gmail IMAP mailbox, star all messages in which you were a participant +(either a sender or an explicit recipient in To: or Cc:), where thread grouping +is performed via the In-Reply-To and References headers. +""" + +# Currently, we assume that the server specification points to a mailbox +# containing all messages (both sent and received), and a message is determined +# to have been sent by you by looking at the From: header field. This should +# work well with Gmail. An alternative strategy is to look through two folders, +# one that's the Inbox and one that's the Sent mailbox, and treat all messages +# in Sent as having been sent by you. +# +# Possible future tasks: implement incremental maintenance of local cache. + +from __future__ import with_statement +from collections import defaultdict +from email import message_from_string +from getpass import getpass +from imaplib import IMAP4_SSL +from argparse import ArgumentParser +from path import path +from re import match +from functools import partial +from itertools import count +from commons.decs import pickle_memoized +from commons.files import cleanse_filename, soft_makedirs +from commons.log import * +from commons.misc import default_if_none, seq +from commons.networking import logout +from commons.seqs import concat, grouper +from commons.startup import run_main +from contextlib import closing +import logging +from commons import log + +info = partial(log.info, 'main') +debug = partial(log.debug, 'main') +warning = partial(log.warning, 'main') +error = partial(log.error, 'main') +die = partial(log.die, 'main') + +def thread_dfs(msg, tid, tid2msgs): + assert msg.tid is None + msg.tid = tid + tid2msgs[tid].append(msg) + for ref in msg.refs: + if ref.tid is None: + thread_dfs(ref, tid, tid2msgs) + else: + assert ref.tid == tid + +def getmail(imap): + info( 'finding max UID' ) + # We use UIDs rather than the default of sequence numbers because UIDs are + # guaranteed to be persistent across sessions. This means that we can, for + # instance, fetch messages in one session and operate on this locally cached + # data before marking messages in a separate session. + ok, [uids] = imap.uid('SEARCH', None, 'ALL') + maxuid = int( uids.split()[-1] ) + del uids + + info( 'actually fetching the messages in chunks up to max', maxuid ) + # The syntax/fields of the FETCH command is documented in RFC 2060. Also, + # this article contains a brief overview: + # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/ + # BODY.PEEK prevents the message from automatically being flagged as \Seen. + query = '(FLAGS BODY.PEEK[HEADER.FIELDS ' \ + '(Message-ID References In-Reply-To From To Cc Subject)])' + step = 1000 + return list( concat( + seq( lambda: info('fetching', start, 'to', start + step - 1), + lambda: imap.uid('FETCH', '%d:%d' % (start, start + step - 1), + query)[1] ) + for start in xrange(1, maxuid + 1, step) ) ) + +def main(argv): + p = ArgumentParser(description = __doc__) + p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(), + help = """File containing your login credentials, with the username on the + first line and the password on the second line. Ignored iff --prompt.""") + p.add_argument('--cachedir', default = path( '~/.mlf.cache' ).expanduser(), + help = "Directory to use for caching our data.") + p.add_argument('--prompt', action = 'store_true', + help = "Interactively prompt for the username and password.") + p.add_argument('--pretend', action = 'store_true', + help = """Do not actually carry out any updates to the server. Use in + conjunction with --debug to observe what would happen.""") + p.add_argument('--no-mark-unseen', action = 'store_true', + help = "Do not mark newly revelant threads as unread.") + p.add_argument('--no-mark-seen', action = 'store_true', + help = "Do not mark newly irrevelant threads as read.") + p.add_argument('--debug', action = 'append', + help = """Enable logging for messages of the given flags. Flags include: + refs (references to missing Message-IDs), dups (duplicate Message-IDs), + main (the main program logic), and star (which messages are being + starred), unstar (which messages are being unstarred).""") + p.add_argument('sender', + help = "Your email address.") + p.add_argument('server', + help = "The server in the format: <host>[:<port>][/<mailbox>].") + + cfg = p.parse_args(argv[1:]) + + config_logging(level = logging.ERROR, do_console = True, flags = cfg.debug) + + if cfg.prompt: + print "username:", + cfg.user = raw_input() + print "password:", + cfg.passwd = getpass() + else: + with file(cfg.credfile) as f: + [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines()) + + try: + m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$', + cfg.server ) + cfg.host = m.group('host') + cfg.port = int( default_if_none(m.group('port'), 993) ) + cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX') + except: + p.error('Need to specify the server in the correct format.') + + soft_makedirs(cfg.cachedir) + + with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap: + imap.login(cfg.user, cfg.passwd) + # Close is only valid in the authenticated state. + with closing(imap) as imap: + # Select the main mailbox (INBOX). + imap.select(cfg.mailbox) + + # Fetch message IDs, references, and senders. + xs = pickle_memoized \ + (lambda imap: cfg.cachedir / cleanse_filename(cfg.sender)) \ + (getmail) \ + (imap) + + log.debug('fetched', xs) + + info('building message-id map and determining the set of messages sent ' + 'by you or addressed to you (the "source set")') + + srcs = [] + mid2msg = {} + # Every second item is just a closing paren. + # Example data: + # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}', + # 'Message-ID: <mai...@py...>\r\n\r\n'), + # ')', + # ('13301 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {59}', + # 'Message-Id: <200...@hv...>\r\n\r\n'), + # ')', + # ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}', + # 'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')] + for (envelope, data), paren in grouper(2, xs): + # Parse the body. + msg = message_from_string(data) + + # Parse the envelope. + m = match( + r"(?P<seqno>\d+) \(UID (?P<uid>\d+) FLAGS \((?P<flags>[^)]+)\)", + envelope ) + msg.seqno = m.group('seqno') + msg.uid = m.group('uid') + msg.flags = m.group('flags').split() + + # Prepare a container for references to other msgs, and initialize the + # thread ID. + msg.refs = [] + msg.tid = None + + # Add these to the map. + if msg['Message-ID'] in mid2msg: + log.warning( 'dups', 'duplicate message IDs:', + msg['Message-ID'], msg['Subject'] ) + mid2msg[ msg['Message-ID'] ] = msg + + # Add to "srcs" set if sent by us or addressed to us. + if ( cfg.sender in default_if_none( msg['From'], '' ) or + cfg.sender in default_if_none( msg['To'], '' ) or + cfg.sender in default_if_none( msg['Cc'], '' ) ): + srcs.append( msg ) + + info( 'constructing undirected graph' ) + + for mid, msg in mid2msg.iteritems(): + # Extract any references. + irt = default_if_none( msg.get_all('In-Reply-To'), [] ) + refs = default_if_none( msg.get_all('References'), [] ) + refs = set( ' '.join( irt + refs ).replace('><', '> <').split() ) + + # Connect nodes in graph bidirectionally. Ignore references to MIDs + # that don't exist. + for ref in refs: + try: + refmsg = mid2msg[ref] + # We can use lists/append (not worry about duplicates) because the + # original sources should be acyclic. If a -> b, then there is no b -> + # a, so when crawling a we can add a <-> b without worrying that later + # we may re-add b -> a. + msg.refs.append(refmsg) + refmsg.refs.append(msg) + except: + log.warning( 'refs', ref ) + + info('finding connected components (grouping the messages into threads)') + + tids = count() + tid2msgs = defaultdict(list) + for mid, msg in mid2msg.iteritems(): + if msg.tid is None: + thread_dfs(msg, tids.next(), tid2msgs) + + info( 'starring the relevant threads, in which I am a participant' ) + + rel_tids = set() + for srcmsg in srcs: + if srcmsg.tid not in rel_tids: + rel_tids.add(srcmsg.tid) + for msg in tid2msgs[srcmsg.tid]: + if r'\Flagged' not in msg.flags: + log.info( 'star', '\n', msg ) + if not cfg.pretend: + imap.uid('STORE', msg.uid, '+FLAGS', r'\Flagged') + if not cfg.no_mark_unseen and r'\Seen' in msg.flags: + imap.uid('STORE', msg.uid, '-FLAGS', r'\Seen') + + info( 'unstarring irrelevant threads, in which I am not a participant' ) + + all_tids = set( tid2msgs.iterkeys() ) + irrel_tids = all_tids - rel_tids + for tid in irrel_tids: + for msg in tid2msgs[tid]: + if r'\Flagged' in msg.flags: + log.info( 'unstar', '\n', msg ) + if not cfg.pretend: + imap.uid('STORE', msg.uid, '-FLAGS', r'\Flagged') + if not cfg.no_mark_seen and r'\Seen' not in msg.flags: + imap.uid('STORE', msg.uid, '+FLAGS', r'\Seen') + +run_main() This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 06:55:12
|
Revision: 717 http://assorted.svn.sourceforge.net/assorted/?rev=717&view=rev Author: yangzhang Date: 2008-05-07 23:55:12 -0700 (Wed, 07 May 2008) Log Message: ----------- removed epydoc since this is not a lib Modified Paths: -------------- mailing-list-filter/trunk/publish.bash Modified: mailing-list-filter/trunk/publish.bash =================================================================== --- mailing-list-filter/trunk/publish.bash 2008-05-08 06:54:53 UTC (rev 716) +++ mailing-list-filter/trunk/publish.bash 2008-05-08 06:55:12 UTC (rev 717) @@ -1,9 +1,5 @@ #!/usr/bin/env bash -post-stage() { - epydoc -o $stagedir/doc src/commons/ -} - fullname='Mailing List Filter' version=0.1 license=psf This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 06:54:52
|
Revision: 716 http://assorted.svn.sourceforge.net/assorted/?rev=716&view=rev Author: yangzhang Date: 2008-05-07 23:54:53 -0700 (Wed, 07 May 2008) Log Message: ----------- it's alive! ready to release... Modified Paths: -------------- mailing-list-filter/trunk/src/filter.py Modified: mailing-list-filter/trunk/src/filter.py =================================================================== --- mailing-list-filter/trunk/src/filter.py 2008-05-08 06:54:29 UTC (rev 715) +++ mailing-list-filter/trunk/src/filter.py 2008-05-08 06:54:53 UTC (rev 716) @@ -1,18 +1,20 @@ #!/usr/bin/env python """ -Given an IMAP mailbox, mark all messages as read except for those threads in -which you were a participant, where thread grouping is performed via the -In-Reply-To and References headers. - -Currently, we assume that the server specification points to a mailbox -containing all messages (both sent and received), and a message is determined -to have been sent by you by looking at the From: header field. This should work -well with Gmail. An alternative strategy is to look through two folders, one -that's the Inbox and one that's the Sent mailbox, and treat all messages in -Sent as having been sent by you. +Given a Gmail IMAP mailbox, star all messages in which you were a participant +(either a sender or an explicit recipient in To: or Cc:), where thread grouping +is performed via the In-Reply-To and References headers. """ +# Currently, we assume that the server specification points to a mailbox +# containing all messages (both sent and received), and a message is determined +# to have been sent by you by looking at the From: header field. This should +# work well with Gmail. An alternative strategy is to look through two folders, +# one that's the Inbox and one that's the Sent mailbox, and treat all messages +# in Sent as having been sent by you. +# +# Possible future tasks: implement incremental maintenance of local cache. + from __future__ import with_statement from collections import defaultdict from email import message_from_string @@ -22,41 +24,59 @@ from path import path from re import match from functools import partial +from itertools import count from commons.decs import pickle_memoized +from commons.files import cleanse_filename, soft_makedirs from commons.log import * -from commons.files import cleanse_filename, soft_makedirs -from commons.misc import default_if_none +from commons.misc import default_if_none, seq from commons.networking import logout from commons.seqs import concat, grouper from commons.startup import run_main from contextlib import closing +import logging +from commons import log -info = partial(info, '') -debug = partial(debug, '') -error = partial(error, '') -die = partial(die, '') +info = partial(log.info, 'main') +debug = partial(log.debug, 'main') +warning = partial(log.warning, 'main') +error = partial(log.error, 'main') +die = partial(log.die, 'main') +def thread_dfs(msg, tid, tid2msgs): + assert msg.tid is None + msg.tid = tid + tid2msgs[tid].append(msg) + for ref in msg.refs: + if ref.tid is None: + thread_dfs(ref, tid, tid2msgs) + else: + assert ref.tid == tid + def getmail(imap): - info( 'finding max seqno' ) - ok, [seqnos] = imap.search(None, 'ALL') - maxseqno = int( seqnos.split()[-1] ) - del seqnos + info( 'finding max UID' ) + # We use UIDs rather than the default of sequence numbers because UIDs are + # guaranteed to be persistent across sessions. This means that we can, for + # instance, fetch messages in one session and operate on this locally cached + # data before marking messages in a separate session. + ok, [uids] = imap.uid('SEARCH', None, 'ALL') + maxuid = int( uids.split()[-1] ) + del uids - info( 'actually fetching the messages in chunks' ) + info( 'actually fetching the messages in chunks up to max', maxuid ) # The syntax/fields of the FETCH command is documented in RFC 2060. Also, # this article contains a brief overview: # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/ # BODY.PEEK prevents the message from automatically being flagged as \Seen. - query = '(FLAGS BODY.PEEK[HEADER.FIELDS (Message-ID References In-Reply-To From Subject)])' + query = '(FLAGS BODY.PEEK[HEADER.FIELDS ' \ + '(Message-ID References In-Reply-To From To Cc Subject)])' step = 1000 return list( concat( - imap.fetch('%d:%d' % (start, start + step - 1), query)[1] - for start in xrange(1, maxseqno + 1, step) ) ) + seq( lambda: info('fetching', start, 'to', start + step - 1), + lambda: imap.uid('FETCH', '%d:%d' % (start, start + step - 1), + query)[1] ) + for start in xrange(1, maxuid + 1, step) ) ) def main(argv): - import logging - config_logging(level = logging.INFO, do_console = True) - p = ArgumentParser(description = __doc__) p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(), help = """File containing your login credentials, with the username on the @@ -65,6 +85,18 @@ help = "Directory to use for caching our data.") p.add_argument('--prompt', action = 'store_true', help = "Interactively prompt for the username and password.") + p.add_argument('--pretend', action = 'store_true', + help = """Do not actually carry out any updates to the server. Use in + conjunction with --debug to observe what would happen.""") + p.add_argument('--no-mark-unseen', action = 'store_true', + help = "Do not mark newly revelant threads as unread.") + p.add_argument('--no-mark-seen', action = 'store_true', + help = "Do not mark newly irrevelant threads as read.") + p.add_argument('--debug', action = 'append', + help = """Enable logging for messages of the given flags. Flags include: + refs (references to missing Message-IDs), dups (duplicate Message-IDs), + main (the main program logic), and star (which messages are being + starred), unstar (which messages are being unstarred).""") p.add_argument('sender', help = "Your email address.") p.add_argument('server', @@ -72,6 +104,8 @@ cfg = p.parse_args(argv[1:]) + config_logging(level = logging.ERROR, do_console = True, flags = cfg.debug) + if cfg.prompt: print "username:", cfg.user = raw_input() @@ -82,7 +116,8 @@ [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines()) try: - m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$', cfg.server ) + m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$', + cfg.server ) cfg.host = m.group('host') cfg.port = int( default_if_none(m.group('port'), 993) ) cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX') @@ -93,6 +128,7 @@ with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap: imap.login(cfg.user, cfg.passwd) + # Close is only valid in the authenticated state. with closing(imap) as imap: # Select the main mailbox (INBOX). imap.select(cfg.mailbox) @@ -103,18 +139,13 @@ (getmail) \ (imap) - debug('fetched:', xs) + log.debug('fetched', xs) - info('determining the set of messages that were sent by you') + info('building message-id map and determining the set of messages sent ' + 'by you or addressed to you (the "source set")') - sent = set() - for (envelope, data), paren in grouper(2, xs): - msg = message_from_string(data) - if cfg.sender in msg['From']: - sent.add( msg['Message-ID'] ) - - info( 'find the threads in which I am a participant' ) - + srcs = [] + mid2msg = {} # Every second item is just a closing paren. # Example data: # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}', @@ -126,24 +157,89 @@ # ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}', # 'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')] for (envelope, data), paren in grouper(2, xs): - m = match( r"(?P<seqno>\d+) \(FLAGS \((?P<flags>[^)]+)\)", envelope ) - seqno = m.group('seqno') - flags = m.group('flags') - if r'\Flagged' in flags: # flags != r'\Seen' and flags != r'\Seen NonJunk': - print 'FLAG' - print seqno, flags - print '\n'.join( map( str, msg.items() ) ) - print - msg = message_from_string(data) - id = msg['Message-ID'] + # Parse the body. + msg = message_from_string(data) + + # Parse the envelope. + m = match( + r"(?P<seqno>\d+) \(UID (?P<uid>\d+) FLAGS \((?P<flags>[^)]+)\)", + envelope ) + msg.seqno = m.group('seqno') + msg.uid = m.group('uid') + msg.flags = m.group('flags').split() + + # Prepare a container for references to other msgs, and initialize the + # thread ID. + msg.refs = [] + msg.tid = None + + # Add these to the map. + if msg['Message-ID'] in mid2msg: + log.warning( 'dups', 'duplicate message IDs:', + msg['Message-ID'], msg['Subject'] ) + mid2msg[ msg['Message-ID'] ] = msg + + # Add to "srcs" set if sent by us or addressed to us. + if ( cfg.sender in default_if_none( msg['From'], '' ) or + cfg.sender in default_if_none( msg['To'], '' ) or + cfg.sender in default_if_none( msg['Cc'], '' ) ): + srcs.append( msg ) + + info( 'constructing undirected graph' ) + + for mid, msg in mid2msg.iteritems(): + # Extract any references. irt = default_if_none( msg.get_all('In-Reply-To'), [] ) refs = default_if_none( msg.get_all('References'), [] ) - refs = set( ' '.join( irt + refs ).split() ) - if refs & sent: - print 'SENT' - print seqno, flags - print '\n'.join( map( str, msg.items() ) ) - print -# if refs & sent: + refs = set( ' '.join( irt + refs ).replace('><', '> <').split() ) + # Connect nodes in graph bidirectionally. Ignore references to MIDs + # that don't exist. + for ref in refs: + try: + refmsg = mid2msg[ref] + # We can use lists/append (not worry about duplicates) because the + # original sources should be acyclic. If a -> b, then there is no b -> + # a, so when crawling a we can add a <-> b without worrying that later + # we may re-add b -> a. + msg.refs.append(refmsg) + refmsg.refs.append(msg) + except: + log.warning( 'refs', ref ) + + info('finding connected components (grouping the messages into threads)') + + tids = count() + tid2msgs = defaultdict(list) + for mid, msg in mid2msg.iteritems(): + if msg.tid is None: + thread_dfs(msg, tids.next(), tid2msgs) + + info( 'starring the relevant threads, in which I am a participant' ) + + rel_tids = set() + for srcmsg in srcs: + if srcmsg.tid not in rel_tids: + rel_tids.add(srcmsg.tid) + for msg in tid2msgs[srcmsg.tid]: + if r'\Flagged' not in msg.flags: + log.info( 'star', '\n', msg ) + if not cfg.pretend: + imap.uid('STORE', msg.uid, '+FLAGS', r'\Flagged') + if not cfg.no_mark_unseen and r'\Seen' in msg.flags: + imap.uid('STORE', msg.uid, '-FLAGS', r'\Seen') + + info( 'unstarring irrelevant threads, in which I am not a participant' ) + + all_tids = set( tid2msgs.iterkeys() ) + irrel_tids = all_tids - rel_tids + for tid in irrel_tids: + for msg in tid2msgs[tid]: + if r'\Flagged' in msg.flags: + log.info( 'unstar', '\n', msg ) + if not cfg.pretend: + imap.uid('STORE', msg.uid, '-FLAGS', r'\Flagged') + if not cfg.no_mark_seen and r'\Seen' not in msg.flags: + imap.uid('STORE', msg.uid, '+FLAGS', r'\Seen') + run_main() This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 06:54:30
|
Revision: 714 http://assorted.svn.sourceforge.net/assorted/?rev=714&view=rev Author: yangzhang Date: 2008-05-07 23:54:08 -0700 (Wed, 07 May 2008) Log Message: ----------- added to readme Modified Paths: -------------- mailing-list-filter/trunk/README Modified: mailing-list-filter/trunk/README =================================================================== --- mailing-list-filter/trunk/README 2008-05-08 06:05:12 UTC (rev 713) +++ mailing-list-filter/trunk/README 2008-05-08 06:54:08 UTC (rev 714) @@ -30,3 +30,8 @@ also fails when others change the subject. Finally, this approach is unsatisfactory because it pollutes subject lines, and it essentially replicates exactly what Message-ID was intended for. + +This script is not intended to be a replacement for the Gmail filters. I still +keep those active so that I can get immediate first-pass filtering. I execute +this script on a daily basis to perform second-pass filtering/unfiltering to +catch those false negatives that may have been missed. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 06:54:30
|
Revision: 715 http://assorted.svn.sourceforge.net/assorted/?rev=715&view=rev Author: yangzhang Date: 2008-05-07 23:54:29 -0700 (Wed, 07 May 2008) Log Message: ----------- specified executable script; still not quite right since a lib is unnecessarily installed Modified Paths: -------------- mailing-list-filter/trunk/setup.py Modified: mailing-list-filter/trunk/setup.py =================================================================== --- mailing-list-filter/trunk/setup.py 2008-05-08 06:54:08 UTC (rev 714) +++ mailing-list-filter/trunk/setup.py 2008-05-08 06:54:29 UTC (rev 715) @@ -25,4 +25,4 @@ Classifier: Topic :: Communications :: Email """ -run_setup(pkg_info_text) +run_setup(pkg_info_text, scripts = ['src/filter.py']) This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 06:05:09
|
Revision: 713 http://assorted.svn.sourceforge.net/assorted/?rev=713&view=rev Author: yangzhang Date: 2008-05-07 23:05:12 -0700 (Wed, 07 May 2008) Log Message: ----------- added a publisher script Added Paths: ----------- mailing-list-filter/trunk/publish.bash Added: mailing-list-filter/trunk/publish.bash =================================================================== --- mailing-list-filter/trunk/publish.bash (rev 0) +++ mailing-list-filter/trunk/publish.bash 2008-05-08 06:05:12 UTC (rev 713) @@ -0,0 +1,12 @@ +#!/usr/bin/env bash + +post-stage() { + epydoc -o $stagedir/doc src/commons/ +} + +fullname='Mailing List Filter' +version=0.1 +license=psf +websrcs=( README ) +rels=( pypi: ) +. assorted.bash "$@" Property changes on: mailing-list-filter/trunk/publish.bash ___________________________________________________________________ Name: svn:executable + * This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 06:03:23
|
Revision: 712 http://assorted.svn.sourceforge.net/assorted/?rev=712&view=rev Author: yangzhang Date: 2008-05-07 23:03:28 -0700 (Wed, 07 May 2008) Log Message: ----------- added an overview readme Added Paths: ----------- mailing-list-filter/trunk/README Added: mailing-list-filter/trunk/README =================================================================== --- mailing-list-filter/trunk/README (rev 0) +++ mailing-list-filter/trunk/README 2008-05-08 06:03:28 UTC (rev 712) @@ -0,0 +1,32 @@ +% Mailing List Filter +% Yang Zhang + +Overview +-------- + +I have a Gmail account that I use for subscribing to and posting to mailing +lists. When dealing with high-volume mailing lists, I am typically only +interested in those threads that I participated in. This is a simple filter +for starring and marking unread any messages belonging to such threads. + +This is accomplished by looking at the set of messages that were either sent +from me or explicitly addressed to me. From this "root set" of messages, we +can use the `Message-ID`, `References`, and `In-Reply-To` headers to determine +threads, and thus the other messages that we care about. + +I have found this to be more accurate than my two original approaches. I used +to have Gmail filters that starred/marked unread any messages containing my +name anywhere in the message. This worked OK since my name is not too common, +but it produced some false positives (not that bad, just unstar messages) and +some false negatives (much harder to detect). + +A second approach is to tag all subjects with some signature string. This +usually is fine, but it doesn't work when you did not start the thread (and +thus determine the subject). You can try to change the subject line, but this +is (1) poor netiquette, (2) unreliable because your reply may not register in +other mail clients as being part of the same thread (and thus other +participants may miss your reply), and (3) unreliable because replies might not +directly referencing your post (either intentionally or unintentionally). It +also fails when others change the subject. Finally, this approach is +unsatisfactory because it pollutes subject lines, and it essentially replicates +exactly what Message-ID was intended for. This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 05:48:33
|
Revision: 711 http://assorted.svn.sourceforge.net/assorted/?rev=711&view=rev Author: yangzhang Date: 2008-05-07 22:48:40 -0700 (Wed, 07 May 2008) Log Message: ----------- added setup Added Paths: ----------- mailing-list-filter/trunk/setup.py Added: mailing-list-filter/trunk/setup.py =================================================================== --- mailing-list-filter/trunk/setup.py (rev 0) +++ mailing-list-filter/trunk/setup.py 2008-05-08 05:48:40 UTC (rev 711) @@ -0,0 +1,28 @@ +#!/usr/bin/env python + +from commons.setup import run_setup + +pkg_info_text = """ +Metadata-Version: 1.1 +Name: mailing-list-filter +Version: 0.1 +Author: Yang Zhang +Author-email: yaaang NOSPAM at REMOVECAPS gmail +Home-page: http://assorted.sourceforge.net/mailing-list-filter/ +Download-url: http://pypi.python.org/pypi/mailing-list-filter/ +Summary: Mailing List Filter +License: Python Software Foundation License +Description: Filter mailing list email for relevant threads only. +Keywords: mailing,list,email,filter,IMAP,Gmail +Platform: any +Provides: commons +Classifier: Development Status :: 4 - Beta +Classifier: Environment :: No Input/Output (Daemon) +Classifier: Intended Audience :: End Users/Desktop +Classifier: License :: OSI Approved :: Python Software Foundation License +Classifier: Operating System :: OS Independent +Classifier: Programming Language :: Python +Classifier: Topic :: Communications :: Email +""" + +run_setup(pkg_info_text) Property changes on: mailing-list-filter/trunk/setup.py ___________________________________________________________________ Name: svn:executable + * This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 04:46:59
|
Revision: 710 http://assorted.svn.sourceforge.net/assorted/?rev=710&view=rev Author: yangzhang Date: 2008-05-07 21:47:03 -0700 (Wed, 07 May 2008) Log Message: ----------- added simple changelog to readme Modified Paths: -------------- python-commons/trunk/README Modified: python-commons/trunk/README =================================================================== --- python-commons/trunk/README 2008-05-08 03:20:46 UTC (rev 709) +++ python-commons/trunk/README 2008-05-08 04:47:03 UTC (rev 710) @@ -33,3 +33,28 @@ [ASPN Cookbook]: http://aspn.activestate.com/ASPN/Cookbook/Python [AIMA Utilities]: http://aima.cs.berkeley.edu/python/utils.py + +Changes +------- + +version 0.3.1 + +- removed extraneous debug print statements + +version 0.3 + +- added versioned guards +- added file memoization +- added retry with exp backoff +- added `countstep()` +- released for + [gbookmark2delicious](http://gbookmark2delicious.googlecode.com/) + +version 0.2 + +- added `clients`, `setup` +- released for [icedb](http://cartel.csail.mit.edu/icedb/) + +version 0.1 + +- initial release This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 03:20:42
|
Revision: 709 http://assorted.svn.sourceforge.net/assorted/?rev=709&view=rev Author: yangzhang Date: 2008-05-07 20:20:46 -0700 (Wed, 07 May 2008) Log Message: ----------- preferring verdana Modified Paths: -------------- assorted-site/trunk/main.css Modified: assorted-site/trunk/main.css =================================================================== --- assorted-site/trunk/main.css 2008-05-08 03:20:39 UTC (rev 708) +++ assorted-site/trunk/main.css 2008-05-08 03:20:46 UTC (rev 709) @@ -10,7 +10,7 @@ padding:0; background-color: white; color: black; - font-family: Georgia, Verdana, sans-serif; + font-family: Verdana, sans-serif; font-size: medium; line-height: 1.3em; color: #333; @@ -70,6 +70,7 @@ margin-bottom: 0.5em; } +/* TODO: make this larger? */ pre { padding: 0; margin: 0; This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 03:20:35
|
Revision: 708 http://assorted.svn.sourceforge.net/assorted/?rev=708&view=rev Author: yangzhang Date: 2008-05-07 20:20:39 -0700 (Wed, 07 May 2008) Log Message: ----------- added scala doc search, js beautify, mailing list filter Modified Paths: -------------- assorted-site/trunk/index.txt Modified: assorted-site/trunk/index.txt =================================================================== --- assorted-site/trunk/index.txt 2008-05-08 03:19:12 UTC (rev 707) +++ assorted-site/trunk/index.txt 2008-05-08 03:20:39 UTC (rev 708) @@ -82,8 +82,14 @@ - Sandbox: heap of small test cases to explore (mostly programming language details, bugs, corner cases, features, etc.) (passive) - Miscellanea + - [Mailing List Filter](mailing-list-filter): deal with high-volume mailing + lists by filtering your mailbox for threads in which you were a participant + (active) + - [Scala Doc Search](http://scripts.mit.edu/~y_z/sds/): navigate the Scala + API documentation by class or object name (done) - Bibliography: my pan-paper BibTeX; i.e., stalling for ZDB (active) - Subtitle adjuster: for time-shifting SRTs (done) + - Javascript Beautifier: a thin [Tamarin] wrapper for [js_beautify]. - Programming Problems: my workspace for solving programming puzzles (hiatus) - Source management: various tools for cleaning up and maintaining a source @@ -92,24 +98,6 @@ - [This website](http://assorted.sf.net/) (passive) - [My personal website](http://www.mit.edu/~y_z/) (passive) -What the statuses mean: - -- done: no more active development planned, but will generally maintain/fix - issues -- passive: under continual but gradual growth -- active: development is happening at a faster pace -- abandoned: incomplete; no plans to pick it up again -- hiatus: incomplete; plan to resume development - -Other links: - -- [SourceForge Project Page](http://sf.net/projects/assorted/): - download file releases, discuss on the forums, report bugs/request features, - [browse the repository] -- [Simple Publications Manager](http://pubmgr.sf.net/): another SF-hosted - mini-project of mine -- [TinyOS](http://tinyos.net/): SF-hosted project I've been involved in - [BattleCode]: http://battlecode.mit.edu/ [BattleCode 2007]: http://battlecode.mit.edu/2007/ [BattleCode 2008]: http://battlecode.mit.edu/2008/ @@ -124,8 +112,34 @@ [Facebook]: http://www.facebook.com/ [YouTube]: http://www.youtube.com/ [MySpace]: http://www.myspace.com/ +[Tamarin]: http://www.mozilla.org/projects/tamarin/ +[js_beautify]: http://elfz.laacz.lv/beautify/ + +What the statuses mean: + +- done: no more active development planned, but will generally maintain/fix + issues +- passive: under continual but gradual growth +- active: development is happening at a faster pace +- abandoned: incomplete; no plans to pick it up again +- hiatus: incomplete; plan to resume development + +Project pages: + +- [SourceForge Project Page]: view summary, [browse the repository] +- [Google Code Page]: download file releases, report bugs/request features +- [Google Groups Page]: discussions and support + +[SourceForge Project Page]: http://sf.net/projects/assorted/ +[Google Code Page]: http://code.google.com/p/assorted/ +[Google Groups Page]: http://groups.google.com/group/assorted-projects/ [browse the repository]: http://assorted.svn.sourceforge.net/viewvc/assorted/ +Copyright 2008 [Yang Zhang]. +All rights reserved. + +[Yang Zhang]: http://www.mit.edu/~y_z/ + <!-- -vim:nocin:et:sw=2:ts=2 +vim:nocin:et:ft=mkd:sw=2:ts=2 --> This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 03:19:13
|
Revision: 707 http://assorted.svn.sourceforge.net/assorted/?rev=707&view=rev Author: yangzhang Date: 2008-05-07 20:19:12 -0700 (Wed, 07 May 2008) Log Message: ----------- fixed missing import Modified Paths: -------------- python-commons/trunk/src/commons/seqs.py Modified: python-commons/trunk/src/commons/seqs.py =================================================================== --- python-commons/trunk/src/commons/seqs.py 2008-05-08 03:18:57 UTC (rev 706) +++ python-commons/trunk/src/commons/seqs.py 2008-05-08 03:19:12 UTC (rev 707) @@ -8,7 +8,7 @@ from struct import pack, unpack from contextlib import closing from itertools import ( chain, count, ifilterfalse, islice, - izip, tee ) + izip, repeat, tee ) from .log import warning """ This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 03:18:52
|
Revision: 706 http://assorted.svn.sourceforge.net/assorted/?rev=706&view=rev Author: yangzhang Date: 2008-05-07 20:18:57 -0700 (Wed, 07 May 2008) Log Message: ----------- added logout Modified Paths: -------------- python-commons/trunk/src/commons/networking.py Modified: python-commons/trunk/src/commons/networking.py =================================================================== --- python-commons/trunk/src/commons/networking.py 2008-05-08 03:18:47 UTC (rev 705) +++ python-commons/trunk/src/commons/networking.py 2008-05-08 03:18:57 UTC (rev 706) @@ -7,6 +7,7 @@ import os, sys from time import * +from contextlib import contextmanager class NoMacAddrError( Exception ): pass @@ -63,3 +64,11 @@ print 'backing off for', backoff sleep(backoff) backoff = multiplier * backoff + +@contextmanager +def logout(x): + """ + A context manager for finally calling the C{logout()} method of an object. + """ + try: yield x + finally: x.logout() This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-08 03:18:40
|
Revision: 705 http://assorted.svn.sourceforge.net/assorted/?rev=705&view=rev Author: yangzhang Date: 2008-05-07 20:18:47 -0700 (Wed, 07 May 2008) Log Message: ----------- added seq, default_if_none Modified Paths: -------------- python-commons/trunk/src/commons/misc.py Modified: python-commons/trunk/src/commons/misc.py =================================================================== --- python-commons/trunk/src/commons/misc.py 2008-05-07 16:06:28 UTC (rev 704) +++ python-commons/trunk/src/commons/misc.py 2008-05-08 03:18:47 UTC (rev 705) @@ -46,3 +46,17 @@ finally: end = time() output[0] = end - start + +def default_if_none(x, d): + """ + Returns L{x} if it's not None, otherwise returns L{d}. + """ + if x is None: return d + else: return x + +def seq(f, g): + """ + Evaluate 0-ary functions L{f} then L{g}, returning L{g()}. + """ + f() + return g() This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-07 16:06:46
|
Revision: 704 http://assorted.svn.sourceforge.net/assorted/?rev=704&view=rev Author: yangzhang Date: 2008-05-07 09:06:28 -0700 (Wed, 07 May 2008) Log Message: ----------- added mailing list filter! still exploring how gmail imap starred messages work Added Paths: ----------- mailing-list-filter/ mailing-list-filter/trunk/ mailing-list-filter/trunk/src/ mailing-list-filter/trunk/src/filter.py Added: mailing-list-filter/trunk/src/filter.py =================================================================== --- mailing-list-filter/trunk/src/filter.py (rev 0) +++ mailing-list-filter/trunk/src/filter.py 2008-05-07 16:06:28 UTC (rev 704) @@ -0,0 +1,149 @@ +#!/usr/bin/env python + +""" +Given an IMAP mailbox, mark all messages as read except for those threads in +which you were a participant, where thread grouping is performed via the +In-Reply-To and References headers. + +Currently, we assume that the server specification points to a mailbox +containing all messages (both sent and received), and a message is determined +to have been sent by you by looking at the From: header field. This should work +well with Gmail. An alternative strategy is to look through two folders, one +that's the Inbox and one that's the Sent mailbox, and treat all messages in +Sent as having been sent by you. +""" + +from __future__ import with_statement +from collections import defaultdict +from email import message_from_string +from getpass import getpass +from imaplib import IMAP4_SSL +from argparse import ArgumentParser +from path import path +from re import match +from functools import partial +from commons.decs import pickle_memoized +from commons.log import * +from commons.files import cleanse_filename, soft_makedirs +from commons.misc import default_if_none +from commons.networking import logout +from commons.seqs import concat, grouper +from commons.startup import run_main +from contextlib import closing + +info = partial(info, '') +debug = partial(debug, '') +error = partial(error, '') +die = partial(die, '') + +def getmail(imap): + info( 'finding max seqno' ) + ok, [seqnos] = imap.search(None, 'ALL') + maxseqno = int( seqnos.split()[-1] ) + del seqnos + + info( 'actually fetching the messages in chunks' ) + # The syntax/fields of the FETCH command is documented in RFC 2060. Also, + # this article contains a brief overview: + # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/ + # BODY.PEEK prevents the message from automatically being flagged as \Seen. + query = '(FLAGS BODY.PEEK[HEADER.FIELDS (Message-ID References In-Reply-To From Subject)])' + step = 1000 + return list( concat( + imap.fetch('%d:%d' % (start, start + step - 1), query)[1] + for start in xrange(1, maxseqno + 1, step) ) ) + +def main(argv): + import logging + config_logging(level = logging.INFO, do_console = True) + + p = ArgumentParser(description = __doc__) + p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(), + help = """File containing your login credentials, with the username on the + first line and the password on the second line. Ignored iff --prompt.""") + p.add_argument('--cachedir', default = path( '~/.mlf.cache' ).expanduser(), + help = "Directory to use for caching our data.") + p.add_argument('--prompt', action = 'store_true', + help = "Interactively prompt for the username and password.") + p.add_argument('sender', + help = "Your email address.") + p.add_argument('server', + help = "The server in the format: <host>[:<port>][/<mailbox>].") + + cfg = p.parse_args(argv[1:]) + + if cfg.prompt: + print "username:", + cfg.user = raw_input() + print "password:", + cfg.passwd = getpass() + else: + with file(cfg.credfile) as f: + [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines()) + + try: + m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$', cfg.server ) + cfg.host = m.group('host') + cfg.port = int( default_if_none(m.group('port'), 993) ) + cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX') + except: + p.error('Need to specify the server in the correct format.') + + soft_makedirs(cfg.cachedir) + + with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap: + imap.login(cfg.user, cfg.passwd) + with closing(imap) as imap: + # Select the main mailbox (INBOX). + imap.select(cfg.mailbox) + + # Fetch message IDs, references, and senders. + xs = pickle_memoized \ + (lambda imap: cfg.cachedir / cleanse_filename(cfg.sender)) \ + (getmail) \ + (imap) + + debug('fetched:', xs) + + info('determining the set of messages that were sent by you') + + sent = set() + for (envelope, data), paren in grouper(2, xs): + msg = message_from_string(data) + if cfg.sender in msg['From']: + sent.add( msg['Message-ID'] ) + + info( 'find the threads in which I am a participant' ) + + # Every second item is just a closing paren. + # Example data: + # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}', + # 'Message-ID: <mai...@py...>\r\n\r\n'), + # ')', + # ('13301 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {59}', + # 'Message-Id: <200...@hv...>\r\n\r\n'), + # ')', + # ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}', + # 'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')] + for (envelope, data), paren in grouper(2, xs): + m = match( r"(?P<seqno>\d+) \(FLAGS \((?P<flags>[^)]+)\)", envelope ) + seqno = m.group('seqno') + flags = m.group('flags') + if r'\Flagged' in flags: # flags != r'\Seen' and flags != r'\Seen NonJunk': + print 'FLAG' + print seqno, flags + print '\n'.join( map( str, msg.items() ) ) + print + msg = message_from_string(data) + id = msg['Message-ID'] + irt = default_if_none( msg.get_all('In-Reply-To'), [] ) + refs = default_if_none( msg.get_all('References'), [] ) + refs = set( ' '.join( irt + refs ).split() ) + if refs & sent: + print 'SENT' + print seqno, flags + print '\n'.join( map( str, msg.items() ) ) + print +# if refs & sent: + +run_main() Property changes on: mailing-list-filter/trunk/src/filter.py ___________________________________________________________________ Name: svn:executable + * This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |
From: <yan...@us...> - 2008-05-04 17:46:36
|
Revision: 703 http://assorted.svn.sourceforge.net/assorted/?rev=703&view=rev Author: yangzhang Date: 2008-05-04 10:46:40 -0700 (Sun, 04 May 2008) Log Message: ----------- uncommented cscope autoload Modified Paths: -------------- configs/trunk/src/vim/plugin/cscope_maps.vim Modified: configs/trunk/src/vim/plugin/cscope_maps.vim =================================================================== --- configs/trunk/src/vim/plugin/cscope_maps.vim 2008-05-04 17:46:02 UTC (rev 702) +++ configs/trunk/src/vim/plugin/cscope_maps.vim 2008-05-04 17:46:40 UTC (rev 703) @@ -37,13 +37,13 @@ " if you want the reverse search order. set csto=0 -" " add any cscope database in current directory -" if filereadable("cscope.out") -" cs add cscope.out -" " else add the database pointed to by environment variable -" elseif $CSCOPE_DB != "" -" cs add $CSCOPE_DB -" endif + " add any cscope database in current directory + if filereadable("cscope.out") + cs add cscope.out + " else add the database pointed to by environment variable + elseif $CSCOPE_DB != "" + cs add $CSCOPE_DB + endif " show msg when any other cscope db added set cscopeverbose This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site. |