assorted-commits Mailing List for Assorted projects (Page 48)

Brought to you by: yangzhang

assorted-commits

You can subscribe to this list here.

2007	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (9)	Dec (12)
2008	Jan (86)	Feb (265)	Mar (96)	Apr (47)	May (136)	Jun (28)	Jul (57)	Aug (42)	Sep (20)	Oct (67)	Nov (37)	Dec (34)
2009	Jan (39)	Feb (85)	Mar (96)	Apr (24)	May (82)	Jun (13)	Jul (10)	Aug (8)	Sep (2)	Oct (20)	Nov (31)	Dec (17)
2010	Jan (16)	Feb (11)	Mar (17)	Apr (53)	May (31)	Jun (13)	Jul (3)	Aug (6)	Sep (11)	Oct (4)	Nov (17)	Dec (17)
2011	Jan (3)	Feb (19)	Mar (5)	Apr (17)	May (3)	Jun (4)	Jul (14)	Aug (3)	Sep (2)	Oct (1)	Nov (3)	Dec (2)
2012	Jan (3)	Feb (7)	Mar (1)	Apr	May (1)	Jun	Jul (4)	Aug (5)	Sep (2)	Oct (3)	Nov	Dec
2013	Jan	Feb	Mar (9)	Apr (5)	May	Jun (2)	Jul (1)	Aug (10)	Sep (1)	Oct (2)	Nov	Dec
2014	Jan (1)	Feb (3)	Mar (3)	Apr (1)	May (4)	Jun	Jul	Aug	Sep (2)	Oct	Nov	Dec
2015	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov	Dec
2016	Jan (1)	Feb	Mar (2)	Apr	May	Jun	Jul	Aug	Sep (1)	Oct	Nov	Dec
2017	Jan	Feb	Mar (1)	Apr	May (5)	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec (2)
2018	Jan	Feb	Mar	Apr	May	Jun (1)	Jul	Aug	Sep	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 46 47 48 49 50 .. 69 > >> (Page 48 of 69)

[Assorted-commits] SF.net SVN: assorted: [727] shell-tools/trunk/src/bash-commons/bashrc.bash

From: <yan...@us...> - 2008-05-08 08:36:51

Revision: 727
          http://assorted.svn.sourceforge.net/assorted/?rev=727&view=rev
Author:   yangzhang
Date:     2008-05-08 01:36:43 -0700 (Thu, 08 May 2008)

Log Message:
-----------
added custom valgrind (instead of using a non-preemptible valgrindrc in configs); still figuring out perl paths and manpaths

Modified Paths:
--------------
    shell-tools/trunk/src/bash-commons/bashrc.bash

Modified: shell-tools/trunk/src/bash-commons/bashrc.bash
===================================================================
--- shell-tools/trunk/src/bash-commons/bashrc.bash	2008-05-08 08:35:45 UTC (rev 726)
+++ shell-tools/trunk/src/bash-commons/bashrc.bash	2008-05-08 08:36:43 UTC (rev 727)
@@ -110,6 +110,8 @@
 
 # TODO fix this to not use an explicit ruby version
 prepend_std PATH bin sbin /sbin /usr/sbin /var/lib/gems/1.8/bin
+# TODO fix this, things like /usr/local/ should get precedence over default
+# locations
 prepend_std MANPATH man ''
 prepend_std INFOPATH info
 # TODO fix this to not use an explicit ruby version
@@ -138,7 +140,7 @@
 
 # perl
 
-prepend_std PERL5LIB lib/perl5/site_perl
+prepend_std PERL5LIB lib/perl5/site_perl lib/perl share/perl
 
 # python
 
@@ -561,6 +563,17 @@
     make -sj2
 }
 
+xvalgrind() {
+    valgrind \
+        --track-fds=yes \
+        --db-attach=yes \
+        --verbose \
+        --tool=memcheck \
+        --leak-check=yes \
+        --leak-resolution=high \
+        --show-reachable=yes
+}
+
 #function set_title() {
 #   if [ $# -eq 0 ] ; then
 #       eval set -- "$PWD"


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [726] shell-tools/trunk/src/bash-commons/assorted. bash

From: <yan...@us...> - 2008-05-08 08:35:57

Revision: 726
          http://assorted.svn.sourceforge.net/assorted/?rev=726&view=rev
Author:   yangzhang
Date:     2008-05-08 01:35:45 -0700 (Thu, 08 May 2008)

Log Message:
-----------
fixed pypi urls

Modified Paths:
--------------
    shell-tools/trunk/src/bash-commons/assorted.bash

Modified: shell-tools/trunk/src/bash-commons/assorted.bash
===================================================================
--- shell-tools/trunk/src/bash-commons/assorted.bash	2008-05-08 08:29:57 UTC (rev 725)
+++ shell-tools/trunk/src/bash-commons/assorted.bash	2008-05-08 08:35:45 UTC (rev 726)
@@ -48,9 +48,9 @@
     elif kind == 'pypi':
       # TODO this should be more robust
       yield ( '[download $version egg]',
-              'http://pypi.python.org/packages/2.5/p/$project/$( echo $project | sed s/-/_/g )-$version-py2.5.egg' )
+              'http://pypi.python.org/packages/2.5/${project:0:1}/$project/$( echo $project | sed s/-/_/g )-$version-py2.5.egg' )
       yield ( '[download $version src tgz]',
-              'http://pypi.python.org/packages/source/p/$project/$package.tar.gz' )
+              'http://pypi.python.org/packages/source/${project:0:1}/$project/$package.tar.gz' )
       yield ( '[PyPI page]',
               'http://pypi.python.org/pypi/$project/' )
 tail = '|' if 'pypi' in kinds() else '| [more downloads] |'


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [725] mailing-list-filter

From: <yan...@us...> - 2008-05-08 08:29:51

Revision: 725
          http://assorted.svn.sourceforge.net/assorted/?rev=725&view=rev
Author:   yangzhang
Date:     2008-05-08 01:29:57 -0700 (Thu, 08 May 2008)

Log Message:
-----------
tagged 0.1 release

Added Paths:
-----------
    mailing-list-filter/tags/
    mailing-list-filter/tags/0.1/
    mailing-list-filter/tags/0.1/README
    mailing-list-filter/tags/0.1/publish.bash
    mailing-list-filter/tags/0.1/setup.py
    mailing-list-filter/tags/0.1/src/mlf.py

Removed Paths:
-------------
    mailing-list-filter/tags/0.1/src/filter.py

Copied: mailing-list-filter/tags/0.1 (from rev 704, mailing-list-filter/trunk)

Copied: mailing-list-filter/tags/0.1/README (from rev 720, mailing-list-filter/trunk/README)
===================================================================
--- mailing-list-filter/tags/0.1/README	                        (rev 0)
+++ mailing-list-filter/tags/0.1/README	2008-05-08 08:29:57 UTC (rev 725)
@@ -0,0 +1,63 @@
+Overview
+--------
+
+I have a Gmail account that I use for subscribing to and posting to mailing
+lists.  When dealing with high-volume mailing lists, I am typically only
+interested in those threads that I participated in.  This is a simple filter
+for starring and marking unread any messages belonging to such threads.
+
+This is accomplished by looking at the set of messages that were either sent
+from me or explicitly addressed to me.  From this "root set" of messages, we
+can use the `Message-ID`, `References`, and `In-Reply-To` headers to determine
+threads, and thus the other messages that we care about.
+
+I have found this to be more accurate than my two original approaches.  I used
+to have Gmail filters that starred/marked unread any messages containing my
+name anywhere in the message.  This worked OK since my name is not too common,
+but it produced some false positives (not that bad, just unstar messages) and
+some false negatives (much harder to detect).
+
+A second approach is to tag all subjects with some signature string.  This
+usually is fine, but it doesn't work when you did not start the thread (and
+thus determine the subject).  You can try to change the subject line, but this
+is (1) poor netiquette, (2) unreliable because your reply may not register in
+other mail clients as being part of the same thread (and thus other
+participants may miss your reply), and (3) unreliable because replies might not
+directly referencing your post (either intentionally or unintentionally).  It
+also fails when others change the subject.  Finally, this approach is
+unsatisfactory because it pollutes subject lines, and it essentially replicates
+exactly what Message-ID was intended for.
+
+This script is not intended to be a replacement for the Gmail filters. I still
+keep those active so that I can get immediate first-pass filtering. I execute
+this script on a daily basis to perform second-pass filtering/unfiltering to
+catch those false negatives that may have been missed.
+
+Setup
+-----
+
+Requirements:
+
+- [argparse](http://argparse.python-hosting.com/)
+- [Python Commons](http://assorted.sf.net/python-commons/) 0.4
+- [path](http://www.jorendorff.com/articles/python/path/)
+
+Install the program using the standard `setup.py` program.
+
+Future Work Ideas
+-----------------
+
+- Currently, we assume that the server specification points to a mailbox
+  containing all messages (both sent and received), and a message is determined
+  to have been sent by you by looking at the From: header field. This works
+  well with Gmail. An alternative strategy is to look through two folders, one
+  that's the Inbox and one that's the Sent mailbox, and treat all messages in
+  Sent as having been sent by you. This is presumably how most other IMAP
+  servers work.
+
+- Implement incremental maintenance of local cache.
+
+- Accept custom operations for filtered/unfiltered messages
+  (trashing/untrashing, labeling/unlabeling, etc.).
+
+- Refactor the message fetching/management part out into its own library.

Copied: mailing-list-filter/tags/0.1/publish.bash (from rev 717, mailing-list-filter/trunk/publish.bash)
===================================================================
--- mailing-list-filter/tags/0.1/publish.bash	                        (rev 0)
+++ mailing-list-filter/tags/0.1/publish.bash	2008-05-08 08:29:57 UTC (rev 725)
@@ -0,0 +1,8 @@
+#!/usr/bin/env bash
+
+fullname='Mailing List Filter'
+version=0.1
+license=psf
+websrcs=( README )
+rels=( pypi: )
+. assorted.bash "$@"

Copied: mailing-list-filter/tags/0.1/setup.py (from rev 721, mailing-list-filter/trunk/setup.py)
===================================================================
--- mailing-list-filter/tags/0.1/setup.py	                        (rev 0)
+++ mailing-list-filter/tags/0.1/setup.py	2008-05-08 08:29:57 UTC (rev 725)
@@ -0,0 +1,28 @@
+#!/usr/bin/env python
+
+from commons.setup import run_setup
+
+pkg_info_text = """
+Metadata-Version: 1.1
+Name: mailing-list-filter
+Version: 0.1
+Author: Yang Zhang
+Author-email: yaaang NOSPAM at REMOVECAPS gmail
+Home-page: http://assorted.sourceforge.net/mailing-list-filter/
+Download-url: http://pypi.python.org/pypi/mailing-list-filter/
+Summary: Mailing List Filter
+License: Python Software Foundation License
+Description: Filter mailing list email for relevant threads only.
+Keywords: mailing,list,email,filter,IMAP,Gmail
+Platform: any
+Provides: commons
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: No Input/Output (Daemon)
+Classifier: Intended Audience :: End Users/Desktop
+Classifier: License :: OSI Approved :: Python Software Foundation License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python
+Classifier: Topic :: Communications :: Email
+"""
+
+run_setup(pkg_info_text, scripts = ['src/mlf.py'])

Deleted: mailing-list-filter/tags/0.1/src/filter.py
===================================================================
--- mailing-list-filter/trunk/src/filter.py	2008-05-07 16:06:28 UTC (rev 704)
+++ mailing-list-filter/tags/0.1/src/filter.py	2008-05-08 08:29:57 UTC (rev 725)
@@ -1,149 +0,0 @@
-#!/usr/bin/env python
-
-"""
-Given an IMAP mailbox, mark all messages as read except for those threads in
-which you were a participant, where thread grouping is performed via the
-In-Reply-To and References headers.
-
-Currently, we assume that the server specification points to a mailbox
-containing all messages (both sent and received), and a message is determined
-to have been sent by you by looking at the From: header field. This should work
-well with Gmail. An alternative strategy is to look through two folders, one
-that's the Inbox and one that's the Sent mailbox, and treat all messages in
-Sent as having been sent by you.
-"""
-
-from __future__ import with_statement
-from collections import defaultdict
-from email import message_from_string
-from getpass import getpass
-from imaplib import IMAP4_SSL
-from argparse import ArgumentParser
-from path import path
-from re import match
-from functools import partial
-from commons.decs import pickle_memoized
-from commons.log import *
-from commons.files import cleanse_filename, soft_makedirs
-from commons.misc import default_if_none
-from commons.networking import logout
-from commons.seqs import concat, grouper
-from commons.startup import run_main
-from contextlib import closing
-
-info  = partial(info,  '')
-debug = partial(debug, '')
-error = partial(error, '')
-die   = partial(die,   '')
-
-def getmail(imap):
-  info( 'finding max seqno' )
-  ok, [seqnos] = imap.search(None, 'ALL')
-  maxseqno = int( seqnos.split()[-1] )
-  del seqnos
-
-  info( 'actually fetching the messages in chunks' )
-  # The syntax/fields of the FETCH command is documented in RFC 2060.  Also,
-  # this article contains a brief overview:
-  # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/
-  # BODY.PEEK prevents the message from automatically being flagged as \Seen.
-  query = '(FLAGS BODY.PEEK[HEADER.FIELDS (Message-ID References In-Reply-To From Subject)])'
-  step = 1000
-  return list( concat(
-    imap.fetch('%d:%d' % (start, start + step - 1), query)[1]
-    for start in xrange(1, maxseqno + 1, step) ) )
-
-def main(argv):
-  import logging
-  config_logging(level = logging.INFO, do_console = True)
-
-  p = ArgumentParser(description = __doc__)
-  p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(),
-      help = """File containing your login credentials, with the username on the
-      first line and the password on the second line.  Ignored iff --prompt.""")
-  p.add_argument('--cachedir', default = path( '~/.mlf.cache' ).expanduser(),
-      help = "Directory to use for caching our data.")
-  p.add_argument('--prompt', action = 'store_true',
-      help = "Interactively prompt for the username and password.")
-  p.add_argument('sender',
-      help = "Your email address.")
-  p.add_argument('server',
-      help = "The server in the format: <host>[:<port>][/<mailbox>].")
-
-  cfg = p.parse_args(argv[1:])
-
-  if cfg.prompt:
-    print "username:",
-    cfg.user = raw_input()
-    print "password:",
-    cfg.passwd = getpass()
-  else:
-    with file(cfg.credfile) as f:
-      [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines())
-
-  try:
-    m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$', cfg.server )
-    cfg.host = m.group('host')
-    cfg.port = int( default_if_none(m.group('port'), 993) )
-    cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX')
-  except:
-    p.error('Need to specify the server in the correct format.')
-
-  soft_makedirs(cfg.cachedir)
-
-  with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap:
-    imap.login(cfg.user, cfg.passwd)
-    with closing(imap) as imap:
-      # Select the main mailbox (INBOX).
-      imap.select(cfg.mailbox)
-
-      # Fetch message IDs, references, and senders.
-      xs = pickle_memoized \
-          (lambda imap: cfg.cachedir / cleanse_filename(cfg.sender)) \
-          (getmail) \
-          (imap)
-
-      debug('fetched:', xs)
-
-      info('determining the set of messages that were sent by you')
-
-      sent = set()
-      for (envelope, data), paren in grouper(2, xs):
-        msg = message_from_string(data)
-        if cfg.sender in msg['From']:
-          sent.add( msg['Message-ID'] )
-
-      info( 'find the threads in which I am a participant' )
-
-      # Every second item is just a closing paren.
-      # Example data:
-      # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}',
-      #   'Message-ID: <mai...@py...>\r\n\r\n'),
-      #  ')',
-      #  ('13301 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {59}',
-      #   'Message-Id: <200...@hv...>\r\n\r\n'),
-      #  ')',
-      #  ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}',
-      #   'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')]
-      for (envelope, data), paren in grouper(2, xs):
-        m = match( r"(?P<seqno>\d+) \(FLAGS \((?P<flags>[^)]+)\)", envelope )
-        seqno = m.group('seqno')
-        flags = m.group('flags')
-        if r'\Flagged' in flags: # flags != r'\Seen' and flags != r'\Seen NonJunk':
-          print 'FLAG'
-          print seqno, flags
-          print '\n'.join( map( str, msg.items() ) )
-          print
-        msg   = message_from_string(data)
-        id    = msg['Message-ID']
-        irt   = default_if_none( msg.get_all('In-Reply-To'), [] )
-        refs  = default_if_none( msg.get_all('References'), [] )
-        refs  = set( ' '.join( irt + refs ).split() )
-        if refs & sent:
-          print 'SENT'
-          print seqno, flags
-          print '\n'.join( map( str, msg.items() ) )
-          print
-#       if refs & sent:
-
-run_main()

Copied: mailing-list-filter/tags/0.1/src/mlf.py (from rev 722, mailing-list-filter/trunk/src/mlf.py)
===================================================================
--- mailing-list-filter/tags/0.1/src/mlf.py	                        (rev 0)
+++ mailing-list-filter/tags/0.1/src/mlf.py	2008-05-08 08:29:57 UTC (rev 725)
@@ -0,0 +1,236 @@
+#!/usr/bin/env python
+
+"""
+Given a Gmail IMAP mailbox, star all messages in which you were a participant
+(either a sender or an explicit recipient in To: or Cc:), where thread grouping
+is performed via the In-Reply-To and References headers.
+"""
+
+from __future__ import with_statement
+from collections import defaultdict
+from email import message_from_string
+from getpass import getpass
+from imaplib import IMAP4_SSL
+from argparse import ArgumentParser
+from path import path
+from re import match
+from functools import partial
+from itertools import count
+from commons.decs import pickle_memoized
+from commons.files import cleanse_filename, soft_makedirs
+from commons.log import *
+from commons.misc import default_if_none, seq
+from commons.networking import logout
+from commons.seqs import concat, grouper
+from commons.startup import run_main
+from contextlib import closing
+import logging
+from commons import log
+
+info    = partial(log.info,    'main')
+debug   = partial(log.debug,   'main')
+warning = partial(log.warning, 'main')
+error   = partial(log.error,   'main')
+die     = partial(log.die,     'main')
+
+def thread_dfs(msg, tid, tid2msgs):
+  assert msg.tid is None
+  msg.tid = tid
+  tid2msgs[tid].append(msg)
+  for ref in msg.refs:
+    if ref.tid is None:
+      thread_dfs(ref, tid, tid2msgs)
+    else:
+      assert ref.tid == tid
+
+def getmail(imap):
+  info( 'finding max UID' )
+  # We use UIDs rather than the default of sequence numbers because UIDs are
+  # guaranteed to be persistent across sessions.  This means that we can, for
+  # instance, fetch messages in one session and operate on this locally cached
+  # data before marking messages in a separate session.
+  ok, [uids] = imap.uid('SEARCH', None, 'ALL')
+  maxuid = int( uids.split()[-1] )
+  del uids
+
+  info( 'actually fetching the messages in chunks up to max', maxuid )
+  # The syntax/fields of the FETCH command is documented in RFC 2060.  Also,
+  # this article contains a brief overview:
+  # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/
+  # BODY.PEEK prevents the message from automatically being flagged as \Seen.
+  query = '(FLAGS BODY.PEEK[HEADER.FIELDS ' \
+          '(Message-ID References In-Reply-To From To Cc Subject)])'
+  step = 1000
+  return list( concat(
+    seq( lambda: info('fetching', start, 'to', start + step - 1),
+         lambda: imap.uid('FETCH', '%d:%d' % (start, start + step - 1),
+                          query)[1] )
+    for start in xrange(1, maxuid + 1, step) ) )
+
+def main(argv):
+  p = ArgumentParser(description = __doc__)
+  p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(),
+      help = """File containing your login credentials, with the username on the
+      first line and the password on the second line.  Ignored iff --prompt.""")
+  p.add_argument('--cachedir', default = path( '~/.mlf.cache' ).expanduser(),
+      help = "Directory to use for caching our data.")
+  p.add_argument('--prompt', action = 'store_true',
+      help = "Interactively prompt for the username and password.")
+  p.add_argument('--pretend', action = 'store_true',
+      help = """Do not actually carry out any updates to the server. Use in
+      conjunction with --debug to observe what would happen.""")
+  p.add_argument('--no-mark-unseen', action = 'store_true',
+      help = "Do not mark newly revelant threads as unread.")
+  p.add_argument('--no-mark-seen', action = 'store_true',
+      help = "Do not mark newly irrevelant threads as read.")
+  p.add_argument('--debug', action = 'append',
+      help = """Enable logging for messages of the given flags. Flags include:
+      refs (references to missing Message-IDs), dups (duplicate Message-IDs),
+      main (the main program logic), and star (which messages are being
+      starred), unstar (which messages are being unstarred).""")
+  p.add_argument('sender',
+      help = "Your email address.")
+  p.add_argument('server',
+      help = "The server in the format: <host>[:<port>][/<mailbox>].")
+
+  cfg = p.parse_args(argv[1:])
+
+  config_logging(level = logging.ERROR, do_console = True, flags = cfg.debug)
+
+  if cfg.prompt:
+    print "username:",
+    cfg.user = raw_input()
+    print "password:",
+    cfg.passwd = getpass()
+  else:
+    with file(cfg.credfile) as f:
+      [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines())
+
+  try:
+    m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$',
+               cfg.server )
+    cfg.host = m.group('host')
+    cfg.port = int( default_if_none(m.group('port'), 993) )
+    cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX')
+  except:
+    p.error('Need to specify the server in the correct format.')
+
+  soft_makedirs(cfg.cachedir)
+
+  with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap:
+    imap.login(cfg.user, cfg.passwd)
+    # Close is only valid in the authenticated state.
+    with closing(imap) as imap:
+      # Select the main mailbox (INBOX).
+      imap.select(cfg.mailbox)
+
+      # Fetch message IDs, references, and senders.
+      xs = pickle_memoized \
+          (lambda imap: cfg.cachedir / cleanse_filename(cfg.sender)) \
+          (getmail) \
+          (imap)
+
+      log.debug('fetched', xs)
+
+      info('building message-id map and determining the set of messages sent '
+           'by you or addressed to you (the "source set")')
+
+      srcs = []
+      mid2msg = {}
+      # Every second item is just a closing paren.
+      # Example data:
+      # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}',
+      #   'Message-ID: <mai...@py...>\r\n\r\n'),
+      #  ')',
+      #  ('13301 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {59}',
+      #   'Message-Id: <200...@hv...>\r\n\r\n'),
+      #  ')',
+      #  ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}',
+      #   'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')]
+      for (envelope, data), paren in grouper(2, xs):
+        # Parse the body.
+        msg = message_from_string(data)
+
+        # Parse the envelope.
+        m = match(
+            r"(?P<seqno>\d+) \(UID (?P<uid>\d+) FLAGS \((?P<flags>[^)]+)\)",
+            envelope )
+        msg.seqno = m.group('seqno')
+        msg.uid   = m.group('uid')
+        msg.flags = m.group('flags').split()
+
+        # Prepare a container for references to other msgs, and initialize the
+        # thread ID.
+        msg.refs = []
+        msg.tid = None
+
+        # Add these to the map.
+        if msg['Message-ID'] in mid2msg:
+          log.warning( 'dups', 'duplicate message IDs:',
+                        msg['Message-ID'], msg['Subject'] )
+        mid2msg[ msg['Message-ID'] ] = msg
+
+        # Add to "srcs" set if sent by us or addressed to us.
+        if ( cfg.sender in default_if_none( msg['From'], '' ) or
+             cfg.sender in default_if_none( msg['To'],   '' ) or
+             cfg.sender in default_if_none( msg['Cc'],   '' ) ):
+          srcs.append( msg )
+
+      info( 'constructing undirected graph' )
+
+      for mid, msg in mid2msg.iteritems():
+        # Extract any references.
+        irt   = default_if_none( msg.get_all('In-Reply-To'), [] )
+        refs  = default_if_none( msg.get_all('References'), [] )
+        refs  = set( ' '.join( irt + refs ).replace('><', '> <').split() )
+
+        # Connect nodes in graph bidirectionally.  Ignore references to MIDs
+        # that don't exist.
+        for ref in refs:
+          try:
+            refmsg = mid2msg[ref]
+            # We can use lists/append (not worry about duplicates) because the
+            # original sources should be acyclic.  If a -> b, then there is no b ->
+            # a, so when crawling a we can add a <-> b without worrying that later
+            # we may re-add b -> a.
+            msg.refs.append(refmsg)
+            refmsg.refs.append(msg)
+          except:
+            log.warning( 'refs', ref )
+
+      info('finding connected components (grouping the messages into threads)')
+
+      tids = count()
+      tid2msgs = defaultdict(list)
+      for mid, msg in mid2msg.iteritems():
+        if msg.tid is None:
+          thread_dfs(msg, tids.next(), tid2msgs)
+
+      info( 'starring the relevant threads, in which I am a participant' )
+
+      rel_tids = set()
+      for srcmsg in srcs:
+        if srcmsg.tid not in rel_tids:
+          rel_tids.add(srcmsg.tid)
+          for msg in tid2msgs[srcmsg.tid]:
+            if r'\Flagged' not in msg.flags:
+              log.info( 'star', '\n', msg )
+              if not cfg.pretend:
+                imap.uid('STORE', msg.uid, '+FLAGS', r'\Flagged')
+                if not cfg.no_mark_unseen and r'\Seen' in msg.flags:
+                  imap.uid('STORE', msg.uid, '-FLAGS', r'\Seen')
+
+      info( 'unstarring irrelevant threads, in which I am not a participant' )
+
+      all_tids = set( tid2msgs.iterkeys() )
+      irrel_tids = all_tids - rel_tids
+      for tid in irrel_tids:
+        for msg in tid2msgs[tid]:
+          if r'\Flagged' in msg.flags:
+            log.info( 'unstar', '\n', msg )
+            if not cfg.pretend:
+              imap.uid('STORE', msg.uid, '-FLAGS', r'\Flagged')
+              if not cfg.no_mark_seen and r'\Seen' not in msg.flags:
+                imap.uid('STORE', msg.uid, '+FLAGS', r'\Seen')
+
+run_main()


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [724] python-commons/tags

From: <yan...@us...> - 2008-05-08 08:29:17

Revision: 724
          http://assorted.svn.sourceforge.net/assorted/?rev=724&view=rev
Author:   yangzhang
Date:     2008-05-08 01:29:25 -0700 (Thu, 08 May 2008)

Log Message:
-----------
tagged release 0.4

Added Paths:
-----------
    python-commons/tags/0.4/
    python-commons/tags/0.4/README
    python-commons/tags/0.4/publish.bash
    python-commons/tags/0.4/setup.py
    python-commons/tags/0.4/src/commons/decs.py
    python-commons/tags/0.4/src/commons/files.py
    python-commons/tags/0.4/src/commons/misc.py
    python-commons/tags/0.4/src/commons/networking.py
    python-commons/tags/0.4/src/commons/seqs.py
    python-commons/tags/0.4/src/commons/setup.py

Removed Paths:
-------------
    python-commons/tags/0.4/README
    python-commons/tags/0.4/publish.bash
    python-commons/tags/0.4/setup.py
    python-commons/tags/0.4/src/commons/decs.py
    python-commons/tags/0.4/src/commons/files.py
    python-commons/tags/0.4/src/commons/misc.py
    python-commons/tags/0.4/src/commons/networking.py
    python-commons/tags/0.4/src/commons/seqs.py
    python-commons/tags/0.4/src/commons/setup.py

Copied: python-commons/tags/0.4 (from rev 679, python-commons/trunk)

Deleted: python-commons/tags/0.4/README
===================================================================
--- python-commons/trunk/README	2008-04-24 16:13:04 UTC (rev 679)
+++ python-commons/tags/0.4/README	2008-05-08 08:29:25 UTC (rev 724)
@@ -1,35 +0,0 @@
-[documentation](doc)
-
-Overview
---------
-
-Python Commons is a general-purpose library for Python. To get a sense of
-what it provides, please glance over the [documentation](doc).
-
-Requirements
-------------
-
-- [Python](http://python.org/) 2.5
-- [setuptools](http://peak.telecommunity.com/DevCenter/setuptools) 0.6
-
-Certain sub-modules have extra requirements:
-
-- `async` requires [Twisted](http://twistedmatrix.com/trac/) 2.5
-- `files` requires [path](http://www.jorendorff.com/articles/python/path/) 2.2
-
-This library has only been tested on Linux.
-
-Setup
------
-
-To install, run `easy_install python-commons`, or download the source tarball
-and run `python setup.py install`.
-
-Related Work
-------------
-
-- [ASPN Cookbook]: a valuable repository of Python snippets
-- [AIMA Utilities]: accompaniment to a popular AI textbook
-
-[ASPN Cookbook]: http://aspn.activestate.com/ASPN/Cookbook/Python
-[AIMA Utilities]: http://aima.cs.berkeley.edu/python/utils.py

Copied: python-commons/tags/0.4/README (from rev 723, python-commons/trunk/README)
===================================================================
--- python-commons/tags/0.4/README	                        (rev 0)
+++ python-commons/tags/0.4/README	2008-05-08 08:29:25 UTC (rev 724)
@@ -0,0 +1,65 @@
+[documentation](doc)
+
+Overview
+--------
+
+Python Commons is a general-purpose library for Python. To get a sense of
+what it provides, please glance over the [documentation](doc).
+
+Requirements
+------------
+
+- [Python](http://python.org/) 2.5
+- [setuptools](http://peak.telecommunity.com/DevCenter/setuptools) 0.6
+
+Certain sub-modules have extra requirements:
+
+- `async` requires [Twisted](http://twistedmatrix.com/trac/) 2.5
+- `files` requires [path](http://www.jorendorff.com/articles/python/path/) 2.2
+
+This library has only been tested on Linux.
+
+Setup
+-----
+
+To install, run `easy_install python-commons`, or download the source tarball
+and run `python setup.py install`.
+
+Related Work
+------------
+
+- [ASPN Cookbook]: a valuable repository of Python snippets
+- [AIMA Utilities]: accompaniment to a popular AI textbook
+
+[ASPN Cookbook]: http://aspn.activestate.com/ASPN/Cookbook/Python
+[AIMA Utilities]: http://aima.cs.berkeley.edu/python/utils.py
+
+Changes
+-------
+
+version 0.4
+
+- removed extraneous debug print statements
+- added `logout()` context manager
+- added `seq()`, `default_if_none()`
+- fixed missing `import` bug
+- released for [Mailing List
+  Filter](http://assorted.sf.net/mailing-list-filter/)
+
+version 0.3
+
+- added versioned guards
+- added file memoization
+- added retry with exp backoff
+- added `countstep()`
+- released for
+  [gbookmark2delicious](http://gbookmark2delicious.googlecode.com/)
+
+version 0.2
+
+- added `clients`, `setup`
+- released for [icedb](http://cartel.csail.mit.edu/icedb/)
+
+version 0.1
+
+- initial release

Deleted: python-commons/tags/0.4/publish.bash
===================================================================
--- python-commons/trunk/publish.bash	2008-04-24 16:13:04 UTC (rev 679)
+++ python-commons/tags/0.4/publish.bash	2008-05-08 08:29:25 UTC (rev 724)
@@ -1,12 +0,0 @@
-#!/usr/bin/env bash
-
-post-stage() {
-  epydoc -o $stagedir/doc src/commons/
-}
-
-fullname='Python Commons'
-version=0.2
-license=psf
-websrcs=( README )
-rels=( pypi: )
-. assorted.bash "$@"

Copied: python-commons/tags/0.4/publish.bash (from rev 723, python-commons/trunk/publish.bash)
===================================================================
--- python-commons/tags/0.4/publish.bash	                        (rev 0)
+++ python-commons/tags/0.4/publish.bash	2008-05-08 08:29:25 UTC (rev 724)
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+
+post-stage() {
+  epydoc -o $stagedir/doc src/commons/
+}
+
+fullname='Python Commons'
+version=0.4
+license=psf
+websrcs=( README )
+rels=( pypi: )
+. assorted.bash "$@"

Deleted: python-commons/tags/0.4/setup.py
===================================================================
--- python-commons/trunk/setup.py	2008-04-24 16:13:04 UTC (rev 679)
+++ python-commons/tags/0.4/setup.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -1,43 +0,0 @@
-#!/usr/bin/env python
-# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
-# vim:ft=python:et:sw=4:ts=4
-
-import os,sys
-sys.path.insert( 0, os.path.join( os.path.dirname( sys.argv[0] ), 'src' ) )
-from commons import setup
-
-pkg_info_text = """
-Metadata-Version: 1.1
-Name: python-commons
-Version: 0.2
-Author: Yang Zhang
-Author-email: yaaang NOSPAM at REMOVECAPS gmail
-Home-page: http://assorted.sourceforge.net/python-commons
-Summary: Python Commons
-License: Python Software Foundation License
-Description: General-purpose library of utilities and extensions to the
-        standard library.
-Keywords: Python,common,commons,utility,utilities,library,libraries
-Platform: any
-Provides: commons
-Classifier: Development Status :: 4 - Beta
-Classifier: Environment :: No Input/Output (Daemon)
-Classifier: Intended Audience :: Developers
-Classifier: License :: OSI Approved :: Python Software Foundation License
-Classifier: Operating System :: OS Independent
-Classifier: Programming Language :: Python
-Classifier: Topic :: Communications
-Classifier: Topic :: Database
-Classifier: Topic :: Internet
-Classifier: Topic :: Software Development :: Libraries :: Python Modules
-Classifier: Topic :: System
-Classifier: Topic :: System :: Filesystems
-Classifier: Topic :: System :: Logging
-Classifier: Topic :: System :: Networking
-Classifier: Topic :: Text Processing
-Classifier: Topic :: Utilities
-"""
-
-setup.run_setup( pkg_info_text,
-                 #scripts = ['frontend/py_hotshot.py'],
-                 )

Copied: python-commons/tags/0.4/setup.py (from rev 723, python-commons/trunk/setup.py)
===================================================================
--- python-commons/tags/0.4/setup.py	                        (rev 0)
+++ python-commons/tags/0.4/setup.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -0,0 +1,43 @@
+#!/usr/bin/env python
+# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
+# vim:ft=python:et:sw=4:ts=4
+
+import os,sys
+sys.path.insert( 0, os.path.join( os.path.dirname( sys.argv[0] ), 'src' ) )
+from commons import setup
+
+pkg_info_text = """
+Metadata-Version: 1.1
+Name: python-commons
+Version: 0.4
+Author: Yang Zhang
+Author-email: yaaang NOSPAM at REMOVECAPS gmail
+Home-page: http://assorted.sourceforge.net/python-commons
+Summary: Python Commons
+License: Python Software Foundation License
+Description: General-purpose library of utilities and extensions to the
+        standard library.
+Keywords: Python,common,commons,utility,utilities,library,libraries
+Platform: any
+Provides: commons
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: No Input/Output (Daemon)
+Classifier: Intended Audience :: Developers
+Classifier: License :: OSI Approved :: Python Software Foundation License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python
+Classifier: Topic :: Communications
+Classifier: Topic :: Database
+Classifier: Topic :: Internet
+Classifier: Topic :: Software Development :: Libraries :: Python Modules
+Classifier: Topic :: System
+Classifier: Topic :: System :: Filesystems
+Classifier: Topic :: System :: Logging
+Classifier: Topic :: System :: Networking
+Classifier: Topic :: Text Processing
+Classifier: Topic :: Utilities
+"""
+
+setup.run_setup( pkg_info_text,
+                 #scripts = ['frontend/py_hotshot.py'],
+                 )

Deleted: python-commons/tags/0.4/src/commons/decs.py
===================================================================
--- python-commons/trunk/src/commons/decs.py	2008-04-24 16:13:04 UTC (rev 679)
+++ python-commons/tags/0.4/src/commons/decs.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -1,92 +0,0 @@
-# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
-# vim:ft=python:et:sw=4:ts=4
-
-"""
-Decorators and decorator utilities.
-
-@todo: Move the actual decorators to modules based on their topic.
-"""
-
-import functools, inspect, xmlrpclib
-
-def wrap_callable(any_callable, before, after):
-    """
-    Wrap any callable with before/after calls.
-
-    From the Python Cookbook. Modified to support C{None} for
-    C{before} or C{after}.
-
-    @copyright: O'Reilly Media
-
-    @param any_callable: The function to decorate.
-    @type any_callable: function
-    
-    @param before: The pre-processing procedure. If this is C{None}, then no pre-processing will be done.
-    @type before: function
-    
-    @param after: The post-processing procedure. If this is C{None}, then no post-processing will be done.
-    @type after: function
-    """
-    def _wrapped(*a, **kw):
-        if before is not None:
-            before( )
-        try:
-            return any_callable(*a, **kw)
-        finally:
-            if after is not None:
-                after( )
-    # In 2.4, only: _wrapped.__name__ = any_callable.__name__
-    return _wrapped
-
-class GenericWrapper( object ):
-    """
-    Wrap all of an object's methods with before/after calls. This is
-    like a decorator for objects.
-
-    From the I{Python Cookbook}.
-
-    @copyright: O'Reilly Media
-    """
-    def __init__(self, obj, before, after, ignore=( )):
-        # we must set into __dict__ directly to bypass __setattr__; so,
-        # we need to reproduce the name-mangling for double-underscores
-        clasname = 'GenericWrapper'
-        self.__dict__['_%s__methods' % clasname] = {  }
-        self.__dict__['_%s__obj' % clasname] = obj
-        for name, method in inspect.getmembers(obj, inspect.ismethod):
-            if name not in ignore and method not in ignore:
-                self.__methods[name] = wrap_callable(method, before, after)
-    def __getattr__(self, name):
-        try:
-            return self.__methods[name]
-        except KeyError:
-            return getattr(self.__obj, name)
-    def __setattr__(self, name, value):
-        setattr(self.__obj, name, value)
-
-##########################################################
-
-def xmlrpc_safe(func):
-    """
-    Makes a procedure "XMLRPC-safe" by returning 0 whenever the inner
-    function returns C{None}. This is useful because XMLRPC requires
-    return values, and 0 is commonly used when functions don't intend
-    to return anything.
-
-    Also, if the procedure returns a boolean, it will be wrapped in
-    L{xmlrpclib.Boolean}.
-
-    @param func: The procedure to decorate.
-    @type func: function
-    """
-    @functools.wraps(func)
-    def wrapper(*args,**kwargs):
-        result = func(*args,**kwargs)
-        if result is not None:
-            if type( result ) == bool:
-                return xmlrpclib.Boolean( result )
-            else:
-                return result
-        else:
-            return 0
-    return wrapper

Copied: python-commons/tags/0.4/src/commons/decs.py (from rev 687, python-commons/trunk/src/commons/decs.py)
===================================================================
--- python-commons/tags/0.4/src/commons/decs.py	                        (rev 0)
+++ python-commons/tags/0.4/src/commons/decs.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -0,0 +1,156 @@
+# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
+# vim:ft=python:et:sw=4:ts=4
+
+"""
+Decorators and decorator utilities.
+
+@todo: Move the actual decorators to modules based on their topic.
+"""
+
+from __future__ import with_statement
+import functools, inspect, xmlrpclib
+from cPickle import *
+
+def wrap_callable(any_callable, before, after):
+    """
+    Wrap any callable with before/after calls.
+
+    From the Python Cookbook. Modified to support C{None} for
+    C{before} or C{after}.
+
+    @copyright: O'Reilly Media
+
+    @param any_callable: The function to decorate.
+    @type any_callable: function
+    
+    @param before: The pre-processing procedure. If this is C{None}, then no pre-processing will be done.
+    @type before: function
+    
+    @param after: The post-processing procedure. If this is C{None}, then no post-processing will be done.
+    @type after: function
+    """
+    def _wrapped(*a, **kw):
+        if before is not None:
+            before( )
+        try:
+            return any_callable(*a, **kw)
+        finally:
+            if after is not None:
+                after( )
+    # In 2.4, only: _wrapped.__name__ = any_callable.__name__
+    return _wrapped
+
+class GenericWrapper( object ):
+    """
+    Wrap all of an object's methods with before/after calls. This is
+    like a decorator for objects.
+
+    From the I{Python Cookbook}.
+
+    @copyright: O'Reilly Media
+    """
+    def __init__(self, obj, before, after, ignore=( )):
+        # we must set into __dict__ directly to bypass __setattr__; so,
+        # we need to reproduce the name-mangling for double-underscores
+        clasname = 'GenericWrapper'
+        self.__dict__['_%s__methods' % clasname] = {  }
+        self.__dict__['_%s__obj' % clasname] = obj
+        for name, method in inspect.getmembers(obj, inspect.ismethod):
+            if name not in ignore and method not in ignore:
+                self.__methods[name] = wrap_callable(method, before, after)
+    def __getattr__(self, name):
+        try:
+            return self.__methods[name]
+        except KeyError:
+            return getattr(self.__obj, name)
+    def __setattr__(self, name, value):
+        setattr(self.__obj, name, value)
+
+##########################################################
+
+def xmlrpc_safe(func):
+    """
+    Makes a procedure "XMLRPC-safe" by returning 0 whenever the inner
+    function returns C{None}. This is useful because XMLRPC requires
+    return values, and 0 is commonly used when functions don't intend
+    to return anything.
+
+    Also, if the procedure returns a boolean, it will be wrapped in
+    L{xmlrpclib.Boolean}.
+
+    @param func: The procedure to decorate.
+    @type func: function
+    """
+    @functools.wraps(func)
+    def wrapper(*args,**kwargs):
+        result = func(*args,**kwargs)
+        if result is not None:
+            if type( result ) == bool:
+                return xmlrpclib.Boolean( result )
+            else:
+                return result
+        else:
+            return 0
+    return wrapper
+
+##########################################################
+
+def file_memoized(serializer, deserializer, pathfunc):
+    """
+    The string result of the given function is saved to the given path.
+
+    Example::
+
+        @file_memoized(lambda x,f: f.write(x),
+                       lambda f: f.read(),
+                       lambda: "/tmp/cache")
+        def foo(): return "hello"
+
+        @file_memoized(pickle.dump,
+                       pickle.load,
+                       lambda x,y: "/tmp/cache-%d-%d" % (x,y))
+        def foo(x,y): return "hello %d %d" % (x,y)
+
+    @param serializer: The function to serialize the return value into a
+                       string.  This should take the return value object and
+                       the file object.
+    @type serializer: function
+
+    @param deserializer: The function te deserialize the cache file contents
+                         into the return value.  This should take the file
+                         object and return a string.
+    type: deserializer: function
+
+    @param pathfunc: Returns the path where the files should be saved.  This
+                     should be able to take the same arguments as the original
+                     function.
+    @type pathfunc: str
+    """
+    def dec(func):
+        @functools.wraps(func)
+        def wrapper(*args, **kwargs):
+            p = pathfunc(*args, **kwargs)
+            try:
+                with file(p) as f:
+                    return deserializer(f)
+            except IOError, (errno, errstr):
+                if errno != 2: raise
+                with file(p, 'w') as f:
+                    x = func(*args, **kwargs)
+                    serializer(x, f)
+                    return x
+        return wrapper
+    return dec
+
+def file_string_memoized(pathfunc):
+    """
+    Wrapper around L{file_memoized} that expects the decorated function to
+    return strings, so the string is written verbatim.
+    """
+    return file_memoized(lambda x,f: f.write(x), lambda f: f.read(), pathfunc)
+
+def pickle_memoized(pathfunc):
+    """
+    Wrapper around L{file_memoized} that uses pickle.
+    """
+    return file_memoized(dump, load, pathfunc)

Deleted: python-commons/tags/0.4/src/commons/files.py
===================================================================
--- python-commons/trunk/src/commons/files.py	2008-04-24 16:13:04 UTC (rev 679)
+++ python-commons/tags/0.4/src/commons/files.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -1,115 +0,0 @@
-# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
-# vim:ft=python:et:sw=4:ts=4
-
-"""
-File and directory manipulation.
-
-@var invalid_filename_chars: The characters which are usually
-prohibited on most modern file systems.
-
-@var invalid_filename_chars_regex: A regex character class constructed
-from L{invalid_filename_chars}.
-"""
-
-from __future__ import with_statement
-
-import os, re, tempfile
-
-from path import path
-
-def soft_makedirs( path ):
-    """
-    Emulate C{mkdir -p} (doesn't complain if it already exists).
-
-    @param path: The path of the directory to create.
-    @type path: str
-
-    @raise OSError: If it cannot create the directory. It only
-    swallows OS error 17.
-    """
-    try:
-        os.makedirs( path )
-    except OSError, ex:
-        if ex.errno == 17:
-            pass
-        else:
-            raise
-
-def temp_dir( base_dir_name, do_create_subdir = True ):
-    """
-    Get a temporary directory without polluting top-level /tmp. This follows
-    Ubuntu's conventions, choosing a temporary directory name based on
-    the given name plus the user name to avoid user conflicts.
-
-    @param base_dir_name: The "name" of the temporary directory. This
-    is usually identifies the purpose of the directory, or the
-    application to which the temporary directory belongs. E.g., if joe
-    calls passes in C{"ssh-agent"} on a standard Linux/Unix system,
-    then the full path of the temporary directory will be
-    C{"/tmp/ssh-agent-joe"}.
-    @type base_dir_name: str
-
-    @param do_create_subdir: If C{True}, then creates a
-    sub-sub-directory within the temporary sub-directory (and returns
-    the path to that). The sub-sub-directory's name is randomized
-    (uses L{tempfile.mkdtemp}).
-    @type do_create_subdir: bool
-
-    @return: The path to the temporary (sub-)sub-directory.
-    @rtype: str
-    """
-    base_dir_name += '-' + os.environ[ 'USER' ]
-    base_dir = path( tempfile.gettempdir() ) / base_dir_name
-    soft_makedirs( base_dir )
-    if do_create_subdir:
-        return tempfile.mkdtemp( dir = base_dir )
-    else:
-        return base_dir
-
-invalid_filename_chars = r'*|\/:<>?'
-invalid_filename_chars_regex = r'[*|\\\/:<>?]'
-
-def cleanse_filename( filename ):
-    """
-    Replaces all problematic characters in a filename with C{"_"}, as
-    specified by L{invalid_filename_chars}.
-
-    @param filename: The filename to cleanse.
-    @type filename: str
-    """
-    pattern = invalid_filename_chars_regex
-    return re.sub( pattern, '_', filename )
-
-class disk_double_buffer( object ):
-    """
-    A simple disk double-buffer. One file is for reading, the other is for
-    writing, and a facility for swapping the two roles is provided.
-    """
-    def __init__( self, path_base, do_persist = True ):
-        self.paths = map( path, [ path_base + '.0', path_base + '.1' ] )
-        self.do_persist = do_persist
-        self.switch_status = path( path_base + '.switched' )
-        if not do_persist or not self.switch_status.exists():
-            self.w, self.r = 0, 1 # default
-        else:
-            self.w, self.r = 1, 0
-        self.reload_files()
-    def reload_files( self ):
-        self.writer = file( self.paths[ self.w ], 'w' )
-        if not self.paths[ self.r ].exists():
-            self.paths[ self.r ].touch()
-        self.reader = file( self.paths[ self.r ] )
-    def switch( self ):
-        self.close()
-        if self.do_persist:
-            if self.w == 0: self.switch_status.touch()
-            else:           self.switch_status.remove()
-        self.r, self.w = self.w, self.r
-        self.reload_files()
-    def write( self, x ):
-        self.writer.write( x )
-    def read( self, len = 8192 ):
-        return self.reader.read( len )
-    def close( self ):
-        self.reader.close()
-        self.writer.close()

Copied: python-commons/tags/0.4/src/commons/files.py (from rev 693, python-commons/trunk/src/commons/files.py)
===================================================================
--- python-commons/tags/0.4/src/commons/files.py	                        (rev 0)
+++ python-commons/tags/0.4/src/commons/files.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -0,0 +1,171 @@
+# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
+# vim:ft=python:et:sw=4:ts=4
+
+"""
+File and directory manipulation.
+
+@var invalid_filename_chars: The characters which are usually
+prohibited on most modern file systems.
+
+@var invalid_filename_chars_regex: A regex character class constructed
+from L{invalid_filename_chars}.
+"""
+
+from __future__ import with_statement
+
+import os, re, tempfile
+from cPickle import *
+from path import path
+
+def soft_makedirs( path ):
+    """
+    Emulate C{mkdir -p} (doesn't complain if it already exists).
+
+    @param path: The path of the directory to create.
+    @type path: str
+
+    @raise OSError: If it cannot create the directory. It only
+    swallows OS error 17.
+    """
+    try:
+        os.makedirs( path )
+    except OSError, ex:
+        if ex.errno == 17:
+            pass
+        else:
+            raise
+
+def temp_dir( base_dir_name, do_create_subdir = True ):
+    """
+    Get a temporary directory without polluting top-level /tmp. This follows
+    Ubuntu's conventions, choosing a temporary directory name based on
+    the given name plus the user name to avoid user conflicts.
+
+    @param base_dir_name: The "name" of the temporary directory. This
+    is usually identifies the purpose of the directory, or the
+    application to which the temporary directory belongs. E.g., if joe
+    calls passes in C{"ssh-agent"} on a standard Linux/Unix system,
+    then the full path of the temporary directory will be
+    C{"/tmp/ssh-agent-joe"}.
+    @type base_dir_name: str
+
+    @param do_create_subdir: If C{True}, then creates a
+    sub-sub-directory within the temporary sub-directory (and returns
+    the path to that). The sub-sub-directory's name is randomized
+    (uses L{tempfile.mkdtemp}).
+    @type do_create_subdir: bool
+
+    @return: The path to the temporary (sub-)sub-directory.
+    @rtype: str
+    """
+    base_dir_name += '-' + os.environ[ 'USER' ]
+    base_dir = path( tempfile.gettempdir() ) / base_dir_name
+    soft_makedirs( base_dir )
+    if do_create_subdir:
+        return tempfile.mkdtemp( dir = base_dir )
+    else:
+        return base_dir
+
+invalid_filename_chars = r'*|\/:<>?'
+invalid_filename_chars_regex = r'[*|\\\/:<>?]'
+
+def cleanse_filename( filename ):
+    """
+    Replaces all problematic characters in a filename with C{"_"}, as
+    specified by L{invalid_filename_chars}.
+
+    @param filename: The filename to cleanse.
+    @type filename: str
+    """
+    pattern = invalid_filename_chars_regex
+    return re.sub( pattern, '_', filename )
+
+class disk_double_buffer( object ):
+    """
+    A simple disk double-buffer. One file is for reading, the other is for
+    writing, and a facility for swapping the two roles is provided.
+    """
+    def __init__( self, path_base, do_persist = True ):
+        self.paths = map( path, [ path_base + '.0', path_base + '.1' ] )
+        self.do_persist = do_persist
+        self.switch_status = path( path_base + '.switched' )
+        if not do_persist or not self.switch_status.exists():
+            self.w, self.r = 0, 1 # default
+        else:
+            self.w, self.r = 1, 0
+        self.reload_files()
+    def reload_files( self ):
+        self.writer = file( self.paths[ self.w ], 'w' )
+        if not self.paths[ self.r ].exists():
+            self.paths[ self.r ].touch()
+        self.reader = file( self.paths[ self.r ] )
+    def switch( self ):
+        self.close()
+        if self.do_persist:
+            if self.w == 0: self.switch_status.touch()
+            else:           self.switch_status.remove()
+        self.r, self.w = self.w, self.r
+        self.reload_files()
+    def write( self, x ):
+        self.writer.write( x )
+    def read( self, len = 8192 ):
+        return self.reader.read( len )
+    def close( self ):
+        self.reader.close()
+        self.writer.close()
+
+def versioned_guard(path, fresh_version):
+    """
+    Maintain a version object.  This is useful for working with versioned
+    caches.
+
+    @param path: The path to the file containing the cached version object.
+    @type path: str
+
+    @param fresh_version: The actual latest version that the cached version
+    should be compared against.
+    @type fresh_version: object (any type that can be compared)
+
+    @return: True iff the cached version is obsolete (less than the fresh
+    version or doesn't exist).
+    @rtype: bool
+    """
+    cache_version = None
+    try:
+        with file( path ) as f: cache_version = load(f)
+    except IOError, (errno, errstr):
+        if errno != 2: raise
+    if cache_version is None or fresh_version > cache_version:
+        with file( path, 'w' ) as f: dump(fresh_version, f)
+        return True
+    else:
+        return False
+
+def versioned_cache(version_path, fresh_version, cache_path, cache_func):
+    """
+    If fresh_version is newer than the version in version_path, then invoke
+    cache_func and cache the result in cache_path (using pickle).
+
+    Note the design flaw with L{versioned_guard}: the updated version value is
+    stored immediately, rather than after updating the cache.
+
+    @param version_path: The path to the file version.
+    @type version_path: str
+
+    @param fresh_version: The actual, up-to-date version value.
+    @type fresh_version: object (any type that can be compared)
+
+    @param cache_path: The path to the cached data.
+    @type cache_path: str
+
+    @param cache_func: The function that produces the fresh data to be cached.
+    @type cache_func: function (no arguments)
+    """
+    if versioned_guard( version_path, fresh_version ):
+        # cache obsolete, force-fetch new data
+        result = cache_func()
+        with file(cache_path, 'w') as f: dump(result, f)
+        return result
+    else:
+        # cache up-to-date (should be available since dlcs-timestamp exists!)
+        with file(cache_path) as f: return load(f)

Deleted: python-commons/tags/0.4/src/commons/misc.py
===================================================================
--- python-commons/trunk/src/commons/misc.py	2008-04-24 16:13:04 UTC (rev 679)
+++ python-commons/tags/0.4/src/commons/misc.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -1,48 +0,0 @@
-# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
-# vim:ft=python:et:sw=4:ts=4
-
-from contextlib import *
-from time import *
-
-"""
-Miscellanea.
-"""
-
-def generate_bit_fields(count):
-    """
-    A generator of [2^i] for i from 0 to (count - 1). Useful for,
-    e.g., enumerating bitmask flags::
-
-        red, yellow, green, blue = generate_bit_fields(4)
-        color1 = blue
-        color2 = red | yellow
-
-    @param count: The number of times to perform the left-shift.
-    @type count: int
-    """
-    j = 1
-    for i in xrange( count ):
-        yield j
-        j <<= 1
-
-@contextmanager
-def wall_clock(output):
-    """
-    A simple timer for code sections.
-
-    @param output: The resulting time is put into index 0 of L{output}.
-    @type output: index-writeable
-
-    Example:
-
-        t = [0]
-        with wall_clock(t):
-            sleep(1)
-        print "the sleep operation took %d seconds" % t[0]
-    """
-    start = time()
-    try:
-        yield
-    finally:
-        end = time()
-        output[0] = end - start

Copied: python-commons/tags/0.4/src/commons/misc.py (from rev 705, python-commons/trunk/src/commons/misc.py)
===================================================================
--- python-commons/tags/0.4/src/commons/misc.py	                        (rev 0)
+++ python-commons/tags/0.4/src/commons/misc.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -0,0 +1,62 @@
+# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
+# vim:ft=python:et:sw=4:ts=4
+
+from contextlib import *
+from time import *
+
+"""
+Miscellanea.
+"""
+
+def generate_bit_fields(count):
+    """
+    A generator of [2^i] for i from 0 to (count - 1). Useful for,
+    e.g., enumerating bitmask flags::
+
+        red, yellow, green, blue = generate_bit_fields(4)
+        color1 = blue
+        color2 = red | yellow
+
+    @param count: The number of times to perform the left-shift.
+    @type count: int
+    """
+    j = 1
+    for i in xrange( count ):
+        yield j
+        j <<= 1
+
+@contextmanager
+def wall_clock(output):
+    """
+    A simple timer for code sections.
+
+    @param output: The resulting time is put into index 0 of L{output}.
+    @type output: index-writeable
+
+    Example:
+
+        t = [0]
+        with wall_clock(t):
+            sleep(1)
+        print "the sleep operation took %d seconds" % t[0]
+    """
+    start = time()
+    try:
+        yield
+    finally:
+        end = time()
+        output[0] = end - start
+
+def default_if_none(x, d):
+    """
+    Returns L{x} if it's not None, otherwise returns L{d}.
+    """
+    if x is None: return d
+    else: return x
+
+def seq(f, g):
+    """
+    Evaluate 0-ary functions L{f} then L{g}, returning L{g()}.
+    """
+    f()
+    return g()

Deleted: python-commons/tags/0.4/src/commons/networking.py
===================================================================
--- python-commons/trunk/src/commons/networking.py	2008-04-24 16:13:04 UTC (rev 679)
+++ python-commons/tags/0.4/src/commons/networking.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -1,39 +0,0 @@
-# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
-# vim:ft=python:et:sw=4:ts=4
-
-"""
-Networking tools.
-"""
-
-import os, sys
-
-class NoMacAddrError( Exception ): pass
-
-def get_mac_addr():
-    """
-    Simply parses the output of C{ifconfig} or C{ipconfig} to estimate
-    this machine's IP address. This is not at all reliable, but tends
-    to work "well enough" for my own purposes.
-
-    From U{http://mail.python.org/pipermail/python-list/2005-December/357300.html}.
-
-    @copyright: Frank Millman
-
-    Note that U{http://libdnet.sf.net/} provides this functionality and much
-    more.
-    """
-    mac = None
-    if sys.platform == 'win32':
-        for line in os.popen("ipconfig /all"):
-            if line.lstrip().startswith('Physical Address'):
-                mac = line.split(':')[1].strip().replace('-',':')
-                break
-    else:
-        for line in os.popen("/sbin/ifconfig"):
-            if line.find('Ether') > -1:
-                mac = line.split()[4]
-                break
-    if mac is None:
-        raise NoMacAddrError
-    return mac
-

Copied: python-commons/tags/0.4/src/commons/networking.py (from rev 706, python-commons/trunk/src/commons/networking.py)
===================================================================
--- python-commons/tags/0.4/src/commons/networking.py	                        (rev 0)
+++ python-commons/tags/0.4/src/commons/networking.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -0,0 +1,74 @@
+# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
+# vim:ft=python:et:sw=4:ts=4
+
+"""
+Networking tools.
+"""
+
+import os, sys
+from time import *
+from contextlib import contextmanager
+
+class NoMacAddrError( Exception ): pass
+
+def get_mac_addr():
+    """
+    Simply parses the output of C{ifconfig} or C{ipconfig} to estimate
+    this machine's IP address. This is not at all reliable, but tends
+    to work "well enough" for my own purposes.
+
+    From U{http://mail.python.org/pipermail/python-list/2005-December/357300.html}.
+
+    @copyright: Frank Millman
+
+    Note that U{http://libdnet.sf.net/} provides this functionality and much
+    more.
+    """
+    mac = None
+    if sys.platform == 'win32':
+        for line in os.popen("ipconfig /all"):
+            if line.lstrip().startswith('Physical Address'):
+                mac = line.split(':')[1].strip().replace('-',':')
+                break
+    else:
+        for line in os.popen("/sbin/ifconfig"):
+            if line.find('Ether') > -1:
+                mac = line.split()[4]
+                break
+    if mac is None:
+        raise NoMacAddrError
+    return mac
+
+def retry_exp_backoff(initial_backoff, multiplier, func):
+    """
+    Repeatedly invoke L{func} until it succeeds (returns non-None), with
+    exponentially growing backoff delay between each try.
+
+    @param initial_backoff: The initial backoff.
+    @type initial_backoff: float
+
+    @param multiplier: The amount by which the backoff is multiplied on each
+    failure.
+    @type multiplier: float
+
+    @param func: The zero-argument function to be invoked that returns True on
+    success and False on failure.
+    @type func: function
+
+    @return: The result of the function
+    """
+    backoff = initial_backoff
+    while True:
+        res = func()
+        if res is not None: return res
+        print 'backing off for', backoff
+        sleep(backoff)
+        backoff = multiplier * backoff
+
+@contextmanager
+def logout(x):
+    """
+    A context manager for finally calling the C{logout()} method of an object.
+    """
+    try: yield x
+    finally: x.logout()

Deleted: python-commons/tags/0.4/src/commons/seqs.py
===================================================================
--- python-commons/trunk/src/commons/seqs.py	2008-04-24 16:13:04 UTC (rev 679)
+++ python-commons/tags/0.4/src/commons/seqs.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -1,357 +0,0 @@
-# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
-# vim:ft=python:et:sw=4:ts=4
-
-from __future__ import ( absolute_import, with_statement )
-
-from cStringIO import StringIO
-from cPickle import *
-from struct import pack, unpack
-from contextlib import closing
-from itertools import ( chain, count, ifilterfalse, islice,
-                        izip, tee )
-from .log import warning
-
-"""
-Sequences, streams, and generators.
-
-@var default_chunk_size: The default chunk size used by L{chunkify}.
-"""
-
-default_chunk_size = 8192
-
-def read_pickle( read, init = '', length_thresh = 100000 ):
-    """
-    Given a reader function L{read}, reads in pickled objects from it. I am a
-    generator which yields unpickled objects. I assume that the pickling
-    is "safe," done using L{safe_pickle}.
-
-    @param read: The reader function that reads from a stream. It should take
-    a single argument, the number of bytes to consume.
-    @type read: function
-
-    @return: A tuple whose first element is the deserialized object or None if
-    EOF was encountered, and whose second element is the remainder bytes until
-    the EOF that were not consumed by unpickling.
-    @rtype: (object, str)
-    """
-    with closing( StringIO() ) as sio:
-        obj = None # return this if we hit eof (not enough bytes read)
-        sio.write( init )
-
-        def read_until( target ):
-            remain = target - streamlen( sio )
-            if remain > 0:
-                chunk = read( remain )
-                # append to end
-                sio.seek(0,2)
-                sio.write( chunk )
-            offset = streamlen( sio )
-            sio.seek(0)
-            return offset >= target
-
-        if read_until(4):
-            lengthstr = sio.read(4)
-            (length,) = unpack('i4', lengthstr)
-            if length_thresh is not None and length > length_thresh or \
-                    length <= 0:
-                warning( 'read_pickle',
-                         'got length', length,
-                         'streamlen', streamlen(sio),
-                         'first bytes %x %x %x %x' % tuple(map(ord,lengthstr)) )
-            if read_until(length+4):
-                # start reading from right after header
-                sio.seek(4)
-                obj = load(sio)
-
-        return ( obj, sio.read() )
-
-def read_pickles( read ):
-    """
-    Reads all the consecutively pickled objects from the L{read} function.
-    """
-    while True:
-        pair = ( obj, rem ) = read_pickle( read )
-        if obj is None: break
-        yield pair
-
-class safe_pickler( object ):
-    def __init__( self, protocol = HIGHEST_PROTOCOL ):
-        self.sio = StringIO()
-        self.pickler = Pickler( self.sio, protocol )
-    def dumps( self, obj ):
-        """
-        Pickle L{obj} but prepends the serialized length in bytes.
-        """
-        self.pickler.clear_memo()
-        self.sio.seek(0)
-        self.pickler.dump(obj)
-        self.sio.truncate()
-        msg = self.sio.getvalue()
-        return pack('i4', self.sio.tell()) + msg
-
-def write_pickle( obj, write ):
-    """
-    Write L{obj} using function L{write}, in a safe, pickle-able fashion.
-    """
-    return write( safe_pickle( obj ) )
-
-def streamlen( stream ):
-    """
-    Get the length of a stream (e.g. file stream or StringIO).
-    Tries to restore the original position in the stream.
-    """
-    orig_pos = stream.tell()
-    stream.seek(0,2) # seek to 0 relative to eof
-    length = stream.tell() # get the position
-    stream.seek(orig_pos) # return to orig_pos
-    return length
-
-def chunkify( stream, chunk_size = default_chunk_size ):
-    """
-    Given an input stream (an object exposing a file-like interface),
-    reads data in from it one chunk at a time. This is a generator
-    which yields those chunks as they come.
-
-    @param stream: The input stream.
-    @type stream: stream
-
-    @param chunk_size: The size of the chunk (usually the number of
-    bytes to read).
-    @type chunk_size: int
-    """
-    offset = 0
-    while True:
-        chunk = stream.read( chunk_size )
-        if not chunk:
-            break
-        yield offset, chunk
-        offset += len( chunk )
-
-def total( iterable ):
-    """
-    Counts the number of items in an iterable. Note that this will
-    consume the elements of the iterable, and if the iterable is
-    infinite, this will not halt.
-
-    @param iterable: The iterable to count.
-    @type iterable
-
-    @return: The number of elements consumed.
-    @rtype: int
-    """
-    return sum( 1 for i in iterable )
-
-#class FilePersistence():
-#    def __init__( self ):
-#        
-#
-#class DbPersistence():
-#    def __init__( self ):
-#        
-
-class ClosedError( Exception ): pass
-
-class PersistentConsumedSeq( object ):
-    """
-    I generate C{[0, 1, ...]}, like L{count}, but I can also
-    save my state to disk. Similar to L{PersistentSeq}, but instead of
-    committing on each call to L{next}, require manual explicit calls
-    to L{commit}. I'm useful for generating unique IDs.
-
-    Why not simply use L{PersistentSeq} instead of me? You usually
-    can. However, some applications use me for efficiency. For
-    instance, consider an application that generates a lot of network
-    packets (with sequence numbers), but only sends a small fraction
-    of them out onto the network. If we only want to guarantee the
-    uniqueness of sequence numbers that are exposed to the world, we
-    need only commit when upon sending a packet, and not on generating
-    a packet (L{next}). This could avoid excessive writes.
-
-    @ivar seqno: The next sequence number to be generated.
-    @type seqno: int
-    """
-    def __init__( self, path ):
-        """
-        @param path: File to save my state in. I keep this file open.
-        @type path: str
-        """
-        try:
-            self.log = file( path, 'r+' )
-        except IOError, ex:
-            if ex.errno == 2:
-                self.log = file( path, 'w+' )
-            else:
-                raise
-        contents = self.log.read()
-        if len( contents ) > 0:
-            self.seqno = int( contents )
-        else:
-            self.seqno = 0
-        self.max_commit = self.seqno
-    def next( self ):
-        """
-        @return: The next number in the sequence.
-        @rtype: int
-
-        @throw ClosedError: If I was previously L{close}d.
-        """
-        if self.log is None:
-            raise ClosedError()
-        self.seqno += 1
-        return self.seqno - 1
-    def commit( self, seqno ):
-        """
-        @param seqno: If this is the maximum committed sequence
-        number, then commit this sequence number (to disk). The
-        semantics will get weird if you pass in sequence numbers that
-        haven't been generated yet.
-        
-        @type seqno: int
-
-        @return: The maximum sequence number ever committed (possibly
-        L{seqno}).
-        @rtype: int
-
-        @throw ClosedError: If I was previously L{close}d.
-        """
-        if self.log is None:
-            raise ClosedError()
-        if seqno > self.max_commit:
-            # TODO use a more flexible logging system that can switch
-            # between Python's logging module and Twisted's log module
-            self.max_commit = seqno
-            self.log.seek( 0 )
-            # yes I write +1 here
-            self.log.write( str( seqno + 1 ) )
-            self.log.truncate()
-            self.log.flush()
-        return self.max_commit
-    def close( self ):
-        """
-        Closes the log file. No more operations can be performed.
-        """
-        self.log.close()
-        self.log = None
-
-class PersistentSeq( PersistentConsumedSeq ):
-    """
-    I generate C{[0, 1, ...]}, like L{count}, but I can also
-    save my state to disk. I save my state immediately to disk on each
-    call to L{next}.
-    """
-    def __init__( self, path ):
-        """
-        @param path: File to save my state in. I keep this file open.
-        @type path: str
-        """
-        PersistentConsumedSeq.__init__( self, path )
-    def next( self ):
-        """
-        Generates the next number in the sequence and immediately
-        commits it.
-        """
-        cur = PersistentConsumedSeq.next( self )
-        self.commit( cur )
-        return cur
-
-def pairwise(iterable):
-    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
-    a, b = tee(iterable)
-    try:
-        b.next()
-    except StopIteration:
-        pass
-    return izip(a, b)
-
-def argmax(sequence, fn=None):
-    """Two usage patterns:
-    C{argmax([s0, s1, ...], fn)}
-    C{argmax([(fn(s0), s0), (fn(s1), s1), ...])}
-    Both return the si with greatest fn(si)"""
-    if fn is None:
-        return max(sequence)[1]
-    else:
-        return max((fn(e), e) for e in sequence)[1]
-
-def argmin(sequence, fn=None):
-    """Two usage patterns:
-    C{argmin([s0, s1, ...], fn)}
-    C{argmin([(fn(s0), s0), (fn(s1), s1), ...])}
-    Both return the si with smallest fn(si)"""
-    if fn is None:
-        return min(sequence)[1]
-    else:
-        return min((fn(e), e) for e in sequence)[1]
-
-def all(seq, pred=bool):
-    """
-    Returns C{True} if C{pred(x) is True} for every element in the
-    iterable
-    """
-    for elem in ifilterfalse(pred, seq):
-        return False
-    return True
-
-def concat(listOfLists):
-    return list(chain(*listOfLists))
-
-def flatten( stream ):
-    """
-    For each item yielded by L{gen}, if that item is itself an
-    iterator/generator, then I will recurse into C{flatten(gen)};
-    otherwise, I'll yield the yielded item. Thus, I essentially
-    "flatten" out a tree of iterators.
-
-    I test whether something is an iterator/generator simply by
-    checking to see if it has a C{next} attribute. Note that this
-    won't include any iterable, so things like L{list}s are yielded
-    like any regular item. This is my author's desired behavior!
-
-    I am useful for coroutines, a la DeferredGenerators from Twisted.
-
-    See also:
-    U{http://mail.python.org/pipermail/python-list/2003-October/232874.html}
-    """
-    for item in stream:
-        if hasattr( item, 'next' ):
-            for item in flatten( item ):
-                yield item
-        else:
-            yield item
-
-def grouper(n, iterable, padvalue=None):
-    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
-    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
-
-def chunker( n, iterable, in_place = False ):
-    """
-    Like L{grouper} but designed to scale for larger L{n}.  Also, does
-    not perform padding.  The end of the stream is reached when we
-    yield a chunk with fewer than L{n} items.
-    """
-    i = -1
-    chunk = [ None ] * n
-    for i, item in enumerate( iterable ):
-        chunk[ i % n ] = item
-        if ( i + 1 ) % n == 0:
-            yield chunk
-            if not in_place: chunk = [ None ] * n
-    else:
-        if i % n < n - 1:
-            del chunk[ ( i + 1 ) % n : ]
-            yield chunk
-
-def take(n, seq):
-    return list(islice(seq, n))
-
-def delimit(sep, xs):
-    for x in xs:
-        yield x
-        break
-    for x in xs:
-        yield sep
-        yield x
-
-# TODO not quite right
-def interleave(xs, ys):
-    return concat(izip( xs, ys ))

Copied: python-commons/tags/0.4/src/commons/seqs.py (from rev 707, python-commons/trunk/src/commons/seqs.py)
===================================================================
--- python-commons/tags/0.4/src/commons/seqs.py	                        (rev 0)
+++ python-commons/tags/0.4/src/commons/seqs.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -0,0 +1,366 @@
+# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
+# vim:ft=python:et:sw=4:ts=4
+
+from __future__ import ( absolute_import, with_statement )
+
+from cStringIO import StringIO
+from cPickle import *
+from struct import pack, unpack
+from contextlib import closing
+from itertools import ( chain, count, ifilterfalse, islice,
+                        izip, repeat, tee )
+from .log import warning
+
+"""
+Sequences, streams, and generators.
+
+@var default_chunk_size: The default chunk size used by L{chunkify}.
+"""
+
+default_chunk_size = 8192
+
+def read_pickle( read, init = '', length_thresh = 100000 ):
+    """
+    Given a reader function L{read}, reads in pickled objects from it. I am a
+    generator which yields unpickled objects. I assume that the pickling
+    is "safe," done using L{safe_pickle}.
+
+    @param read: The reader function that reads from a stream. It should take
+    a single argument, the number of bytes to consume.
+    @type read: function
+
+    @return: A tuple whose first element is the deserialized object or None if
+    EOF was encountered, and whose second element is the remainder bytes until
+    the EOF that were not consumed by unpickling.
+    @rtype: (object, str)
+    """
+    with closing( StringIO() ) as sio:
+        obj = None # return this if we hit eof (not enough bytes read)
+        sio.write( init )
+
+        def read_until( target ):
+            remain = target - streamlen( sio )
+            if remain > 0:
+                chunk = read( remain )
+                # append to end
+                sio.seek(0,2)
+                sio.write( chunk )
+            offset = streamlen( sio )
+            sio.seek(0)
+            return offset >= target
+
+        if read_until(4):
+            lengthstr = sio.read(4)
+            (length,) = unpack('i4', lengthstr)
+            if length_thresh is not None and length > length_thresh or \
+                    length <= 0:
+                warning( 'read_pickle',
+                         'got length', length,
+                         'streamlen', streamlen(sio),
+                         'first bytes %x %x %x %x' % tuple(map(ord,lengthstr)) )
+            if read_until(length+4):
+                # start reading from right after header
+                sio.seek(4)
+                obj = load(sio)
+
+        return ( obj, sio.read() )
+
+def read_pickles( read ):
+    """
+    Reads all the consecutively pickled objects from the L{read} function.
+    """
+    while True:
+        pair = ( obj, rem ) = read_pickle( read )
+        if obj is None: break
+        yield pair
+
+class safe_pickler( object ):
+    def __init__( self, protocol = HIGHEST_PROTOCOL ):
+        self.sio = StringIO()
+        self.pickler = Pickler( self.sio, protocol )
+    def dumps( self, obj ):
+        """
+        Pickle L{obj} but prepends the serialized length in bytes.
+        """
+        self.pickler.clear_memo()
+        self.sio.seek(0)
+        self.pickler.dump(obj)
+        self.sio.truncate()
+        msg = self.sio.getvalue()
+        return pack('i4', self.sio.tell()) + msg
+
+def write_pickle( obj, write ):
+    """
+    Write L{obj} using function L{write}, in a safe, pickle-able fashion.
+    """
+    return write( safe_pickle( obj ) )
+
+def streamlen( stream ):
+    """
+    Get the length of a stream (e.g. file stream or StringIO).
+    Tries to restore the original position in the stream.
+    """
+    orig_pos = stream.tell()
+    stream.seek(0,2) # seek to 0 relative to eof
+    length = stream.tell() # get the position
+    stream.seek(orig_pos) # return to orig_pos
+    return length
+
+def chunkify( stream, chunk_size = default_chunk_size ):
+    """
+    Given an input stream (an object exposing a file-like interface),
+    reads data in from it one chunk at a time. This is a generator
+    which yields those chunks as they come.
+
+    @param stream: The input stream.
+    @type stream: stream
+
+    @param chunk_size: The size of the chunk (usually the number of
+    bytes to read).
+    @type chunk_size: int
+    """
+    offset = 0
+    while True:
+        chunk = stream.read( chunk_size )
+        if not chunk:
+            break
+        yield offset, chunk
+        offset += len( chunk )
+
+def total( iterable ):
+    """
+    Counts the number of items in an iterable. Note that this will
+    consume the elements of the iterable, and if the iterable is
+    infinite, this will not halt.
+
+    @param iterable: The iterable to count.
+    @type iterable
+
+    @return: The number of elements consumed.
+    @rtype: int
+    """
+    return sum( 1 for i in iterable )
+
+#class FilePersistence():
+#    def __init__( self ):
+#        
+#
+#class DbPersistence():
+#    def __init__( self ):
+#        
+
+class ClosedError( Exception ): pass
+
+class PersistentConsumedSeq( object ):
+    """
+    I generate C{[0, 1, ...]}, like L{count}, but I can also
+    save my state to disk. Similar to L{PersistentSeq}, but instead of
+    committing on each call to L{next}, require manual explicit calls
+    to L{commit}. I'm useful for generating unique IDs.
+
+    Why not simply use L{PersistentSeq} instead of me? You usually
+    can. However, some applications use me for efficiency. For
+    instance, consider an application that generates a lot of network
+    packets (with sequence numbers), but only sends a small fraction
+    of them out onto the network. If we only want to guarantee the
+    uniqueness of sequence numbers that are exposed to the world, we
+    need only commit when upon sending a packet, and not on generating
+    a packet (L{next}). This could avoid excessive writes.
+
+    @ivar seqno: The next sequence number to be generated.
+    @type seqno: int
+    """
+    def __init__( self, path ):
+        """
+        @param path: File to save my state in. I keep this file open.
+        @type path: str
+        """
+        try:
+            self.log = file( path, 'r+' )
+        except IOError, ex:
+            if ex.errno == 2:
+                self.log = file( path, 'w+' )
+            else:
+                raise
+        contents = self.log.read()
+        if len( contents ) > 0:
+            self.seqno = int( contents )
+        else:
+            self.seqno = 0
+        self.max_commit = self.seqno
+    def next( self ):
+        """
+        @return: The next number in the sequence.
+        @rtype: int
+
+        @throw ClosedError: If I was previously L{close}d.
+        """
+        if self.log is None:
+            raise ClosedError()
+        self.seqno += 1
+        return self.seqno - 1
+    def commit( self, seqno ):
+        """
+        @param seqno: If this is the maximum committed sequence
+        number, then commit this sequence number (to disk). The
+        semantics will get weird if you pass in sequence numbers that
+        haven't been generated yet.
+        
+        @type seqno: int
+
+        @return: The maximum sequence number ever committed (possibly
+        L{seqno}).
+        @rtype: int
+
+        @throw ClosedError: If I was previously L{close}d.
+        """
+        if self.log is None:
+            raise ClosedError()
+        if seqno > self.max_commit:
+            # TODO use a more flexible logging system that can switch
+            # between Python's logging module and Twisted's log module
+            self.max_commit = seqno
+            self.log.seek( 0 )
+            # yes I write +1 here
+            self.log.write( str( seqno + 1 ) )
+            self.log.truncate()
+            self.log.flush()
+        return self.max_commit
+    def close( self ):
+        """
+        Closes the log file. No more operations can be performed.
+        """
+        self.log.close()
+        self.log = None
+
+class PersistentSeq( PersistentConsumedSeq ):
+    """
+    I generate C{[0, 1, ...]}, like L{count}, but I can also
+    save my state to disk. I save my state immediately to disk on each
+    call to L{next}.
+    """
+    def __init__( self, path ):
+        """
+        @param path: File to save my state in. I keep this file open.
+        @type path: str
+        """
+        PersistentConsumedSeq.__init__( self, path )
+    def next( self ):
+        """
+        Generates the next number in the sequence and immediately
+        commits it.
+        """
+        cur = PersistentConsumedSeq.next( self )
+        self.commit( cur )
+        return cur
+
+def pairwise(iterable):
+    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
+    a, b = tee(iterable)
+    try:
+        b.next()
+    except StopIteration:
+        pass
+    return izip(a, b)
+
+def argmax(sequence, fn=None):
+    """Two usage patterns:
+    C{argmax([s0, s1, ...], fn)}
+    C{argmax([(fn(s0), s0), (fn(s1), s1), ...])}
+    Both return the si with greatest fn(si)"""
+    if fn is None:
+        return max(sequence)[1]
+    else:
+        return max((fn(e), e) for e in sequence)[1]
+
+def argmin(sequence, fn=None):
+    """Two usage patterns:
+    C{argmin([s0, s1, ...], fn)}
+    C{argmin([(fn(s0), s0), (fn(s1), s1), ...])}
+    Both return the si with smallest fn(si)"""
+    if fn is None:
+        return min(sequence)[1]
+    else:
+        return min((fn(e), e) for e in sequence)[1]
+
+def all(seq, pred=bool):
+    """
+    Returns C{True} if C{pred(x) is True} for every element in the
+    iterable
+    """
+    for elem in ifilterfalse(pred, seq):
+        return False
+    return True
+
+def concat(listOfLists):
+    return list(chain(*listOfLists))
+
+def flatten( stream ):
+    """
+    For each item yielded by L{gen}, if that item is itself an
+    iterator/generator, then I will recurse into C{flatten(gen)};
+    otherwise, I'll yield the yielded item. Thus, I essentially
+    "flatten" out a tree of iterators.
+
+    I test whether something is an iterator/generator simply by
+    checking to see if it has a C{next} attribute. Note that this
+    won't include any iterable, so things like L{list}s are yielded
+    like any regular item. This is my author's desired behavior!
+
+    I am useful for coroutines, a la DeferredGenerators from Twisted.
+
+    See also:
+    U{http://mail.python.org/pipermail/python-list/2003-October/232874.html}
+    """
+    for item in stream:
+        if hasattr( item, 'next' ):
+            for item in flatten( item ):
+                yield item
+        else:
+            yield item
+
+def grouper(n, iterable, padvalue=None):
+    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
+    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)
+
+def chunker( n, iterable, in_place = False ):
+    """
+    Like L{grouper} but designed to scale for larger L{n}.  Also, does
+    not perform padding.  The end of the stream is reached when we
+    yield a chunk with fewer than L{n} items.
+    """
+    i = -1
+    chunk = [ None ] * n
+    for i, item in enumerate( iterable ):
+        chunk[ i % n ] = item
+        if ( i + 1 ) % n == 0:
+            yield chunk
+            if not in_place: chunk = [ None ] * n
+    else:
+        if i % n < n - 1:
+            del chunk[ ( i + 1 ) % n : ]
+            yield chunk
+
+def countstep(start, step):
+    """
+    Generate [start, start+step, start+2*step, start+3*step, ...].
+    """
+    i = start
+    while True:
+        yield i
+        i += step
+
+def take(n, seq):
+    return list(islice(seq, n))
+
+def delimit(sep, xs):
+    for x in xs:
+        yield x
+        break
+    for x in xs:
+        yield sep
+        yield x
+
+# TODO not quite right
+def interleave(xs, ys):
+    return concat(izip( xs, ys ))

Deleted: python-commons/tags/0.4/src/commons/setup.py
===================================================================
--- python-commons/trunk/src/commons/setup.py	2008-04-24 16:13:04 UTC (rev 679)
+++ python-commons/tags/0.4/src/commons/setup.py	2008-05-08 08:29:25 UTC (rev 724)
@@ -1,98 +0,0 @@
-#!/usr/bin/env python
-# -*- mode: python; tab-width: 4; indent-tabs-mode: nil; py-indent-offset: 4; -*-
-# vim:ft=python:et:sw=4:ts=4
-
-"""
-Common code for setup.py files.
-"""
-
-arg_keys = """
-name
-version
-author
-author_email
-description: Summary
-download_url: Download-url
-long_description: Description
-keywords: Keywords
-url: Home-page
-license
-classifiers: Classifier
-platforms: Platform
-"""
-
-import sys
-if not hasattr(sys, "version_info") or sys.version_info < (2, 3):
-    from distutils.core import setup
-    _setup = setup
-    def setup(**kwargs):
-        for key in [
-            # distutils >= Python 2.3 args
-            # XXX probably download_url came...
 
[truncated message content]

[Assorted-commits] SF.net SVN: assorted: [723] python-commons/trunk

From: <yan...@us...> - 2008-05-08 08:27:31

Revision: 723
          http://assorted.svn.sourceforge.net/assorted/?rev=723&view=rev
Author:   yangzhang
Date:     2008-05-08 01:27:37 -0700 (Thu, 08 May 2008)

Log Message:
-----------
updated version

Modified Paths:
--------------
    python-commons/trunk/README
    python-commons/trunk/publish.bash
    python-commons/trunk/setup.py

Modified: python-commons/trunk/README
===================================================================
--- python-commons/trunk/README	2008-05-08 08:26:56 UTC (rev 722)
+++ python-commons/trunk/README	2008-05-08 08:27:37 UTC (rev 723)
@@ -37,9 +37,14 @@
 Changes
 -------
 
-version 0.3.1
+version 0.4
 
 - removed extraneous debug print statements
+- added `logout()` context manager
+- added `seq()`, `default_if_none()`
+- fixed missing `import` bug
+- released for [Mailing List
+  Filter](http://assorted.sf.net/mailing-list-filter/)
 
 version 0.3
 

Modified: python-commons/trunk/publish.bash
===================================================================
--- python-commons/trunk/publish.bash	2008-05-08 08:26:56 UTC (rev 722)
+++ python-commons/trunk/publish.bash	2008-05-08 08:27:37 UTC (rev 723)
@@ -5,7 +5,7 @@
 }
 
 fullname='Python Commons'
-version=0.3.1
+version=0.4
 license=psf
 websrcs=( README )
 rels=( pypi: )

Modified: python-commons/trunk/setup.py
===================================================================
--- python-commons/trunk/setup.py	2008-05-08 08:26:56 UTC (rev 722)
+++ python-commons/trunk/setup.py	2008-05-08 08:27:37 UTC (rev 723)
@@ -9,7 +9,7 @@
 pkg_info_text = """
 Metadata-Version: 1.1
 Name: python-commons
-Version: 0.3.1
+Version: 0.4
 Author: Yang Zhang
 Author-email: yaaang NOSPAM at REMOVECAPS gmail
 Home-page: http://assorted.sourceforge.net/python-commons


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [722] mailing-list-filter/trunk/src/mlf.py

From: <yan...@us...> - 2008-05-08 08:26:51

Revision: 722
          http://assorted.svn.sourceforge.net/assorted/?rev=722&view=rev
Author:   yangzhang
Date:     2008-05-08 01:26:56 -0700 (Thu, 08 May 2008)

Log Message:
-----------
amoved ideas list to README

Modified Paths:
--------------
    mailing-list-filter/trunk/src/mlf.py

Modified: mailing-list-filter/trunk/src/mlf.py
===================================================================
--- mailing-list-filter/trunk/src/mlf.py	2008-05-08 08:26:38 UTC (rev 721)
+++ mailing-list-filter/trunk/src/mlf.py	2008-05-08 08:26:56 UTC (rev 722)
@@ -6,15 +6,6 @@
 is performed via the In-Reply-To and References headers.
 """
 
-# Currently, we assume that the server specification points to a mailbox
-# containing all messages (both sent and received), and a message is determined
-# to have been sent by you by looking at the From: header field. This should
-# work well with Gmail. An alternative strategy is to look through two folders,
-# one that's the Inbox and one that's the Sent mailbox, and treat all messages
-# in Sent as having been sent by you.
-#
-# Possible future tasks: implement incremental maintenance of local cache.
-
 from __future__ import with_statement
 from collections import defaultdict
 from email import message_from_string


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [721] mailing-list-filter/trunk/setup.py

From: <yan...@us...> - 2008-05-08 08:26:32

Revision: 721
          http://assorted.svn.sourceforge.net/assorted/?rev=721&view=rev
Author:   yangzhang
Date:     2008-05-08 01:26:38 -0700 (Thu, 08 May 2008)

Log Message:
-----------
accounting for rename

Modified Paths:
--------------
    mailing-list-filter/trunk/setup.py

Modified: mailing-list-filter/trunk/setup.py
===================================================================
--- mailing-list-filter/trunk/setup.py	2008-05-08 08:04:04 UTC (rev 720)
+++ mailing-list-filter/trunk/setup.py	2008-05-08 08:26:38 UTC (rev 721)
@@ -25,4 +25,4 @@
 Classifier: Topic :: Communications :: Email
 """
 
-run_setup(pkg_info_text, scripts = ['src/filter.py'])
+run_setup(pkg_info_text, scripts = ['src/mlf.py'])


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [720] mailing-list-filter/trunk/README

From: <yan...@us...> - 2008-05-08 08:04:03

Revision: 720
          http://assorted.svn.sourceforge.net/assorted/?rev=720&view=rev
Author:   yangzhang
Date:     2008-05-08 01:04:04 -0700 (Thu, 08 May 2008)

Log Message:
-----------
added todos and setup to readme

Modified Paths:
--------------
    mailing-list-filter/trunk/README

Modified: mailing-list-filter/trunk/README
===================================================================
--- mailing-list-filter/trunk/README	2008-05-08 08:03:44 UTC (rev 719)
+++ mailing-list-filter/trunk/README	2008-05-08 08:04:04 UTC (rev 720)
@@ -1,6 +1,3 @@
-% Mailing List Filter
-% Yang Zhang
-
 Overview
 --------
 
@@ -35,3 +32,32 @@
 keep those active so that I can get immediate first-pass filtering. I execute
 this script on a daily basis to perform second-pass filtering/unfiltering to
 catch those false negatives that may have been missed.
+
+Setup
+-----
+
+Requirements:
+
+- [argparse](http://argparse.python-hosting.com/)
+- [Python Commons](http://assorted.sf.net/python-commons/) 0.4
+- [path](http://www.jorendorff.com/articles/python/path/)
+
+Install the program using the standard `setup.py` program.
+
+Future Work Ideas
+-----------------
+
+- Currently, we assume that the server specification points to a mailbox
+  containing all messages (both sent and received), and a message is determined
+  to have been sent by you by looking at the From: header field. This works
+  well with Gmail. An alternative strategy is to look through two folders, one
+  that's the Inbox and one that's the Sent mailbox, and treat all messages in
+  Sent as having been sent by you. This is presumably how most other IMAP
+  servers work.
+
+- Implement incremental maintenance of local cache.
+
+- Accept custom operations for filtered/unfiltered messages
+  (trashing/untrashing, labeling/unlabeling, etc.).
+
+- Refactor the message fetching/management part out into its own library.


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [719] python-commons/trunk/src/commons/setup.py

From: <yan...@us...> - 2008-05-08 08:03:39

Revision: 719
          http://assorted.svn.sourceforge.net/assorted/?rev=719&view=rev
Author:   yangzhang
Date:     2008-05-08 01:03:44 -0700 (Thu, 08 May 2008)

Log Message:
-----------
added note on trove classifier reference

Modified Paths:
--------------
    python-commons/trunk/src/commons/setup.py

Modified: python-commons/trunk/src/commons/setup.py
===================================================================
--- python-commons/trunk/src/commons/setup.py	2008-05-08 07:48:18 UTC (rev 718)
+++ python-commons/trunk/src/commons/setup.py	2008-05-08 08:03:44 UTC (rev 719)
@@ -3,7 +3,9 @@
 # vim:ft=python:et:sw=4:ts=4
 
 """
-Common code for setup.py files.
+Common code for setup.py files.  Details about the Trove classifiers are
+available at
+U{http://pypi.python.org/pypi?%3Aaction=list_classifiers}.
 """
 
 arg_keys = """


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [718] mailing-list-filter/trunk/src

From: <yan...@us...> - 2008-05-08 07:48:12

Revision: 718
          http://assorted.svn.sourceforge.net/assorted/?rev=718&view=rev
Author:   yangzhang
Date:     2008-05-08 00:48:18 -0700 (Thu, 08 May 2008)

Log Message:
-----------
renamed filter script

Added Paths:
-----------
    mailing-list-filter/trunk/src/mlf.py

Removed Paths:
-------------
    mailing-list-filter/trunk/src/filter.py

Deleted: mailing-list-filter/trunk/src/filter.py
===================================================================
--- mailing-list-filter/trunk/src/filter.py	2008-05-08 06:55:12 UTC (rev 717)
+++ mailing-list-filter/trunk/src/filter.py	2008-05-08 07:48:18 UTC (rev 718)
@@ -1,245 +0,0 @@
-#!/usr/bin/env python
-
-"""
-Given a Gmail IMAP mailbox, star all messages in which you were a participant
-(either a sender or an explicit recipient in To: or Cc:), where thread grouping
-is performed via the In-Reply-To and References headers.
-"""
-
-# Currently, we assume that the server specification points to a mailbox
-# containing all messages (both sent and received), and a message is determined
-# to have been sent by you by looking at the From: header field. This should
-# work well with Gmail. An alternative strategy is to look through two folders,
-# one that's the Inbox and one that's the Sent mailbox, and treat all messages
-# in Sent as having been sent by you.
-#
-# Possible future tasks: implement incremental maintenance of local cache.
-
-from __future__ import with_statement
-from collections import defaultdict
-from email import message_from_string
-from getpass import getpass
-from imaplib import IMAP4_SSL
-from argparse import ArgumentParser
-from path import path
-from re import match
-from functools import partial
-from itertools import count
-from commons.decs import pickle_memoized
-from commons.files import cleanse_filename, soft_makedirs
-from commons.log import *
-from commons.misc import default_if_none, seq
-from commons.networking import logout
-from commons.seqs import concat, grouper
-from commons.startup import run_main
-from contextlib import closing
-import logging
-from commons import log
-
-info    = partial(log.info,    'main')
-debug   = partial(log.debug,   'main')
-warning = partial(log.warning, 'main')
-error   = partial(log.error,   'main')
-die     = partial(log.die,     'main')
-
-def thread_dfs(msg, tid, tid2msgs):
-  assert msg.tid is None
-  msg.tid = tid
-  tid2msgs[tid].append(msg)
-  for ref in msg.refs:
-    if ref.tid is None:
-      thread_dfs(ref, tid, tid2msgs)
-    else:
-      assert ref.tid == tid
-
-def getmail(imap):
-  info( 'finding max UID' )
-  # We use UIDs rather than the default of sequence numbers because UIDs are
-  # guaranteed to be persistent across sessions.  This means that we can, for
-  # instance, fetch messages in one session and operate on this locally cached
-  # data before marking messages in a separate session.
-  ok, [uids] = imap.uid('SEARCH', None, 'ALL')
-  maxuid = int( uids.split()[-1] )
-  del uids
-
-  info( 'actually fetching the messages in chunks up to max', maxuid )
-  # The syntax/fields of the FETCH command is documented in RFC 2060.  Also,
-  # this article contains a brief overview:
-  # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/
-  # BODY.PEEK prevents the message from automatically being flagged as \Seen.
-  query = '(FLAGS BODY.PEEK[HEADER.FIELDS ' \
-          '(Message-ID References In-Reply-To From To Cc Subject)])'
-  step = 1000
-  return list( concat(
-    seq( lambda: info('fetching', start, 'to', start + step - 1),
-         lambda: imap.uid('FETCH', '%d:%d' % (start, start + step - 1),
-                          query)[1] )
-    for start in xrange(1, maxuid + 1, step) ) )
-
-def main(argv):
-  p = ArgumentParser(description = __doc__)
-  p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(),
-      help = """File containing your login credentials, with the username on the
-      first line and the password on the second line.  Ignored iff --prompt.""")
-  p.add_argument('--cachedir', default = path( '~/.mlf.cache' ).expanduser(),
-      help = "Directory to use for caching our data.")
-  p.add_argument('--prompt', action = 'store_true',
-      help = "Interactively prompt for the username and password.")
-  p.add_argument('--pretend', action = 'store_true',
-      help = """Do not actually carry out any updates to the server. Use in
-      conjunction with --debug to observe what would happen.""")
-  p.add_argument('--no-mark-unseen', action = 'store_true',
-      help = "Do not mark newly revelant threads as unread.")
-  p.add_argument('--no-mark-seen', action = 'store_true',
-      help = "Do not mark newly irrevelant threads as read.")
-  p.add_argument('--debug', action = 'append',
-      help = """Enable logging for messages of the given flags. Flags include:
-      refs (references to missing Message-IDs), dups (duplicate Message-IDs),
-      main (the main program logic), and star (which messages are being
-      starred), unstar (which messages are being unstarred).""")
-  p.add_argument('sender',
-      help = "Your email address.")
-  p.add_argument('server',
-      help = "The server in the format: <host>[:<port>][/<mailbox>].")
-
-  cfg = p.parse_args(argv[1:])
-
-  config_logging(level = logging.ERROR, do_console = True, flags = cfg.debug)
-
-  if cfg.prompt:
-    print "username:",
-    cfg.user = raw_input()
-    print "password:",
-    cfg.passwd = getpass()
-  else:
-    with file(cfg.credfile) as f:
-      [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines())
-
-  try:
-    m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$',
-               cfg.server )
-    cfg.host = m.group('host')
-    cfg.port = int( default_if_none(m.group('port'), 993) )
-    cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX')
-  except:
-    p.error('Need to specify the server in the correct format.')
-
-  soft_makedirs(cfg.cachedir)
-
-  with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap:
-    imap.login(cfg.user, cfg.passwd)
-    # Close is only valid in the authenticated state.
-    with closing(imap) as imap:
-      # Select the main mailbox (INBOX).
-      imap.select(cfg.mailbox)
-
-      # Fetch message IDs, references, and senders.
-      xs = pickle_memoized \
-          (lambda imap: cfg.cachedir / cleanse_filename(cfg.sender)) \
-          (getmail) \
-          (imap)
-
-      log.debug('fetched', xs)
-
-      info('building message-id map and determining the set of messages sent '
-           'by you or addressed to you (the "source set")')
-
-      srcs = []
-      mid2msg = {}
-      # Every second item is just a closing paren.
-      # Example data:
-      # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}',
-      #   'Message-ID: <mai...@py...>\r\n\r\n'),
-      #  ')',
-      #  ('13301 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {59}',
-      #   'Message-Id: <200...@hv...>\r\n\r\n'),
-      #  ')',
-      #  ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}',
-      #   'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')]
-      for (envelope, data), paren in grouper(2, xs):
-        # Parse the body.
-        msg = message_from_string(data)
-
-        # Parse the envelope.
-        m = match(
-            r"(?P<seqno>\d+) \(UID (?P<uid>\d+) FLAGS \((?P<flags>[^)]+)\)",
-            envelope )
-        msg.seqno = m.group('seqno')
-        msg.uid   = m.group('uid')
-        msg.flags = m.group('flags').split()
-
-        # Prepare a container for references to other msgs, and initialize the
-        # thread ID.
-        msg.refs = []
-        msg.tid = None
-
-        # Add these to the map.
-        if msg['Message-ID'] in mid2msg:
-          log.warning( 'dups', 'duplicate message IDs:',
-                        msg['Message-ID'], msg['Subject'] )
-        mid2msg[ msg['Message-ID'] ] = msg
-
-        # Add to "srcs" set if sent by us or addressed to us.
-        if ( cfg.sender in default_if_none( msg['From'], '' ) or
-             cfg.sender in default_if_none( msg['To'],   '' ) or
-             cfg.sender in default_if_none( msg['Cc'],   '' ) ):
-          srcs.append( msg )
-
-      info( 'constructing undirected graph' )
-
-      for mid, msg in mid2msg.iteritems():
-        # Extract any references.
-        irt   = default_if_none( msg.get_all('In-Reply-To'), [] )
-        refs  = default_if_none( msg.get_all('References'), [] )
-        refs  = set( ' '.join( irt + refs ).replace('><', '> <').split() )
-
-        # Connect nodes in graph bidirectionally.  Ignore references to MIDs
-        # that don't exist.
-        for ref in refs:
-          try:
-            refmsg = mid2msg[ref]
-            # We can use lists/append (not worry about duplicates) because the
-            # original sources should be acyclic.  If a -> b, then there is no b ->
-            # a, so when crawling a we can add a <-> b without worrying that later
-            # we may re-add b -> a.
-            msg.refs.append(refmsg)
-            refmsg.refs.append(msg)
-          except:
-            log.warning( 'refs', ref )
-
-      info('finding connected components (grouping the messages into threads)')
-
-      tids = count()
-      tid2msgs = defaultdict(list)
-      for mid, msg in mid2msg.iteritems():
-        if msg.tid is None:
-          thread_dfs(msg, tids.next(), tid2msgs)
-
-      info( 'starring the relevant threads, in which I am a participant' )
-
-      rel_tids = set()
-      for srcmsg in srcs:
-        if srcmsg.tid not in rel_tids:
-          rel_tids.add(srcmsg.tid)
-          for msg in tid2msgs[srcmsg.tid]:
-            if r'\Flagged' not in msg.flags:
-              log.info( 'star', '\n', msg )
-              if not cfg.pretend:
-                imap.uid('STORE', msg.uid, '+FLAGS', r'\Flagged')
-                if not cfg.no_mark_unseen and r'\Seen' in msg.flags:
-                  imap.uid('STORE', msg.uid, '-FLAGS', r'\Seen')
-
-      info( 'unstarring irrelevant threads, in which I am not a participant' )
-
-      all_tids = set( tid2msgs.iterkeys() )
-      irrel_tids = all_tids - rel_tids
-      for tid in irrel_tids:
-        for msg in tid2msgs[tid]:
-          if r'\Flagged' in msg.flags:
-            log.info( 'unstar', '\n', msg )
-            if not cfg.pretend:
-              imap.uid('STORE', msg.uid, '-FLAGS', r'\Flagged')
-              if not cfg.no_mark_seen and r'\Seen' not in msg.flags:
-                imap.uid('STORE', msg.uid, '+FLAGS', r'\Seen')
-
-run_main()

Copied: mailing-list-filter/trunk/src/mlf.py (from rev 716, mailing-list-filter/trunk/src/filter.py)
===================================================================
--- mailing-list-filter/trunk/src/mlf.py	                        (rev 0)
+++ mailing-list-filter/trunk/src/mlf.py	2008-05-08 07:48:18 UTC (rev 718)
@@ -0,0 +1,245 @@
+#!/usr/bin/env python
+
+"""
+Given a Gmail IMAP mailbox, star all messages in which you were a participant
+(either a sender or an explicit recipient in To: or Cc:), where thread grouping
+is performed via the In-Reply-To and References headers.
+"""
+
+# Currently, we assume that the server specification points to a mailbox
+# containing all messages (both sent and received), and a message is determined
+# to have been sent by you by looking at the From: header field. This should
+# work well with Gmail. An alternative strategy is to look through two folders,
+# one that's the Inbox and one that's the Sent mailbox, and treat all messages
+# in Sent as having been sent by you.
+#
+# Possible future tasks: implement incremental maintenance of local cache.
+
+from __future__ import with_statement
+from collections import defaultdict
+from email import message_from_string
+from getpass import getpass
+from imaplib import IMAP4_SSL
+from argparse import ArgumentParser
+from path import path
+from re import match
+from functools import partial
+from itertools import count
+from commons.decs import pickle_memoized
+from commons.files import cleanse_filename, soft_makedirs
+from commons.log import *
+from commons.misc import default_if_none, seq
+from commons.networking import logout
+from commons.seqs import concat, grouper
+from commons.startup import run_main
+from contextlib import closing
+import logging
+from commons import log
+
+info    = partial(log.info,    'main')
+debug   = partial(log.debug,   'main')
+warning = partial(log.warning, 'main')
+error   = partial(log.error,   'main')
+die     = partial(log.die,     'main')
+
+def thread_dfs(msg, tid, tid2msgs):
+  assert msg.tid is None
+  msg.tid = tid
+  tid2msgs[tid].append(msg)
+  for ref in msg.refs:
+    if ref.tid is None:
+      thread_dfs(ref, tid, tid2msgs)
+    else:
+      assert ref.tid == tid
+
+def getmail(imap):
+  info( 'finding max UID' )
+  # We use UIDs rather than the default of sequence numbers because UIDs are
+  # guaranteed to be persistent across sessions.  This means that we can, for
+  # instance, fetch messages in one session and operate on this locally cached
+  # data before marking messages in a separate session.
+  ok, [uids] = imap.uid('SEARCH', None, 'ALL')
+  maxuid = int( uids.split()[-1] )
+  del uids
+
+  info( 'actually fetching the messages in chunks up to max', maxuid )
+  # The syntax/fields of the FETCH command is documented in RFC 2060.  Also,
+  # this article contains a brief overview:
+  # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/
+  # BODY.PEEK prevents the message from automatically being flagged as \Seen.
+  query = '(FLAGS BODY.PEEK[HEADER.FIELDS ' \
+          '(Message-ID References In-Reply-To From To Cc Subject)])'
+  step = 1000
+  return list( concat(
+    seq( lambda: info('fetching', start, 'to', start + step - 1),
+         lambda: imap.uid('FETCH', '%d:%d' % (start, start + step - 1),
+                          query)[1] )
+    for start in xrange(1, maxuid + 1, step) ) )
+
+def main(argv):
+  p = ArgumentParser(description = __doc__)
+  p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(),
+      help = """File containing your login credentials, with the username on the
+      first line and the password on the second line.  Ignored iff --prompt.""")
+  p.add_argument('--cachedir', default = path( '~/.mlf.cache' ).expanduser(),
+      help = "Directory to use for caching our data.")
+  p.add_argument('--prompt', action = 'store_true',
+      help = "Interactively prompt for the username and password.")
+  p.add_argument('--pretend', action = 'store_true',
+      help = """Do not actually carry out any updates to the server. Use in
+      conjunction with --debug to observe what would happen.""")
+  p.add_argument('--no-mark-unseen', action = 'store_true',
+      help = "Do not mark newly revelant threads as unread.")
+  p.add_argument('--no-mark-seen', action = 'store_true',
+      help = "Do not mark newly irrevelant threads as read.")
+  p.add_argument('--debug', action = 'append',
+      help = """Enable logging for messages of the given flags. Flags include:
+      refs (references to missing Message-IDs), dups (duplicate Message-IDs),
+      main (the main program logic), and star (which messages are being
+      starred), unstar (which messages are being unstarred).""")
+  p.add_argument('sender',
+      help = "Your email address.")
+  p.add_argument('server',
+      help = "The server in the format: <host>[:<port>][/<mailbox>].")
+
+  cfg = p.parse_args(argv[1:])
+
+  config_logging(level = logging.ERROR, do_console = True, flags = cfg.debug)
+
+  if cfg.prompt:
+    print "username:",
+    cfg.user = raw_input()
+    print "password:",
+    cfg.passwd = getpass()
+  else:
+    with file(cfg.credfile) as f:
+      [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines())
+
+  try:
+    m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$',
+               cfg.server )
+    cfg.host = m.group('host')
+    cfg.port = int( default_if_none(m.group('port'), 993) )
+    cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX')
+  except:
+    p.error('Need to specify the server in the correct format.')
+
+  soft_makedirs(cfg.cachedir)
+
+  with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap:
+    imap.login(cfg.user, cfg.passwd)
+    # Close is only valid in the authenticated state.
+    with closing(imap) as imap:
+      # Select the main mailbox (INBOX).
+      imap.select(cfg.mailbox)
+
+      # Fetch message IDs, references, and senders.
+      xs = pickle_memoized \
+          (lambda imap: cfg.cachedir / cleanse_filename(cfg.sender)) \
+          (getmail) \
+          (imap)
+
+      log.debug('fetched', xs)
+
+      info('building message-id map and determining the set of messages sent '
+           'by you or addressed to you (the "source set")')
+
+      srcs = []
+      mid2msg = {}
+      # Every second item is just a closing paren.
+      # Example data:
+      # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}',
+      #   'Message-ID: <mai...@py...>\r\n\r\n'),
+      #  ')',
+      #  ('13301 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {59}',
+      #   'Message-Id: <200...@hv...>\r\n\r\n'),
+      #  ')',
+      #  ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}',
+      #   'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')]
+      for (envelope, data), paren in grouper(2, xs):
+        # Parse the body.
+        msg = message_from_string(data)
+
+        # Parse the envelope.
+        m = match(
+            r"(?P<seqno>\d+) \(UID (?P<uid>\d+) FLAGS \((?P<flags>[^)]+)\)",
+            envelope )
+        msg.seqno = m.group('seqno')
+        msg.uid   = m.group('uid')
+        msg.flags = m.group('flags').split()
+
+        # Prepare a container for references to other msgs, and initialize the
+        # thread ID.
+        msg.refs = []
+        msg.tid = None
+
+        # Add these to the map.
+        if msg['Message-ID'] in mid2msg:
+          log.warning( 'dups', 'duplicate message IDs:',
+                        msg['Message-ID'], msg['Subject'] )
+        mid2msg[ msg['Message-ID'] ] = msg
+
+        # Add to "srcs" set if sent by us or addressed to us.
+        if ( cfg.sender in default_if_none( msg['From'], '' ) or
+             cfg.sender in default_if_none( msg['To'],   '' ) or
+             cfg.sender in default_if_none( msg['Cc'],   '' ) ):
+          srcs.append( msg )
+
+      info( 'constructing undirected graph' )
+
+      for mid, msg in mid2msg.iteritems():
+        # Extract any references.
+        irt   = default_if_none( msg.get_all('In-Reply-To'), [] )
+        refs  = default_if_none( msg.get_all('References'), [] )
+        refs  = set( ' '.join( irt + refs ).replace('><', '> <').split() )
+
+        # Connect nodes in graph bidirectionally.  Ignore references to MIDs
+        # that don't exist.
+        for ref in refs:
+          try:
+            refmsg = mid2msg[ref]
+            # We can use lists/append (not worry about duplicates) because the
+            # original sources should be acyclic.  If a -> b, then there is no b ->
+            # a, so when crawling a we can add a <-> b without worrying that later
+            # we may re-add b -> a.
+            msg.refs.append(refmsg)
+            refmsg.refs.append(msg)
+          except:
+            log.warning( 'refs', ref )
+
+      info('finding connected components (grouping the messages into threads)')
+
+      tids = count()
+      tid2msgs = defaultdict(list)
+      for mid, msg in mid2msg.iteritems():
+        if msg.tid is None:
+          thread_dfs(msg, tids.next(), tid2msgs)
+
+      info( 'starring the relevant threads, in which I am a participant' )
+
+      rel_tids = set()
+      for srcmsg in srcs:
+        if srcmsg.tid not in rel_tids:
+          rel_tids.add(srcmsg.tid)
+          for msg in tid2msgs[srcmsg.tid]:
+            if r'\Flagged' not in msg.flags:
+              log.info( 'star', '\n', msg )
+              if not cfg.pretend:
+                imap.uid('STORE', msg.uid, '+FLAGS', r'\Flagged')
+                if not cfg.no_mark_unseen and r'\Seen' in msg.flags:
+                  imap.uid('STORE', msg.uid, '-FLAGS', r'\Seen')
+
+      info( 'unstarring irrelevant threads, in which I am not a participant' )
+
+      all_tids = set( tid2msgs.iterkeys() )
+      irrel_tids = all_tids - rel_tids
+      for tid in irrel_tids:
+        for msg in tid2msgs[tid]:
+          if r'\Flagged' in msg.flags:
+            log.info( 'unstar', '\n', msg )
+            if not cfg.pretend:
+              imap.uid('STORE', msg.uid, '-FLAGS', r'\Flagged')
+              if not cfg.no_mark_seen and r'\Seen' not in msg.flags:
+                imap.uid('STORE', msg.uid, '+FLAGS', r'\Seen')
+
+run_main()


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [717] mailing-list-filter/trunk/publish.bash

From: <yan...@us...> - 2008-05-08 06:55:12

Revision: 717
          http://assorted.svn.sourceforge.net/assorted/?rev=717&view=rev
Author:   yangzhang
Date:     2008-05-07 23:55:12 -0700 (Wed, 07 May 2008)

Log Message:
-----------
removed epydoc since this is not a lib

Modified Paths:
--------------
    mailing-list-filter/trunk/publish.bash

Modified: mailing-list-filter/trunk/publish.bash
===================================================================
--- mailing-list-filter/trunk/publish.bash	2008-05-08 06:54:53 UTC (rev 716)
+++ mailing-list-filter/trunk/publish.bash	2008-05-08 06:55:12 UTC (rev 717)
@@ -1,9 +1,5 @@
 #!/usr/bin/env bash
 
-post-stage() {
-  epydoc -o $stagedir/doc src/commons/
-}
-
 fullname='Mailing List Filter'
 version=0.1
 license=psf


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [716] mailing-list-filter/trunk/src/filter.py

From: <yan...@us...> - 2008-05-08 06:54:52

Revision: 716
          http://assorted.svn.sourceforge.net/assorted/?rev=716&view=rev
Author:   yangzhang
Date:     2008-05-07 23:54:53 -0700 (Wed, 07 May 2008)

Log Message:
-----------
it's alive! ready to release...

Modified Paths:
--------------
    mailing-list-filter/trunk/src/filter.py

Modified: mailing-list-filter/trunk/src/filter.py
===================================================================
--- mailing-list-filter/trunk/src/filter.py	2008-05-08 06:54:29 UTC (rev 715)
+++ mailing-list-filter/trunk/src/filter.py	2008-05-08 06:54:53 UTC (rev 716)
@@ -1,18 +1,20 @@
 #!/usr/bin/env python
 
 """
-Given an IMAP mailbox, mark all messages as read except for those threads in
-which you were a participant, where thread grouping is performed via the
-In-Reply-To and References headers.
-
-Currently, we assume that the server specification points to a mailbox
-containing all messages (both sent and received), and a message is determined
-to have been sent by you by looking at the From: header field. This should work
-well with Gmail. An alternative strategy is to look through two folders, one
-that's the Inbox and one that's the Sent mailbox, and treat all messages in
-Sent as having been sent by you.
+Given a Gmail IMAP mailbox, star all messages in which you were a participant
+(either a sender or an explicit recipient in To: or Cc:), where thread grouping
+is performed via the In-Reply-To and References headers.
 """
 
+# Currently, we assume that the server specification points to a mailbox
+# containing all messages (both sent and received), and a message is determined
+# to have been sent by you by looking at the From: header field. This should
+# work well with Gmail. An alternative strategy is to look through two folders,
+# one that's the Inbox and one that's the Sent mailbox, and treat all messages
+# in Sent as having been sent by you.
+#
+# Possible future tasks: implement incremental maintenance of local cache.
+
 from __future__ import with_statement
 from collections import defaultdict
 from email import message_from_string
@@ -22,41 +24,59 @@
 from path import path
 from re import match
 from functools import partial
+from itertools import count
 from commons.decs import pickle_memoized
+from commons.files import cleanse_filename, soft_makedirs
 from commons.log import *
-from commons.files import cleanse_filename, soft_makedirs
-from commons.misc import default_if_none
+from commons.misc import default_if_none, seq
 from commons.networking import logout
 from commons.seqs import concat, grouper
 from commons.startup import run_main
 from contextlib import closing
+import logging
+from commons import log
 
-info  = partial(info,  '')
-debug = partial(debug, '')
-error = partial(error, '')
-die   = partial(die,   '')
+info    = partial(log.info,    'main')
+debug   = partial(log.debug,   'main')
+warning = partial(log.warning, 'main')
+error   = partial(log.error,   'main')
+die     = partial(log.die,     'main')
 
+def thread_dfs(msg, tid, tid2msgs):
+  assert msg.tid is None
+  msg.tid = tid
+  tid2msgs[tid].append(msg)
+  for ref in msg.refs:
+    if ref.tid is None:
+      thread_dfs(ref, tid, tid2msgs)
+    else:
+      assert ref.tid == tid
+
 def getmail(imap):
-  info( 'finding max seqno' )
-  ok, [seqnos] = imap.search(None, 'ALL')
-  maxseqno = int( seqnos.split()[-1] )
-  del seqnos
+  info( 'finding max UID' )
+  # We use UIDs rather than the default of sequence numbers because UIDs are
+  # guaranteed to be persistent across sessions.  This means that we can, for
+  # instance, fetch messages in one session and operate on this locally cached
+  # data before marking messages in a separate session.
+  ok, [uids] = imap.uid('SEARCH', None, 'ALL')
+  maxuid = int( uids.split()[-1] )
+  del uids
 
-  info( 'actually fetching the messages in chunks' )
+  info( 'actually fetching the messages in chunks up to max', maxuid )
   # The syntax/fields of the FETCH command is documented in RFC 2060.  Also,
   # this article contains a brief overview:
   # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/
   # BODY.PEEK prevents the message from automatically being flagged as \Seen.
-  query = '(FLAGS BODY.PEEK[HEADER.FIELDS (Message-ID References In-Reply-To From Subject)])'
+  query = '(FLAGS BODY.PEEK[HEADER.FIELDS ' \
+          '(Message-ID References In-Reply-To From To Cc Subject)])'
   step = 1000
   return list( concat(
-    imap.fetch('%d:%d' % (start, start + step - 1), query)[1]
-    for start in xrange(1, maxseqno + 1, step) ) )
+    seq( lambda: info('fetching', start, 'to', start + step - 1),
+         lambda: imap.uid('FETCH', '%d:%d' % (start, start + step - 1),
+                          query)[1] )
+    for start in xrange(1, maxuid + 1, step) ) )
 
 def main(argv):
-  import logging
-  config_logging(level = logging.INFO, do_console = True)
-
   p = ArgumentParser(description = __doc__)
   p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(),
       help = """File containing your login credentials, with the username on the
@@ -65,6 +85,18 @@
       help = "Directory to use for caching our data.")
   p.add_argument('--prompt', action = 'store_true',
       help = "Interactively prompt for the username and password.")
+  p.add_argument('--pretend', action = 'store_true',
+      help = """Do not actually carry out any updates to the server. Use in
+      conjunction with --debug to observe what would happen.""")
+  p.add_argument('--no-mark-unseen', action = 'store_true',
+      help = "Do not mark newly revelant threads as unread.")
+  p.add_argument('--no-mark-seen', action = 'store_true',
+      help = "Do not mark newly irrevelant threads as read.")
+  p.add_argument('--debug', action = 'append',
+      help = """Enable logging for messages of the given flags. Flags include:
+      refs (references to missing Message-IDs), dups (duplicate Message-IDs),
+      main (the main program logic), and star (which messages are being
+      starred), unstar (which messages are being unstarred).""")
   p.add_argument('sender',
       help = "Your email address.")
   p.add_argument('server',
@@ -72,6 +104,8 @@
 
   cfg = p.parse_args(argv[1:])
 
+  config_logging(level = logging.ERROR, do_console = True, flags = cfg.debug)
+
   if cfg.prompt:
     print "username:",
     cfg.user = raw_input()
@@ -82,7 +116,8 @@
       [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines())
 
   try:
-    m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$', cfg.server )
+    m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$',
+               cfg.server )
     cfg.host = m.group('host')
     cfg.port = int( default_if_none(m.group('port'), 993) )
     cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX')
@@ -93,6 +128,7 @@
 
   with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap:
     imap.login(cfg.user, cfg.passwd)
+    # Close is only valid in the authenticated state.
     with closing(imap) as imap:
       # Select the main mailbox (INBOX).
       imap.select(cfg.mailbox)
@@ -103,18 +139,13 @@
           (getmail) \
           (imap)
 
-      debug('fetched:', xs)
+      log.debug('fetched', xs)
 
-      info('determining the set of messages that were sent by you')
+      info('building message-id map and determining the set of messages sent '
+           'by you or addressed to you (the "source set")')
 
-      sent = set()
-      for (envelope, data), paren in grouper(2, xs):
-        msg = message_from_string(data)
-        if cfg.sender in msg['From']:
-          sent.add( msg['Message-ID'] )
-
-      info( 'find the threads in which I am a participant' )
-
+      srcs = []
+      mid2msg = {}
       # Every second item is just a closing paren.
       # Example data:
       # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}',
@@ -126,24 +157,89 @@
       #  ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}',
       #   'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')]
       for (envelope, data), paren in grouper(2, xs):
-        m = match( r"(?P<seqno>\d+) \(FLAGS \((?P<flags>[^)]+)\)", envelope )
-        seqno = m.group('seqno')
-        flags = m.group('flags')
-        if r'\Flagged' in flags: # flags != r'\Seen' and flags != r'\Seen NonJunk':
-          print 'FLAG'
-          print seqno, flags
-          print '\n'.join( map( str, msg.items() ) )
-          print
-        msg   = message_from_string(data)
-        id    = msg['Message-ID']
+        # Parse the body.
+        msg = message_from_string(data)
+
+        # Parse the envelope.
+        m = match(
+            r"(?P<seqno>\d+) \(UID (?P<uid>\d+) FLAGS \((?P<flags>[^)]+)\)",
+            envelope )
+        msg.seqno = m.group('seqno')
+        msg.uid   = m.group('uid')
+        msg.flags = m.group('flags').split()
+
+        # Prepare a container for references to other msgs, and initialize the
+        # thread ID.
+        msg.refs = []
+        msg.tid = None
+
+        # Add these to the map.
+        if msg['Message-ID'] in mid2msg:
+          log.warning( 'dups', 'duplicate message IDs:',
+                        msg['Message-ID'], msg['Subject'] )
+        mid2msg[ msg['Message-ID'] ] = msg
+
+        # Add to "srcs" set if sent by us or addressed to us.
+        if ( cfg.sender in default_if_none( msg['From'], '' ) or
+             cfg.sender in default_if_none( msg['To'],   '' ) or
+             cfg.sender in default_if_none( msg['Cc'],   '' ) ):
+          srcs.append( msg )
+
+      info( 'constructing undirected graph' )
+
+      for mid, msg in mid2msg.iteritems():
+        # Extract any references.
         irt   = default_if_none( msg.get_all('In-Reply-To'), [] )
         refs  = default_if_none( msg.get_all('References'), [] )
-        refs  = set( ' '.join( irt + refs ).split() )
-        if refs & sent:
-          print 'SENT'
-          print seqno, flags
-          print '\n'.join( map( str, msg.items() ) )
-          print
-#       if refs & sent:
+        refs  = set( ' '.join( irt + refs ).replace('><', '> <').split() )
 
+        # Connect nodes in graph bidirectionally.  Ignore references to MIDs
+        # that don't exist.
+        for ref in refs:
+          try:
+            refmsg = mid2msg[ref]
+            # We can use lists/append (not worry about duplicates) because the
+            # original sources should be acyclic.  If a -> b, then there is no b ->
+            # a, so when crawling a we can add a <-> b without worrying that later
+            # we may re-add b -> a.
+            msg.refs.append(refmsg)
+            refmsg.refs.append(msg)
+          except:
+            log.warning( 'refs', ref )
+
+      info('finding connected components (grouping the messages into threads)')
+
+      tids = count()
+      tid2msgs = defaultdict(list)
+      for mid, msg in mid2msg.iteritems():
+        if msg.tid is None:
+          thread_dfs(msg, tids.next(), tid2msgs)
+
+      info( 'starring the relevant threads, in which I am a participant' )
+
+      rel_tids = set()
+      for srcmsg in srcs:
+        if srcmsg.tid not in rel_tids:
+          rel_tids.add(srcmsg.tid)
+          for msg in tid2msgs[srcmsg.tid]:
+            if r'\Flagged' not in msg.flags:
+              log.info( 'star', '\n', msg )
+              if not cfg.pretend:
+                imap.uid('STORE', msg.uid, '+FLAGS', r'\Flagged')
+                if not cfg.no_mark_unseen and r'\Seen' in msg.flags:
+                  imap.uid('STORE', msg.uid, '-FLAGS', r'\Seen')
+
+      info( 'unstarring irrelevant threads, in which I am not a participant' )
+
+      all_tids = set( tid2msgs.iterkeys() )
+      irrel_tids = all_tids - rel_tids
+      for tid in irrel_tids:
+        for msg in tid2msgs[tid]:
+          if r'\Flagged' in msg.flags:
+            log.info( 'unstar', '\n', msg )
+            if not cfg.pretend:
+              imap.uid('STORE', msg.uid, '-FLAGS', r'\Flagged')
+              if not cfg.no_mark_seen and r'\Seen' not in msg.flags:
+                imap.uid('STORE', msg.uid, '+FLAGS', r'\Seen')
+
 run_main()


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [714] mailing-list-filter/trunk/README

From: <yan...@us...> - 2008-05-08 06:54:30

Revision: 714
          http://assorted.svn.sourceforge.net/assorted/?rev=714&view=rev
Author:   yangzhang
Date:     2008-05-07 23:54:08 -0700 (Wed, 07 May 2008)

Log Message:
-----------
added to readme

Modified Paths:
--------------
    mailing-list-filter/trunk/README

Modified: mailing-list-filter/trunk/README
===================================================================
--- mailing-list-filter/trunk/README	2008-05-08 06:05:12 UTC (rev 713)
+++ mailing-list-filter/trunk/README	2008-05-08 06:54:08 UTC (rev 714)
@@ -30,3 +30,8 @@
 also fails when others change the subject.  Finally, this approach is
 unsatisfactory because it pollutes subject lines, and it essentially replicates
 exactly what Message-ID was intended for.
+
+This script is not intended to be a replacement for the Gmail filters. I still
+keep those active so that I can get immediate first-pass filtering. I execute
+this script on a daily basis to perform second-pass filtering/unfiltering to
+catch those false negatives that may have been missed.


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [715] mailing-list-filter/trunk/setup.py

From: <yan...@us...> - 2008-05-08 06:54:30

Revision: 715
          http://assorted.svn.sourceforge.net/assorted/?rev=715&view=rev
Author:   yangzhang
Date:     2008-05-07 23:54:29 -0700 (Wed, 07 May 2008)

Log Message:
-----------
specified executable script; still not quite right since a lib is unnecessarily installed

Modified Paths:
--------------
    mailing-list-filter/trunk/setup.py

Modified: mailing-list-filter/trunk/setup.py
===================================================================
--- mailing-list-filter/trunk/setup.py	2008-05-08 06:54:08 UTC (rev 714)
+++ mailing-list-filter/trunk/setup.py	2008-05-08 06:54:29 UTC (rev 715)
@@ -25,4 +25,4 @@
 Classifier: Topic :: Communications :: Email
 """
 
-run_setup(pkg_info_text)
+run_setup(pkg_info_text, scripts = ['src/filter.py'])


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [713] mailing-list-filter/trunk/publish.bash

From: <yan...@us...> - 2008-05-08 06:05:09

Revision: 713
          http://assorted.svn.sourceforge.net/assorted/?rev=713&view=rev
Author:   yangzhang
Date:     2008-05-07 23:05:12 -0700 (Wed, 07 May 2008)

Log Message:
-----------
added a publisher script

Added Paths:
-----------
    mailing-list-filter/trunk/publish.bash

Added: mailing-list-filter/trunk/publish.bash
===================================================================
--- mailing-list-filter/trunk/publish.bash	                        (rev 0)
+++ mailing-list-filter/trunk/publish.bash	2008-05-08 06:05:12 UTC (rev 713)
@@ -0,0 +1,12 @@
+#!/usr/bin/env bash
+
+post-stage() {
+  epydoc -o $stagedir/doc src/commons/
+}
+
+fullname='Mailing List Filter'
+version=0.1
+license=psf
+websrcs=( README )
+rels=( pypi: )
+. assorted.bash "$@"


Property changes on: mailing-list-filter/trunk/publish.bash
___________________________________________________________________
Name: svn:executable
   + *


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [712] mailing-list-filter/trunk/README

From: <yan...@us...> - 2008-05-08 06:03:23

Revision: 712
          http://assorted.svn.sourceforge.net/assorted/?rev=712&view=rev
Author:   yangzhang
Date:     2008-05-07 23:03:28 -0700 (Wed, 07 May 2008)

Log Message:
-----------
added an overview readme

Added Paths:
-----------
    mailing-list-filter/trunk/README

Added: mailing-list-filter/trunk/README
===================================================================
--- mailing-list-filter/trunk/README	                        (rev 0)
+++ mailing-list-filter/trunk/README	2008-05-08 06:03:28 UTC (rev 712)
@@ -0,0 +1,32 @@
+% Mailing List Filter
+% Yang Zhang
+
+Overview
+--------
+
+I have a Gmail account that I use for subscribing to and posting to mailing
+lists.  When dealing with high-volume mailing lists, I am typically only
+interested in those threads that I participated in.  This is a simple filter
+for starring and marking unread any messages belonging to such threads.
+
+This is accomplished by looking at the set of messages that were either sent
+from me or explicitly addressed to me.  From this "root set" of messages, we
+can use the `Message-ID`, `References`, and `In-Reply-To` headers to determine
+threads, and thus the other messages that we care about.
+
+I have found this to be more accurate than my two original approaches.  I used
+to have Gmail filters that starred/marked unread any messages containing my
+name anywhere in the message.  This worked OK since my name is not too common,
+but it produced some false positives (not that bad, just unstar messages) and
+some false negatives (much harder to detect).
+
+A second approach is to tag all subjects with some signature string.  This
+usually is fine, but it doesn't work when you did not start the thread (and
+thus determine the subject).  You can try to change the subject line, but this
+is (1) poor netiquette, (2) unreliable because your reply may not register in
+other mail clients as being part of the same thread (and thus other
+participants may miss your reply), and (3) unreliable because replies might not
+directly referencing your post (either intentionally or unintentionally).  It
+also fails when others change the subject.  Finally, this approach is
+unsatisfactory because it pollutes subject lines, and it essentially replicates
+exactly what Message-ID was intended for.


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [711] mailing-list-filter/trunk/setup.py

From: <yan...@us...> - 2008-05-08 05:48:33

Revision: 711
          http://assorted.svn.sourceforge.net/assorted/?rev=711&view=rev
Author:   yangzhang
Date:     2008-05-07 22:48:40 -0700 (Wed, 07 May 2008)

Log Message:
-----------
added setup

Added Paths:
-----------
    mailing-list-filter/trunk/setup.py

Added: mailing-list-filter/trunk/setup.py
===================================================================
--- mailing-list-filter/trunk/setup.py	                        (rev 0)
+++ mailing-list-filter/trunk/setup.py	2008-05-08 05:48:40 UTC (rev 711)
@@ -0,0 +1,28 @@
+#!/usr/bin/env python
+
+from commons.setup import run_setup
+
+pkg_info_text = """
+Metadata-Version: 1.1
+Name: mailing-list-filter
+Version: 0.1
+Author: Yang Zhang
+Author-email: yaaang NOSPAM at REMOVECAPS gmail
+Home-page: http://assorted.sourceforge.net/mailing-list-filter/
+Download-url: http://pypi.python.org/pypi/mailing-list-filter/
+Summary: Mailing List Filter
+License: Python Software Foundation License
+Description: Filter mailing list email for relevant threads only.
+Keywords: mailing,list,email,filter,IMAP,Gmail
+Platform: any
+Provides: commons
+Classifier: Development Status :: 4 - Beta
+Classifier: Environment :: No Input/Output (Daemon)
+Classifier: Intended Audience :: End Users/Desktop
+Classifier: License :: OSI Approved :: Python Software Foundation License
+Classifier: Operating System :: OS Independent
+Classifier: Programming Language :: Python
+Classifier: Topic :: Communications :: Email
+"""
+
+run_setup(pkg_info_text)


Property changes on: mailing-list-filter/trunk/setup.py
___________________________________________________________________
Name: svn:executable
   + *


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [710] python-commons/trunk/README

From: <yan...@us...> - 2008-05-08 04:46:59

Revision: 710
          http://assorted.svn.sourceforge.net/assorted/?rev=710&view=rev
Author:   yangzhang
Date:     2008-05-07 21:47:03 -0700 (Wed, 07 May 2008)

Log Message:
-----------
added simple changelog to readme

Modified Paths:
--------------
    python-commons/trunk/README

Modified: python-commons/trunk/README
===================================================================
--- python-commons/trunk/README	2008-05-08 03:20:46 UTC (rev 709)
+++ python-commons/trunk/README	2008-05-08 04:47:03 UTC (rev 710)
@@ -33,3 +33,28 @@
 
 [ASPN Cookbook]: http://aspn.activestate.com/ASPN/Cookbook/Python
 [AIMA Utilities]: http://aima.cs.berkeley.edu/python/utils.py
+
+Changes
+-------
+
+version 0.3.1
+
+- removed extraneous debug print statements
+
+version 0.3
+
+- added versioned guards
+- added file memoization
+- added retry with exp backoff
+- added `countstep()`
+- released for
+  [gbookmark2delicious](http://gbookmark2delicious.googlecode.com/)
+
+version 0.2
+
+- added `clients`, `setup`
+- released for [icedb](http://cartel.csail.mit.edu/icedb/)
+
+version 0.1
+
+- initial release


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [709] assorted-site/trunk/main.css

From: <yan...@us...> - 2008-05-08 03:20:42

Revision: 709
          http://assorted.svn.sourceforge.net/assorted/?rev=709&view=rev
Author:   yangzhang
Date:     2008-05-07 20:20:46 -0700 (Wed, 07 May 2008)

Log Message:
-----------
preferring verdana

Modified Paths:
--------------
    assorted-site/trunk/main.css

Modified: assorted-site/trunk/main.css
===================================================================
--- assorted-site/trunk/main.css	2008-05-08 03:20:39 UTC (rev 708)
+++ assorted-site/trunk/main.css	2008-05-08 03:20:46 UTC (rev 709)
@@ -10,7 +10,7 @@
     padding:0;
     background-color: white;
     color: black;
-    font-family: Georgia, Verdana, sans-serif;
+    font-family: Verdana, sans-serif;
     font-size: medium;
     line-height: 1.3em;
     color: #333; 
@@ -70,6 +70,7 @@
   margin-bottom: 0.5em;
 }
 
+/* TODO: make this larger? */
 pre {
   padding: 0;
   margin: 0;


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [708] assorted-site/trunk/index.txt

From: <yan...@us...> - 2008-05-08 03:20:35

Revision: 708
          http://assorted.svn.sourceforge.net/assorted/?rev=708&view=rev
Author:   yangzhang
Date:     2008-05-07 20:20:39 -0700 (Wed, 07 May 2008)

Log Message:
-----------
added scala doc search, js beautify, mailing list filter

Modified Paths:
--------------
    assorted-site/trunk/index.txt

Modified: assorted-site/trunk/index.txt
===================================================================
--- assorted-site/trunk/index.txt	2008-05-08 03:19:12 UTC (rev 707)
+++ assorted-site/trunk/index.txt	2008-05-08 03:20:39 UTC (rev 708)
@@ -82,8 +82,14 @@
   - Sandbox: heap of small test cases to explore (mostly programming language
     details, bugs, corner cases, features, etc.) (passive)
 - Miscellanea
+  - [Mailing List Filter](mailing-list-filter): deal with high-volume mailing
+    lists by filtering your mailbox for threads in which you were a participant
+    (active)
+  - [Scala Doc Search](http://scripts.mit.edu/~y_z/sds/): navigate the Scala
+    API documentation by class or object name (done)
   - Bibliography: my pan-paper BibTeX; i.e., stalling for ZDB (active)
   - Subtitle adjuster: for time-shifting SRTs (done)
+  - Javascript Beautifier: a thin [Tamarin] wrapper for [js_beautify].
   - Programming Problems: my workspace for solving programming puzzles
     (hiatus)
   - Source management: various tools for cleaning up and maintaining a source
@@ -92,24 +98,6 @@
   - [This website](http://assorted.sf.net/) (passive)
   - [My personal website](http://www.mit.edu/~y_z/) (passive)
 
-What the statuses mean:
-
-- done: no more active development planned, but will generally maintain/fix
-  issues
-- passive: under continual but gradual growth
-- active: development is happening at a faster pace
-- abandoned: incomplete; no plans to pick it up again
-- hiatus: incomplete; plan to resume development
-
-Other links:
-
-- [SourceForge Project Page](http://sf.net/projects/assorted/):
-  download file releases, discuss on the forums, report bugs/request features,
-  [browse the repository]
-- [Simple Publications Manager](http://pubmgr.sf.net/): another SF-hosted
-  mini-project of mine
-- [TinyOS](http://tinyos.net/): SF-hosted project I've been involved in
-
 [BattleCode]: http://battlecode.mit.edu/
 [BattleCode 2007]: http://battlecode.mit.edu/2007/
 [BattleCode 2008]: http://battlecode.mit.edu/2008/
@@ -124,8 +112,34 @@
 [Facebook]: http://www.facebook.com/
 [YouTube]: http://www.youtube.com/
 [MySpace]: http://www.myspace.com/
+[Tamarin]: http://www.mozilla.org/projects/tamarin/
+[js_beautify]: http://elfz.laacz.lv/beautify/
+
+What the statuses mean:
+
+- done: no more active development planned, but will generally maintain/fix
+  issues
+- passive: under continual but gradual growth
+- active: development is happening at a faster pace
+- abandoned: incomplete; no plans to pick it up again
+- hiatus: incomplete; plan to resume development
+
+Project pages:
+
+- [SourceForge Project Page]: view summary, [browse the repository]
+- [Google Code Page]: download file releases, report bugs/request features
+- [Google Groups Page]: discussions and support
+
+[SourceForge Project Page]: http://sf.net/projects/assorted/
+[Google Code Page]: http://code.google.com/p/assorted/
+[Google Groups Page]: http://groups.google.com/group/assorted-projects/
 [browse the repository]: http://assorted.svn.sourceforge.net/viewvc/assorted/
 
+Copyright 2008 [Yang Zhang].  
+All rights reserved.
+
+[Yang Zhang]: http://www.mit.edu/~y_z/
+
 <!--
-vim:nocin:et:sw=2:ts=2
+vim:nocin:et:ft=mkd:sw=2:ts=2
 -->


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [707] python-commons/trunk/src/commons/seqs.py

From: <yan...@us...> - 2008-05-08 03:19:13

Revision: 707
          http://assorted.svn.sourceforge.net/assorted/?rev=707&view=rev
Author:   yangzhang
Date:     2008-05-07 20:19:12 -0700 (Wed, 07 May 2008)

Log Message:
-----------
fixed missing import

Modified Paths:
--------------
    python-commons/trunk/src/commons/seqs.py

Modified: python-commons/trunk/src/commons/seqs.py
===================================================================
--- python-commons/trunk/src/commons/seqs.py	2008-05-08 03:18:57 UTC (rev 706)
+++ python-commons/trunk/src/commons/seqs.py	2008-05-08 03:19:12 UTC (rev 707)
@@ -8,7 +8,7 @@
 from struct import pack, unpack
 from contextlib import closing
 from itertools import ( chain, count, ifilterfalse, islice,
-                        izip, tee )
+                        izip, repeat, tee )
 from .log import warning
 
 """


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [706] python-commons/trunk/src/commons/networking.py

From: <yan...@us...> - 2008-05-08 03:18:52

Revision: 706
          http://assorted.svn.sourceforge.net/assorted/?rev=706&view=rev
Author:   yangzhang
Date:     2008-05-07 20:18:57 -0700 (Wed, 07 May 2008)

Log Message:
-----------
added logout

Modified Paths:
--------------
    python-commons/trunk/src/commons/networking.py

Modified: python-commons/trunk/src/commons/networking.py
===================================================================
--- python-commons/trunk/src/commons/networking.py	2008-05-08 03:18:47 UTC (rev 705)
+++ python-commons/trunk/src/commons/networking.py	2008-05-08 03:18:57 UTC (rev 706)
@@ -7,6 +7,7 @@
 
 import os, sys
 from time import *
+from contextlib import contextmanager
 
 class NoMacAddrError( Exception ): pass
 
@@ -63,3 +64,11 @@
         print 'backing off for', backoff
         sleep(backoff)
         backoff = multiplier * backoff
+
+@contextmanager
+def logout(x):
+    """
+    A context manager for finally calling the C{logout()} method of an object.
+    """
+    try: yield x
+    finally: x.logout()


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [705] python-commons/trunk/src/commons/misc.py

From: <yan...@us...> - 2008-05-08 03:18:40

Revision: 705
          http://assorted.svn.sourceforge.net/assorted/?rev=705&view=rev
Author:   yangzhang
Date:     2008-05-07 20:18:47 -0700 (Wed, 07 May 2008)

Log Message:
-----------
added seq, default_if_none

Modified Paths:
--------------
    python-commons/trunk/src/commons/misc.py

Modified: python-commons/trunk/src/commons/misc.py
===================================================================
--- python-commons/trunk/src/commons/misc.py	2008-05-07 16:06:28 UTC (rev 704)
+++ python-commons/trunk/src/commons/misc.py	2008-05-08 03:18:47 UTC (rev 705)
@@ -46,3 +46,17 @@
     finally:
         end = time()
         output[0] = end - start
+
+def default_if_none(x, d):
+    """
+    Returns L{x} if it's not None, otherwise returns L{d}.
+    """
+    if x is None: return d
+    else: return x
+
+def seq(f, g):
+    """
+    Evaluate 0-ary functions L{f} then L{g}, returning L{g()}.
+    """
+    f()
+    return g()


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [704] mailing-list-filter

From: <yan...@us...> - 2008-05-07 16:06:46

Revision: 704
          http://assorted.svn.sourceforge.net/assorted/?rev=704&view=rev
Author:   yangzhang
Date:     2008-05-07 09:06:28 -0700 (Wed, 07 May 2008)

Log Message:
-----------
added mailing list filter! still exploring how gmail imap starred messages work

Added Paths:
-----------
    mailing-list-filter/
    mailing-list-filter/trunk/
    mailing-list-filter/trunk/src/
    mailing-list-filter/trunk/src/filter.py

Added: mailing-list-filter/trunk/src/filter.py
===================================================================
--- mailing-list-filter/trunk/src/filter.py	                        (rev 0)
+++ mailing-list-filter/trunk/src/filter.py	2008-05-07 16:06:28 UTC (rev 704)
@@ -0,0 +1,149 @@
+#!/usr/bin/env python
+
+"""
+Given an IMAP mailbox, mark all messages as read except for those threads in
+which you were a participant, where thread grouping is performed via the
+In-Reply-To and References headers.
+
+Currently, we assume that the server specification points to a mailbox
+containing all messages (both sent and received), and a message is determined
+to have been sent by you by looking at the From: header field. This should work
+well with Gmail. An alternative strategy is to look through two folders, one
+that's the Inbox and one that's the Sent mailbox, and treat all messages in
+Sent as having been sent by you.
+"""
+
+from __future__ import with_statement
+from collections import defaultdict
+from email import message_from_string
+from getpass import getpass
+from imaplib import IMAP4_SSL
+from argparse import ArgumentParser
+from path import path
+from re import match
+from functools import partial
+from commons.decs import pickle_memoized
+from commons.log import *
+from commons.files import cleanse_filename, soft_makedirs
+from commons.misc import default_if_none
+from commons.networking import logout
+from commons.seqs import concat, grouper
+from commons.startup import run_main
+from contextlib import closing
+
+info  = partial(info,  '')
+debug = partial(debug, '')
+error = partial(error, '')
+die   = partial(die,   '')
+
+def getmail(imap):
+  info( 'finding max seqno' )
+  ok, [seqnos] = imap.search(None, 'ALL')
+  maxseqno = int( seqnos.split()[-1] )
+  del seqnos
+
+  info( 'actually fetching the messages in chunks' )
+  # The syntax/fields of the FETCH command is documented in RFC 2060.  Also,
+  # this article contains a brief overview:
+  # http://www.devshed.com/c/a/Python/Python-Email-Libraries-part-2-IMAP/3/
+  # BODY.PEEK prevents the message from automatically being flagged as \Seen.
+  query = '(FLAGS BODY.PEEK[HEADER.FIELDS (Message-ID References In-Reply-To From Subject)])'
+  step = 1000
+  return list( concat(
+    imap.fetch('%d:%d' % (start, start + step - 1), query)[1]
+    for start in xrange(1, maxseqno + 1, step) ) )
+
+def main(argv):
+  import logging
+  config_logging(level = logging.INFO, do_console = True)
+
+  p = ArgumentParser(description = __doc__)
+  p.add_argument('--credfile', default = path( '~/.mlf.auth' ).expanduser(),
+      help = """File containing your login credentials, with the username on the
+      first line and the password on the second line.  Ignored iff --prompt.""")
+  p.add_argument('--cachedir', default = path( '~/.mlf.cache' ).expanduser(),
+      help = "Directory to use for caching our data.")
+  p.add_argument('--prompt', action = 'store_true',
+      help = "Interactively prompt for the username and password.")
+  p.add_argument('sender',
+      help = "Your email address.")
+  p.add_argument('server',
+      help = "The server in the format: <host>[:<port>][/<mailbox>].")
+
+  cfg = p.parse_args(argv[1:])
+
+  if cfg.prompt:
+    print "username:",
+    cfg.user = raw_input()
+    print "password:",
+    cfg.passwd = getpass()
+  else:
+    with file(cfg.credfile) as f:
+      [cfg.user, cfg.passwd] = map(lambda x: x.strip('\r\n'), f.readlines())
+
+  try:
+    m = match( r'(?P<host>[^:/]+)(:(?P<port>\d+))?(/(?P<mailbox>.+))?$', cfg.server )
+    cfg.host = m.group('host')
+    cfg.port = int( default_if_none(m.group('port'), 993) )
+    cfg.mailbox = default_if_none(m.group('mailbox'), 'INBOX')
+  except:
+    p.error('Need to specify the server in the correct format.')
+
+  soft_makedirs(cfg.cachedir)
+
+  with logout(IMAP4_SSL(cfg.host, cfg.port)) as imap:
+    imap.login(cfg.user, cfg.passwd)
+    with closing(imap) as imap:
+      # Select the main mailbox (INBOX).
+      imap.select(cfg.mailbox)
+
+      # Fetch message IDs, references, and senders.
+      xs = pickle_memoized \
+          (lambda imap: cfg.cachedir / cleanse_filename(cfg.sender)) \
+          (getmail) \
+          (imap)
+
+      debug('fetched:', xs)
+
+      info('determining the set of messages that were sent by you')
+
+      sent = set()
+      for (envelope, data), paren in grouper(2, xs):
+        msg = message_from_string(data)
+        if cfg.sender in msg['From']:
+          sent.add( msg['Message-ID'] )
+
+      info( 'find the threads in which I am a participant' )
+
+      # Every second item is just a closing paren.
+      # Example data:
+      # [('13300 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {67}',
+      #   'Message-ID: <mai...@py...>\r\n\r\n'),
+      #  ')',
+      #  ('13301 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {59}',
+      #   'Message-Id: <200...@hv...>\r\n\r\n'),
+      #  ')',
+      #  ('13302 (BODY[HEADER.FIELDS (Message-ID References In-Reply-To)] {92}',
+      #   'Message-ID: <C43EAFC0.2E3AE%ni...@ya...>\r\nIn-Reply-To: <481...@gm...>\r\n\r\n')]
+      for (envelope, data), paren in grouper(2, xs):
+        m = match( r"(?P<seqno>\d+) \(FLAGS \((?P<flags>[^)]+)\)", envelope )
+        seqno = m.group('seqno')
+        flags = m.group('flags')
+        if r'\Flagged' in flags: # flags != r'\Seen' and flags != r'\Seen NonJunk':
+          print 'FLAG'
+          print seqno, flags
+          print '\n'.join( map( str, msg.items() ) )
+          print
+        msg   = message_from_string(data)
+        id    = msg['Message-ID']
+        irt   = default_if_none( msg.get_all('In-Reply-To'), [] )
+        refs  = default_if_none( msg.get_all('References'), [] )
+        refs  = set( ' '.join( irt + refs ).split() )
+        if refs & sent:
+          print 'SENT'
+          print seqno, flags
+          print '\n'.join( map( str, msg.items() ) )
+          print
+#       if refs & sent:
+
+run_main()


Property changes on: mailing-list-filter/trunk/src/filter.py
___________________________________________________________________
Name: svn:executable
   + *


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

[Assorted-commits] SF.net SVN: assorted: [703] configs/trunk/src/vim/plugin/cscope_maps.vim

From: <yan...@us...> - 2008-05-04 17:46:36

Revision: 703
          http://assorted.svn.sourceforge.net/assorted/?rev=703&view=rev
Author:   yangzhang
Date:     2008-05-04 10:46:40 -0700 (Sun, 04 May 2008)

Log Message:
-----------
uncommented cscope autoload

Modified Paths:
--------------
    configs/trunk/src/vim/plugin/cscope_maps.vim

Modified: configs/trunk/src/vim/plugin/cscope_maps.vim
===================================================================
--- configs/trunk/src/vim/plugin/cscope_maps.vim	2008-05-04 17:46:02 UTC (rev 702)
+++ configs/trunk/src/vim/plugin/cscope_maps.vim	2008-05-04 17:46:40 UTC (rev 703)
@@ -37,13 +37,13 @@
     " if you want the reverse search order.
     set csto=0
 
-"    " add any cscope database in current directory
-"    if filereadable("cscope.out")
-"        cs add cscope.out  
-"    " else add the database pointed to by environment variable 
-"    elseif $CSCOPE_DB != ""
-"        cs add $CSCOPE_DB
-"    endif
+    " add any cscope database in current directory
+    if filereadable("cscope.out")
+        cs add cscope.out  
+    " else add the database pointed to by environment variable 
+    elseif $CSCOPE_DB != ""
+        cs add $CSCOPE_DB
+    endif
 
     " show msg when any other cscope db added
     set cscopeverbose  


This was sent by the SourceForge.net collaborative development platform, the world's largest Open Source development site.

Flat | Threaded

<< < 1 .. 46 47 48 49 50 .. 69 > >> (Page 48 of 69)