Menu

Tree [e5da5b] master /
 History

HTTPS access


File Date Author Commit
 acinclude 2019-11-02 Tomas Pospisek Tomas Pospisek [4a3774] remove checks for transient libraries
 debian 2020-11-15 Tomas Pospisek Tomas Pospisek [e5da5b] run autoreconf as the first step of the Debian ...
 doc 2020-10-28 Tomas Pospisek Tomas Pospisek [c021b0] use the proper adjective
 src 2020-11-15 Tomas Pospisek Tomas Pospisek [676755] fix wrong indent
 tests 2020-11-09 Tomas Pospisek Tomas Pospisek [d758f6] fix mail
 .cvsignore 2004-08-20 tpo tpo [a5394e] ignore stamp-in, signal to gentoo which autobui...
 AUTHORS 2003-01-24 tpo tpo [266d55] and yet more touches to our autotools files
 COPYING 2003-01-23 tpo tpo [08db84] *** empty log message ***
 Makefile.am 2020-10-28 Tomas Pospisek Tomas Pospisek [d3257b] let autoreconf know that aclocal needs
 NEWS 2020-10-28 Tomas Pospisek Tomas Pospisek [8ec3f4] debian/changelog should only contain user visib...
 README 2020-10-28 Tomas Pospisek Tomas Pospisek [e09d42] add chapter on mailsync development
 THANKS 2020-11-06 Tomas Pospisek Tomas Pospisek [6955c9] refer to debian/changelog that contains further...
 TODO 2004-06-09 tpo tpo [5a6566] maintaining message Status is broken again - ne...
 autogen.sh 2004-08-20 tpo tpo [a5394e] ignore stamp-in, signal to gentoo which autobui...
 configure.ac 2020-11-10 Tomas Pospisek Tomas Pospisek [0ea801] support Message-IDs folded with a linefeed

Read Me

Mailsync
--------

1.   Compiling mailsync
2.   Configuring mailsync
2.1     Mailbox specification
2.2     Stores
2.3     Channels
3.   How does mailsync work?
4.   Running mailsync
4.1     Verbosity
4.2     Various messages from mailsync
5.   Limitations
5.1     Problems with concurrent mailbox access
5.2     Problems wit "New" messages (Mutt)
5.3     Empty Message-ID headers
6.   More about the algorithm
6.1     Message id algorithms
7.   History
8.   Further reading
9.   Support, development, new versions and homepage
9.1     Developing mailsync
10.  Author, disclaimer, etc.



1. Compiling mailsync
---------------------

Mailsync has been built on Linux, FreeBSD and on SGI-Irix 6.5. With a
little coaxing, though, it should compile on any unix-like system on
which c-client compiles.  On a non-unix-like system, you may have to
come up with alternatives to a few functions like stat() and getenv().

1. Download, unpack, and compile the c-client from
   ftp://ftp.cac.washington.edu/imap.  Currently I'm using
   imap-2002.RC8, which allows linking from a C++ program.  If you
   already have an older version of c-client installed, you can still
   link it to Mailsync.  Just get c-client.h from either the latest IMAP
   distribution or from http://mailsync.sourceforge.net.

If you're building mailsync from git you'll need an automake version later
than 1.6 and pkg-config. The following should do the deal:

2. ./autogen.sh

After that the build procedure is the same as building from a released
source package:

3. ./configure && make && make install



2. Configuring mailsync
-----------------------

To use mailsync, you first have to make a configuration file, which by
default is `$HOME/.mailsync'.  The config file specifies two kinds of
things: "stores" and "channels".  

Lines in the mailsync configuration file starting with a `#' are
regarded as comments and are being ignored.

A "store" describes a mailbox and which parts of that mailbox you want
to have synchronized.  A channel is a pair of stores which you want to
get synchronized along with a file where mailsync can save
synchronization info.

Additionaly mailsync initializes and reads in c-client configuration to
fine tune it's behaveour. You will very likely not need to bother about
that. Before you tinker with it make sure to read the imaprc.txt
documentation file from c-client.


2.1 Mailbox specification
-------------------------

Mailsync uses the c-client library for manipulation of mailboxes.  Please
have a look at the c-client library documentation for details of the
format of a mailbox specification.  Especially have a look at
docs/naming.txt and docs/drivers.txt from it's documentation.

Briefly, a mailbox specification looks like this:

 {imap.unc.edu/user=culver}           refers to INBOX by default
 {imap.unc.edu/user=culver}INBOX.foo  a mailbox on a Cyrus server
 {imap.unc.edu/user=culver}foo        a mailbox on a UW or Netscape server
 mbox                                 the file $HOME/mbox
 Mail/foo                             a mailbox in $HOME/Mail
 /tmp/foo                             some other file



2.2 Stores
----------

A store is a collection of mail folders.  Exactly what kind of
collection you can use depends on your IMAP server, but one thing that
always works is a bunch of folders in a single directory.
Unfortunately there is no general, concise way to specify even this
kind of store.  I have settled on a fairly general but redundant
method.

Here are examples of store specifiers that work with some servers I
have access to.

store cyrus {
	server	{imap.unc.edu/user=culver}
	ref	{imap.unc.edu}
	pat	INBOX.sync.%
	prefix	INBOX.sync.
	passwd  secret
}
store netscape-or-uw {
	server	{imap.cs.unc.edu/user=culver}
	ref	{imap.cs.unc.edu}
	pat	sync/%
	prefix	sync/
}
store localdirectory {
	pat	Mail/%
	prefix	Mail/
}

To test a store specification, put it in your .mailsync and run

  mailsync <storename>

Mailsync will list the mailboxes it thinks are in the store.  (In this
mode, it will not touch anything or even open the mailboxes.)  The
names it returns should be stripped of any store-specific information.

`Pat' describes the pattern of the boxes you want to synchronize.  It
matches names exactly except for:

* the delimiter character which is used to delimit hierarchies (folders)
  in your mailbox and which depends on the mailstore you use (unix
  filesystems use `/', Cyrus, UW and Netscape use `.').
  
* `%' is a globing operator.  It will match all the items in a hierarchy
  (folder) but it will not descend down the hierarchy (into folders).
  
* `*' acts like `%' but descends into the hierarchy.  Be careful with `*'
  as it can take a lot of time to traverse a deep hierarchy.

If you omit the `prefix' specification, you will see full mailbox
names.  Whatever you specify in `prefix' is stripped off of these
names to form what I shall call a "boxname".

Two mailboxes on different stores will be synchronized if and only if
they have the same "boxname".  Finally, the full c-client name is formed
by <server><prefix><boxname>.

`passwd' is the password to use when accessing the box. If you omit it and
the store will require a password then mailsync will ask you for it.

If you want to use MH files, or some other format, you should check
out the "docs/" directory in the imap distribution, particularly
naming.txt, drivers.txt, and formats.txt.



2.3 Channels
------------

A channel just specifies two stores that are to be synchronized, and
one c-client mailbox `msinfo' which is used by mailsync to remember
what messages it has seen (it stores there sets of message-ids in
between runs).  Any c-client mailbox specification is fine.  If one
or both of the stores is a local file, then you might as well use a
local file for `msinfo':

channel local-cyrus localdirectory cyrus {
	msinfo	.msinfo
}

The message-id list is kept as a message in the mailbox, with the
channel tag ("local-cyrus") as the subject.  So you can use the same
`msinfo' file for many channels, as long as the channels have
different names.

If, on the other hand, both stores are on remote imap servers, you may 
consider putting `msinfo' on one or on the other server:

channel uw-cyrus netscape-or-uw cyrus {
	msinfo	{imap.cs.unc.edu/user=culver}msinfo
	passwd secret
}

The `msinfo' mailbox stores a bunch of message-ids in a mail message.
The message is formed without a message-id, so you can even put the
`msinfo' inside the store if you like, and mailsync will not copy or
delete the `msinfo' message.  (This may change, though.  Better to
keep it separate from the stores.)

As for stores `passwd' can be omitted and in case it's required mailsync
will ask for it.

In the channel specification you can optionally set message size limit.
Messages with size exceeding the limit (in bytes) will be skipped when
synchronizing:

channel uw-cyrus netscape-or-uw cyrus {
	msinfo {imap.cs.unc.edu/user=culver}msinfo
	passwd secret
	sizelimit 81920
}

The example above will not synchronize messages bigger than 80k.


3. How does mailsync work?
--------------------------

First, it loads the state of the store at the last sync.  

If a "lasttime" mailbox doesn't exist, it is assumed empty.  This is
the right thing to do.

Then it iterates through every mailbox on either store.  In each
mailbox, it applies the following 3-way diff algorithm.

  If a message exists on both stores, it is left alone.

  If a message exists on one store but not the other, and it is a
  "new" message (it's not recorded in `msinfo'), it is copied to the
  other store.

  If a message exists on one store but not the other, and it is an 
  "old" message (it was recorded in `msinfo' at last synchronization), 
  it is assumed that the message was deliberately deleted from one
  store, and is removed from the other.

Finally, it saves the set of remaining messages to the `msinfo' file.



4. Running mailsync
-------------------

There are three modes.


"Sync" mode:

    mailsync [options] <channel>

Synchronizes the two stores specified by the channel,
doing a 3-way diff with the message-ids stashed in the channel's
msinfo box.


"List" mode:

    mailsync [options] <store>

Simply list the boxnames specified by the given store.  Don't change
or write anything.  Useful when writing the .mailsync config file.


"Diff" mode:

    mailsync [options] <channel> <store>
 or mailsync [options] <store> <channel> 

Compare the messages in the store to the information in the channel's
msinfo file.  This mode does not disturb any mail.  This is useful if,
for instance, <store> is local, <channel>'s msinfo is a local file,
but <channel>'s other store is remote and you're not dialed up.
Mailsync will tell you how many new messages and deletions would be
propagated away from <store> through <channel>.  It also reports
duplicate messages (without deleting them).

The options change from time to time, and are described if you just
type "mailsync".

The `-D' option deletes empty mailboxes after synchronizing (and works 
only in "Sync" mode).  Mailsync doesn't differentiate between empty
and missing mailboxes.  Suppose you delete a mailbox on store A but
not on store B.  Without `-D', mailsync will delete all messages on B, 
but it will also resurrect the mailbox on A.  With `-D', both
mailboxes will be deleted.  Your choice.



4.1 Verbosity
-------------

Mailsync knowns a series of log_levels. If you encounter problem with
mailsync you should enable those to let mailsync tell you more about
what's going on. Run:

    mailsync --help

to see what logging options mailsync offers. The different log levels
are described in the c-client documentation "internal.txt" under mm_log.



4.2 Various messages from mailsync
----------------------------------

When running mailsync with the "-m" you'll can see what messages mailsync
is copying forth and back. Below is an example session with an explanation
of the actions that mailsync is taking:

    $ mailsync -vw -m inbox
    Synchronizing stores "local-inbox" <-> "remote-inbox"...
    Authorizing against {[192.168.1.2]/imap}
    
     *** INBOX ***
    
      deleted    >  Wed Jun  9 tony kwami             FROM MR TONY KWAMI.
      duplicate     Wed Jun  9 Some Hero              Re: comment
    1 duplicate for deletion in INBOX/remote-inbox
    Authorizing against {[192.168.1.2]/imap}
      ign. del   <  Wed Jun  9 Mail Delivery System   Mail failure - no recipient a
      copied     <- Wed Jun  9 Debian Bug Tracking Sy Bug#253290 acknowledged by de
    
    1 copied remote-inbox->local-inbox.
    1 deleted on remote-inbox.
    69 messages remain in INBOX
    Expunged 2 mail(s) in store remote-inbox

Above you can see that:

* a spam from some "tony kwami" was deleted on the remote mailbox, since I have
  deleted it locally

* "Some Hero" sent me the same message twice - probably he sent it to me
   and to some mailing list in the "Cc:". Since I'm using msg-id algorithm,
   the duplicate is detected and automatically deleted.

* copying a message from a mail daemon was not done, since it is marked as
  deleted in the remote mailbox and mailsync by default doesn't copy
  deleted mails

* a mail from the Debian Bug Tracking System was copyied to the local mailbox.



5. Limitations
--------------

Mailsync assumes that the message-id is a global unique identifier.
If it isn't, then it won't work.  Here are some situations where the
message-id assumption doesn't quite hold.

1. Two different messages in the same mailbox with the same
   message-id.  Mailsync will delete the one that comes later in the
   mailbox.

2. Two different messages in different mailboxes with the same
   message-id.  Mailsync will never notice; everything works.

3. A message with no message-id.  Mailsync prints a warning and leaves 
   the message alone.  It will never be copied to any other store, nor
   deleted for any reason.

4. A message that resides on both stores which is then edited on one
   store.  The edit will not be propagated to the other store.  A
   workaround is to edit the message-id when you edit the message.
   This will be interpreted by mailsync as a delete and a new message,
   and everything will work.

Another limitation is that when a message is moved from one mailbox to 
another, mailsync interprets this by a deletion from one box and a new
message in the other.  This simplifies the algorithm a lot, but wastes 
some network bandwidth if you move large messages around between
syncs.  Workaround: try to put large messages in their final resting
place before you run mailsync.



5.1 Problems with concurrent mailbox access
-------------------------------------------

Mailsync is not coded with concurrent access in mind. It shouldn't
break in a horrible way, since it tries to double-check before it
deletes something, but it is not really fit for the task either.
Efforts to make it concurrency capable are certainly appreciated.



5.2 Problems wit "New" messages (Mutt)
--------------------------------------

The mail header line "Status: O" is interpreted differently depending
on which mail client you're using. Pine and it's mailbox manipulation
library c-client assume, that a new message is new as long as you
have not read it.

Mutt and possibly some other clients interpet only a mail with
"Status: " as unseen. If you want mutt to see all "new" messages as
"new", then you must use the "-n" (do not delete messages) option of
mailsync. This will have the effect of not deleting any messages, not
even duplicates on either store you're synchronzing.

Explanation:

c-client's author says the following:

"The definition of "Recent" [that is "Status: " - tpo] is that the
 message has arrived since the last time the mailbox was opened
 readwrite."
 
Strangely enough appending new messages to a mailbox can be done without
opening a mailbox read-write. However any other operation that change
anything in a mailbox such as tagging a message as deleted or expunging
old messages needs to be done in read-write mode. As soon as that happens
*all* messages will be tagged with "Status: O" and hence loose their
"Recent" status. That means that if we want to be able to maintain that
status, we are not allowed to tag messages as deleted, to expunge messages
or similar.



5.3 Empty Message-ID headers
----------------------------

Some mail software produces messages with no or empty Message-ID headers in
particular for draft or unsent messages and in particular Microsoft Outlook.
That means that mailsync will not be able to uniquely identify such a message
using the Message-ID header. If you need to synchronize such messages you
should use the md5 message id algorithm. You can achieve this by passing
"-t md5" to mailsync.

Mailsync deletes duplicate messages by default - assuming that they're the
same. However if it finds multiple messages with empty Message-IDs it will
display the message:

        "Not deleting duplicate message with empty Message-ID - see README"

and leave that message alone, supposing that this is just another email
produced by weird software.



6. More about the algorithm
---------------------------

The sync operation is symmetric---it doesn't matter which store you
specify first.  This is fundamentally different from the IMAP
disconnected-mode mail reading model, where there is a primary store,
and a client is responsible for synchronizing its local copy with the
primary store.  There is no limitation to the number of stores you
sync with each other.  Think of the stores as nodes of a graph, and
put links between the stores you want to sync between.  If there are
no cycles, then everything will work.  If there are cycles, some weird
things can happen that I haven't totally worked out yet: I think that
a message could be passed back as a new message to a store which
deleted it, for example.

Since both stores can be local, and in different formats, you can also
use mailsync as a method for keeping all of your messages accessible
by two MUAs, even if neither is IMAP-aware.  So mailsync could help
you transition between MUAs.



6.1 Message id algorithms
-------------------------
By default mailsync uses the Message-ID header to uniquely identify email
messages. In case this is not possible (see 5.2) mailsync can use a md5
hash of a number of mailheaders (defined in msgid.c) to identify a message.



7. History
----------

Mailsync-1 was a bunch of shell scripts.  Mail was parsed using awk
and transferred using ftp.

Mailsync-2 was a Java program that suffered from the Second System
Effect.  It was never completed.

Mailsync-3 was written in C over the c-client library.  I used it for
over a year without any surprises.

Mailsync-4 is a C++ program based on mailsync-3.  I was able to remove 
a lot of cruft and make things a little more sensible, safe, and
efficient. 

Mailsync-4.2 and on has seen extended documentation and some
reengineering efforts to make it easier to read, patch and understand.



8. Further reading
------------------

RFC 2076 - Common Internet Message Headers
RFC 1176 - IMAP v2
RFC 1203 - IMAP v3
RFC 2060 - Internet Message Access Protocol - Version 4rev1
         - being updated: http://www.imc.org/draft-crispin-imapv
RFC 2342 - NAMESPACES (discovery of where my "*&/%* mailboxes are)
http://www.iana.org/assignments/imap4-capabilities
http://www.ics.uci.edu/~mh/book/mh/tocs/jump.htm



9. Support, development, new versions and homepage
--------------------------------------------------

Mailsync's homepage is at http://sourceforge.net/projects/mailsync.
There's also a CVS repository there in which mailsync development
happens. Announcements about new versions are usually posted on
http://freshmeat.net. If you need help or have a patch or
suggestion to contribute then subscribe to the mailsync mailing
list and post it there. Instructions on how to do so can be found
on mailsyncs homepage. Tomas Pospisek will also provide commercial
support if needed (see below).



9.1 Developing mailsync
-----------------------

If you make changes then:

* change the version in configure.ac
* log the change in debian/changelog
  * don't forget to use the same version as in configure.ac
* run `autoreconf -fi`



10. Author, disclaimer, etc.
---------------------------

Mailsync's author is Tim Culver <fullcity@sourceforge.net>.  
Since version 4.2 Tomas Pospisek <tpo_deb@sourcepole.ch> maintains
the code.  Mailsync is copylefted under the GNU General Public
License: http://www.gnu.org/copyleft/gpl.html
Linking with openssl is explicitly permitted!