|
From: Bob L. <li...@is...> - 2002-09-12 18:27:50
|
I've posted my md5 digest patch at http://www.isi.edu/~lindell/mailsync/. For the benefit of the list, I'll try and recap the problem I'm trying to solve and the outstanding issues. I encountered a problem synchronzing my draft mail folder. Messages in my draft folder do not have a Message-Id field. After giving the problem some thought, I decided it would be good to have a "strong" digest for each message, rather than relying on the message-id. I made a modification to mailsync to compute the md5 digest over the fields From:, To:, Subject:, Date:, and Message-Id:. I ignore any of these fields that are not present and compute the digest over the remaining fields. Although this method requires some additional computation, it seems to be more reliable than assuming the Message-Id is unique. Instead of the Message-Id, I store the ascii hex representation of the md5 digest in the msinfo message, which is 32 bytes long. Information about the uniqueness of Message Ids: http://cr.yp.to/immhf/thread.html In practice, Message-IDs are not necessarily unique. For example, Internet Mail Service 5.0.1457.3 reportedly copies Message-ID into a bounce message from the message being bounced; and Microsoft Internet Mail reportedly uses the same Message-ID for every message. Furthermore, from a security perspective, an attacker can easily forge a message with a duplicate Message-ID. I'm still not convinced that mailsync should rely exclusively on the Message-Id field, since I believe that RFC 822 doesn't require a message to contain this field. Bernstein's site also mentions: Any message that starts or continues a thread needs a Message-ID. Not all messages contain Message-ID; for example, bounce messages from qmail do not contain Message-ID, and the Bell Labs upas mailer never creates Message-ID. Outstanding issues for my patch: 1) Mail clients may modify some message envelope fields due to character set conversions. This would cause the message to a different digest value. If the mail client modifies the Message Id field, the existing mailsync algorithms would also have difficulties. 2) The current code ignores a msinfo message since it has no Message-Id field. My modifications operate on messages without Message-Id fields, so a different mechanism is needed to exclude the msinfo message from the synchronization process. 3) Using the digest should probably be an option. My patch currently replaces the Message Id with the digest. 4) Currently, mailsync warns users when it sees two messages that have the same Message Id. In a similar vein, mailsync could warn users when two messages have a different digest but have the same Message Id (for messages with Message Id fields). Those messages could be ignored for syncing purposes unless the user asks to force these to be synced (with a new command line option). 5) How would one do a transition from the current Message Id based scheme to the md5 one? Assuming that you can disambiguate a md5 digest string from a Message Id string in the msinfo list, it should be straightforward. If the msinfo previous contained the Message Id field, it is matched against the message during the sync process and written back out as a digest instead. 6) The msinfo list entry should probably be of the form: md5digest DELIMITER_CHAR Message_ID with no spaces between the 3 elements. The DELIMITER_CHAR should be chosen as a character that cannot appear in a Message Id or md5digest. A test for the old or new format would check the 33rd character of the list element. If it is the DELIMITER_CHAR, it is the new format, otherwise it is the old format (e.g. Message Id). Bob |