|
From: Tim C. <ful...@do...> - 2002-09-10 15:28:45
|
Hi all, (Please use my spam-protected address "ful...@do..." on the list.) On Tue, 10 Sep 2002, Tomas Pospisek wrote: > I went through the forums and deleted those that were obsolete. In the one > forum that remains are some open questions. So I'll close the forum as > soon as they're cleared. Thanks! > > > * why does mailsync need a procedure (simplify_message_id) that is > > > removing spaces in messages at all? AFAI can see a message id is defined > > > by the regular expression: > > > > > > ^message-id: .* > > > > > > So to manipulate a message I just have to process line by line, strip > > > the string "^message-id:" from them if necessary (it wouldn't be > > > necessary if we'd be reading it from a msinfo file) and take the rest, > > > whatever it's composed of, as the message id. > > > > This is important. The .msinfo parser is very stupid and relies on > > whitespace to separate message-ids, so message-ids can't be allowed to > > have spaces. Since they are compared to each other, it's crucial that if > > I modify message-ids, I always modify them in exactly the same way. > > Yeah, I've seen that. It's a very creative hack ;-) A hack, at least. > > Eventually we are going to have to change the .msinfo format. When we do, > > we should probably use... (wait for it...) XML. I like the fact that it > > has such a small number of magic characters (just <, >, &, is that right?) > > which is important for us since message-ids have all manner of > > punctuation. > > Yeah - on the other side, the msinfo format is trivial enough, I do not > think we need the syntactic candy if the structure is _that_ simple. You're right. I think the only thing wrong with the config file is that it might be difficult to specify folders with interesting punctuation. > Additionaly since message-ids contain a lot of <,>,"'s we'll have to > escape them. Yuck!! I'm hoping it's just three characters to escape: <, >, &. In fact we could strip off the standard <, > from the message-id! s/^<|>$//g. Note for the record: I'm talking about the hypothetical XML format here. > Have you ever tried to read XML config files? IMHO it's not easy to read. > So wrt to msinfo I'm against XML. OK. > message-ids can't start with a space. So we could use starting spaces to > determine what a message-id is. I like that idea! > Since IMAP is so "flexible" I'm not sure which character can _not_ be > contained in a mailbox name. In case there is one such character, then we > could use it as a syntactic element. F.ex.: > > private/tim > <Pin...@pa...> Yes, I don't know whether such a character exists. Perhaps NUL is the only one! > If we decide to put all message-ids into <> (a fabulous idea). Then we can > define whatever we want later. F.ex.: > > private/tim > mid: <Pin...@pa...> > md5: 1234456 > md5, mid: <Pin...@pa...>, 983745 > > etc. Yes, that's good. > I think the config parser is the lesser evil now. Mailsync has quite some > bugs. I want to kill those first. Then it'd be nice to get the code into > some shape (a global variable can pop up at any line right now in > the code ;-) An excellent idea. One kind of global variable is the container used to collect results from c-client callbacks. These could be made static variables in a separate c-client adaptor file. > AFAIK (I'll have to check in the code to verify this) the whole header is > downloaded anyway, so it doesn't hurt if we compare md5's as well. It won't hurt performance, true, I'm just worried that the algorithm might get a little too complicated. > True, we can make it optional and/or warn the user if mailsync finds two > identical message-ids with differing headers. > > > The more I think about it, the less I like md5ing for a unique identifier. > > If we don't get the header selection heuristic exactly right, we won't be > > able to fix it without invalidating everybody's msinfo. > > Ack. Well, we can alleviate this by versioning the header-selection algorithm: <mailbox> <name>mail/foo</name> <message> <md5 version="1">325252ae267a</md5> <message-id>my_lame_client<&>@patroclus.doppke.com</message-id> </message> ... </mailbox> (I assume if we did <mailbox name="mail/foo"> we'd just have one more character to escape in mailbox names.) > > How about incorporating some functionality to add message-ids where > > necessary? The new message-ids could be computed by an md5, so that > > duplicate messages will tend to get the same message-ids, and when the md5 > > heuristic fails (some header changed in transit), the result will be no > > worse than duplicate messages. > > Do you want to retransmit the message with a newly inserted message id or > what are you thinking about here? Unfortunately, imap would require retransmitting the message in this scenario. :-( > > Seriously: the < may be required by the RFC, I can't remember. Anyway, > > can the clean_message_id() function just add < > when necessary? Not many > > mailsync users can be depending on its buggy behavior here. > > Yes, I think we could do that. Good idea. For the record, we're talking about *adding* < > to the current msinfo format, while *removing* them from the hypothetical future XML format. > > Possibly for the < > problem, we won't have to change the format. But > > eventually we will, and it won't be a big deal. Just tag the new format, > > allow mailsync to permanently read both formats, but only write the new > > one. Msinfo is always rewritten from scratch each time anyway. > > I'll have to look at the code to find where it's rewritten. In such > a case we'd have less problems (well, downgrading mailsync wouldn't be > possible, but in case we get it right, who cares to do that anyway). Microsoft Word supports downgrading, I think. :-) Tim |