|
From: Tomas P. <tp...@so...> - 2002-09-10 09:20:01
|
I'm Cc:ing Bob with this and also the mailing list. I'm inviting you both
to subscribe the mailing list so we can continue there.
On Mon, 9 Sep 2002, Tim Culver wrote:
> > I suggest merging the existing forums into the mailing list and deleting
> > the three forums. I don't know about you, but I don't ever come around
> > monitoring what's going on in the forums. And there's not as much traffic
> > that would justify 3 of them (IMHO). Once they're merged into the mailing
> > list, I'd delete them.
>
> That sounds great. I think the 3 forums was the default setup at the
> time. Forum posts are cc:ed to me, and sometimes I'm lazy and just reply
> by email.
I went through the forums and deleted those that were obsolete. In the one
forum that remains are some open questions. So I'll close the forum as
soon as they're cleared.
> > * why does mailsync need a procedure (simplify_message_id) that is
> > removing spaces in messages at all? AFAI can see a message id is defined
> > by the regular expression:
> >
> > ^message-id: .*
> >
> > So to manipulate a message I just have to process line by line, strip
> > the string "^message-id:" from them if necessary (it wouldn't be
> > necessary if we'd be reading it from a msinfo file) and take the rest,
> > whatever it's composed of, as the message id.
>
> This is important. The .msinfo parser is very stupid and relies on
> whitespace to separate message-ids, so message-ids can't be allowed to
> have spaces. Since they are compared to each other, it's crucial that if
> I modify message-ids, I always modify them in exactly the same way.
Yeah, I've seen that. It's a very creative hack ;-)
> Eventually we are going to have to change the .msinfo format. When we do,
> we should probably use... (wait for it...) XML. I like the fact that it
> has such a small number of magic characters (just <, >, &, is that right?)
> which is important for us since message-ids have all manner of
> punctuation.
Yeah - on the other side, the msinfo format is trivial enough, I do not
think we need the syntactic candy if the structure is _that_ simple.
Additionaly since message-ids contain a lot of <,>,"'s we'll have to
escape them. Yuck!!
> Also I assume there is a decent free parser out there (though I haven't
> looked).
certainly. The most simple XML parser will be 10 lines of C, and the most
complex (being able to parse all the DTD candy etc.) will be.
Have you ever tried to read XML config files? IMHO it's not easy to read.
So wrt to msinfo I'm against XML.
message-ids can't start with a space. So we could use starting spaces to
determine what a message-id is.
Since IMAP is so "flexible" I'm not sure which character can _not_ be
contained in a mailbox name. In case there is one such character, then we
could use it as a syntactic element. F.ex.:
private/tim
<Pin...@pa...>
If we decide to put all message-ids into <> (a fabulous idea). Then we can
define whatever we want later. F.ex.:
private/tim
mid: <Pin...@pa...>
md5: 1234456
md5, mid: <Pin...@pa...>, 983745
etc.
> While we're at it, let's make the config file XML too and get rid of all
> of my cheesy parsing code.
Your parsing code is IMO OK, I've commented it. Now it's pretty readable
and understandable. But getting rid of the msinfo parser would be a good
thing (IMHO).
> > "Store", "Channel" etc.
> > clearly look like objects, but they are structures. This results in some
> > ugly code:
> >
> > struct ConfigItem {
> > int is_Store;
> > Store* store;
> > Channel* channel;
> > ...
> >
> > Is there something that I'm missing that doesn't allow those structures
> > to be real classes (like f.ex. some interdependency between them and the
> > c-client code which I guess is C only).
> > If there's nothing that speaks against it, I'm going to slowly hack the
> > code into object oriented form ;-)
>
> Please do, but be aware that I may have strong opinions about how OO code
> should work. ConfigItem is ugly, but I don't think that store and channel
> have enough in common to require a common base class. The only thing they
> have in common is that they are specified in a config file. If we get rid
> of my cheesy parser, we probably won't need it at all.
I think the config parser is the lesser evil now. Mailsync has quite some
bugs. I want to kill those first. Then it'd be nice to get the code into
some shape (a global variable can pop up at any line right now in
the code ;-)
> > * what does "tdc" mean (as in tdc_mail_list_dest, tdc_mail_list_store,
> > tdc_mail_list etc.)?
>
> Those are my initials. :-) Probably those functions were slight
> variations on things that existed in the c-client example programs. Feel
> free to come up with better names.
I see. I've started some renaming already.
> > * Regarding the message-id vs. md5 proble I'm proposing the following
> > solution:
> >
> > if ( enabled_md5 or message_id_not_available)
> > calculate_compare_and_save_md5
> > if ( enabled_message_id and message_id_available)
> > compare_and_save_message_id
> >
> > What do you think?
>
> I don't like fallbacks. It should use message-id or md5sum at the user's
> discretion. Also, the user should be able to switch back and forth
> without worrying about his msinfo acting corrupted. If md5sum turns out
> to work better after some experience, we'll make it the default.
Yeah. But did you read Bob Lindell's email regarding message ids? It seems
that a) message id's aren't even required by any RFC and b) that there is
software that produces the same message-id for all emails.
AFAIK (I'll have to check in the code to verify this) the whole header is
downloaded anyway, so it doesn't hurt if we compare md5's as well.
True, we can make it optional and/or warn the user if mailsync finds two
identical message-ids with differing headers.
> The more I think about it, the less I like md5ing for a unique identifier.
> If we don't get the header selection heuristic exactly right, we won't be
> able to fix it without invalidating everybody's msinfo.
Ack.
> How about incorporating some functionality to add message-ids where
> necessary? The new message-ids could be computed by an md5, so that
> duplicate messages will tend to get the same message-ids, and when the md5
> heuristic fails (some header changed in transit), the result will be no
> worse than duplicate messages.
Do you want to retransmit the message with a newly inserted message id or
what are you thinking about here?
> > * Regarding the msinfo format. Currently it looks like this:
> >
> > Headers: blabla
> > More_Headers: blabli
> >
> > mailbox/submailbox
> > WHATEVER_MESSAGE_ID
> > mailbox/submailbox2
> > ...
> >
> > You are distinguishing between a mailbox name and a message id this way:
> >
> > if (text[k] != '<') { /* Mailbox name */ }
> > else { /* Message id */ }
> >
> > The problem is, that there are mailclients that don't give a damn
> > about that convention. An excerpt from my msinfo:
> >
> > j-spin
> > TCPSMTP_GEN.12.550@194.235.177.92
> >
> > Let's look at the email message in question:
> >
> > From xx...@go... Sat Aug 30 08:24:23 1997
> > Received: from bbs.datacomm.ch (datacomm.ch [194.148.11.200]) by spin.ch (8.7.5/
> > New SPIN) with SMTP id IAA29983 for <xx...@sp...>; Sat, 30 Aug 1997 08:24:19 GMT
> > X-ROUTED: Sat, 30 Aug 1997 08:23:06 -0200
> > X-TCP-IDENTITY: Edo
> > Received: from octum-880-mhz- [194.235.177.92] by bbs.datacomm.ch with smtp
> > id AIBGBHBL ; Sat, 30 Aug 1997 08:22:22 -0200
> > From: "ed" <xx...@go...>
> > To: <xx...@sp...>
> > Subject: Es ist doch ganz klar...
> > Date: Sat, 30 Aug 1997 08:13:22 +0200
> > X-MSMail-Priority: Normal
> > X-Priority: 3
> > X-Mailer: Microsoft Internet Mail 4.70.1161
> > MIME-Version: 1.0
> > Content-Type: text/plain; charset=Windows-1250
> > Content-Transfer-Encoding: 8bit
> > message-id: TCPSMTP_GEN.12.550@194.235.177.92
> > Status: RO
> > X-Status:
> > X-Keywords:
> > X-UID: 123
>
> See, there's your problem. You must not accept any incoming mail from
> Outlook Express.
Heh :-)
> Seriously: the < may be required by the RFC, I can't remember. Anyway,
> can the clean_message_id() function just add < > when necessary? Not many
> mailsync users can be depending on its buggy behavior here.
Yes, I think we could do that. Good idea.
> > AFAI can see, the only thing we can do here is to define a new msinfo
> > format, release a new mailsync and test for the new format and if it
> > doesn't fit either transform it automaticall (blegh) or tell the user to
> > please transform it by had (using some nice script, that catches some
> > misstakes).
>
> Possibly for the < > problem, we won't have to change the format. But
> eventually we will, and it won't be a big deal. Just tag the new format,
> allow mailsync to permanently read both formats, but only write the new
> one. Msinfo is always rewritten from scratch each time anyway.
I'll have to look at the code to find where it's rewritten. In such
a case we'd have less problems (well, downgrading mailsync wouldn't be
possible, but in case we get it right, who cares to do that anyway).
*t
-----------------------------------------------------------------------
Tomas Pospisek
sourcepole - Linux & Open Source Solutions
http://sourcepole.com
Elestastrasse 18, 7310 Bad Ragaz, Switzerland
Tel:+41 (81) 330 77 13, Fax:+41 (81) 330 77 12
------------------------------------------------------------------------
|