We are beginning to look at archiving email and realizing that this is not a simple form of text.

First off, of course, email needn't just be text, it could exist as HTML (or presumably in other formats, but for now, we're comfortable worrying about text in both marked up and plain text forms).

Then there are the threads, recipients (cc, bcc where known), was this forwarded, and attachments.

Has anyone worked on a content model to describe email for archival purposes, even at this simple level? Where might I look for more information?

Many thanks,
Ari