From: Stephen E. <Ste...@it...> - 2004-07-30 12:37:09
|
Tom, The simplest, but least efficient, way of reading messages is to grab the entire RFC822 formatted message and pass it to a MIME::Parser object. You can then loop over the parts of the resultant MIME::Entity. This is exactly the approach that I'm using for a simple nntp client where the messages are vitually guaranteed to only contain a text body with no attachments. A more efficient, and much more complicated, approach is to take advantage of the IMAP server's ability to parse the message and only return requested parts. This is done through the get_bodystructure() method on a Mail::IMAPClient object that returns a Mail::IMAPClient::BodyStructure object. When I was doing this two and a half years ago there was some reason that the methods on the bodystructure object weren't giving me useful data (or more likely I was doing something wrong) so I resorted to manually looping over the internal structure of the object. This meant I had to go through some trial and error to determine when to increment the part number and when to perform a recursive call... Anyway, as the structure is looped over the decision tree looks something like this: * if an arrayref recursive call on each item in the arrayref * else is a hashref of data * if the bodytype is an arrayref then its a multipart/* * if the bodysubtype is 'alternative' then need to loop over the parts in the bodytype arrayref and find the last one that we know how to handle and do a recursive call on it. In my case the options were text/plain, text/html, or /related/. This is because MIME says that you could have (for example) the same message body as text, html and a word document. * else treat it as multipart/mixed and do a recursive call on each item in the bodytype arrayref * if the bodytype is message/* * if the bodysubtype is 'rfc822' then the part is a forwarded message so output the headers (from envelopestruct entry) and do a recursive call on each item in the bodystructure arrayref. * if the bodysubtype is 'delivery-status' then you may want to output it (these are mail server messages) * else standard single parts... * if text/plain and bodydisp is NIL then output a text part * if text/html and bodydisp is NIL then output a html part * if image/gif or image/jpe?g then output an image tag * else output generic attachment link and info such as filename size and encoding. I checked for the disposition of the part (bodydisp value) because the decision was made to only output text or html parts if they were the body of the message, attachments were to get the generic link. Each part of a mime message has a number. For a simple message this could be a simple as having parts with numbers 1, 2 and 3. However when another complete message is contained as a part then you can get numbers such as 1.3, 3.5 and 1.3.2. This part number is then used with the fetch method to get the data for a specific part of the message with can then be decoded: my $head = ($imap->fetch($id, "BODY[$part.MIME]"))[1]; my @body = $imap->fetch($id, "BODY[$part]"); At certain points in the above logic the part number either needs to be incremented or another level added. While I was looking through some documentation about how Squirrelmail (PHP) handled MIME data I found out about the MIME torture test [1] which was a complicated message with various part types including multiple levels of messages. I simply wrote it into my inbox and then use good old trial and error until all the part numbers came out ok. So far we have a page output that contains headers, bodies and links to attachments (either img or a tags). For each of the attachments we know the folder, message uid (it's much easier if you tell Mail::IMAPClient to use the unique message id's instead of sequence id's), part number and maybe a filename. The path I took was to generate a url such as: /email/attachments/document.pdf?folder=INBOX&id=1375&part=1.6 A dhandler (/email/attachments/dhandler) was used so that the attachment filename could be used in the url to make IE and Netscape 4.x use the filename in the save file dialog box. The dhandler ignores the filename, downloads and decodes the appropriate part (keeping the content type of the part from the message), and then prints it out to the browser. I hope this helps ... :) Looking back at the code even six months after I wrote it there are heaps of things that I would do differently now but since there are around 160 thousand email page requests a day (both folder listing and message views for around 20 thousand users) there is no compelling reason to allocate resources to rewrite it. I have been toying with the idea of asking for permission to rewrite it in my own time and release it as open source... Another thing to note about Mail::IMAPClient::BodyStructure is that it creates a Parser::RecDecent parser in order to parse what is returned by the IMAP server. For me this was talking at least three seconds per page load so I had to make sure it was preloaded in handler. However I think that this might have had something to do with this being pre version 1 mason running under an old version of Solaris. Things are so much better since we switched from an E10K domain to a farm of linux boxes for the frontend... I so didn't intent to write so much ... :) Stephen [1] It appears that it is available to download through the kmMail sourceforge site: http://sourceforge.net/project/showfiles.php?group_id=32721 Tom Allison wrote: > I've decided to take on something strange for my "first" project in > HTML::Mason. > > IMAP web interface. > > So far so good, but I don't have any great time table here. > > But I do need to solicit some pointers. > > I need to figure out how to read email. Normally this isn't too hard > until you get to MIME. Long term my only goal is to be able to read > HTML and a variety of image attachments. Everything else would be a > download link only. > > But Perl has many ways of dealing with MIME. > > Suggestions would be appreciated. > > > > ------------------------------------------------------- > This SF.Net email is sponsored by OSTG. Have you noticed the changes on > Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now, > one more big change to announce. We are now OSTG- Open Source Technology > Group. Come see the changes on the new OSTG site. www.ostg.com > _______________________________________________ > Mason-users mailing list > Mas...@li... > https://lists.sourceforge.net/lists/listinfo/mason-users |