Re: [Mason] IMAP progress

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Tom,

The simplest, but least efficient, way of reading messages is to grab 
the entire RFC822 formatted message and pass it to a MIME::Parser 
object.  You can then loop over the parts of the resultant MIME::Entity. 
  This is exactly the approach that I'm using for a simple nntp client 
where the messages are vitually guaranteed to only contain a text body 
with no attachments.

A more efficient, and much more complicated, approach is to take 
advantage of the IMAP server's ability to parse the message and only 
return requested parts.  This is done through the get_bodystructure() 
method on a Mail::IMAPClient object that returns a 
Mail::IMAPClient::BodyStructure object.

When I was doing this two and a half years ago there was some reason 
that the methods on the bodystructure object weren't giving me useful 
data (or more likely I was doing something wrong) so I resorted to 
manually looping over the internal structure of the object.  This meant 
I had to go through some trial and error to determine when to increment 
the part number and when to perform a recursive call...

Anyway, as the structure is looped over the decision tree looks 
something like this:

* if an arrayref recursive call on each item in the arrayref
* else is a hashref of data
   * if the bodytype is an arrayref then its a multipart/*
     * if the bodysubtype is 'alternative' then need to loop over the
       parts in the bodytype arrayref and find the last one that we
       know how to handle and do a recursive call on it.  In my case
       the options were text/plain, text/html, or /related/.  This is
       because MIME says that you could have (for example) the same
       message body as text, html and a word document.
     * else treat it as multipart/mixed and do a recursive call on
       each item in the bodytype arrayref
   * if the bodytype is message/*
     * if the bodysubtype is 'rfc822' then the part is a forwarded
       message so output the headers (from envelopestruct entry) and
       do a recursive call on each item in the bodystructure arrayref.
     * if the bodysubtype is 'delivery-status' then you may want to
       output it (these are mail server messages)
   * else standard single parts...
     * if text/plain and bodydisp is NIL then output a text part
     * if text/html and bodydisp is NIL then output a html part
     * if image/gif or image/jpe?g then output an image tag
     * else output generic attachment link and info such as filename
       size and encoding.

I checked for the disposition of the part (bodydisp value) because the 
decision was made to only output text or html parts if they were the 
body of the message, attachments were to get the generic link.

Each part of a mime message has a number.  For a simple message this 
could be a simple as having parts with numbers 1, 2 and 3.  However when 
another complete message is contained as a part then you can get numbers 
such as 1.3, 3.5 and 1.3.2.  This part number is then used with the 
fetch method to get the data for a specific part of the message with can 
then be decoded:

   my $head = ($imap->fetch($id, "BODY[$part.MIME]"))[1];
   my @body = $imap->fetch($id, "BODY[$part]");

At certain points in the above logic the part number either needs to be 
incremented or another level added.  While I was looking through some 
documentation about how Squirrelmail (PHP) handled MIME data I found out 
about the MIME torture test [1] which was a complicated message with 
various part types including multiple levels of messages.  I simply 
wrote it into my inbox and then use good old trial and error until all 
the part numbers came out ok.

So far we have a page output that contains headers, bodies and links to 
attachments (either img or a tags).  For each of the attachments we know 
the folder, message uid (it's much easier if you tell Mail::IMAPClient 
to use the unique message id's instead of sequence id's), part number 
and maybe a filename.  The path I took was to generate a url such as:

  /email/attachments/document.pdf?folder=INBOX&id=1375&part=1.6

A dhandler (/email/attachments/dhandler) was used so that the attachment 
filename could be used in the url to make IE and Netscape 4.x use the 
filename in the save file dialog box.  The dhandler ignores the 
filename, downloads and decodes the appropriate part (keeping the 
content type of the part from the message), and then prints it out to 
the browser.

I hope this helps ... :)

Looking back at the code even six months after I wrote it there are 
heaps of things that I would do differently now but since there are 
around 160 thousand email page requests a day (both folder listing and 
message views for around 20 thousand users) there is no compelling 
reason to allocate resources to rewrite it.  I have been toying with the 
idea of asking for permission to rewrite it in my own time and release 
it as open source...

Another thing to note about Mail::IMAPClient::BodyStructure is that it 
creates a Parser::RecDecent parser in order to parse what is returned by 
the IMAP server.  For me this was talking at least three seconds per 
page load so I had to make sure it was preloaded in handler.  However I 
think that this might have had something to do with this being pre 
version 1 mason running under an old version of Solaris.  Things are so 
much better since we switched from an E10K domain to a farm of linux 
boxes for the frontend...

I so didn't intent to write so much ... :)

Stephen

[1] It appears that it is available to download through the kmMail 
sourceforge site:
  http://sourceforge.net/project/showfiles.php?group_id=32721

Tom Allison wrote:

> I've decided to take on something strange for my "first" project in 
> HTML::Mason.
> 
> IMAP web interface.
> 
> So far so good, but I don't have any great time table here.
> 
> But I do need to solicit some pointers.
> 
> I need to figure out how to read email.  Normally this isn't too hard 
> until you get to MIME.  Long term my only goal is to be able to read 
> HTML and a variety of image attachments.  Everything else would be a 
> download link only.
> 
> But Perl has many ways of dealing with MIME.
> 
> Suggestions would be appreciated.
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by OSTG. Have you noticed the changes on
> Linux.com, ITManagersJournal and NewsForge in the past few weeks? Now,
> one more big change to announce. We are now OSTG- Open Source Technology
> Group. Come see the changes on the new OSTG site. www.ostg.com
> _______________________________________________
> Mason-users mailing list
> Mas...@li...
> https://lists.sourceforge.net/lists/listinfo/mason-users