Importing emails from mbox

2004-06-23
2004-07-01
  • Ron Goodwin

    Ron Goodwin - 2004-06-23

    You mention that MMImport has been added for that purpose.  How is it used?  I have a 7MB mbx file of the old support emails waiting to become a knowledgebase within MM.

     
    • James Henderson

      James Henderson - 2004-06-24

      There seem to be at least three different formats with the extension "mbx".  If your file is from Outlook or (PC-)Pine then that format is not supported by the MMInport script.  If you could send me a sample to look at, or just tell me exactly how the file was created, I may be able to advise on the best course of action.

      If your mbx file is from Eudora you're probably in luck because it sounds as though it's close enough to the Unix mbox format.  Try it anyway - here are the instructions.

      First, I would recommend that you use the latest CVS version of the MMImport script (I can send it to you if that's easier) especially if you're on Windows, because the current version doesn't open files in binary mode and risks corrupting them.

      Zope must be running when you run the script.  To run it enter a command something like this:

      python MMImport.py -u http://localhost:8080/mail test@example.com test.mbx

      (Note that if you do use the current release version you must put "-a" before the account name:

      python MMImport.py -u http://localhost:8080/mail -a test@example.com test.mbx
      )

      If you need more specific help you'll need to tell me what operating system you're using.  Hope this helps.

       
      • Ron Goodwin

        Ron Goodwin - 2004-06-24

        I am using a Eudora mbx file.  My test server is on Windows and I am running the 1.0 release. My live server is on Linux.

        I tried the example and for each email in the file received an error: message could not be imported.

        Is the CVS version of MMImport newer than the 1.0 rel version?  If so, I would appreciate you sending me a copy.  Thanks

         
      • Ron Goodwin

        Ron Goodwin - 2004-06-30

        I have noticed one problem with the importing of some emails.  When imported there is no body text, only a line indicating an empty message, but there is an attachment which appears to contain the missing body.  In some of the cases I have looked at the emails have been composed in MSWord 11 and the body is in X-HTML.  However, when sent normally via the email server they are displayed correctly.  Any ideas?

         
    • James Henderson

      James Henderson - 2004-06-24

      The CVS version is newer - I only looked at the script after reading your posting.  I've sent a copy to your SourceForge e-mail address.

      However, I realize that the script is only likely to give an error message when it has the wrong URL for MailManager or thw wrong account address; hideously mutilated imports will still be reported as a success.  This is certainly a shortcoming in the script that I will address when I have a moment.  In the meantime could you check that you can browse to the URL you passed on the command line and that you have the right e-mail address for the account that you want to add the messages to?

       
    • Ron Goodwin

      Ron Goodwin - 2004-06-28

      Thanks James, the script worked a treat.  One suggestion for a future increment would be the ability to set a status on import.  I was importing 2400 old support emails which forms part of a knowledgbase and user history.

       
    • James Henderson

      James Henderson - 2004-06-30

      At first I thought this might be a case of messages that could not be parsed still being reported as successes, the problem I mentioned above.  This has now been addressed, see
      http://tinyurl.com/2ufze.

      However, reading more carefully I don't think it is an importing problem.  What you describe happens when a mailer sends you a message in an extended text format, such as HTML, with no plain text alternative, something which is very bad practice (but all Microsoft mail clients are notorious for having the least respect for standards).  This may be either a multipart message with a single HTML subpart (most of my spam is like this) or a non-multipart message with content type "plain/html" (I had some junk from Yahoo like this the other day).

      If this is what is happening then MailManager is behaving as expected.  The different result when you send the message "normally" is probably due to your mail client fixing up the outgoing message by adding a plain text alternative.

      If this doesn't sound likewhat is happening perhaps you could send me the raw text of a message (use the "export" link in MailManager) both as it is when imported and when sent through the internet.

      If you don't like MailManager's handling of such ill-formed messages open an RFE.  An alternative I have in mind is to show HTML messages by default, but that won't be done till we've implemented RFE  859106 "Display HTML messages safely".  (This RFE gives a brief rationale for the current behaviour.)

       
      • James Henderson

        James Henderson - 2004-07-01

        Thank you for sending me the samples.  I am happy that the one you imported eventually arrived in MailManager just as it was when it first arrived in you mail server.

        It was, however, so badly formed in the first place by Microsoft Exchange that I'm actually quite proud of how MailManager handles it. :)  It has a content-type of "multipart-alternative".  This means that the message should have several subparts, each containing the same content in a different format. This email though has no subparts at all!  (Subparts should be surrounded by the boundary string specified in the first content-type header and each have their own content-type header, unless they are plain text.)   There is just some XML text in a Microsoft schema with no further content-type header to say what it is.

        Since there is no plain text part MailManager indicates that there is no body to display (I accept that saying the message is empty is misleading - I'll change the text) and that there is a part of the message with type "multipart-alternative".  It even gives a helpful title to this other part.  I note with pleasure that KMail, a very standards-compliant mail client, handles this message in the same way, right down to the title it gives it.  Pine, another standards-compliant browser, just says "malformed message" and refuses to show you any of the content at all!  (I was amused to notice just now that I have prefaced the code that handles this kind of message, which starts at line 183 in Message.py, with the comment "This shouldn't often happen.")

        When you redirected this message using Eudora it was fixed up by having the content-type changed from "multipart-alternative" to "text/html".  (The Microsoft-specific markup was also removed from the HTML.)  I would think that strictly a redirect should not change the message, but in a world where the most popular e-mail software creates such abominations I suppose you have to be pragmatic.

         

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks