Menu

Archiving MS Exchange

Help
James
2007-07-02
2013-05-20
  • James

    James - 2007-07-02

    Hello,
    I ran across this program while looking for a way to archive some Microsoft Exchange public folder postings (emails with attachments).  I'm hoping I can make this work, but I'm having trouble just now.

    I've got XENA installed (WinXP Pro w/ SP2, Java 1.6.0_01).  I've taken the relevant posts from the public folder to a PST file via the export tool in Outlook.  (Outlook tools don't really do much else; the other export formats drop the attachments.)  I load the file into XENA, but hitting Normalise (with normalise option "guess type for all files") throws the error "au.gov.naa.digipres.xena.kernel.XenaInputSource cannot be cast to au.gov.naa.digipres.xena.kernel.MultiInputSource".  (I can post the full trace if you'd like.)  The other normalisation option (binary only) does work, but that just appears to encode the entire PST file, not the individual items therein.

    Am I using this correctly?  If not, any suggestions on what I might try instead?

     
    • Michael Carden

      Michael Carden - 2007-07-02

      G'day James.

      Can I check that it's the Xena 4.0 preview that you're using? If any attachments are MS Office files, you'll also need OpenOffice 2 installed. In addition, in the Xena menu: Tools -> Plugin Preferences -> Email : make sure you set the path to readpst.exe (which will be in the /winx86 folder of the Xena 4.0 preview download).

      And yes, a stack trace would be most useful.

      Thanks,

      Michael Carden

       
    • James

      James - 2007-07-03

      Hi Michael,

      Thanks for replying.  It seems the issue has changed a little this morning:  it's kinda working -- no cast error -- but the outcome isn't much better, I'm sorry to say.  First, to answer your questions:
      - Yes, it's the 4.0 preview -- or so I'm thinking.  The download file name is xena4pre and the date of xena.jar is 5/30/2007, but everything else in the interface -- the title bar, about panel -- says Xena 3.0 Lite.
      - None of the attachments (in _this_ PST file at least) are Office files; however, that might not always be the case, so I'll be sure to set that preference.
      - I had previously found and set the path for readpst.exe.

      Regretably, I've already deleted the log file from yesterday with the trace.  Sorry.  I've tried to recreate the error this morning (including blowing away the installation completely, including the registry entries), but to no avail.  Instead, I'm getting a slightly different outcome.

      Now, instead of failing, the file gets a successful normalisation*, the the .xena file is empty.  The results pane shows good stuff -- Guessed Type = Pst Outlook, Normaliser = Email -- but the XML is just this:
      <?xml version="1.0" encoding="UTF-8"?>
      <package:package xmlns:package="http://preservation.naa.gov.au/package/1.0">
      <package:meta>
        <naa:wrapper xmlns:naa="http://preservation.naa.gov.au/naa/1.0">NAA Package</naa:wrapper>
        <dcterms:created xmlns:dcterms="http://purl.org/dc/terms/">2007-07-03T11:39:47</dcterms:created>
        <dc:identifier xmlns:dc="http://purl.org/dc/elements/1.1/">8b9dd088468a97cf</dc:identifier>
        <naa:datasources xmlns:naa="http://preservation.naa.gov.au/naa/1.0">
         <naa:datasource>
          <naa:last-modified>2007-07-03T11:15:15</naa:last-modified>
          <dc:source xmlns:dc="http://purl.org/dc/elements/1.1/">file:/fspost200701proc.pst</dc:source>
         </naa:datasource>
        </naa:datasources>
      </package:meta>
      <package:content>
        <mailbox:mailbox xmlns:mailbox="http://preservation.naa.gov.au/mailbox/1.0"/>
      </package:content>
      </package:package>

      I've tried archiving a couple other PST files as well, just in case I had a bad one, but the results are the same.

      Any thoughts, or things I might look at or try?  I hope we can figure it out; this seems like a very promising application for projects that I need/want to backup.

      Thanks again for the reply.  Have a great day.
      james

      * I can't tell you how much trouble I have spelling this word with an "s".  The en-US version has a "z" there instead.  Oh well, just another quick from the land up over. :-)

       
      • Michael Carden

        Michael Carden - 2007-07-03

        James,

        You certainly have found some odd behaviour here. Just to make sure that I haven't left out anything important, I normalised a couple of pst files this morning and all behaved quite well.

        So I have two ideas in mind.

        First, if you have a pst sample that you don't mind us looking at (nothing confidential etc), please zip it up and email it to us so that we can see how it behaves.

        Second, could you start Xena from a command prompt:

        java -jar xena.jar

        ...then go through the normalising process with a pst that fails. Once it's done, grab a java stack trace by pressing ctrl-break in the command prompt window and:

        1. Right click the icon at the left hand end of the command prompt window's title bar.
        2. Select Edit -> Select All
        3. Press Enter
        4. Open Notepad
        5. Paste
        6. Save As <filename.txt>

        Log a bug at http://sourceforge.net/tracker/?group_id=85722&atid=577089 and attach the file(s).

        Anything you can send us will help us sort it out for you.

        Thanks,
        MC

         
        • James

          James - 2007-07-05

          Alright, I've done as requested and created a posting with relevant data on this issue (https://sourceforge.net/tracker/index.php?func=detail&aid=1748614&group_id=85722&atid=577089).  I'm sorry, but I had to double vombatus' bug posting from yesterday (https://sourceforge.net/tracker/index.php?func=detail&aid=1748125&group_id=85722&atid=577089); there doesn't appear to be a way to attach files once created.  I think vombatus has it right, tho'; in Googling around, it seems that readpst having trouble with 2003 is a known issue.  This might be bad.

          In any case, I've attached a file with trace logs, input files (I had to mock something up, but I built it from the actual items I've trying to archive), and the XENA output files.

          Additionally, as a test, I tried creating a PST in the older format, but I'm not sure it worked right.  OL doesn't have a means to directly export in that format, so I created a new data file in the older format and copied the relevant posts over to it.  Processing through XENA didn't give me any better result, so I'm not sure if OL is actually using the older format or just creating a "compatible" file.

          I have to run to a meeting now, but please let me know if there's anything more I can provide to help sort this issue out.

          Thanks again for all the help.
          james

           
    • Nobody/Anonymous

      I appreciate the help and the suggestions, Michael.  I'll give them a go, and try to pull together some samples and debug information for you when I'm back in the office tomorrow.

      Out of curiosity, would the version of Outlook (or more specifically, the type of PST being created) be a factor?  IIRC, Outlook 2003 onward has a somewhat different PST format than previous versions.  When creating my PST files for XENA, I've been using the current format (I've got Outlook 2003); I haven't tried it with the older format.  Perhaps that will give different results.  I'll give that a try as well and let you know what I find out.

      Thanks again for the assistance,
      james

       
    • John

      John - 2007-07-05

      I have posted this as a bug https://sourceforge.net/tracker/index.php?func=detail&aid=1748125&group_id=85722&atid=577089

      Looks like the readpst.exe cannot deal with Outlook 2003 psts

       
    • James

      James - 2007-07-13

      I think I've got a work-around -- actually two -- if anyone's interested.  Both are a bit manual and, of course, may not work for everyone, but they seemed to do the trick for me.  (And since it'll probably be a while before readpst gets updated to handle Outlook 2003... well, this is better than nothing.)

      The first work-around doesn't require any extra software.  Xena can correctly guess and process .msg ("Outlook Message Format") files from Outlook.  Therefore, you can go through your Outlook folder(s) and save all your message(s) or post(s) as .msg files; you might want to use the Unicode .msg type to better preserve all text characters.  Unfortunately, you have to do them one by one -- the only Save As file type available with multiple items selected appears to be text.  But after that, start up Xena, add your .msg files to a session and presto, you've got .xena files of your messages.
      Pluses:  Straightforward, no extra software needed, easy enough a cross-eyed wallaby could do it.
      Minuses:  It's tedious.

      The second way is a little easier, kinda, but not quite as simple and has a slightly lower chance of working for everyone (depending on Exchange/mail server configuration).  Using Mozilla Thunderbird, or possibly another email client, it's possible to connect to Exchange and access your mailbox.  The bonus here is that Thunderbird stores messages in a more common mbox format, which Xena can understand.  So for this option, install Thunderbird, connect to your mailbox (Tools > Import), download your messages/posts, and exit Thunderbird.  Then in Xena, load the physical mailbox file (\Documents and Settings\Username\Application Data\Thunderbird\Profiles\{something}.default\Mail -- from here you may have to look at bit, but it's probably under Local Folders and/or possibly a few additional folders, depending on how you imported your mail and your Outlook/Exchange folder structure) and you should be able to normalize it.  Please note, the file WON'T have a .mbox extension, but it is in the proper format.  (If you wanted to be safe, you could make a copy of the file and/or add the extension.)
      Pluses:  No need to save off every message.
      Minuses:  If you're not a geek, it's a lot harder.

      The second option is more closely like processing the PST file, but obviously you'll need to be a little more savvy about about moving around your hard drive.  I've tried both methods and either works just fine.  (Additionally, if you go the Thunderbird route, you could save the messages -- again, one by one -- out of it into .eml files.  Eml is a text-based, RFC 2822 compliant format that can be opened in Outlook Express and other email clients.)

      It would have been exceptionally cool if I could have just processed the PST, but the end result is the important thing.  I glad I'm able to put this fine tool to work for me.

      Thanks to everyone for the help and information on this issue.
      james

       
      • Michael Carden

        Michael Carden - 2007-07-16

        Thanks for that James.

        We did explore solutions that involve having access to Outlook but for our purposes we can't have a piece of proprietary software as a dependency. Our hope had also been that Thunderbird might be able to parse pst all by itself, but it can't and again for our purposes, we can't assume access to the exchange server hosting mail that's to be preserved. So it's great to hear that you have some useful workarounds that fit your needs, but we can't really employ them.

        What we're hoping is that the current pst format is documented and published so that we, or the original author of readpst can get on with the job of writing a parser for it.

        Meantime, it looks like we're stuck with no xena-fying of post-2003 pst files.

        --
        MC

         
        • James

          James - 2007-07-17

          I'm sorry, I don't think I was clear about the role of Exchange (or even Outlook) in my Thunderbird "solution."  There is no dependency of either software.

          Here's my setup/situation:  I'm in a corporate environment, Outlook 2003 is the email client, Exchange is the email server.  (We're a Microsoft shop; not necessarily my choice, but since "CIO" doesn't follow my name, what can you do?)  The email & public folder postings are going to a company with an unknown setup, and so I too needed something that wasn't dependent on a specific environment, software or format.  The current readpst program can't deal with the Outlook 2003 format of the PST files (nor, apparently, the "compatible" PSTs that the client supposedly creates), so with that avenue closed, I was looking to "move" or otherwise access the messages in a manner that Xena could handle.  Enter Thunderbird.

          The principle caveat here is that to use Thunderbird with Exchange server, the server must have either POP3 or IMAP protocols enabled.  This isn't always the case; no surprise, MS favors their own mail protocols whenever possible.  However, if POP3 or IMAP are enabled on Exchange -- fortunately, we had both -- Thunderbird can be used to access the mailbox and folders available to the user.  Thunderbird puts the messages into a standard mbox format, which is then parseable by Xena.  Exchange isn't needed once the messages are in Thunderbird, and Outlook doesn't enter the picture as it's not being used as the email client.  There is no dependency on proprietary MS software or formats; at this point it's all OS.

          Obviously it would be far nicer if readpst were updated to parse the newer PST format.  Then this middle step of getting the messages out of a proprietary MS format could be avoided.  In any case, the Thunderbird step done, Xena can parse the mailbox and output the messages into the Xena XML format.  They show up just fine in the Xena viewer, and so far there's been no issue with OpenOffice opening or displaying any of the messages' attachments.

          Again, there is no Microsoft dependency.  I hate to sound like a broken record, but this key point was apparently missed.  I'm sorry I wasn't more explicit in my initial explanation, but my workaround post was already getting pretty wordy.  I hope this clears up any confusion.

          Thanks again for your assistance in all this.
          james

           
          • Justin Waddell

            Justin Waddell - 2007-07-18

            Hi,

            I'm a developer working on Xena at the NAA. Michael is away at the moment so I thought I'd reply this time!

            We cannot control the format of the files that we will receive from various agencies. This means that we may receive exports from Exchange or Outlook in the 2003 PST format on a set of CDs or a hard drive etc, without any way of referencing the original Exchange server. From our point of view it would be fantastic if all agencies used Thunderbird to export all their mail into the mbox format, however this is extremely unlikely. We have to assume that we will receive PST-2003 files at some stage, and we will need to normalise them using only open source software.

            Still, I imagine that our use case is unusual - most users of Xena would have access to the original Exchange server - so it is nice to know that there is a reasonably simple solution to the PST-2003 problem.

            Justin

             
            • James

              James - 2007-07-18

              Hi Justin, (great name, by the way -- same as my brother)

              Your explanation definitely clears things up.  It sounds like we're on opposite sides of the equation -- in terms of use, that is, not philosophy.  In your scenario, I guess I'd be the agency sending the information.  As such, there is the additional perk of having more immediate access to the source of items being archived; just getting a random PST file decidedly limits your options.

              Unfortunately, it seems that absent an updated version of readpst, your (or anyone's) only choice in processing a rogue PST file is by using Outlook.  There you could open the PST data file and save off the messages one by one for processing by Xena (the "tedious" option in my workaround post).  But that, obviously, requires a copy of Outlook, which makes for a proprietary dependency.  The upside is that you only need Outlook to output the messages to .msg files; after that it's not required.

              Thanks for the reply; your explanation was enlightening and helped me understand the NAA perspective on this subject.  And hopefully there's a new readpst coming along that will make all this moot.

              Cheers,
              james

               

Log in to post a comment.