From: Alister P. <gsp...@gm...> - 2014-08-31 08:40:02
|
Hi, Trying to finish off the mail:get-messages function. The error appears when trying to store an email as xml in the /db. I’ve hit a nasty little bug - nothing to do with eXist - and (surprise) related to mail from Microsoft Outlook - via Apple Mail. I now suspect that Apple Mail is the culprit. I forwarded an email from my Inbox to another account and then retrieved it using mail:get-messages. The original html section is full of <o:p></o:p> tags. These are end-of-paragraph markers inserted by MS Word when creating HTML. In the original email, the prefix and namespace is declared - but in the forwarded message, it is missing - consequently I get an error from SAXParser when trying to store this content in the DB. Is there some way to tell the parser to skip these (empty) elements? Or will I have to write a filter for the text before parsing it? I’m using Wolfgang’s suggestion: DocumentImpl html = ModuleUtils.htmlToXHtml(context, "alternative", new StreamSource(part.getInputStream()), null, null); ElementImpl rootElem = (ElementImpl)html.getDocumentElement(); (Otherwise, mail:get-messages is working quite nicely.) Thanks, Alister. |