From: Jenny B. <sk...@gm...> - 2008-04-15 20:08:09
|
On Wed, Apr 9, 2008 at 1:02 AM, Jacob Kjome <ho...@vi...> wrote: > You should avoid the direct use of implementation classes. Go through standard > API's. And if you put xalan-2.7.1.jar and serializer.jar (and, I suggest, > xercesImpl-2.9.1.jar) in the classpath, you will end up using the very latest > implementations (better than the buggy versions that ship with the JVM). > > //Using String writer for output for convenience. > //Usually better to use an OutputStream. > StringWriter sw = new StringWriter(); > > > //JAXP Transformer API > Transformer t = TransformerFactory.newInstance().newTransformer(); > > //for HTML output > t.setOutputProperty(OutputKeys.METHOD, "html"); > t.setOutputProperty(OutputKeys.MEDIA_TYPE, "text/html"); > t.setOutputProperty(OutputKeys.ENCODING, "ISO-8859-1"); Thanks for the tip there. That approach is working for me now. I'm running into a quirk of how it's output some things though, that I'm not sure how to interpret. (Background: I've got a lot of Java servlet and web programming experience, but less with xml and the various versions of xhtml and related specifications. So I'm a bit lost on what to expect of the browser from this.) I have a dom document. I've passed it through JTidy and NekoHTML for cleanup, and the result is pretty nice. However, in the original html I was parsing, there were some situations like this: <P>Some text goes <strong></strong> here making a paragraph.</P> When that's coming back out of the serializer, it's coming out as this, which Firefox chokes on: <P>Some text goes <strong/> here making a paragraph.</P> Likewise for <textarea /> and some other tags - Firefox rendering gets completely thrown off when it encounters a few certain tags in empty-tag XML style rather than html style. The code I'm using to set up the transformer for output is this: transformer.setOutputProperty(OutputKeys.METHOD, "html"); transformer.setOutputProperty(OutputKeys.MEDIA_TYPE, "text/html"); transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8"); transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no"); transformer.transform(new DOMSource(domDocument), new StreamResult(stringWriter)); So I'm not sure why I'm getting things that look like XML when the tag is empty. I'm using Xalan 2.7.1 and Xerces 2.9.1, and this is a small enough code base I'm pretty sure there are no jar conflicts sneaking in old versions. Rather I suspect I'm misunderstanding something about the serialization process or xml / html specifications. Thanks for any help you can provide. Jenny Brown |