From: SourceForge.net <no...@so...> - 2010-04-03 14:25:01
|
The following forum message was posted by rajorshi at http://sourceforge.net/projects/jtidy/forums/forum/41437/topic/3658377: Hello, I had posted this on jtidy-user mailing list but got no response :( , so trying here... I\'m trying to use jtidy to format/cleanup some HTML contained in a Java String. What I see is that often, spaces are lost. For instance, suppose the markup is <span style=\"color:red;\">hello</span><span style=\"color:blue\"> world</span> The space (not nbsp, but it\'s rendered by browsers and mail clients nevertheless) is lost, and it transforms into: <span style=\"color:red;\">hello</span><span style=\"color:blue\">world</span> And hence shows up in a browser as \"helloworld\" instead of \"hello world\". The following is my code. Am I doing something obviously wrong here? Code: InputStream is = new ByteArrayInputStream(rawHtml.getBytes(\"utf-8\")); Tidy tidy = new Tidy(); tidy.setInputEncoding(\"utf-8\"); ByteArrayOutputStream baos = new ByteArrayOutputStream(); tidy.parseDOM(is, baos); String pure = baos.toString(\"utf-8\"); Thanks in advance! Raj |