I'm unable to get the unicode characters. My code looks as follows:
// set HTMLCleaner, clean and create w3c document
HtmlCleaner htmlCleaner = new HtmlCleaner();
CleanerProperties htmlCleanerProperties = htmlCleaner.getProperties();
htmlCleanerProperties.setTranslateSpecialEntities(true);
htmlCleanerProperties.setAllowHtmlInsideAttributes(true);
htmlCleanerProperties.setAllowMultiWordAttributes(true);
htmlCleanerProperties.setRecognizeUnicodeChars(true);
htmlCleanerProperties.setOmitComments(true);
TagNode root = htmlCleaner.clean(rosterHTML);
// return w3c document
return new DomSerializer(htmlCleanerProperties).createDOM(root);
However, I just came to think of that I'm actually working with a W3C DOM Document.
Could it be that the unicode characters are translated back to special entities then?
J
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Scott,
I'm unable to get the unicode characters. My code looks as follows:
However, I just came to think of that I'm actually working with a W3C DOM Document.
Could it be that the unicode characters are translated back to special entities then?
J