HtmlCleaner / Discussion / Unescaping special characters such as åäö

Unescaping special characters such as åäö

Scott,

I'm unable to get the unicode characters. My code looks as follows:

    // set HTMLCleaner, clean and create w3c document
    HtmlCleaner htmlCleaner = new HtmlCleaner();
    CleanerProperties htmlCleanerProperties = htmlCleaner.getProperties();
    htmlCleanerProperties.setTranslateSpecialEntities(true);
    htmlCleanerProperties.setAllowHtmlInsideAttributes(true);
    htmlCleanerProperties.setAllowMultiWordAttributes(true);
    htmlCleanerProperties.setRecognizeUnicodeChars(true);
    htmlCleanerProperties.setOmitComments(true);
    TagNode root = htmlCleaner.clean(rosterHTML);

    // return w3c document
    return new DomSerializer(htmlCleanerProperties).createDOM(root);

However, I just came to think of that I'm actually working with a W3C DOM Document.
Could it be that the unicode characters are translated back to special entities then?

HtmlCleaner Discussion

Forums

Help

Unescaping special characters such as åäö