Scott,

I'm unable to get the unicode characters. My code looks as follows:

    // set HTMLCleaner, clean and create w3c document
    HtmlCleaner htmlCleaner = new HtmlCleaner();
    CleanerProperties htmlCleanerProperties = htmlCleaner.getProperties();
    htmlCleanerProperties.setTranslateSpecialEntities(true);
    htmlCleanerProperties.setAllowHtmlInsideAttributes(true);
    htmlCleanerProperties.setAllowMultiWordAttributes(true);
    htmlCleanerProperties.setRecognizeUnicodeChars(true);
    htmlCleanerProperties.setOmitComments(true);
    TagNode root = htmlCleaner.clean(rosterHTML);

    // return w3c document
    return new DomSerializer(htmlCleanerProperties).createDOM(root);

However, I just came to think of that I'm actually working with a W3C DOM Document.
Could it be that the unicode characters are translated back to special entities then?

J