HtmlUnit uses Text.DEFAULT_CHARSET as default encoding if no clues can be found from neither the HTTP header, the BOM, nor the content. For HTML pages this might be correct, but for XML pages (like RSS) this is not according to the standard (According to http://www.w3.org/TR/xml/#charencoding (or in a more readable format: http://www.opentag.com/xfaq_enc.htm#enc_default)
This would require at least changes to:
public String getContentType() {
final String contentTypeHeader = getResponseHeaderValue("content-type");
if (contentTypeHeader == null) {
// Not technically legal but some servers don't return a content-type
return "";
}
final int index = contentTypeHeader.indexOf(';');
if (index == -1) {
return contentTypeHeader;
}
return contentTypeHeader.substring(0, index);
}
Now fixed in SVN. Thanks for reporting.