From: Johann P. <joh...@ch...> - 2010-06-25 09:29:04
|
I have an odd problem when accessing a German page encoded in ISO-8859-1 which contains Umlauts and also other special characters. Here is the simple code: import com.gargoylesoftware.htmlunit.WebClient; import com.gargoylesoftware.htmlunit.html.HtmlPage; class Test1 { public static void main(String[] args) throws Exception { WebClient w = new WebClient(); HtmlPage p = w.getPage("http://www.berufslexikon.at/bhs_beruf3045_5"); System.out.println(p.asText()); } } This program will output the umlauts just fine, but it gets the quote characters wrong. There are other pages with the same encoding which contain e.g. bullets which are also not correctly encoded. The odd thing is that the umlauts are ok, which seems to indicate that iso-8859-1 is detected correctly, but why are the quotes and bullets wrong then? As one can easily try, the web page with that URL is displayed correctly in all browsers, including Firefox, IE, Chrome, and even Lynx. is this a bug? Is there any workaround? Cheers, Johann |