[Htmlparser-user] Character Encoding
Brought to you by:
derrickoswald
From: <bo...@ti...> - 2008-05-12 11:56:08
|
Thanks Derrick, The page in question includes the following tags: <META http-equiv=Content-Type content="text/html; charset=utf-8"> <META http-equiv=content-type> I don't understand why the second one is there but it really is. With that information can you suggest a resolution? I am not entirely sure how to verify your point (1). Best Regards Brian ----------------------------------------------------------------------------- There are two possibilities. 1) The HTTP server is/is not serving up content type meta information in the HTTP header like so: text/html; charset=utf-8 2) The source HTML does/does not contain a meta tag like so: <meta http-equiv="Content-type" content="text/html; charset=utf-8" /> You need to determine which one so the appropriate 'fix' can be applied. ----- Original Message ---- From: "bo...@ti..." <bo...@ti...> To: htm...@li... Sent: Monday, May 12, 2008 7:31:39 AM Subject: [Htmlparser-user] Character Encoding Hi, I have a strange problem and I can’t get my head around it. Hopefully someone can point me in the right direction. I’m using the following code with HTMLParser 1.6 to retrieve web pages: parser = new Parser (URL); ThePage = parser.parse (null); MyPage = ThePage.toHtml(); On some pages (not all…) if the HTML page contains: £10 Free “My Page” contains “?10 Free” on other pages it works fine. I guess it has something to do with character encoding? Can someone suggest what I add where to get this to work correctly (I would like to keep the “£10 Free”) Thanks in advance Brian _______________________________ How can you protect children online? Find out - http://www.tiscali.co.uk/protection |