Thanks Derrick,
The page in question includes the following tags:
<META http-equiv=Content-Type content="text/html; charset=utf-8">
<META http-equiv=content-type>
I don't understand why the second one is there but it really is. With
that information can you suggest a resolution? I am not entirely sure
how to verify your point (1).
Best Regards
Brian
-----------------------------------------------------------------------------
There are two possibilities.
1) The HTTP server is/is not serving up content type meta information
in the HTTP header like so:
text/html; charset=utf-8
2) The source HTML does/does not contain a meta tag like so:
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
You need to determine which one so the appropriate 'fix' can be
applied.
----- Original Message ----
From: "bo...@ti..." <bo...@ti...>
To: htm...@li...
Sent: Monday, May 12, 2008 7:31:39 AM
Subject: [Htmlparser-user] Character Encoding
Hi,
I have a strange problem and I can’t get my head around it. Hopefully
someone can point me in the right direction. I’m using the following
code with HTMLParser 1.6 to retrieve web pages:
parser = new Parser
(URL);
ThePage = parser.parse
(null);
MyPage = ThePage.toHtml();
On some pages (not all…) if the HTML page contains:
£10 Free
“My Page” contains “?10 Free” on other pages it works fine.
I guess it has something to do with character encoding? Can someone
suggest what I add where to get this to work correctly (I would like
to
keep the “£10 Free”)
Thanks in advance
Brian
_______________________________
How can you protect children online? Find out - http://www.tiscali.co.uk/protection
|