[Htmlparser-user] Character Encoding

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Thanks Derrick,

The page in question includes the following tags:

<META http-equiv=Content-Type content="text/html; charset=utf-8">
<META http-equiv=content-type>

I don't understand why the second one is there but it really is. With 
that information can you suggest a resolution? I am not entirely sure 
how to verify your point (1).

Best Regards

Brian
-----------------------------------------------------------------------------

There are two possibilities.

1) The HTTP server is/is not serving up content type meta information 
in the HTTP header like so:
text/html; charset=utf-8

2) The source HTML does/does not contain a meta tag like so:

----- Original Message ----
From: "bo...@ti..." <bo...@ti...>
To: htm...@li...
Sent: Monday, May 12, 2008 7:31:39 AM
Subject: [Htmlparser-user] Character Encoding

Hi,

I have a strange problem and I can’t get my head around it. Hopefully 
someone can point me in the right direction. I’m using the following 
code with HTMLParser 1.6 to retrieve web pages:

                parser                      = new Parser
(URL);              
                ThePage                  = parser.parse
(null);          
                MyPage                    = ThePage.toHtml();

On some pages (not all…) if the HTML page contains:

Â£10 Free

“My Page” contains “?10 Free” on other pages it works fine.

I guess it has something to do with character encoding? Can someone 
suggest what I add where to get this to work correctly (I would like 
to 
keep the “Â£10 Free”)

Thanks in advance

Brian

_______________________________
How can you protect children online?  Find out - http://www.tiscali.co.uk/protection