Derrick, Amit - any ideas ?
----- Original Message -----
From: "Joe Lin" <gu...@ya...>
To: <htm...@li...>
Sent: Saturday, March 08, 2003 1:32 AM
Subject: [Htmlparser-user] problem parsing Chinese character website
> Hi,
>
> It seems that the parser has problem handling Chinese
> chracters. I experiment with a simple web page as
> follows (I saved it as "test.html"):
>
> <HTML>
> <HEAD>
> <TITLE>Hello</TITLE>
> <META http-equiv=Content-Type content="text/html;
> charset=gb2312">
> </HEAD>
> <BODY bgColor=#ffffff>
> <h1>Hello</h1><br>
> </body>
> </html>
>
> I then run the parser as
> java -jar htmlparser.jar file:test.html.
> The parser output nothing but:
> HTMLParser v1.3 (Integration Build Mar 02, 2003)
> Parsing file:test.html
> INFO: detected charset "gb2312", using "EUC-CN"
>
> Thanks for any help.
>
> Joe
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Tax Center - forms, calculators, tips, more
> http://taxes.yahoo.com/
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Etnus, makers of TotalView, The
debugger
> for complex code. Debugging C/C++ programs can leave you feeling lost and
> disoriented. TotalView can help you find your way. Available on major UNIX
> and Linux platforms. Try it free. www.etnus.com
> _______________________________________________
> Htmlparser-user mailing list
> Htm...@li...
> https://lists.sourceforge.net/lists/listinfo/htmlparser-user
|