[Htmlparser-user] problem parsing Chinese character website
Brought to you by:
derrickoswald
From: Joe L. <gu...@ya...> - 2003-03-08 09:32:43
|
Hi, It seems that the parser has problem handling Chinese chracters. I experiment with a simple web page as follows (I saved it as "test.html"): <HTML> <HEAD> <TITLE>Hello</TITLE> <META http-equiv=Content-Type content="text/html; charset=gb2312"> </HEAD> <BODY bgColor=#ffffff> <h1>Hello</h1><br> </body> </html> I then run the parser as java -jar htmlparser.jar file:test.html. The parser output nothing but: HTMLParser v1.3 (Integration Build Mar 02, 2003) Parsing file:test.html INFO: detected charset "gb2312", using "EUC-CN" Thanks for any help. Joe __________________________________________________ Do you Yahoo!? Yahoo! Tax Center - forms, calculators, tips, more http://taxes.yahoo.com/ |