Re: [Htmlparser-user] problem parsing Chinese character website
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2003-03-12 06:36:20
|
Derrick, Amit - any ideas ? ----- Original Message ----- From: "Joe Lin" <gu...@ya...> To: <htm...@li...> Sent: Saturday, March 08, 2003 1:32 AM Subject: [Htmlparser-user] problem parsing Chinese character website > Hi, > > It seems that the parser has problem handling Chinese > chracters. I experiment with a simple web page as > follows (I saved it as "test.html"): > > <HTML> > <HEAD> > <TITLE>Hello</TITLE> > <META http-equiv=Content-Type content="text/html; > charset=gb2312"> > </HEAD> > <BODY bgColor=#ffffff> > <h1>Hello</h1><br> > </body> > </html> > > I then run the parser as > java -jar htmlparser.jar file:test.html. > The parser output nothing but: > HTMLParser v1.3 (Integration Build Mar 02, 2003) > Parsing file:test.html > INFO: detected charset "gb2312", using "EUC-CN" > > Thanks for any help. > > Joe > > __________________________________________________ > Do you Yahoo!? > Yahoo! Tax Center - forms, calculators, tips, more > http://taxes.yahoo.com/ > > > ------------------------------------------------------- > This SF.net email is sponsored by: Etnus, makers of TotalView, The debugger > for complex code. Debugging C/C++ programs can leave you feeling lost and > disoriented. TotalView can help you find your way. Available on major UNIX > and Linux platforms. Try it free. www.etnus.com > _______________________________________________ > Htmlparser-user mailing list > Htm...@li... > https://lists.sourceforge.net/lists/listinfo/htmlparser-user |