Re: [Htmlparser-user] problem parsing Chinese character website
Brought to you by:
derrickoswald
From: Derrick O. <Der...@ro...> - 2003-03-13 12:30:07
|
I gave this problem a cursory look and it appears that the input stream opened with that charset doesn't return any lines. I didn't have time to construct a simple test case and I'm not sure a US-English system is the best platform to test this. Derrick >Message: 3 >From: "Somik Raha" <so...@ya...> >To: <htm...@li...> >Subject: Re: [Htmlparser-user] problem parsing Chinese character website >Date: Tue, 11 Mar 2003 22:37:51 -0800 >Reply-To: htm...@li... > >Derrick, Amit - any ideas ? > >----- Original Message ----- >From: "Joe Lin" <gu...@ya...> >To: <htm...@li...> >Sent: Saturday, March 08, 2003 1:32 AM >Subject: [Htmlparser-user] problem parsing Chinese character website > > > > >>Hi, >> >>It seems that the parser has problem handling Chinese >>chracters. I experiment with a simple web page as >>follows (I saved it as "test.html"): >> >><HTML> >><HEAD> >><TITLE>Hello</TITLE> >><META http-equiv=Content-Type content="text/html; >>charset=gb2312"> >></HEAD> >><BODY bgColor=#ffffff> >><h1>Hello</h1><br> >></body> >></html> >> >>I then run the parser as >>java -jar htmlparser.jar file:test.html. >>The parser output nothing but: >>HTMLParser v1.3 (Integration Build Mar 02, 2003) >>Parsing file:test.html >>INFO: detected charset "gb2312", using "EUC-CN" >> >>Thanks for any help. >> >>Joe >> >> >> >> |