Re: [Htmlparser-developer] htmlparser 1.0 (Issue with mtv3 is that of internationalization)
Brought to you by:
derrickoswald
From: Somik R. <so...@ya...> - 2002-01-09 11:50:17
|
Dear Kaarle, Thank you very much! You are quite right, I forgot I was using = Shift-JIS for Japanese encoding support and SJIS is a Microsoft specific = standard - not unicode, but if I use a unicode encoding, it should be = fine. I will try with UTF8, will need your help to co-ordinate some more = tests. Meanwhile this style thing is proving to be a headache, just got a = report that its crashing on google. Need to add more test cases.. Regards, Somik ----- Original Message -----=20 From: Kaarle Kaila=20 To: Somik Raha=20 Sent: Wednesday, January 09, 2002 2:40 AM Subject: Re: [Htmlparser-developer] htmlparser 1.0 (Issue with mtv3 is = that of internationalization) At 22:37 8.1.2002 +0530, Somik Raha wrote: Hi Kaarle, I found the reason for the last problem - the site : = http://www.mtv3.fi has a link in Finnish. That link is not being interpreted correctly = by the parser. The link is : <a href=3D"/ks/ks_20020701b.shtml">Palveluun p=E4=E4set = t=E4st=E4</a> hi Somik, HTMLParser reads lines from the net. It initiates the contact to that = line with a command=20 reader =3D new HTMLReader(new BufferedReader(new = InputStreamReader(uc.getInputStream(),"SJIS")),resourceLocn); I don't know what SJIS stands for. The Java API does not list that, = but lists among others ISO-8859-1. Check InputStreamReader constructor. By using ISO-8859-1 it does not = hang like it did with SJIS! SJIS seems to make everything 7-bit ascii.=20 reader =3D new HTMLReader(new BufferedReader(new = InputStreamReader(uc.getInputStream(),"ISO-8859-1")),resourceLocn); With this setting at least finnish characters come correctly.=20 I also downloaded two files you hade made changes from CVS=20 and I could read www.mtv3.fi. It even reads my webpage (rather strange = output though). In Japan I would expect the internationalizing to be an issue?? = Wouldn't UNICODE=20 be required there? regards Kaarle Whats happening is that the last < is being corrupted. I havent = faced a problem with internationalization till now - and I am kind of stuck = with this one. Maybe you'd be in a better position to solve it than me. I = will make the release with the other bug fixed, and Id be grateful if u = can proceed from there. Regards, Somik _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com --------------------------------------------- Kaarle Kaila http://www.iki.fi/kaila mailto:kaa...@ik... tel: +358 50 3725844=20 |