Re: [Htmlparser-developer] htmlparser 1.0 (Issue with mtv3 is that of internationalization)
Brought to you by:
derrickoswald
|
From: Somik R. <so...@ya...> - 2002-01-09 11:50:17
|
Dear Kaarle,
Thank you very much! You are quite right, I forgot I was using =
Shift-JIS for Japanese encoding support and SJIS is a Microsoft specific =
standard - not unicode, but if I use a unicode encoding, it should be =
fine. I will try with UTF8, will need your help to co-ordinate some more =
tests.
Meanwhile this style thing is proving to be a headache, just got a =
report that its crashing on google. Need to add more test cases..
Regards,
Somik
----- Original Message -----=20
From: Kaarle Kaila=20
To: Somik Raha=20
Sent: Wednesday, January 09, 2002 2:40 AM
Subject: Re: [Htmlparser-developer] htmlparser 1.0 (Issue with mtv3 is =
that of internationalization)
At 22:37 8.1.2002 +0530, Somik Raha wrote:
Hi Kaarle,
I found the reason for the last problem - the site : =
http://www.mtv3.fi
has a link in Finnish. That link is not being interpreted correctly =
by the
parser. The link is :
<a href=3D"/ks/ks_20020701b.shtml">Palveluun p=E4=E4set =
t=E4st=E4</a>
hi Somik,
HTMLParser reads lines from the net. It initiates the contact to that =
line with a command=20
reader =3D new HTMLReader(new BufferedReader(new =
InputStreamReader(uc.getInputStream(),"SJIS")),resourceLocn);
I don't know what SJIS stands for. The Java API does not list that, =
but lists among others ISO-8859-1.
Check InputStreamReader constructor. By using ISO-8859-1 it does not =
hang like it did with SJIS!
SJIS seems to make everything 7-bit ascii.=20
reader =3D new HTMLReader(new BufferedReader(new =
InputStreamReader(uc.getInputStream(),"ISO-8859-1")),resourceLocn);
With this setting at least finnish characters come correctly.=20
I also downloaded two files you hade made changes from CVS=20
and I could read www.mtv3.fi. It even reads my webpage (rather strange =
output though).
In Japan I would expect the internationalizing to be an issue?? =
Wouldn't UNICODE=20
be required there?
regards
Kaarle
Whats happening is that the last < is being corrupted. I havent =
faced a
problem with internationalization till now - and I am kind of stuck =
with
this one. Maybe you'd be in a better position to solve it than me. I =
will
make the release with the other bug fixed, and Id be grateful if u =
can
proceed from there.
Regards,
Somik
_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com
---------------------------------------------
Kaarle Kaila
http://www.iki.fi/kaila
mailto:kaa...@ik...
tel: +358 50 3725844=20
|