Hi,
In order to read japanese site this is what i do
//usual code of creating HTMLParser and then loop to get say text nodes only
HTMLStringNode stringNode = (HTMLStringNode) node;
System.out.println(new String(stringNode.getText().getBytes("ISO8859_4"),"Shift_JIS"));
//System.out.println(stringNode.getText()); -- this will not work.
is this the right way to get/read japanese characters?
if not, what is the other way?
if yes, then shudnt we have a method in HTMLParser or HTMLReader to give me the default encoding used by HTMLParser i.e. '8859_4' currently.
unless i know this encoding i cannot get the desired result.
comments?
regards,
amit.
---------------------------------
Do you Yahoo!?
Faith Hill - Exclusive Performances, Videos, & more
faith.yahoo.com |