Thread: [Htmlparser-developer] reading japanese via htmlParser

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi, 

In order to read japanese site this is what i do

//usual code of creating HTMLParser and then loop to get say text nodes only

 HTMLStringNode stringNode = (HTMLStringNode) node;
 System.out.println(new String(stringNode.getText().getBytes("ISO8859_4"),"Shift_JIS"));
 //System.out.println(stringNode.getText()); -- this will not work.

is this the right way to get/read japanese characters?

if not, what is the other way?

if yes, then shudnt we have a method in HTMLParser or HTMLReader to give me the default encoding used by HTMLParser i.e. '8859_4' currently.
unless i know this encoding i cannot get the desired result.

comments?

regards,
amit.

---------------------------------
Do you Yahoo!?
Faith Hill - Exclusive Performances, Videos, & more
faith.yahoo.com

Thread: [Htmlparser-developer] reading japanese via htmlParser

htmlparser-developer