Re: [Htmlparser-developer] JIS encoding problem
Brought to you by:
derrickoswald
From: Yuta O. <ok...@ar...> - 2006-04-20 09:20:39
|
Thank you for your advice! I modified our code as reseting the parser and calling visitAllNodesWith() again, parsing process is done successfully by corrected encoding. And I have correction about JIS handling. I make scanJIS() to recognize "[ESC] ( I" as the end of JIS encoding string, but it is mistake. According to ISO-2022-JP, It is necessary to return to ASCII charset at the end of the line and the text. JIS X 0201-1976 "Kana" charset, that is single byte charset, is not ASCII charset. Note that the codes I modified are only support the Japanese charsets. There are many type of charset(ex. Chinese, Korean, Latin, etc...) which use other escape sequences. If another problem is happen about escape sequence handling, following URLs help you to settle the problem. Wikipedia - ISO/IEC 2022 http://en.wikipedia.org/wiki/ISO_2022 International Register of Coded Character Sets http://www.itscj.ipsj.or.jp/ISO-IR/ |