Tcl / Read-Only Bugs / #1800 iso2022-jp conversion problems

#1800 iso2022-jp conversion problems

Milestone: obsolete: 8.4a4

Status: closed-fixed

Owner: Jeffrey Hobbs

Labels: 44. UTF-8 Strings (46)

Priority: 5

Updated: 2002-04-18

Created: 2002-03-06

Creator: kazuro furukawa

Private: No

tcl8.4a4 addressed several problems around the
iso2022-jp enconding.
For example, bugs that I submitted in the past was
mostly fixed.
[ BugID: 218099 ] iso2022-jp encoding does not work.
[ BugID: 219283 ] iso2022-jp encoding is broken

However, it still have problems when I convert
relatively long (longer
than several kilo-bytes) japanese texts (eg. Unix
Japanese Manual
Pages) into iso2022-jp. I'll attach a scipt to
reproduce that.

Some details follow.

(1) euc-jp to iso2022-jp gets-puts conversion

When I convert a text with "tclsh8.4 eucjis.tcl -eucjis
-gets infile outfile",
sometimes "esc ( B" is missing, sometimes extra "esc (
B" appears.
While extra "esc ( B" does not matter, missing "esc (
B" causes
missing characters on reading. The error is
reprodusible if I use the
same file, but I don't know how and when it happens.

"od -x -a" of an example error is below. If I extract
the erroneous
line, the error does not occur. Thus the error is not
the code
dependent but context dependent.

[ output from eucjis.tcl -eucjis -gets euc.txt
jis-n3.txt ]

% H $ 7 $ ^ esc ( B nl sp
sp sp sp sp sp
0007760 241b 2442 2139 1b23 4228
0a0a 2020 752d
esc $ B $ 9 ! # esc ( B nl
nl sp sp - u
! 0010000 2020 241b 2542 213d 253c
2148 2d4a 2074
! sp sp esc $ B % = ! < % H
! J - t sp
! 0010020 241b 2442 3b48 4d48 2451
2439 246b 2448
! esc $ B $ H ; H M Q $ 9
$ k $ H $

[ correct output produced from a software called nkf ]

% H $ 7 $ ^ esc ( B nl sp
sp sp sp sp sp
0007760 241b 2442 2139 1b23 4228
0a0a 2020 752d
esc $ B $ 9 ! # esc ( B nl
nl sp sp - u
! 0010000 2020 241b 2542 213d 253c
2148 1b4a 4228
! sp sp esc $ B % = ! < % H
! J esc ( B
! 0010020 742d 1b20 4224 4824 483b
514d 3924 6b24
! - t sp esc $ B $ H ; H M
Q $ 9 $ k
! 0010040 4824 2d24 4b21 5e24 3f24
4f24 3d49 283c
! $ H $ - ! K $ ^ $ ? $
O I = < (

(2) euc-jp to iso2022-jp read-puts conversion

When I convert a text with "tclsh8.4 eucjis.tcl -eucjis
-read infile outfile",
sometimes extra "esc $ B" appears in the middle of the
output.
It seems it always appears at around the character
number 4096 or
8192, etc. (It's not byte number, but character
number.) Thus,
if the tcl internal buffer for unicode storage is
8192-byte long
(4096 characters), such boundary handling is supposed
to have some
bugs, at the beginning of each internal buffer.

(3) font selection mechanism

Under tk8.4a4 some character is not displayed correctly
with a font
like "*-jisx0208.1983-1". It is a minor problem, since
we normally use
"*-jisx0208.1983-0".

Discussion

kazuro furukawa - 2002-03-06

A script to convert Japanese texts between euc-jp and iso2022-jp encodings

eucjis.tcl

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

kazuro furukawa - 2002-03-06

assigned_to: nijtmans --> hobbs
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

kazuro furukawa - 2002-03-08

Logged In: YES
user_id=49637

Problems (1) and (2) were found to be fixed by a patch by
Koichi Yamamoto (private communication). He may submit the
patch after he refine it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Koichi Yamamoto - 2002-03-12

Logged In: YES
user_id=475117

Hi,
I sent Mr. Furukawa an additional patch to fix this
problem, then I received his message that (1) and (2)
problems were solved.

My additional patch is available from:
http://www3.ocn.ne.jp/~yamako/tcl/iso2022-
jp.tcl84a4.2002mar12.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeffrey Hobbs - 2002-04-18

yamako-endenc.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeffrey Hobbs - 2002-04-18

Logged In: YES
user_id=72656

Applied patch to 8.4 head on 2002-04-17. Attached patch
for posterity.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jeffrey Hobbs - 2002-04-18

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

iso2022-jp conversion problems

The Tool Command Language implementation

Group

Searches

Help

#1800 iso2022-jp conversion problems

Discussion