Menu

#3309 binary scan unicode error

obsolete: 8.4.11
closed-invalid
5
2005-12-16
2005-11-22
No

Windows2000/ActiveTcl8.4.11

test

set b1 \u3042\u3044
set b2 [encoding convertto unicode $b1]

binary scan $b1 H* result1;# 4244
binary scan $b2 H* result2;# 42304430
if {$result1 ne $result2} {error error}

maybe is result2 value right?
It seems to me that each second bytes was lost.

Discussion

  • Donal K. Fellows

    • status: open --> open-invalid
     
  • Donal K. Fellows

    Logged In: YES
    user_id=79902

    That's exactly the way it is defined to work, using just the
    low byte from each character. The [encoding convertto] just
    transforms the characters into a sequence of bytes (with
    each resulting byte encoded in a single character in the
    range \u0000-\u00ff) which are then handled in the way you
    describe.

    The key point is that Tcl is using a different
    interpretation of bytes and (especially) characters to what
    you expected.

    Keeping bug open while I consider if a documentation update
    is needed.

     
  • Donal K. Fellows

    • status: open-invalid --> closed-invalid