Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#1 iconv problem with UTF-16 files

closed-invalid
nobody
None
5
2003-06-12
2003-02-19
No

Iconv has major problems with UTF-16 file on the
windows platform.

Apparently the files are opened as text instead of binary.
With UTF-16, this means special characters will be seen
where they do not exist (the value was only part of an
extended character), and modifications in the
input/output made by the OS will generate invalid UTF-16.

Conversion of a valid UTF-16-LE file to the same format :
iconv -f utf-16LE -t utf-16LE index.txt > ind.txt
shows that :
0D 00 - 0A 00 - 32 00 in input changes to :
0D 00 - 0D 0A 00 - 32 00 in output.

The erronous 0D desynchonizes all the data after that
and it becomes invalid.

When in the input is present :
3A 00 - 1A 90 - 3A 00
the outputs stop after
3A 00
without any error message, nothing saying that invalid
input is seen. It seems to think 1A is an end of file.

When converting to utf-8 (there should not be a problem
outputting utf-8)
iconv -f utf-16LE -t utf-8 index.txt > ind_u8.txt
the input sequence :
0D 00 - 0A 00
changes to
0D - 0D - 0A

Discussion

  • Test UTF-16 file to demonstrate problem

     
    Attachments
  • Perry
    Perry
    2003-05-24

    Logged In: YES
    user_id=60964

    Use the --binary option to iconv.exe, eg,

    iconv --binary -f UTF-16LE -t UTF-16 index.txt > index_twin.txt

    (I have tested this with your index file, btw, using iconv-1.9.)

    Hopefully you will be able to confirm that this solves the
    problem, and you can close the bug ? (Ordinarily iconv bugs
    should go to the iconv bug address, I think, but if you
    concur that this is not really a bug, just an annoyance of
    the MS-Windows platform, then we can forget about it ?)

     
  • Logged In: YES
    user_id=670775

    That's ok, I didn't realize there was that option, and it
    does solve the problem.

    Sorry about that.

     
    • status: open --> open-invalid
     
    • status: open-invalid --> closed-invalid