#1 iconv problem with UTF-16 files


Iconv has major problems with UTF-16 file on the
windows platform.

Apparently the files are opened as text instead of binary.
With UTF-16, this means special characters will be seen
where they do not exist (the value was only part of an
extended character), and modifications in the
input/output made by the OS will generate invalid UTF-16.

Conversion of a valid UTF-16-LE file to the same format :
iconv -f utf-16LE -t utf-16LE index.txt > ind.txt
shows that :
0D 00 - 0A 00 - 32 00 in input changes to :
0D 00 - 0D 0A 00 - 32 00 in output.

The erronous 0D desynchonizes all the data after that
and it becomes invalid.

When in the input is present :
3A 00 - 1A 90 - 3A 00
the outputs stop after
3A 00
without any error message, nothing saying that invalid
input is seen. It seems to think 1A is an end of file.

When converting to utf-8 (there should not be a problem
outputting utf-8)
iconv -f utf-16LE -t utf-8 index.txt > ind_u8.txt
the input sequence :
0D 00 - 0A 00
changes to
0D - 0D - 0A


  • Jean-Marc Desperrier

    Test UTF-16 file to demonstrate problem

  • Anonymous

    Anonymous - 2003-05-24

    Logged In: YES

    Use the --binary option to iconv.exe, eg,

    iconv --binary -f UTF-16LE -t UTF-16 index.txt > index_twin.txt

    (I have tested this with your index file, btw, using iconv-1.9.)

    Hopefully you will be able to confirm that this solves the
    problem, and you can close the bug ? (Ordinarily iconv bugs
    should go to the iconv bug address, I think, but if you
    concur that this is not really a bug, just an annoyance of
    the MS-Windows platform, then we can forget about it ?)

  • Jean-Marc Desperrier

    Logged In: YES

    That's ok, I didn't realize there was that option, and it
    does solve the problem.

    Sorry about that.

  • Jean-Marc Desperrier

    • status: open --> open-invalid
  • Jean-Marc Desperrier

    • status: open-invalid --> closed-invalid

Log in to post a comment.