#211 Double BOM rendered

happens every time
open
nobody
5
2012-12-29
2012-12-29
Bruce
No

When a UTF16 encoded field is rendered, a double BOM is inserted in the mp3 file. The first 8 bytes Following text encoding description byte are: FF FE FE FF 00 4F 00 63. Many readers get confused by the double conflicting BOM.

This occurs with the id3v2 distribution (id3v2.x86_64 0:0.1.12-2.fc15) on Fedora 15, which uses id3lib.x86_64 0:3.8.3-25.fc15. It also occurs with id3v2 and id3lib built from source (id3v2-0.1.12-5.fc18.src.rpm and id3lib-3.8.3.tar.gz).

To duplicate:
run id3v2 -r PRIV 21.mp3 (file must have utf-16 with BOM fields, files from amazon.com do)

My environment:
g++ (GCC) 4.6.3 20120306 (Red Hat 4.6.3-2)
ldd (GNU libc) 2.14.1
Linux 2.6.41.4-1.fc15.x86_64
id3lib version 3.8.3

Detail:
The method io::writeUnicodeText() (io_helpers.cpp) always prepends a LE BOM, implying the internal field data is LE. The convert method dami::convert() (utils.cpp) passes a text string to iconv_open() to set it's output encoding. This string is determined by getFormat() from the field's ID3_TextEnc. When ID3_TextEnc is ID3TE_UTF16, getFormat() provides the string "UTF16" to iconv_open() as the output encoding. iconv() then returns a BOM which inserted in the internal data. This results in the double BOM.

For a consistent LE internal field data representation, getFormat() should return "UTF-16LE" for all utf-16 enumerations of ID3_TextEnc. No BOM is returned by iconv() with an output coding of "UTF-16LE", eliminating the double BOM. In addition, the byte ordering in io::writeUnicodeText() must be reversed.

Discussion