id3lib / Patches / #63 Non ISO-8859-1 charsets bugfix

#63 Non ISO-8859-1 charsets bugfix

Status: open

Owner: nobody

Labels: None

Priority: 8

Updated: 2004-08-24

Created: 2004-08-24

Creator: Vladimir Petrov

Private: No

That's right. At now id3lib cannot supporting character
encodings other than ISO-8859-1, because is too buggy.

There are the reasons:

In field.cpp, field_string_ascii.cpp and
field_string_unicode.cpp statements like 'enc ==
ID3TE_ASCII' and 'enc == ID3TE_UNICODE' instead of the
recommended 'ID3TE_IS_SINGLE_BYTE_ENC(enc)' and
'ID3TE_IS_DOUBLE_BYTE_ENC(enc)' are used. That causes
unicode methods in io_helpers.cpp to be applyed
everytime excepting the cases whenever the current
field encoding is set to ID3TE_ASCII. But UTF-8 is
single byte encoding too. As result every time trying
to write UTF-8 frame the BOM character has prepended
ahead of the field's text. :o There seems that the
reading and writing methods can lose the last byte in
cases of the odd number of the UTF-8 string bytes.

In io::writeUnicodeText in io_helpers.cpp the BOM
character always is written as the host endian byte
order. But unicode data is always written as opposite
to host byte order in cases of external calls. It seems
that id3lib internally handles the double byte data as
MSB first, independent of the host byte order. As
result in some cases incorrect BOM were written. I
rewrote the method thus accepting that feeded data is
allways in big endian byte order or with prepended BOM.
Tag data will everytime been in big endian too. Also
threatment for the cases of UTF-16BE field encoding has
been added to not allowing BOMs at beginning of the data.

Discussion

Vladimir Petrov - 2004-08-24

charsets bugfix

id3lib-3.8.3-write_encoded_utf8_bugfix-0.1.patch.gz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Vladimir Petrov - 2004-08-24

priority: 5 --> 8
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Non ISO-8859-1 charsets bugfix

Group

Searches

Help

#63 Non ISO-8859-1 charsets bugfix

Discussion