Menu

#63 Non ISO-8859-1 charsets bugfix

open
nobody
None
8
2004-08-24
2004-08-24
No

That's right. At now id3lib cannot supporting character
encodings other than ISO-8859-1, because is too buggy.

There are the reasons:

In field.cpp, field_string_ascii.cpp and
field_string_unicode.cpp statements like 'enc ==
ID3TE_ASCII' and 'enc == ID3TE_UNICODE' instead of the
recommended 'ID3TE_IS_SINGLE_BYTE_ENC(enc)' and
'ID3TE_IS_DOUBLE_BYTE_ENC(enc)' are used. That causes
unicode methods in io_helpers.cpp to be applyed
everytime excepting the cases whenever the current
field encoding is set to ID3TE_ASCII. But UTF-8 is
single byte encoding too. As result every time trying
to write UTF-8 frame the BOM character has prepended
ahead of the field's text. :o There seems that the
reading and writing methods can lose the last byte in
cases of the odd number of the UTF-8 string bytes.

In io::writeUnicodeText in io_helpers.cpp the BOM
character always is written as the host endian byte
order. But unicode data is always written as opposite
to host byte order in cases of external calls. It seems
that id3lib internally handles the double byte data as
MSB first, independent of the host byte order. As
result in some cases incorrect BOM were written. I
rewrote the method thus accepting that feeded data is
allways in big endian byte order or with prepended BOM.
Tag data will everytime been in big endian too. Also
threatment for the cases of UTF-16BE field encoding has
been added to not allowing BOMs at beginning of the data.

Discussion

  • Vladimir Petrov

    Vladimir Petrov - 2004-08-24
    • priority: 5 --> 8
     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.