MediaTomb / Bugs / #78 extraction of UTF-8 encoded aux-data leads to truncation

#78 extraction of UTF-8 encoded aux-data leads to truncation

Status: open

Owner: Jin

Labels: Media Import/Metadata Handling (32)

Priority: 5

Updated: 2010-12-05

Created: 2010-12-05

Creator: airflow

Private: No

Auxiliary data is truncated if it contains special characters (although the encoding and the corresponding setting in config.xml is right).

I found that the truncation depends on how many special characters are contained in the string.

E.g. "Björk" --> "Björ" (1 character truncated)
but "José Gonzáles" --> "José Gonzál" (2 characters truncated)

UTF-8 works in a way that it uses an additional byte for a "special" character like an umlaut. For me it looks like there is some bug in the code which mixes up the number of "human-readable"-characters for a string (which would be 5 for "Björk") and the number of bytes actually needed to store it in UTF-8 (which would be 6 in that example).

For the record: I used the newest SVN-version in combination with taglib.

Discussion

gamicoulas - 2011-08-24

I can confirm that airflow is right on that. I think that the error is on the taglib_handler.cc and more precisely in the code lines 228-230. Somewhere there the size of the string is getting mixed up. I'm not very competent as a developer but maybe someone can shed some light or apply the fix...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Igor Testen - 2011-12-19

I can also confirm this issue. As gamicoulas points out, the problem is in taglib_handler.cc, on the line 230. Specifically:

String value(frameContents.toCString(true), frameContents.size());

I now use the following workaround:

String value(frameContents.toCString(true));

Note: It seems to work for me, but I'm not sure if that doesn't break something else, so use at your own risk. @Devs: maybe a comment on this?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

extraction of UTF-8 encoded aux-data leads to truncation

Group

Searches

Help

#78 extraction of UTF-8 encoded aux-data leads to truncation

Discussion