#78 extraction of UTF-8 encoded aux-data leads to truncation

open
Jin
5
2010-12-05
2010-12-05
airflow
No

Auxiliary data is truncated if it contains special characters (although the encoding and the corresponding setting in config.xml is right).

I found that the truncation depends on how many special characters are contained in the string.

E.g. "Björk" --> "Björ" (1 character truncated)
but "José Gonzáles" --> "José Gonzál" (2 characters truncated)

UTF-8 works in a way that it uses an additional byte for a "special" character like an umlaut. For me it looks like there is some bug in the code which mixes up the number of "human-readable"-characters for a string (which would be 5 for "Björk") and the number of bytes actually needed to store it in UTF-8 (which would be 6 in that example).

For the record: I used the newest SVN-version in combination with taglib.

Discussion

  • gamicoulas
    gamicoulas
    2011-08-24

    I can confirm that airflow is right on that. I think that the error is on the taglib_handler.cc and more precisely in the code lines 228-230. Somewhere there the size of the string is getting mixed up. I'm not very competent as a developer but maybe someone can shed some light or apply the fix...

     
  • Igor Testen
    Igor Testen
    2011-12-19

    I can also confirm this issue. As gamicoulas points out, the problem is in taglib_handler.cc, on the line 230. Specifically:

    String value(frameContents.toCString(true), frameContents.size());

    I now use the following workaround:

    String value(frameContents.toCString(true));

    Note: It seems to work for me, but I'm not sure if that doesn't break something else, so use at your own risk. @Devs: maybe a comment on this?