From: Joel S. <jsf...@jk...> - 2012-11-16 03:39:41
|
On 11/13/2012 02:10 AM, Javier Kohen wrote: > > On Tue, Nov 13, 2012 at 2:44 AM, Joel Smith <jsf...@jk... > <mailto:jsf...@jk...>> wrote: > > On 11/11/2012 03:09 AM, Daniele Forsi wrote: > > I don't know that code but in this commit > > > https://gitorious.org/~joelsmith/gtkpod/joelsmiths-sandbox/commit/6157a8c10646ff25c98651755c6dbbafa63170ba > <https://gitorious.org/%7Ejoelsmith/gtkpod/joelsmiths-sandbox/commit/6157a8c10646ff25c98651755c6dbbafa63170ba> > > > > you are using g_strdup() while existing code uses charset_to_utf8() > > > > exisitng code: > > 182 // MP4 Title > > 183 value = find_atom_value(TITLE); > > 184 if (value) { > > 185 track->title = charset_to_utf8(value); > > 186 free(value); > > 187 } > > > > new code: > > 383 // MP4 Sort Title > > 384 value = find_atom_value(SORT_TITLE); > > 385 if (value) { > > 386 track->sort_title = g_strdup(value); > > 387 free(value); > > 388 } > > > > if that's ok can you please explain why with a comment in the code? > > I think the existing code is wrong. MP4 tags are already stored as > UTF-8 (or at least they're supposed to be). And AP doesn't do any > further conversions before returning the strings to us, so we > shouldn't > be trying to convert an already UTF-8 string from local charset to > UTF-8. If run with a non-UTF-8 charset locale, that would corrupt the > data. For example, with ISO-8859-1, all but the lower 128 char values > would get corrupted. Funny that I didn't even notice the other ones > doing it wrong. Thanks for reviewing the patch so we could catch the > other bug. I'll commit a fix. I'm no expert on i18n, charsets, etc, so > maybe we'll end up reverting my change right away, but that's how > I see it. > > > I'm no expert in i18n and Unicode either, but I have some experience. > I can review the change if you'd like. > The more eyes, the the better. Here's the patch: http://gtkpod.git.sourceforge.net/git/gitweb.cgi?p=gtkpod/gtkpod;a=commitdiff;h=0fd91b5fe7e93e5b2409b7b5a36e9dd052042237 My reasoning is that find_atom_value() calls APar_ExtractDataAtom() which for text types, simply seeks to an offset in the MP4 file and reads the atom value data into the return buffer, without doing any kind of charset conversion. Since MP4 atom text values are supposed to be stored as UTF-8, we should just copy these UTF-8 strings directly, rather than calling charset_to_utf8() on them. Regards, Joel |