From: Javier K. <jk...@gm...> - 2012-11-16 06:26:28
|
On Fri, Nov 16, 2012 at 4:39 AM, Joel Smith <jsf...@jk...>wrote: > On 11/13/2012 02:10 AM, Javier Kohen wrote: > > > On Tue, Nov 13, 2012 at 2:44 AM, Joel Smith <jsf...@jk...>wrote: > >> On 11/11/2012 03:09 AM, Daniele Forsi wrote: >> > I don't know that code but in this commit >> > >> https://gitorious.org/~joelsmith/gtkpod/joelsmiths-sandbox/commit/6157a8c10646ff25c98651755c6dbbafa63170ba >> > >> > you are using g_strdup() while existing code uses charset_to_utf8() >> > >> > exisitng code: >> > 182 // MP4 Title >> > 183 value = find_atom_value(TITLE); >> > 184 if (value) { >> > 185 track->title = charset_to_utf8(value); >> > 186 free(value); >> > 187 } >> > >> > new code: >> > 383 // MP4 Sort Title >> > 384 value = find_atom_value(SORT_TITLE); >> > 385 if (value) { >> > 386 track->sort_title = g_strdup(value); >> > 387 free(value); >> > 388 } >> > >> > if that's ok can you please explain why with a comment in the code? >> >> I think the existing code is wrong. MP4 tags are already stored as >> UTF-8 (or at least they're supposed to be). And AP doesn't do any >> further conversions before returning the strings to us, so we shouldn't >> be trying to convert an already UTF-8 string from local charset to >> UTF-8. If run with a non-UTF-8 charset locale, that would corrupt the >> data. For example, with ISO-8859-1, all but the lower 128 char values >> would get corrupted. Funny that I didn't even notice the other ones >> doing it wrong. Thanks for reviewing the patch so we could catch the >> other bug. I'll commit a fix. I'm no expert on i18n, charsets, etc, so >> maybe we'll end up reverting my change right away, but that's how I see >> it. >> > > I'm no expert in i18n and Unicode either, but I have some experience. I > can review the change if you'd like. > > > The more eyes, the the better. Here's the patch: > > http://gtkpod.git.sourceforge.net/git/gitweb.cgi?p=gtkpod/gtkpod;a=commitdiff;h=0fd91b5fe7e93e5b2409b7b5a36e9dd052042237 > > My reasoning is that find_atom_value() calls APar_ExtractDataAtom() which > for text types, simply seeks to an offset in the MP4 file and reads the > atom value data into the return buffer, without doing any kind of charset > conversion. Since MP4 atom text values are supposed to be stored as UTF-8, > we should just copy these UTF-8 strings directly, rather than calling > charset_to_utf8() on them. > Sounds good to me. Here's what the official site<http://atomicparsley.sourceforge.net/>says on this topic: > With the exception of artwork, all iTunes-style tags support only 1 piece > of metadata - you can't have for example 3 title tags. 3gp assets and > copyright notices support multiple tags, but they must differ in the > language setting for that tag. You can have for example 3 title tags, but 1 > in english, 1 in spanish, 1 in undefined. *All iTunes-style text metadata > is always in UTF-8*; 3gp assets & copyright notices can be in set in > UTF-8 or UTF-16. *All strings are converted internally to UTF-8*, > converting as necessary. On *nix platforms, input is in UTF-8; the native > Windows port supports full UTF-16 input. |