Re: [Gtkpod-devel] Patches: mp4 chapters, sorting/compilation/media type tags

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, Nov 16, 2012 at 4:39 AM, Joel Smith <jsf...@jk...>wrote:

>  On 11/13/2012 02:10 AM, Javier Kohen wrote:
>
>
> On Tue, Nov 13, 2012 at 2:44 AM, Joel Smith <jsf...@jk...>wrote:
>
>> On 11/11/2012 03:09 AM, Daniele Forsi wrote:
>> > I don't know that code but in this commit
>> >
>> https://gitorious.org/~joelsmith/gtkpod/joelsmiths-sandbox/commit/6157a8c10646ff25c98651755c6dbbafa63170ba
>> >
>> > you are using g_strdup() while existing code uses charset_to_utf8()
>> >
>> > exisitng code:
>> > 182        // MP4 Title
>> > 183           value = find_atom_value(TITLE);
>> > 184           if (value) {
>> > 185               track->title = charset_to_utf8(value);
>> > 186               free(value);
>> > 187           }
>> >
>> > new code:
>> > 383        // MP4 Sort Title
>> > 384           value = find_atom_value(SORT_TITLE);
>> > 385           if (value) {
>> > 386               track->sort_title = g_strdup(value);
>> > 387               free(value);
>> > 388           }
>> >
>> > if that's ok can you please explain why with a comment in the code?
>>
>>  I think the existing code is wrong.  MP4 tags are already stored as
>> UTF-8 (or at least they're supposed to be).  And AP doesn't do any
>> further conversions before returning the strings to us, so we shouldn't
>> be trying to convert an already UTF-8 string from local charset to
>> UTF-8.  If run with a non-UTF-8 charset locale, that would corrupt the
>> data.  For example, with ISO-8859-1, all but the lower 128 char values
>> would get corrupted.  Funny that I didn't even notice the other ones
>> doing it wrong. Thanks for reviewing the patch so we could catch the
>> other bug. I'll commit a fix. I'm no expert on i18n, charsets, etc, so
>> maybe we'll end up reverting my change right away, but that's how I see
>> it.
>>
>
>  I'm no expert in i18n and Unicode either, but I have some experience. I
> can review the change if you'd like.
>
>
> The more eyes, the the better.  Here's the patch:
>
> http://gtkpod.git.sourceforge.net/git/gitweb.cgi?p=gtkpod/gtkpod;a=commitdiff;h=0fd91b5fe7e93e5b2409b7b5a36e9dd052042237
>
> My reasoning is that find_atom_value() calls APar_ExtractDataAtom() which
> for text types, simply seeks to an offset in the MP4 file and reads the
> atom value data into the return buffer, without doing any kind of charset
> conversion.  Since MP4 atom text values are supposed to be stored as UTF-8,
> we should just copy these UTF-8 strings directly, rather than calling
> charset_to_utf8() on them.
>

Sounds good to me. Here's what the official
site<http://atomicparsley.sourceforge.net/>says on this topic:

> With the exception of artwork, all iTunes-style tags support only 1 piece
> of metadata - you can't have for example 3 title tags. 3gp assets and
> copyright notices support multiple tags, but they must differ in the
> language setting for that tag. You can have for example 3 title tags, but 1
> in english, 1 in spanish, 1 in undefined. *All iTunes-style text metadata
> is always in UTF-8*; 3gp assets & copyright notices can be in set in
> UTF-8 or UTF-16. *All strings are converted internally to UTF-8*,
> converting as necessary. On *nix platforms, input is in UTF-8; the native
> Windows port supports full UTF-16 input.