Re: [Gtkpod-devel] Patches: mp4 chapters, sorting/compilation/media type tags

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 11/13/2012 02:10 AM, Javier Kohen wrote:
>
> On Tue, Nov 13, 2012 at 2:44 AM, Joel Smith <jsf...@jk... 
> <mailto:jsf...@jk...>> wrote:
>
>     On 11/11/2012 03:09 AM, Daniele Forsi wrote:
>     > I don't know that code but in this commit
>     >
>     https://gitorious.org/~joelsmith/gtkpod/joelsmiths-sandbox/commit/6157a8c10646ff25c98651755c6dbbafa63170ba
>     <https://gitorious.org/%7Ejoelsmith/gtkpod/joelsmiths-sandbox/commit/6157a8c10646ff25c98651755c6dbbafa63170ba>
>     >
>     > you are using g_strdup() while existing code uses charset_to_utf8()
>     >
>     > exisitng code:
>     > 182        // MP4 Title
>     > 183           value = find_atom_value(TITLE);
>     > 184           if (value) {
>     > 185               track->title = charset_to_utf8(value);
>     > 186               free(value);
>     > 187           }
>     >
>     > new code:
>     > 383        // MP4 Sort Title
>     > 384           value = find_atom_value(SORT_TITLE);
>     > 385           if (value) {
>     > 386               track->sort_title = g_strdup(value);
>     > 387               free(value);
>     > 388           }
>     >
>     > if that's ok can you please explain why with a comment in the code?
>
>     I think the existing code is wrong.  MP4 tags are already stored as
>     UTF-8 (or at least they're supposed to be).  And AP doesn't do any
>     further conversions before returning the strings to us, so we
>     shouldn't
>     be trying to convert an already UTF-8 string from local charset to
>     UTF-8.  If run with a non-UTF-8 charset locale, that would corrupt the
>     data.  For example, with ISO-8859-1, all but the lower 128 char values
>     would get corrupted.  Funny that I didn't even notice the other ones
>     doing it wrong. Thanks for reviewing the patch so we could catch the
>     other bug. I'll commit a fix. I'm no expert on i18n, charsets, etc, so
>     maybe we'll end up reverting my change right away, but that's how
>     I see it.
>
>
> I'm no expert in i18n and Unicode either, but I have some experience. 
> I can review the change if you'd like.
>

The more eyes, the the better.  Here's the patch:
http://gtkpod.git.sourceforge.net/git/gitweb.cgi?p=gtkpod/gtkpod;a=commitdiff;h=0fd91b5fe7e93e5b2409b7b5a36e9dd052042237

My reasoning is that find_atom_value() calls APar_ExtractDataAtom() 
which for text types, simply seeks to an offset in the MP4 file and 
reads the atom value data into the return buffer, without doing any kind 
of charset conversion.  Since MP4 atom text values are supposed to be 
stored as UTF-8, we should just copy these UTF-8 strings directly, 
rather than calling charset_to_utf8() on them.
Regards,
Joel