Menu

#99 support for ID3v2.4 (UTF-8)

None
closed-accepted
None
3
2023-11-21
2023-03-21
kris
No

It would be really great if LAME supported writing MP3 metadata as ID3v2.4 tags (with 0x03 prefix for UTF-8). Adoption of ID3v2.4 is growing (came out over 20 years ago!) and in general, UTF-8 is definitely the way forward.

Discussion

  • kris

    kris - 2023-03-21

    Apologies, meant to post this as a feature request.
    Did so here: https://sourceforge.net/p/lame/feature-requests/84/

     
  • kris

    kris - 2023-10-12

    I have taken a stab at adding support for UTF-8 ID3v2.4 tag - see the ID3v2.4_UTF8_support.patch file attached. Some details below.

    Background as I understand it
    The ID3v2.4 is the first ID3 tag version to explicitly support UTF-8 encoding - it is done by specifying 0x03 value for the encoding byte. The encoding byte of 0x00 indicates ISO-8859-1 (latin-1) encoding.

    For context, there are 3 versions of ID3 metadata tags at play here:

    ID3v1 - doesn't explicitly support UTF-8. Seems like you can use any encoding you like, but there is no way to signal to decoder how to interpret it. The assumed/spec-compliant encoding is ISO-8859-1 (latin1)

    ID3v2.2/ID3v2.3 - added Unicode support (16-bit unicode 2.0, UCS-2) in addition to ISO-8859-1 (latin1). The encoding is specified with the encoding byte:

    0x00 for ISO-8859-1 (latin1)
    0x01 UTF-16. Unicode strings must begin with the
    Unicode BOM ($FF FE or $FE FF) to identify the byte order.

    ID3v2.4 - added official UTF-8 support. The encoding byte now can have the following values:

    0x00 for ISO-8859-1 (latin1)
    0x01 UTF-16. Unicode strings must begin with the
    Unicode BOM ($FF FE or $FE FF) to identify the byte order.
    0x02 UTF-16BE - Unicode without BOM.
    0x03 UTF-8

    Details of the patch

    The patch adds support for writing UTF-8-encoded ID3v2.4 by adding two new externally-exposed functions, id3tag_add_v2_4_UTF8 and id3tag_v2_4_UTF8_only.

    Specifics:

    1). Added a new flag, V2_4_UTF8_FLAG, in libmp3lame/id3tag.h to indicate that when we are writing ID3v2 tags, they should be written as ID3v2.4 with UTF-8 compression (encoding byte = 3). This fits in with existing flags like ADD_V2_FLAG and V2_ONLY_FLAG.

    1) Added two new externally-exposed functions, id3tag_add_v2_4_UTF8 and id3tag_v2_4_UTF8_only, that turn on the V2_4_UTF8_FLAG flag. Using these function ensures that ID3v2 tags will be written as UTF-8-encoded ID3v2.4 tags. The id3tag_add_v2_4_UTF8 function means writing both ID3v1 and ID3v2.4 tags, while id3tag_v2_4_UTF8_only means writing only ID3v2.4 tags. This works similarly to existing id3tag_add_v2 and id3tag_v2_only functions. Changes for adding the two new functions and exposing them are in the following files:
    include/lame.def
    include/lame.h
    include/libmp3lame.sym
    libmp3lame/id3tag.c (implementation of the functions)

    3) Several minor changes in libmp3lame/id3tag.c to implement id3tag_add_v2_4_UTF8 and id3tag_v2_4_UTF8_only in a way that uses existing code and tries to avoid duplicating logic as much as possible.

    Applying and using the patch

    For me the patch seems to work correctly when applied to LAME beta-2 v3.101.2 (taken from this snapshot https://sourceforge.net/p/lame/svn/6505/tree/trunk/lame/). It can be applied as follows:

    patch -ru -b -d /path/to/where/lame/source/is < ID3v2.4_UTF8_support.patch 
    

    After the LAME library is rebuilt with the patch, you should be able to do one call to one of the new functions (id3tag_add_v2_4_UTF8 or id3tag_v2_4_UTF8_only) to set the LAME flags appropriately, e.g.:

       /* force writing ID3v2.4 with UTF-8 encoding */
        id3tag_v2_4_UTF8_only(lame_flags);
    

    And then just use LAME functions like id3tag_set_title or id3tag_set_artist, etc. as normal (but now providing UTF-8-encoded data to them). The patch does not handle conversions between encodings. This unfortunately means that using id3tag_add_v2_4_UTF8 will write the same data you provide to both ID3v1 and ID3v2.4 tags (e.g. either both latin1 or both UTF-8), which is not quite spec-compliant.

    My intention was to not modify any of existing LAME functionality, so hopefully writing ID3v1 and ID3v2.3 tags (when using functions like id3tag_add_v2, id3tag_v1_only, id3tag_v2_only, etc.) should still work the same after the patch is applied

     
  • Alexander Leidinger

    • status: open --> closed-accepted
    • assigned_to: Alexander Leidinger
    • Group: -->
     
  • Alexander Leidinger

    Committed. With some code in the frontend to use it. Could you please check/test and see if this is ok (the frontend uses iconv to convert the characters if needed)?

     

Log in to post a comment.