Menu

#215 AudioPlayer: support Unicode in ID3v2 tags

Done
None
Medium
Defect
2023-09-08
2023-09-03
No

The plugin uses TAGS_ReadEx function to read tags:

const char* WINAPI TAGS_ReadEx( DWORD dwHandle, const char* fmt, DWORD tagtype, int codepage );

The tags library correctly reads ID3v2 tag first, but then squashes it into the system ANSI codepage, which is not necessarily compatible with the text, sometimes resulting in something like "???????? - ????????".

Luckily, there is a loophole: the library also provides TAGS_SetUTF8 function:

BOOL TAGS_SetUTF8(
    BOOL enable // enable UTF-8?
);

Purpose:
Determines whether the TAGS_Read/Ex function returns a UTF-8 string
(true) or an ANSI string (false). The default is ANSI (false) on Windows
and UTF-8 (true) on other platforms.

Luckily, the client code already supports and autodetects UTF-8 tags, so it is enough to call TAGS_SetUTF8(TRUE), no further changes required.

A possible implementation:

--- "a/AudioPlayer\\audio_system.cpp"
+++ "b/AudioPlayer\\audio_system.cpp"
@@ -52,6 +52,13 @@ static const char* WINAPI dummy_tags(DWORD, const char *, DWORD, int)
    return nullptr;
 }

+typedef BOOL(WINAPI* TAGS_SETUTF8)(BOOL enable);
+
+static BOOL WINAPI dummy_setutf8(BOOL)
+{
+   return FALSE;
+}
+
 const char *audio_system::get_tags(DWORD stream, const char *fmt)
 {
    const auto tags_func = []() {
@@ -61,6 +68,17 @@ const char *audio_system::get_tags(DWORD stream, const char *fmt)
        return pf ? reinterpret_cast<TAGS_READEX>(pf) : &dummy_tags;
    };
    static const auto tags_func_ptr = tags_func();
+
+   const auto setutf8_func = []() {
+       if (instance()->_bass_tags_lib == nullptr)
+           return &dummy_setutf8;
+       const auto pf = GetProcAddress(instance()->_bass_tags_lib, "TAGS_SetUTF8");
+       return pf ? reinterpret_cast<TAGS_SETUTF8>(pf) : &dummy_setutf8;
+   };
+   static const auto setutf8_func_ptr = setutf8_func();
+
+   (*setutf8_func_ptr)(TRUE);
+
    return (*tags_func_ptr)(stream, fmt, (DWORD)-1, (int)ap_settings.default_cp);
 }

Discussion

  • Vladimir Surguchev

    • status: New --> Accepted
     
  • Vladimir Surguchev

    3.48.14

     
  • Alex Alabuzhev

    Alex Alabuzhev - 2023-09-04

    Thanks

     
  • Vladimir Surguchev

    Really i am not sure if this change is always good idea.
    Maybe need config or some kind of heuristic to evaluate tags with utf8(true/false).

     
  • Alex Alabuzhev

    Alex Alabuzhev - 2023-09-04

    i am not sure if this change is always good idea

    Why?

     
  • Vladimir Surguchev

    I didn't check.
    But as I can remember text tags in IDv2 is not always unicode .
    What if 'squashes it into the system ANSI codepage' is required sometimes?

     
  • Alex Alabuzhev

    Alex Alabuzhev - 2023-09-05

    ID3v2 frame overview
    Possible encodings:

    $00 ISO-8859-1 [ISO-8859-1]. Terminated with $00.
    $01 UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All
    strings in the same frame SHALL have the same byteorder.
    Terminated with $00 00.
    $02 UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM.
    Terminated with $00 00.
    $03 UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.

    I.e. ID3v2 is always either some Unicode form or pure ASCII.

    Without the change the library converts the strings to the system code page.
    For ASCII this is basically a no-op, for all Unicode forms it's a potentially lossy conversion.

    With the change the library converts the strings to UTF-8.
    For ASCII this is still a no-op, for a UTF-8 tag it is a also a no-op, for other forms of Unicode this is a lossless conversion.

    I'd say overall it's an improvement.

    The only case where the change would cause an extra conversion is when you have, say, an mp3 with only id3v1 in your system codepage and no id3v2 at all. However, such files are likely rare these days and the price of such a conversion is negligible, comparing to the stuff that happens right after reading the tags, i.e. codepage detection and conversion to wstring in normalize_title.

     
  • Vladimir Surguchev

    • status: Accepted --> Done
     

Anonymous
Anonymous

Add attachments
Cancel