The plugin uses TAGS_ReadEx
function to read tags:
const char* WINAPI TAGS_ReadEx( DWORD dwHandle, const char* fmt, DWORD tagtype, int codepage );
The tags library correctly reads ID3v2 tag first, but then squashes it into the system ANSI codepage, which is not necessarily compatible with the text, sometimes resulting in something like "???????? - ????????".
Luckily, there is a loophole: the library also provides TAGS_SetUTF8
function:
BOOL TAGS_SetUTF8(
BOOL enable // enable UTF-8?
);
Purpose:
Determines whether the TAGS_Read/Ex function returns a UTF-8 string
(true) or an ANSI string (false). The default is ANSI (false) on Windows
and UTF-8 (true) on other platforms.
Luckily, the client code already supports and autodetects UTF-8 tags, so it is enough to call TAGS_SetUTF8(TRUE)
, no further changes required.
A possible implementation:
--- "a/AudioPlayer\\audio_system.cpp"
+++ "b/AudioPlayer\\audio_system.cpp"
@@ -52,6 +52,13 @@ static const char* WINAPI dummy_tags(DWORD, const char *, DWORD, int)
return nullptr;
}
+typedef BOOL(WINAPI* TAGS_SETUTF8)(BOOL enable);
+
+static BOOL WINAPI dummy_setutf8(BOOL)
+{
+ return FALSE;
+}
+
const char *audio_system::get_tags(DWORD stream, const char *fmt)
{
const auto tags_func = []() {
@@ -61,6 +68,17 @@ const char *audio_system::get_tags(DWORD stream, const char *fmt)
return pf ? reinterpret_cast<TAGS_READEX>(pf) : &dummy_tags;
};
static const auto tags_func_ptr = tags_func();
+
+ const auto setutf8_func = []() {
+ if (instance()->_bass_tags_lib == nullptr)
+ return &dummy_setutf8;
+ const auto pf = GetProcAddress(instance()->_bass_tags_lib, "TAGS_SetUTF8");
+ return pf ? reinterpret_cast<TAGS_SETUTF8>(pf) : &dummy_setutf8;
+ };
+ static const auto setutf8_func_ptr = setutf8_func();
+
+ (*setutf8_func_ptr)(TRUE);
+
return (*tags_func_ptr)(stream, fmt, (DWORD)-1, (int)ap_settings.default_cp);
}
Anonymous
3.48.14
Thanks
Really i am not sure if this change is always good idea.
Maybe need config or some kind of heuristic to evaluate tags with utf8(true/false).
Why?
I didn't check.
But as I can remember text tags in IDv2 is not always unicode .
What if 'squashes it into the system ANSI codepage' is required sometimes?
I.e. ID3v2 is always either some Unicode form or pure ASCII.
Without the change the library converts the strings to the system code page.
For ASCII this is basically a no-op, for all Unicode forms it's a potentially lossy conversion.
With the change the library converts the strings to UTF-8.
For ASCII this is still a no-op, for a UTF-8 tag it is a also a no-op, for other forms of Unicode this is a lossless conversion.
I'd say overall it's an improvement.
The only case where the change would cause an extra conversion is when you have, say, an mp3 with only id3v1 in your system codepage and no id3v2 at all. However, such files are likely rare these days and the price of such a conversion is negligible, comparing to the stuff that happens right after reading the tags, i.e. codepage detection and conversion to wstring in normalize_title.