incorrect check/decoding for utf-16 surrogates in id3 parser
Brought to you by:
sobukus
Forwarding bug #786718 [1] from the Debian bug tracker:
"utf-16 decoder in id3 parser improperly identifies surrogate pairs,
resulting in improper identification of characters in 0xf800-0xfffe
range as leading surrogate and decoding failure.
E.g. attempt to decode title "「x」~y~" results in:
[id3.c:1065] error: Invalid UTF16 surrogate pair at 0 (0xff62).
and empty parsed title.
Attempt to decode title "xy&zte" results in:
[id3.c:1065] error: Invalid UTF16 surrogate pair at 4 (0xff06).
and "xy" in parsed title.
Patch attached, verified to work.
This bug still present in the latest upstream version (1.22.2)."
Patch from the bug report.
Ah, many thanks for forwarding this bug. I didn't expect that and replied to the debian one already. I'll repeat for the record here:
Regarding the patch: Oh, yes, I see stupid me not getting the proper
idea about bit masks back in 2006/2007 in this case.
Let's recap to be on the safe side:
high surrogate range: 0xD800 to 0xDBFF
low suggogate range: 0xDC00 to 0xDFFF
Do we agree on that or is my knowledge of UTF-16 outdated?
I sense that the mask 0xf800 doesn't cover the first range properly,
neither. We need to detect bit sequences between
0b1101100000000000
0b1101101111111111
We don't want to catch
0b110111xxxxxxxxxx
in there. So a proper mask should be
0b1111110000000000
which is 0xfc00 in hex, too. Verifying the low surrogate range:
0b1101110000000000
0b1101111111111111
The mask
0b1111110000000000
seems appropriate here, too. How convenient. This smells of intelligent
design, doesn't it? ;-) So 0xfc00 should be used both for low and high
surrogates to properly tell them apart with the additional bit.
I'm attaching a revised patch that should enter mpg123 trunk shortly.
Feel free to yell and show the error in my current reasoning …
Hm, I kindof expected some reaction, also from the original reporter ... so I guess I'll just roll out the fix?
I was hoping that the original reporter would reply to your mail. But since I don't have a file to test the patches I won't be of much help.