Menu

#216 incorrect check/decoding for utf-16 surrogates in id3 parser

svn
closed-fixed
nobody
None
5
2015-08-04
2015-05-25
No

Forwarding bug #786718 [1] from the Debian bug tracker:

"utf-16 decoder in id3 parser improperly identifies surrogate pairs,
resulting in improper identification of characters in 0xf800-0xfffe
range as leading surrogate and decoding failure.

E.g. attempt to decode title "「x」~y~" results in:
[id3.c:1065] error: Invalid UTF16 surrogate pair at 0 (0xff62).
and empty parsed title.
Attempt to decode title "xy&zte" results in:
[id3.c:1065] error: Invalid UTF16 surrogate pair at 4 (0xff06).
and "xy" in parsed title.

Patch attached, verified to work.
This bug still present in the latest upstream version (1.22.2)."

[1] https://bugs.debian.org/786718

Discussion

  • Sebastian Ramacher

    Patch from the bug report.

     
  • Thomas Orgis

    Thomas Orgis - 2015-05-25

    Ah, many thanks for forwarding this bug. I didn't expect that and replied to the debian one already. I'll repeat for the record here:

    Regarding the patch: Oh, yes, I see stupid me not getting the proper
    idea about bit masks back in 2006/2007 in this case.

    Let's recap to be on the safe side:

    high surrogate range: 0xD800 to 0xDBFF
    low suggogate range: 0xDC00 to 0xDFFF

    Do we agree on that or is my knowledge of UTF-16 outdated?

    I sense that the mask 0xf800 doesn't cover the first range properly,
    neither. We need to detect bit sequences between

    0b1101100000000000
    0b1101101111111111

    We don't want to catch

    0b110111xxxxxxxxxx

    in there. So a proper mask should be

    0b1111110000000000

    which is 0xfc00 in hex, too. Verifying the low surrogate range:

    0b1101110000000000
    0b1101111111111111

    The mask

    0b1111110000000000

    seems appropriate here, too. How convenient. This smells of intelligent
    design, doesn't it? ;-) So 0xfc00 should be used both for low and high
    surrogates to properly tell them apart with the additional bit.

    I'm attaching a revised patch that should enter mpg123 trunk shortly.

    Feel free to yell and show the error in my current reasoning …

     
  • Thomas Orgis

    Thomas Orgis - 2015-06-28

    Hm, I kindof expected some reaction, also from the original reporter ... so I guess I'll just roll out the fix?

     
  • Sebastian Ramacher

    I was hoping that the original reporter would reply to your mail. But since I don't have a file to test the patches I won't be of much help.

     
  • Thomas Orgis

    Thomas Orgis - 2015-08-04
    • status: open --> closed-fixed
     

Log in to post a comment.

MongoDB Logo MongoDB