#16 UTF-8 BOM triggers a crash

closed-fixed
None
5
2001-02-16
2000-11-29
Anonymous
No

Files (at least the file I was working with) which start with a UTF-8 BOM (0xEF 0xBB 0xBF)
can trigger a crash.

My best guess is that the problem lies in xmltok.c, in initScan().
Specifically, it looks like the "case 0xEFBB:" section neglects to set "*nextTokPtr = ptr + 3;"
which would be behavior consistent with the other XML_TOK_BOM cases in this function
since this UTF-8 BOM is 3 bytes long.

I am working with an old version of Expat but I checked the latest version
of xmltok.c and this code has apparently not changed.

The proposed fix is listed below.

Bruce Kaskel
Adobe SVG Viewer Engineering Lead
Adobe Systems Incorporated

case 0xEFBB:
/* Maybe a UTF-8 BOM (EF BB BF) */
/* If there's an explicitly specified (external) encoding
of ISO-8859-1 or some flavour of UTF-16
and this is an external text entity,
don't look for the BOM,
because it might be a legal data. */
if (state == XML_CONTENT_STATE) {
int e = INIT_ENC_INDEX(enc);
if (e == ISO_8859_1_ENC || e == UTF_16BE_ENC || e == UTF_16LE_ENC || e == UTF_16_ENC)
break;
}
if (ptr + 2 == end)
return XML_TOK_PARTIAL;
if ((unsigned char)ptr[2] == 0xBF) {
*nextTokPtr = ptr + 3; // <<---------** PROPOSED FIX **
*encPtr = encodingTable[UTF_8_ENC];
return XML_TOK_BOM;
}
break;

Discussion

  • Sam TH

    Sam TH - 2001-02-02

    Here's the change as a context diff:

    Index: xmltok.c

    RCS file: /cvsroot/expat/expat/lib/xmltok.c,v
    retrieving revision 1.5
    diff -u -c -r1.5 xmltok.c
    cvs server: conflicting specifications of output style
    *** xmltok.c 2000/10/22 19:20:23 1.5
    --- xmltok.c 2001/02/02 14:24:48
    ***************
    *** 1500,1505 ****
    --- 1500,1506 ----
    if (ptr + 2 == end)
    return XML_TOK_PARTIAL;
    if ((unsigned char)ptr[2] == 0xBF) {
    + *nextTokPtr = ptr + 3;
    *encPtr = encodingTable[UTF_8_ENC];
    return XML_TOK_BOM;
    }

     
  • Fred L. Drake, Jr.

    Suggested fix confirmed and checked in as lib/xmltok.c revision 1.6.

     
  • Fred L. Drake, Jr.

    • assigned_to: nobody --> fdrake
    • status: open --> closed-fixed
     

Log in to post a comment.