#173 Memory corruption with non-ASCII names

closed-accepted
Karl Waclawek
None
6
2002-07-09
2002-07-09
Karl Waclawek
No

I ran into a problem with Expat overwriting my
aplication memory. This happens when the content
model in an element declaration contains names
that are non-ASCII (e.g. Japanese).

This bug is hard to find, because it will not
always bite.

I could trace this down to the following section
in the function doProlog, under switch case
XML_ROLE_CONTENT_ELEMENT_PLUS:
...
el = getElementType(parser, enc, s, nxt);
if (!el)
return XML_ERROR_NO_MEMORY;
dtd.scaffold[myindex].name = el->name;
dtd.contentStringLen += nxt - s + 1;
...
dtd.contentStringLen is supposed to be incremented
by the length of el->name. However, for non-ASCII
names, the input length, nxt - s + 1, is not the
same as the encoded length. The function
poolStoreString within getElementType encodes the
name from the input encoding to the working
encoding of Expat (UTF-8 or UTF-16).

Specifically, in my test case, using a DTD encoded
in UTF-16BE and a working encoding of UTF-8, this
problem manifested itself in my app crashing badly.

Therefore I suggest this fix:
...
const XML_Char *name;
int nameLen;
...
el = getElementType(parser, enc, s, nxt);
if (!el)
return XML_ERROR_NO_MEMORY;
name = el->name;
dtd.scaffold[myindex].name = name;
nameLen = 0;
for (; name[nameLen++]; );
dtd.contentStringLen += nameLen;
...

Karl

Discussion

    • status: open --> open-accepted
     
  • Logged In: YES
    user_id=3066

    I like this change; feel free to check it in and close the bug.

     
    • status: open-accepted --> closed-accepted
     
  • Logged In: YES
    user_id=3066

    Karl checked this in as lib/xmlparse.c revision 1.49.