I am getting a segv when asking pyexpat to parse some bad xml. I'm not
sure whether this is a problem with pyexpat or with expat, but I'm leaning
towards expat after finding a (partial?) fix.
The stack of the segv is:
(gdb) bt
#0 0x010245f8 in normal_updatePosition (enc=0x1035fc0, ptr=0x73b000
<Address 0x73b000 out of bounds>, end=0x186cc50 "</<die", 'x' <repeats 76
times>, "</<die", 'x' <repeats 76 times>, "</<die", 'x' \
<repeats 30 times>..., pos=0x1317668) at
/Volumes/extra/dvitek/trunk/third-party/python/Modules/expat/xmltok_impl.c:
1748
#1 0x01011545 in XML_GetCurrentLineNumber (parser=0x13174d0) at
/Volumes/extra/dvitek/trunk/third-party/python/Modules/expat/xmlparse.c:179
3
#2 0x010093ca in set_error (self=0x725540, code=XML_ERROR_INVALID_TOKEN)
at /Volumes/extra/dvitek/trunk/third-party/python/Modules/pyexpat.c:126
#3 0x0100c633 in get_parse_result (self=0x725540, rv=0) at
/Volumes/extra/dvitek/trunk/third-party/python/Modules/pyexpat.c:918
#4 0x0100c6e0 in xmlparse_Parse (self=0x725540, args=0x7701e8) at
/Volumes/extra/dvitek/trunk/third-party/python/Modules/pyexpat.c:940
#5 0x00068067 in PyCFunction_Call (func=0x770a30, arg=0x7701e8, kw=0x0) at
Objects/methodobject.c:81
...
What appears to be going on is something like this:
There is an existing parser object whose m_positionPtr and m_eventPtr
properly point somewhere inside m_buffer. Everything is fine at this
point.
Someone calls XML_Parse on this parser. It proceeds to call XML_GetBuffer.
XML_GetBuffer allocates a new buffer, but doesn't touch m_positionPtr or
m_eventPtr. So these two pointers now point to freed memory (the old
buffer).
After XML_GetBuffer returns, there is a call to XML_ParseBuffer. This
function has a failure case just before it sets m_positionPtr and
m_eventPtr to point somewhere better. It returns XML_STATUS_ERROR before
touching m_positionPtr.
pyexpat sees this error code and proceeds to call XML_GetErrorLineNumber.
XML_GetLineNumber dereferences m_positionPtr in the stack we see above,
causing the crash.
I'm using the pyexpat (part of Python 2.6) interface to expat. Python 2.6
embeds a copy of expat's lib subdirectory at Modules/expat. To confirm
that things were not better in 2.0.1 I just copied the 2.0.1 sources over
the old ones. I was pleasantly surprised when it built 2.0.1 fine without
any tinkering. Anyway, the problem still happened with 2.0.1.
My fix was to add these three lines near the end of XML_GetBuffer, under
the assumption that nothing good can come of the dangling pointers:
}
+ positionPtr = NULL;
+ eventPtr = NULL;
+ eventEndPtr = NULL;
}
return bufferEnd;
}
My test case was:
from xml.parsers.expat import ParserCreate
for abc in range(1,1000):
print abc
p = ParserCreate()
for i in xrange(100):
try:
p.Parse('<die%s</' % ('x' * abc), False)
except:
pass
It only caused a crash on MacOSX; it worked OK on Linux. That's probably
just because glibc's free isn't giving back pages to the OS as aggressively
as the BSD one though.
Karl Waclawek
None
Test Required
Public
| Field | Old Value | Date | By |
|---|---|---|---|
| resolution_id | None | 2009-09-29 02:51 | kwaclaw |
| artifact_group_id | None | 2009-09-29 02:51 | kwaclaw |
| assigned_to | nobody | 2009-09-29 02:51 | kwaclaw |
Copyright © 2010 Geeknet, Inc. All rights reserved. Terms of Use