Menu

#464 Parser crash with specially formatted UTF-8 sequences

Test Required
closed-fixed
5
2016-03-12
2008-06-11
No

I have discovered a way to crash libexpat's xml parser with certain specially formatted UTF-8 sequences. All applications that link w/ expat and use it to render user-provided XML files are affected. As far as I see, the issue is not exploitable, just denial of service.

This is the patch that I have come up with, also attached to this email:

+++ lib/xmltok_impl.c 2007-12-21 11:11:42.054417000 -0800
@@ -1745,6 +1745,9 @@
 switch (BYTE_TYPE(enc, ptr)) {
 #define LEAD_CASE(n) \
 case BT_LEAD ## n: \
+ if (end - ptr < n) { \
+   return; \
+ } \
 ptr += n; \
 break;
 LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)

The parser's updatePosition function which keeps track of the current position pointer increments the ptr by {2, 3, 4} to skip past multibyte character ombinations, and this causes ptr in the "while (ptr != end)" loop to jump past the terminating condition, causing the loop to continue reading past 'end' and into out of bounds memory until a crash.

In general this parser does not appear the most robust and could be the source of some security issues.

A fault file is attached. To reproduce, compile examples/outline.c and run against it. This patch may not be 100% complete...

Contact:
Peter Valchev <pvalchev@google.com>

Discussion

  • Peter Valchev

    Peter Valchev - 2008-06-11

    Example of crash

     
  • Karl Waclawek

    Karl Waclawek - 2008-06-11

    Logged In: YES
    user_id=290026
    Originator: NO

    Can reproduce. The problem is that this code can be called *after* an error has been found (to report line and column number). Therefore it should not rely on correct byte counts for multibyte characters.

    Patch applied in xmltok_impl.c rev. 1.14.

    Would you please also report all the other issues you have found?

     
  • Karl Waclawek

    Karl Waclawek - 2008-06-11
    • assigned_to: fdrake --> kwaclaw
    • status: open --> open-fixed
     
  • Peter Valchev

    Peter Valchev - 2008-06-12

    Another crash case... seems to be same area

     
  • Peter Valchev

    Peter Valchev - 2008-06-12
    • assigned_to: kwaclaw --> fdrake
    • status: open-fixed --> open
     
  • Peter Valchev

    Peter Valchev - 2008-06-12

    Logged In: YES
    user_id=2114255
    Originator: YES

    Thanks.

    Actually rechecking in this area again, I think I found another issue, which seems to not be covered with the patch I provided :( I am attaching the new test case.

    I haven't found many other issues at this point... I stumbled upon this one fairly quickly and my above observations were more general than anything. If I do find more I'll be sure to tell you.

    File Added: expat-fault2.xml

     
  • Karl Waclawek

    Karl Waclawek - 2008-06-13
    • assigned_to: fdrake --> kwaclaw
     
  • Karl Waclawek

    Karl Waclawek - 2008-06-13

    Logged In: YES
    user_id=290026
    Originator: NO

    What is supposed to happen with the second test case?
    It reports an error, but does not crash with xmlwf or outline.
    IE reports the same error, so that is probably OK.

     
  • Peter Valchev

    Peter Valchev - 2008-06-13
    • status: open --> open-fixed
     
  • Peter Valchev

    Peter Valchev - 2008-06-13

    Logged In: YES
    user_id=2114255
    Originator: YES

    Ugh the padding I inserted broke the example. New attached tarball with expat-fault3.xml crashes both outline and xmlwf

    (gdb) r < /tmp/expat-fault3.
    Starting program: /usr/src/lib/libexpat/examples/outline < /tmp/expat-fault3.
    /bin/ksh: cannot open /tmp/expat-fault3.: No such file or directory

    Program exited with code 01.
    You can't do that without a process to debug.
    (gdb) r < /tmp/expat-fault3.xml
    Starting program: /usr/src/lib/libexpat/examples/outline < /tmp/expat-fault3.xml

    Program received signal SIGSEGV, Segmentation fault.
    little2_updatePosition (enc=0x2676dec0, ptr=0x7f4ba000 <Address 0x7f4ba000 out of bounds>,
    end=0x7f4b9003 "", pos=0x7de87198) at xmltok_impl.c:1748
    1748 switch (BYTE_TYPE(enc, ptr)) {
    (gdb) bt
    #0 little2_updatePosition (enc=0x2676dec0, ptr=0x7f4ba000 <Address 0x7f4ba000 out of bounds>,
    end=0x7f4b9003 "", pos=0x7de87198) at xmltok_impl.c:1748
    #1 0x0676b8a3 in XML_GetCurrentLineNumber (parser=0x7de87000)
    at /usr/src/lib/libexpat/lib/xmlparse.c:1799
    #2 0x1c000a24 in main ()

    File Added: expat-fault3.tar.gz

     
  • Peter Valchev

    Peter Valchev - 2008-06-13
     
  • Peter Valchev

    Peter Valchev - 2008-06-13

    Logged In: YES
    user_id=2114255
    Originator: YES

    Ugh the padding I inserted broke the example. New attached tarball with expat-fault3.xml crashes both outline and xmlwf

    (gdb) r < /tmp/expat-fault3.
    Starting program: /usr/src/lib/libexpat/examples/outline < /tmp/expat-fault3.
    /bin/ksh: cannot open /tmp/expat-fault3.: No such file or directory

    Program exited with code 01.
    You can't do that without a process to debug.
    (gdb) r < /tmp/expat-fault3.xml
    Starting program: /usr/src/lib/libexpat/examples/outline < /tmp/expat-fault3.xml

    Program received signal SIGSEGV, Segmentation fault.
    little2_updatePosition (enc=0x2676dec0, ptr=0x7f4ba000 <Address 0x7f4ba000 out of bounds>,
    end=0x7f4b9003 "", pos=0x7de87198) at xmltok_impl.c:1748
    1748 switch (BYTE_TYPE(enc, ptr)) {
    (gdb) bt
    #0 little2_updatePosition (enc=0x2676dec0, ptr=0x7f4ba000 <Address 0x7f4ba000 out of bounds>,
    end=0x7f4b9003 "", pos=0x7de87198) at xmltok_impl.c:1748
    #1 0x0676b8a3 in XML_GetCurrentLineNumber (parser=0x7de87000)
    at /usr/src/lib/libexpat/lib/xmlparse.c:1799
    #2 0x1c000a24 in main ()

    File Added: expat-fault3.tar.gz

     
  • Peter Valchev

    Peter Valchev - 2008-06-13
     
  • Karl Waclawek

    Karl Waclawek - 2008-06-13

    Logged In: YES
    user_id=290026
    Originator: NO

    I applied the original fix I had in mind, which I didn't use because I thought yours might perform better. It is quite simple: replace "while (ptr != end)" with "while (ptr < end)".
    Seems to work so far. Fixed in xmltok_impl.c rev. 1.15.

    Karl

     
  • Peter Valchev

    Peter Valchev - 2008-06-13

    Logged In: YES
    user_id=2114255
    Originator: YES

    Yes, I originally had the same patch (while ptr < end) but chose the latter to keep in line with the rest of the file, sigh.

    Now the next question is, there are many while (ptr != end) loops still left in that file if you search.. some of the look like they could have the same problem. Should all of them be changed, to be on the paranoid side? The original construct is just evil and waiting for bugs like this to happen...

     
  • Karl Waclawek

    Karl Waclawek - 2008-06-13

    Logged In: YES
    user_id=290026
    Originator: NO

    I know what you mean and I had the exact same thoughts. I think at least for older CPUs doing a "<" comparison might have been slower than doing a comparison for equality, so I am not sure how that would work out now. Also, we have to consider that this could modify the logic, with a slight risk of introducing an error.

    Anyway, I think there are other parts of the code (like the encoding checking) that ensures that pathological situations like the one you found don't happen. It is also important to observe that the crash seems to happen only *after* the parser returned with an error, and the code gets called again to get the error location while the parser is already in an error state.

    Don't forget, Expat is by default installed on most Linux distributions and it is used by Firefox/Mozilla, so Expat has been given a real good workout with millions of installations. The bugs you reported are the first bug reports we received in a long time (except for platform specific build issues).

    Nevertheless, please keep digging, we really want Expat to be resistant to pathological input.

    Thanks for the efforts,

    Karl

     
  • Karl Waclawek

    Karl Waclawek - 2009-01-17
    • milestone: --> Test Required
     
  • Karl Waclawek

    Karl Waclawek - 2009-01-17

    Has anyone tested current CVS for this issue's fix?

     
  • Sebastian Pipping

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -2,17 +2,17 @@
    
     This is the patch that I have come up with, also attached to this email:
    
    -+++ lib/xmltok\_impl.c 2007-12-21 11:11:42.054417000 -0800
    -@@ -1745,6 +1745,9 @@
    -switch \(BYTE\_TYPE\(enc, ptr\)\) \{
    -\#define LEAD\_CASE\(n\) \
    -case BT\_LEAD \#\# n: \
    -\+ if \(end - ptr &lt; n\) \{ \
    -\+   return; \
    -\+ \} \
    -ptr += n; \
    -break;
    -LEAD\_CASE\(2\) LEAD\_CASE\(3\) LEAD\_CASE\(4\)
    +    +++ lib/xmltok\_impl.c 2007-12-21 11:11:42.054417000 -0800
    +    @@ -1745,6 +1745,9 @@
    +    switch \(BYTE\_TYPE\(enc, ptr\)\) \{
    +    \#define LEAD\_CASE\(n\) \
    +    case BT\_LEAD \#\# n: \
    +    \+ if \(end - ptr &lt; n\) \{ \
    +    \+   return; \
    +    \+ \} \
    +    ptr += n; \
    +    break;
    +    LEAD\_CASE\(2\) LEAD\_CASE\(3\) LEAD\_CASE\(4\)
    
     The parser's updatePosition function which keeps track of the current position pointer increments the ptr by \{2, 3, 4\} to skip past multibyte character ombinations, and this causes ptr in the "while \(ptr \!= end\)" loop to jump past the terminating condition, causing the loop to continue reading past 'end' and into out of bounds memory until a crash.
    
     
  • Sebastian Pipping

    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -2,17 +2,17 @@
    
     This is the patch that I have come up with, also attached to this email:
    
    -    +++ lib/xmltok\_impl.c 2007-12-21 11:11:42.054417000 -0800
    +    +++ lib/xmltok_impl.c 2007-12-21 11:11:42.054417000 -0800
         @@ -1745,6 +1745,9 @@
    -    switch \(BYTE\_TYPE\(enc, ptr\)\) \{
    -    \#define LEAD\_CASE\(n\) \
    -    case BT\_LEAD \#\# n: \
    -    \+ if \(end - ptr &lt; n\) \{ \
    -    \+   return; \
    -    \+ \} \
    -    ptr += n; \
    -    break;
    -    LEAD\_CASE\(2\) LEAD\_CASE\(3\) LEAD\_CASE\(4\)
    +     switch (BYTE_TYPE(enc, ptr)) {
    +     #define LEAD_CASE(n) \
    +     case BT_LEAD ## n: \
    +    + if (end - ptr < n) { \
    +    +   return; \
    +    + } \
    +     ptr += n; \
    +     break;
    +     LEAD_CASE(2) LEAD_CASE(3) LEAD_CASE(4)
    
     The parser's updatePosition function which keeps track of the current position pointer increments the ptr by \{2, 3, 4\} to skip past multibyte character ombinations, and this causes ptr in the "while \(ptr \!= end\)" loop to jump past the terminating condition, causing the loop to continue reading past 'end' and into out of bounds memory until a crash.
    
     
  • Sebastian Pipping

    Releated commits in Git are:
    5d3f2b833e849f67b97411fd57007ee7e5b665ca -- Fix for bug #1990430.
    a247ccd476970b9a0036379c584cb592cf08d79b -- Better fix for bug #1990430.

    Closing.

     
  • Sebastian Pipping

    • status: open-fixed --> closed-fixed
     

Log in to post a comment.