Function ezxml_char_content() put pointer to internal address of larger block as xml→txt, which is later deallocated using free leading to segmentation fault.
ASAN report (segmentation fault occurs also without ASAN):
=================================================================
==6853==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x60300000efee in thread T0
0 0x7f2123ee72ca in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x982ca)
1 0x410bb8 in ezxml_free ezxml/ezxml.c:821
2 0x4106fd in ezxml_free ezxml/ezxml.c:791
3 0x401808 in main ezxml/test_ezxml.c:102
4 0x7f2123aa582f in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
5 0x4018f8 in _start (ezxml/test_ezxml_asan.exe+0x4018f8)
0x60300000efee is located 14 bytes inside of 17-byte region [0x60300000efe0,0x60300000eff1)
allocated by thread T0 here:
0 0x7f2123ee7602 in malloc (/usr/lib/x86_64-linux-gnu/libasan.so.2+0x98602)
1 0x4171cc in ezxml_parse_fd ezxml/ezxml.c:646
2 0x4171cc in ezxml_parse_file ezxml/ezxml.c:659
SUMMARY: AddressSanitizer: bad-free ??:0 __interceptor_free
==6853==ABORTING
Reproduction:
Sample XML file leading to crash:
crash_006_bad_free.xml
Code snippet for reproduction:
ezxml_t result = ezxml_parse_file("crash_006_bad_free.xml");
Additional files for reproduction of this crash.
The error occurs due to a bogus character reference that cannot be encoded as UTF-8. The proposed patch checks whether the character can be encoded in 36 bits (the maximum value that can be encoded using the UTF-8 encoding - note: the standard only uses 21 bits).
This fix also resolves CVE-2019-20202 (bug 17) as well as CVE-2021-31598 (bug 28).
Note: The solution as proposed in bug 28 is incorrect. It causes valid XML data to be rejected as well.