From: ISHIKAWA,chiaki <ish...@yk...> - 2022-05-11 03:10:24
|
Hi, I have been analyzing thunderbird mail client under valgrind for sometime. memcheck has been so useful for me to find memory-related errors. Thank you for releasing this great tool. Recently, I noticed an invalid read of 8 bytes warning, which should be familiar to all of us. Interestingly, the initial part of the stack trace is found in a report in Qt bug database. It comes from dynamic loading library support. https://bugreports.qt.io/browse/QTBUG-90374 It was filed last year. My system is Debian GNU/Linux and I used gcc to compile thunderbird. The report was done by someone who uses clang. I believe the issue lies in a certain version of dl-library, glibc OR valgrind? The reason I say valgrind might be to blame, too, is as follows. (Debian is known to release toolchains very conservatively. I think that is why I did not see this issue last year.) Actually, mine has line numbers slight off due to version differences I suspect. 143:39.43 GECKO(115765) ==115769== Invalid read of size 8 143:39.64 GECKO(115765) ==115769== at 0x4021BF4: strncmp (strcmp.S:175) 143:39.64 GECKO(115765) ==115769== by 0x400655D: is_dst (dl-load.c:214) 143:39.64 GECKO(115765) ==115769== by 0x4007666: _dl_dst_count (dl-load.c:251) 143:39.64 GECKO(115765) ==115769== by 0x4007857: expand_dynamic_string_token (dl-load.c:393) 143:39.64 GECKO(115765) ==115769== by 0x40079C7: fillin_rpath.isra.0 (dl-load.c:465) 143:39.68 GECKO(115765) ==115769== by 0x4007CC2: decompose_rpath (dl-load.c:636) 143:39.68 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath (dl-load.c:678) 143:39.68 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath (dl-load.c:659) ... [omitted] ... My local valgrind dump tells me where the address was allocated. 143:40.60 GECKO(115765) ==115769== Address 0x27ba3819 is 9 bytes inside a block of size 15 alloc'd 143:40.65 GECKO(115765) ==115769== at 0x483CF9B: malloc (vg_replace_malloc.c:380) 143:40.65 GECKO(115765) ==115769== by 0x402074B: malloc (rtld-malloc.h:56) 143:40.65 GECKO(115765) ==115769== by 0x402074B: strdup (strdup.c:42) 143:40.65 GECKO(115765) ==115769== by 0x4007C54: decompose_rpath (dl-load.c:611) 143:40.65 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath (dl-load.c:678) 143:40.65 GECKO(115765) ==115769== by 0x4009E9D: cache_rpath (dl-load.c:659) 143:40.65 GECKO(115765) ==115769== by 0x4009E9D: _dl_map_object (dl-load.c:2174) 143:40.65 GECKO(115765) ==115769== by 0x400E4B0: openaux (dl-deps.c:64) ... [omission] ... I *think* this is a valid error case of large-sized READ used in strncmp reading beyond the allocated memory boundary. (strcmp.S shows 8 octets read instead of one octet at a time.) I think such a usage of strdup/str{n}cmp combination is abound in C source codes. So I thought maybe valgrind was reporting something different. Otherwise, many application programs have to create suppression for this type of issue. That is what I thought initially. A different type of error I thought initially was, say, for example, 9 bytes inside a block of size 15 might mean somehow the data contains uninitialized data in the string area in that position. However, come to think of it, if so, strdup would have triggered a valgrind warning before this. There is no warning from valgrind for strdup. Also, I created a test program and realized that in that case, valgrind prints ==120076== Conditional jump or move depends on uninitialised value(s) ==120076== at 0x4843172: strncmp (vg_replace_strmem.c:663) ==120076== by 0x108778: main (in /home/ishikawa/Dropbox/TB-DIR/a.out) So the original problem must be the read beyond malloc'ed area boundary. Now, is dl-library to blame? I think dl-library has been used literally hundreds of million times or more daily and is hard to think that there is a bug there. (Famous last word). Dl-library does not have control how long each path strings are (I think it is trying to record the path components of a loading path), and thus cannot control valgrind messages generated due to 8-char read going beyond the malloced memory end. (So probably people have to create suppression after all. If the particular version has this issue.) As for valgrind, can valgrind be somehow more intelligent in this case? Maybe creating a substitute strcmp? (I know single char comparison at a time would be slower than comparing 8 characters at a time when appropriate). But at least, this type of surprise warning would be reduced. However, we may have a problem here for glibc.. If this read beyond the malloced region is for real, we have a problem. I have no idea how this behavior is constrained or sanctioned by C standard, C library standard or POSIX standard, but the use of 8 octets strcmp.S can lead to a real issue possibly unless malloc() does allocate memory chunks in 8 or larger unit uniformly. Unless glibc makes sure that there is a guard area between malloc area and the end of user virtual space. I have an experience where a bitblt-like CPU instruction expected us to create a bitmap with a horizontal bit length of multiple of 16 (or 8?). even if the really used screen size is less than that. So we had to round it up to the multiple of 16 (or 8?). I got a bit stingy on memory use and once created a bitmap data with the raster line not appended with this extra octets to make its length a multiple of 16 (or 8). Kaboom. I created this memory area using the C runtime library of the CPU/computer maker's OS. When the CPU bitblt-like instruction accessed the last raster line data, it fetches data 16 (or 8?) octets at a time and at the end, it accessed beyond the malloced area. And it was BEYOND the allocated user memory space by the OS. (The access of 8-byte read for intermediate ratlines ended up reading the next allocated rasterline area, and so it was OK.) So the program crashed due to memory violation. It took me a couple of weeks to figure this out since bitblt-like instruction did not offer any clue regarding where the address violation occurred. Also, only one of the screen bitmaps created thusly was at the end of user virtual space and it was difficult to realize why the instruction crashed seemingly in random manner when it handled other bitmaps without a problem. The CPU vendor intended to use the instruction only for the main display screen of its work station and in that case, the memory is preallocated in neat 16-multiple horizontal. I tried to use the bitblt-like instruction for arbitrary use-defined virtual screen. So my message here is that there *can* be a grave consequence of this malloc and reading larger than originally assumed chunk behavior, but I am not sure where to report this and alert the developers. Yeah, if malloc() allocates 8 or 16 byte chunks always, it should be OK [and we are better off it is built this way due to some standard, glibc manifest, or whatever published document which won't change overnight.] Even in this age of PC users having GBs of memory, I hate to think of programs which allocates memory using 3 or 4 octet length... Chiaki |