|
From: John R. <jr...@bi...> - 2017-09-06 17:33:13
|
> cat /proc/cpuinfo [[snip]] > processor : 1 > model name : ARMv7 Processor rev 10 (v7l) > BogoMIPS : 132.00 > Features : half thumb fastmult vfp edsp neon vfpv3 tls vfpd32 > CPU implementer : 0x41 > CPU architecture: 7 > CPU variant : 0x2 > CPU part : 0xc09 > CPU revision : 10 > > Hardware : Freescale i.MX6 Quad/DualLite (Device Tree) > Revision : 0000 > Serial : 0000000000000000 > > Operating system info: > cat /etc/openwrt_release > DISTRIB_ID='OpenWrt' > DISTRIB_RELEASE='15.05' > DISTRIB_REVISION='r48153' > DISTRIB_CODENAME='chaos_calmer' > DISTRIB_TARGET='imx6/generic' > DISTRIB_DESCRIPTION='OpenWrt Chaos Calmer 15.05' > DISTRIB_TAINTS='no-all busybox' > > cat /proc/version > Linux version 3.18.23 (gcc version 5.3.0 (OpenWrt GCC 5.3.0 r48153) ) #6 SMP Tue Jul 11 16:35:20 CEST 2017 > > Source code could be downloaded from: > https://uclibc.org/downloads/uClibc-0.9.33.2.tar.bz2 > > Extracted chunks from trace output can be downloaded from: > https://github.com/KKoovalsky/Valgrind-problems > > The file is called vgtrace-shortened.txt. Full trace available in vgtrace.txt file. In this repo I also included compiled uClibc library. Thank you for the detailed information, particularly the vgtrace*.txt. It's a compiler "bug", and a "bug" in the memcheck implementation, and a definite bug in the memcheck error reporting. The workaround is to invoke valgrind(memcheck) with "--ignore-range-below-sp=0x0-0x14". The problem can be seen here: ===== vgtrace-shortened.txt line 8308 (arm) 0x4817678: mov r12, r13 ## copy r12 from r13(==sp) ------ IMark(0x4817678, 4, 0) ------ t1 = GET:I32(60) t0 = t1 t2 = t0 PUT(56) = t2 PUT(68) = 0x481767C:I32 (arm) 0x481767C: stmdb r13!, {0xDFF0} ## push r15(==pc),r14(==lr),r12,r11-r4 onto stack (sp===r13) *in that order* ------ IMark(0x481767C, 4, 0) ------ t3 = GET:I32(60) t4 = t3 PUT(60) = Sub32(t3,0x2C:I32) STle(Sub32(t4,0x4:I32)) = 0x4817684:I32 STle(Sub32(t4,0x8:I32)) = GET:I32(64) STle(Sub32(t4,0xC:I32)) = GET:I32(56) STle(Sub32(t4,0x10:I32)) = GET:I32(52) STle(Sub32(t4,0x14:I32)) = GET:I32(48) STle(Sub32(t4,0x18:I32)) = GET:I32(44) STle(Sub32(t4,0x1C:I32)) = GET:I32(40) STle(Sub32(t4,0x20:I32)) = GET:I32(36) STle(Sub32(t4,0x24:I32)) = GET:I32(32) STle(Sub32(t4,0x28:I32)) = GET:I32(28) STle(Sub32(t4,0x2C:I32)) = GET:I32(24) PUT(68) = 0x4817680:I32 (arm) 0x4817680: sub r11, r12, #0x4 ## r12 has same value as sp before the 'stmdb' ------ IMark(0x4817680, 4, 0) ------ t5 = GET:I32(56) t6 = 0x4:I32 t7 = Sub32(t5,t6) PUT(52) = t7 PUT(68) = 0x4817684:I32 (arm) 0x4817684: ldmdb r11, {0xAFF0} ## load r15(==pc) from stored lr, r13(==sp) from stored r12, r11-r4 from stored original values ------ IMark(0x4817684, 4, 0) ------ t8 = GET:I32(52) t9 = t8 PUT(68) = LDle:I32(Sub32(t9,0x4:I32)) PUT(60) = LDle:I32(Sub32(t9,0x8:I32)) PUT(48) = LDle:I32(Sub32(t9,0x10:I32)) PUT(44) = LDle:I32(Sub32(t9,0x14:I32)) PUT(40) = LDle:I32(Sub32(t9,0x18:I32)) PUT(36) = LDle:I32(Sub32(t9,0x1C:I32)) PUT(32) = LDle:I32(Sub32(t9,0x20:I32)) PUT(28) = LDle:I32(Sub32(t9,0x24:I32)) PUT(24) = LDle:I32(Sub32(t9,0x28:I32)) PUT(52) = LDle:I32(Sub32(t9,0xC:I32)) PUT(68) = GET:I32(68) PUT(68) = GET:I32(68); exit-Boring GuestBytes 4817678 16 0D C0 A0 E1 F0 DF 2D E9 04 B0 4C E2 F0 AF 1B E9 0028F343 VexExpansionRatio 16 952 595 :10 ==26904== Invalid read of size 4 ==26904== at 0x4000E54: ??? (in /lib/ld-uClibc-0.9.33.2.so) ==26904== Address 0x7dad09fc is on thread 1's stack ==26904== 20 bytes below stack pointer ===== The net effect of those 3 instructions is: r0-r3 do not change; none of them was written r4-r10 do not change; each value is stored and fetched to/from the same (corresponding) address r11 = (r12 - 4) from the 'sub' r12 gets the original (and final) value of r13(==sp) r13(==sp) does not change. It was decremented by 44 (11 registers times 4 bytes per register) but then loaded from the location which was written with the value of r12, which is the same as the original sp r14(==lr) does not change; it never was written r15(==pc) is loaded from the original value in r14(==lr) which is the return address memcheck's bug is reporting the location "at 0x..." using the new value that was loaded into pc, instead of the original value of the pc of the instruction which suffered the complaint. The compiler bug is relying on a particular implementation of poorly-specified hardware. The "ldmdb r11, {0xAFF0}" reads 10 words from memory, and changes the value of r13(==sp) among other registers. The compiler assumes that the change to r13 does not become visible until the entire instruction has completed, yet this is not guaranteed explicitly. It is conceivable that the 'ldmdb' could be interrupted immediately after writing r13(==sp), save internal state as part of servicing the interrupt, and resume state upon return. If so, then the fetches to load the remaining registers are outside the boundary of the stack (namely, less than the downward-growing sp), and that's a memcheck error. On the other hand, all known hardware does not allow such an interrupt (all side effects are "atomic") so the memcheck implementation is not faithful because it uses the new value of r13(==sp) to check subsequent memory fetches for other registers before the 'lmdb' instruction ends. The compiler's choice of storing and re-loading r4-r11 is horribly inefficient: 8 writes and 8 reads that only waste time. The value stored from r15(==pc) via the 'stmdb' never is read. The entire sequence could be replaced by "bx lr" or "mov pc,lr", (possibly preceded by "mov r12,sp"); except that 'bx' is not implemented in some early hardware, and "mov pc,lr" is frowned upon in hardware that does have 'bx'. One possible solution that works everywhere is to use "blx lr" and just ignore the value that is written to lr. -- |