|
From: Stephane H. <sho...@ni...> - 2007-06-12 23:39:50
|
I dont know if this is a known issue, a problem with our code, or an (up to now) unknown bug in valgrind. on a few projects we're working on that are using both threads and fibers, valgrind starts to spew tons of error of invalid access in the middle of allocated objects that has definitely not been freed, such as: ==22128== 6 errors in context 51 of 185: ==22128== Thread 1: ==22128== Invalid read of size 4 ==22128== at 0x4047C80: pthread_cond_signal@@GLIBC_2.3.2 (in /lib/libpthread-2.4.so) ==22128== by 0x8053D74: DriverProcess(LDI_PATCHER*, unsigned, unsigned) (patcher.cpp:820) ==22128== by 0x804C4EC: Main_Loop() (main_loop.cpp:59) ==22128== by 0x804C967: main (base_kernel.cpp:154) ==22128== Address 0x6F68D08 is 736 bytes inside a block of size 788 alloc'd ==22128== at 0x402192C: operator new[](unsigned) (vg_replace_malloc.c:195) ==22128== 7 errors in context 54 of 185: ==22128== Invalid read of size 4 ==22128== at 0x4498DF5: swapcontext (in /lib/libc-2.4.so) ==22128== Address 0x6F68F7C is 108 bytes inside a block of size 408 alloc'd ==22128== at 0x402192C: operator new[](unsigned) (vg_replace_malloc.c:195) ==22128== 7 errors in context 57 of 185: ==22128== Invalid read of size 4 ==22128== at 0x4498DEC: swapcontext (in /lib/libc-2.4.so) ==22128== Address 0x6F68F6C is 92 bytes inside a block of size 408 alloc'd ==22128== at 0x402192C: operator new[](unsigned) (vg_replace_malloc.c:195) ==22128== 33053 errors in context 184 of 185: ==22128== Invalid read of size 4 ==22128== at 0x804F289: ReadAnswerPacket(XML_Node&, Fiber*) (resource_relay.cpp:87) ==22128== Address 0x6F68F2C is 28 bytes inside a block of size 408 alloc'd ==22128== at 0x402192C: operator new[](unsigned) (vg_replace_malloc.c:195) etc... ==22128== IN SUMMARY: 171777 errors from 185 contexts (suppressed: 11 from 1) could it be that valgrind chokes on fibers implemented with swapcontext ? I tried with the latest version (valgrind-3.2.3) and an older one (valgrind-3.1.1) gcc (GCC) 4.1.0 Linux shockenhull 2.6.16.13-4-smp #1 SMP Wed May 3 04:53:23 UTC 2006 i686 athlon i386 GNU/Linux -- Stephane Hockenhull SSC-Studios.com |
|
From: Christoph B. <bar...@or...> - 2007-06-13 08:09:00
|
Am Mittwoch, 13. Juni 2007 schrieb Stephane Hockenhull: > I dont know if this is a known issue, a problem with our code, or an (up to > now) unknown bug in valgrind. > > on a few projects we're working on that are using both threads and fibers, > valgrind starts to spew tons of error of invalid access in the middle of > allocated objects that has definitely not been freed, such as: > > ==22128== 6 errors in context 51 of 185: > ==22128== Thread 1: > ==22128== Invalid read of size 4 > ==22128== at 0x4047C80: pthread_cond_signal@@GLIBC_2.3.2 > (in /lib/libpthread-2.4.so) > ==22128== by 0x8053D74: DriverProcess(LDI_PATCHER*, unsigned, unsigned) > (patcher.cpp:820) > ==22128== by 0x804C4EC: Main_Loop() (main_loop.cpp:59) > ==22128== by 0x804C967: main (base_kernel.cpp:154) > ==22128== Address 0x6F68D08 is 736 bytes inside a block of size 788 > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > (vg_replace_malloc.c:195) Valgrind does not complain that the objects are already freed. Are you sure that all objects are correctly initialized? For example the mutex and the condition variable used here? Christoph |
|
From: Stephane H. <sho...@ni...> - 2007-06-13 16:23:02
|
On Wednesday 13 June 2007 04:08, Christoph Bartoschek wrote: > Am Mittwoch, 13. Juni 2007 schrieb Stephane Hockenhull: > > I dont know if this is a known issue, a problem with our code, or an (up > > to now) unknown bug in valgrind. > > > > on a few projects we're working on that are using both threads and > > fibers, valgrind starts to spew tons of error of invalid access in the > > middle of allocated objects that has definitely not been freed, such as: > > > > ==22128== 6 errors in context 51 of 185: > > ==22128== Thread 1: > > ==22128== Invalid read of size 4 > > ==22128== at 0x4047C80: pthread_cond_signal@@GLIBC_2.3.2 > > (in /lib/libpthread-2.4.so) > > ==22128== by 0x8053D74: DriverProcess(LDI_PATCHER*, unsigned, > > unsigned) (patcher.cpp:820) > > ==22128== by 0x804C4EC: Main_Loop() (main_loop.cpp:59) > > ==22128== by 0x804C967: main (base_kernel.cpp:154) > > ==22128== Address 0x6F68D08 is 736 bytes inside a block of size 788 > > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > > (vg_replace_malloc.c:195) > > Valgrind does not complain that the objects are already freed. Are you sure > that all objects are correctly initialized? For example the mutex and the > condition variable used here? in that case it should say use of uninitialized value, not invalid access. also: ==19109== Invalid write of size 4 ==19109== at 0x4415D74: swapcontext (in /lib/libc-2.4.so) ==19109== Address 0x7F921C8 is 112 bytes inside a block of size 428 alloc'd ==19109== at 0x402192C: operator new[](unsigned) (vg_replace_malloc.c:195) if this was really an error it should crash hard in either case (doing a swap context on random data (ie all cpu registers including the instruction pointer and the stack pointer become junk) and have it succeed is like winning the lottery... I'm still not a millionaire ;) but when I run without valgrind the application churns away happily, and even with valgrind, it complains but still run fine. and as I said the data is not freed (there is no delete on that object in the code as it is part of the application itself, and valgrind seem to confirm it isn't) maybe the problem isnt swapcontext related but something else, in any case, seems like we're doing something that makes valgrind go paranoid. we believe there's a bug in our code that totally messes up valgrind. also, we're not using client requests (memcheck.h) in our code. we're actively investigating it, if some valgrind developers could give us some theories on what could cause this it would help greatly. > Christoph > > ------------------------------------------------------------------------- > This SF.net email is sponsored by DB2 Express > Download DB2 Express C - the FREE version of DB2 express and take > control of your XML. No limits. Just data. Click to get it now. > http://sourceforge.net/powerbar/db2/ > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users -- Stephane Hockenhull SSC-Studios.com |
|
From: Stephane H. <sho...@ni...> - 2007-06-13 17:27:46
|
On Wednesday 13 June 2007 04:08, Christoph Bartoschek wrote: > Am Mittwoch, 13. Juni 2007 schrieb Stephane Hockenhull: > > I dont know if this is a known issue, a problem with our code, or an (up > > to now) unknown bug in valgrind. > > > > on a few projects we're working on that are using both threads and > > fibers, valgrind starts to spew tons of error of invalid access in the > > middle of allocated objects that has definitely not been freed, such as: > > > > ==22128== 6 errors in context 51 of 185: > > ==22128== Thread 1: > > ==22128== Invalid read of size 4 > > ==22128== at 0x4047C80: pthread_cond_signal@@GLIBC_2.3.2 > > (in /lib/libpthread-2.4.so) > > ==22128== by 0x8053D74: DriverProcess(LDI_PATCHER*, unsigned, > > unsigned) (patcher.cpp:820) > > ==22128== by 0x804C4EC: Main_Loop() (main_loop.cpp:59) > > ==22128== by 0x804C967: main (base_kernel.cpp:154) > > ==22128== Address 0x6F68D08 is 736 bytes inside a block of size 788 > > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > > (vg_replace_malloc.c:195) > > Valgrind does not complain that the objects are already freed. Are you sure > that all objects are correctly initialized? For example the mutex and the > condition variable used here? > > Christoph it might be related to bug 125824 http://bugs.kde.org/show_bug.cgi?id=125824 lib pthread seem to cause troubles with valgrind anyone can confirm this ? -- Stephane Hockenhull SSC-Studios.com |
|
From: Julian S. <js...@ac...> - 2007-06-18 11:46:42
|
> it might be related to bug 125824 > > http://bugs.kde.org/show_bug.cgi?id=125824 > > lib pthread seem to cause troubles with valgrind > anyone can confirm this ? No, I don't think so. Much more likely to be as a result of swapcontext. Maybe you can make a small test case? J |
|
From: Julian S. <js...@ac...> - 2007-06-18 11:33:37
|
Are you using self modifying code or dynamic code generation? If yes, does --smc-check=all help? It might be something to do with swapcontext. We've had strangeness with that before, although I can't immediately think how it would cause this problem. J On Wednesday 13 June 2007 00:38, Stephane Hockenhull wrote: > I dont know if this is a known issue, a problem with our code, or an (up to > now) unknown bug in valgrind. > > on a few projects we're working on that are using both threads and fibers, > valgrind starts to spew tons of error of invalid access in the middle of > allocated objects that has definitely not been freed, such as: > > ==22128== 6 errors in context 51 of 185: > ==22128== Thread 1: > ==22128== Invalid read of size 4 > ==22128== at 0x4047C80: pthread_cond_signal@@GLIBC_2.3.2 > (in /lib/libpthread-2.4.so) > ==22128== by 0x8053D74: DriverProcess(LDI_PATCHER*, unsigned, unsigned) > (patcher.cpp:820) > ==22128== by 0x804C4EC: Main_Loop() (main_loop.cpp:59) > ==22128== by 0x804C967: main (base_kernel.cpp:154) > ==22128== Address 0x6F68D08 is 736 bytes inside a block of size 788 > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > (vg_replace_malloc.c:195) > > ==22128== 7 errors in context 54 of 185: > ==22128== Invalid read of size 4 > ==22128== at 0x4498DF5: swapcontext (in /lib/libc-2.4.so) > ==22128== Address 0x6F68F7C is 108 bytes inside a block of size 408 > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > (vg_replace_malloc.c:195) > > ==22128== 7 errors in context 57 of 185: > ==22128== Invalid read of size 4 > ==22128== at 0x4498DEC: swapcontext (in /lib/libc-2.4.so) > ==22128== Address 0x6F68F6C is 92 bytes inside a block of size 408 alloc'd > ==22128== at 0x402192C: operator new[](unsigned) > (vg_replace_malloc.c:195) > > ==22128== 33053 errors in context 184 of 185: > ==22128== Invalid read of size 4 > ==22128== at 0x804F289: ReadAnswerPacket(XML_Node&, Fiber*) > (resource_relay.cpp:87) > ==22128== Address 0x6F68F2C is 28 bytes inside a block of size 408 alloc'd > ==22128== at 0x402192C: operator new[](unsigned) > (vg_replace_malloc.c:195) > > etc... > > ==22128== IN SUMMARY: 171777 errors from 185 contexts (suppressed: 11 from > 1) > > could it be that valgrind chokes on fibers implemented with swapcontext ? > > I tried with the latest version (valgrind-3.2.3) and an older one > (valgrind-3.1.1) > > gcc (GCC) 4.1.0 > Linux shockenhull 2.6.16.13-4-smp #1 SMP Wed May 3 04:53:23 UTC 2006 i686 > athlon i386 GNU/Linux |
|
From: Stephane H. <sho...@ni...> - 2007-06-18 16:09:14
|
we're not using self-modifying code. I will try with --smc-check=all for now I've added a #define that removes all calls to swapcontext and replace them with fibers made of threads that try to lock a single mutex and check if its their turn so they all act as fibers (quite inefficient, but they do act like fibers). and it did fix all the weird stuff happening with swapcontext. valgrind reported 0 access errors (put aside an harmless/sloppy bug in the opengl library of X+4byte access in a X+3byte allocated buffer which malloc rounds up anyway, but its still sloppy coding, tsk tsk tsk ;) ) so there's definitely something going on with swapcontext+threads we've found out that even without valgrind odd things happen when we're using a mix of fibers and threads (there's an assert to make sure fibers of another thread doesn't get run accidently, I checked that possibility already). but valgrind is no help in figuring it out, I'll make sure to post the bug when I figure it out. On Monday 18 June 2007 07:32, Julian Seward wrote: > Are you using self modifying code or dynamic code generation? > If yes, does --smc-check=all help? > > It might be something to do with swapcontext. We've had strangeness > with that before, although I can't immediately think how it would > cause this problem. > > J > > On Wednesday 13 June 2007 00:38, Stephane Hockenhull wrote: > > I dont know if this is a known issue, a problem with our code, or an (up > > to now) unknown bug in valgrind. > > > > on a few projects we're working on that are using both threads and > > fibers, valgrind starts to spew tons of error of invalid access in the > > middle of allocated objects that has definitely not been freed, such as: > > > > ==22128== 6 errors in context 51 of 185: > > ==22128== Thread 1: > > ==22128== Invalid read of size 4 > > ==22128== at 0x4047C80: pthread_cond_signal@@GLIBC_2.3.2 > > (in /lib/libpthread-2.4.so) > > ==22128== by 0x8053D74: DriverProcess(LDI_PATCHER*, unsigned, > > unsigned) (patcher.cpp:820) > > ==22128== by 0x804C4EC: Main_Loop() (main_loop.cpp:59) > > ==22128== by 0x804C967: main (base_kernel.cpp:154) > > ==22128== Address 0x6F68D08 is 736 bytes inside a block of size 788 > > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > > (vg_replace_malloc.c:195) > > > > ==22128== 7 errors in context 54 of 185: > > ==22128== Invalid read of size 4 > > ==22128== at 0x4498DF5: swapcontext (in /lib/libc-2.4.so) > > ==22128== Address 0x6F68F7C is 108 bytes inside a block of size 408 > > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > > (vg_replace_malloc.c:195) > > > > ==22128== 7 errors in context 57 of 185: > > ==22128== Invalid read of size 4 > > ==22128== at 0x4498DEC: swapcontext (in /lib/libc-2.4.so) > > ==22128== Address 0x6F68F6C is 92 bytes inside a block of size 408 > > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > > (vg_replace_malloc.c:195) > > > > ==22128== 33053 errors in context 184 of 185: > > ==22128== Invalid read of size 4 > > ==22128== at 0x804F289: ReadAnswerPacket(XML_Node&, Fiber*) > > (resource_relay.cpp:87) > > ==22128== Address 0x6F68F2C is 28 bytes inside a block of size 408 > > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > > (vg_replace_malloc.c:195) > > > > etc... > > > > ==22128== IN SUMMARY: 171777 errors from 185 contexts (suppressed: 11 > > from 1) > > > > could it be that valgrind chokes on fibers implemented with swapcontext ? > > > > I tried with the latest version (valgrind-3.2.3) and an older one > > (valgrind-3.1.1) > > > > gcc (GCC) 4.1.0 > > Linux shockenhull 2.6.16.13-4-smp #1 SMP Wed May 3 04:53:23 UTC 2006 i686 > > athlon i386 GNU/Linux -- Stephane Hockenhull SSC-Studios.com |
|
From: Stephane H. <sho...@ni...> - 2007-07-10 15:44:36
|
we found the problem: fiber stack switches messes up valgrind's initialized+accessible flags. we tried with both linux's builtin makecontext/swapcontext and our own custom version and the problem comes up with both. we have small stack buffers allocated among other data, when we execute the stack switch valgrind clears the "accessible" bits over an large area that includes valid data. also, it flags data above the stack pointer as uninitialized even tho it was initialized and accessed before. [data][data][data][ allocated fiber_stack ][data][data] the fiber stack in our code on x86 IA32 gets initialized as -----top ---- [parameterN] ... [parameter2] [parameter1] [zero (null return address of fiber proc)] [fiber entry point (injected return address of our SwapContext)] [zero ebp] [zero ebx] [zero esi] [zero edi] <------------ context's initial ESP [zeroes] ... ---bottom---- ... here is other data in the application's heap and/or .bss section .. ----- so the first SwapContext switches ESP, restore (pop's out) its callee-saved registers and return address which jumps to the start of the fiber proc. this is perfectly sane code, but the initial stack switch causes valgrind to clear the initialized bits above the initial ESP so all parameters are considered as uninitialized values, and data below the stack gets marked as unaccessible so all valid accesses to data on the heap just below the fiber stack reports bogus errors. we would need a way to explicitly tell valgrind to mark an area as a stack region and remove that ugly hack where valgrind does that bogus automatic detection. this could be done inside an redirected makecontext directly, and in custom code as well. this would also allow valgrind to detect fiber stack overflow/underflow. On Tuesday 12 June 2007 19:38, Stephane Hockenhull wrote: > I dont know if this is a known issue, a problem with our code, or an (up to > now) unknown bug in valgrind. > > on a few projects we're working on that are using both threads and fibers, > valgrind starts to spew tons of error of invalid access in the middle of > allocated objects that has definitely not been freed, such as: > > ==22128== 6 errors in context 51 of 185: > ==22128== Thread 1: > ==22128== Invalid read of size 4 > ==22128== at 0x4047C80: pthread_cond_signal@@GLIBC_2.3.2 > (in /lib/libpthread-2.4.so) > ==22128== by 0x8053D74: DriverProcess(LDI_PATCHER*, unsigned, unsigned) > (patcher.cpp:820) > ==22128== by 0x804C4EC: Main_Loop() (main_loop.cpp:59) > ==22128== by 0x804C967: main (base_kernel.cpp:154) > ==22128== Address 0x6F68D08 is 736 bytes inside a block of size 788 > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > (vg_replace_malloc.c:195) > > ==22128== 7 errors in context 54 of 185: > ==22128== Invalid read of size 4 > ==22128== at 0x4498DF5: swapcontext (in /lib/libc-2.4.so) > ==22128== Address 0x6F68F7C is 108 bytes inside a block of size 408 > alloc'd ==22128== at 0x402192C: operator new[](unsigned) > (vg_replace_malloc.c:195) > > ==22128== 7 errors in context 57 of 185: > ==22128== Invalid read of size 4 > ==22128== at 0x4498DEC: swapcontext (in /lib/libc-2.4.so) > ==22128== Address 0x6F68F6C is 92 bytes inside a block of size 408 alloc'd > ==22128== at 0x402192C: operator new[](unsigned) > (vg_replace_malloc.c:195) > > ==22128== 33053 errors in context 184 of 185: > ==22128== Invalid read of size 4 > ==22128== at 0x804F289: ReadAnswerPacket(XML_Node&, Fiber*) > (resource_relay.cpp:87) > ==22128== Address 0x6F68F2C is 28 bytes inside a block of size 408 alloc'd > ==22128== at 0x402192C: operator new[](unsigned) > (vg_replace_malloc.c:195) > > etc... > > ==22128== IN SUMMARY: 171777 errors from 185 contexts (suppressed: 11 from > 1) > > could it be that valgrind chokes on fibers implemented with swapcontext ? > > I tried with the latest version (valgrind-3.2.3) and an older one > (valgrind-3.1.1) > > gcc (GCC) 4.1.0 > Linux shockenhull 2.6.16.13-4-smp #1 SMP Wed May 3 04:53:23 UTC 2006 i686 > athlon i386 GNU/Linux -- Stephane Hockenhull SSC-Studios.com |
|
From: Nicholas N. <nj...@cs...> - 2007-07-10 22:13:53
|
On Tue, 10 Jul 2007, Stephane Hockenhull wrote: > this is perfectly sane code, but the initial stack switch causes valgrind to > clear the initialized bits above the initial ESP so all parameters are > considered as uninitialized values, and data below the stack gets marked as > unaccessible so all valid accesses to data on the heap just below the fiber > stack reports bogus errors. Distinguishing stack switches from large stack allocations is tricky. The --max-stackframe option might be helpful. > we would need a way to explicitly tell valgrind to mark an area as a stack > region and remove that ugly hack where valgrind does that bogus automatic > detection. If the above fails, the VALGRIND_STACK_REGISTER client request and its friends may be of help. See http://www.valgrind.org/docs/manual/manual-core.html#manual-core.clientreq. Nick |