|
From: Patrick S. <ps...@ci...> - 2007-12-04 15:08:16
|
Hi,
I'm trying to use the VALGRIND_NON_SIMD_CALL0() macro to cause a function
to be called natively (i.e. not simulated). However, when I do this, it
causes valgrind to exit with signal 11 (SEGV).
I have tried this with both valgrind 3.2.3 and with the latest code from
subversion, and get the same results. I am using Fedora Core 6 (but see
the same problem with Ubuntu 6.06).
I've searched the mailing list archives, and there are cases there that
mention a similar problem, but in that case it was due to the function
being passed to VALGRIND_NON_SIMD_CALL*() macros not having enough
arguments (there is always an extra thread_id parameter, so the CALLn
macro needs a function with n+1 arguments). However, I've accounted for
this in my code (example code below), and I'm still seeing the crash.
The crash only seems to occur if certain library functions are called. For
example, if the code below is run as-is, everything works fine, but if the
call to printf() is uncommented, then the crash is seen.
Is this a bug in Valgrind, or is there a limitation on the sort of
functions that can be called? If the latter, what is the nature of the
limitation, and are there any workarounds?
Many thanks for your help,
Patrick
Example code
============
#include <stdio.h>
#include <valgrind/valgrind.h>
int called = 0;
int test_func(unsigned int thread_id)
{
called = 1;
//printf("Called\n");
return (0);
}
int main(int argc, char *argv[])
{
VALGRIND_NON_SIMD_CALL0(test_func);
printf("called = %d\n", called);
return (0);
}
Results of running under valgrind (printf disabled as above) - OK
============================================================
==2534== Memcheck, a memory error detector.
==2534== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==2534== Using LibVEX rev exported, a library for dynamic binary translation.
==2534== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==2534== Using valgrind-3.3.0.RC1, a dynamic binary instrumentation framework.
==2534== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==2534== For more details, rerun with: -v
==2534==
called = 1
==2534==
==2534== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 12 from 1)
==2534== malloc/free: in use at exit: 0 bytes in 0 blocks.
==2534== malloc/free: 0 allocs, 0 frees, 0 bytes allocated.
==2534== For counts of detected errors, rerun with: -v
==2534== All heap blocks were freed -- no leaks are possible.
Results of running under valgrind (printf enabled) - segfaults
==================================================
==2750== Memcheck, a memory error detector.
==2750== Copyright (C) 2002-2007, and GNU GPL'd, by Julian Seward et al.
==2750== Using LibVEX rev exported, a library for dynamic binary translation.
==2750== Copyright (C) 2004-2007, and GNU GPL'd, by OpenWorks LLP.
==2750== Using valgrind-3.3.0.RC1, a dynamic binary instrumentation framework.
==2750== Copyright (C) 2000-2007, and GNU GPL'd, by Julian Seward et al.
==2750== For more details, rerun with: -v
==2750==
--2750-- VALGRIND INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--2750-- si_code=80; Faulting address: 0x0; sp: 0x627D4E74
valgrind: the 'impossible' happened:
Killed by fatal signal
==2750== at 0x47692399: puts (ioputs.c:37)
==2750== by 0x804844F: test_func (in /home/psmears/valgrind-test/a.out)
==2750== by 0x38034961: do_client_request (scheduler.c:1261)
==2750== by 0x38035F4B: vgPlain_scheduler (scheduler.c:979)
==2750== by 0x380497A8: run_a_thread_NORETURN (syswrap-linux.c:89)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable
==2750== at 0x80484A4: main (in /home/psmears/valgrind-test/a.out)
Note: see also the FAQ.txt in the source distribution.
It contains workarounds to several common problems.
If that doesn't help, please report this bug to: www.valgrind.org
In the bug report, send all the above text, the valgrind
version, and what Linux distro you are using. Thanks.
|
|
From: Julian S. <js...@ac...> - 2007-12-04 15:17:29
|
> > I'm trying to use the VALGRIND_NON_SIMD_CALL0() macro to cause a function > to be called natively (i.e. not simulated). That's all highly fragile. I suggest you don't do that. What problem are you really trying to solve? Why do you need to run bits of code natively? J |
|
From: Patrick S. <ps...@ci...> - 2007-12-04 15:32:58
|
On Tue, 4 Dec 2007, Julian Seward wrote: >> I'm trying to use the VALGRIND_NON_SIMD_CALL0() macro to cause a function >> to be called natively (i.e. not simulated). > > That's all highly fragile. I suggest you don't do that. What problem > are you really trying to solve? Why do you need to run bits of code > natively? Heh... that's a fair question :) The code in question uses shared memory for IPC. This works fine with Valgrind, provided we give it enough hints to know when memory has been initialised by another process - except for synchronisation: since valgrind breaks up atomic memory accesses, we see concurrency problems under valgrind that don't otherwise appear. The idea was to run the (relatively small, and so "trustworthy") synchronisation functions natively, to get around the atomicity problem. >From what you say it sounds like that might not be as easy a fix as I'd hoped - I'd be very grateful for any better suggestions :) Thanks for your help, Patrick |
|
From: Ashley P. <api...@co...> - 2007-12-04 16:55:10
|
On Tue, 2007-12-04 at 15:32 +0000, Patrick Smears wrote: > On Tue, 4 Dec 2007, Julian Seward wrote: > > >> I'm trying to use the VALGRIND_NON_SIMD_CALL0() macro to cause a function > >> to be called natively (i.e. not simulated). > > > > That's all highly fragile. I suggest you don't do that. What problem > > are you really trying to solve? Why do you need to run bits of code > > natively? > > Heh... that's a fair question :) > > The code in question uses shared memory for IPC. This works fine with > Valgrind, provided we give it enough hints to know when memory has been > initialised by another process - except for synchronisation: since > valgrind breaks up atomic memory accesses, we see concurrency problems > under valgrind that don't otherwise appear. > > The idea was to run the (relatively small, and so "trustworthy") > synchronisation functions natively, to get around the atomicity problem. > >From what you say it sounds like that might not be as easy a fix as I'd > hoped - I'd be very grateful for any better suggestions :) I've been doing this successfully for years without problem. Our shared memory code is a mix of staging buffers, producer-consumer fifo's and home-spun synchronisation primitives which rely on polling on whole dwords. Perhaps you could implement your locking using the bakery algorithm which doesn't require atomics? Ashley, |
|
From: Julian S. <js...@ac...> - 2007-12-05 21:06:44
|
> >> Are there any guidelines for stuff that's likely to work with > >> NON_SIMD_CALL, or is it just a case of "it works if it works"? > > > > That would be one for Julian although given the scale of the address > > space magic that valgrind does I'd expect "if it works you're lucky" to > > be a reasonable starting point. Perhaps you could construct your > > synchronisation primitives such that they are inside their own self > > contained functions and didn't do any memory accesses beyond what is > > absolutely necessary? > > I'll wait for Julian's input, but it's sounding like whatever happens I'm > going to have to replace the synchronisation one way or the other :) Sorry not to have been very inputful on this so far. I can't claim to have great insight into the robustness aspects of the NON_SIMD_CALL mechanism. However, looking at your original message, it occurs to me your prospects of that working are improved if you ensure that the NON_SIMD_CALL'd code (test_func) does not refer to any global variables, and certainly does not refer to any libc or other functions (printf et al). Any kind of entanglement with libc or dynamic linking is likely to have a bad outcome, for gnarly reasons which we've grappled with a lot in the distant past. If you simply make test_func be a wrapper around a LOCK-prefixed instruction and literally nothing else, your prospects might improve (or not, YMMV :-) Worth a try, I'd say. AIUI the lock-prefixed insns (etc) are actually the only things that you absolutely can't run on the simulator, right? J |
|
From: Madhan S. <mad...@gm...> - 2007-12-07 06:56:00
|
> > If you simply make test_func be a wrapper around a LOCK-prefixed > instruction and literally nothing else, your prospects might improve > (or not, YMMV :-) Worth a try, I'd say. > This actually works very well. I have all sorts of LOCK prefix usage and NON_SIMD_CALL has been working perfectly fine, right from Valgrind 2.4.1 to 3.2.3 version. As Julian pointed out, in my case the functions have almost nothing else other than the assembly code. Thanks, Madhan. |
|
From: Nicholas N. <nj...@cs...> - 2007-12-07 07:06:37
|
On Fri, 7 Dec 2007, Madhan Sadasivam wrote: > This actually works very well. I have all sorts of LOCK prefix usage > and NON_SIMD_CALL has been working perfectly fine, right from > Valgrind 2.4.1 to 3.2.3 version. > > As Julian pointed out, in my case the functions have almost nothing > else other than the assembly code. I think it's using libc functions in the NON_SIMD_CALLs that is the problem. Nick |
|
From: Patrick S. <ps...@ci...> - 2007-12-11 18:17:42
|
Hi all, > Sorry not to have been very inputful on this so far. > > I can't claim to have great insight into the robustness aspects > of the NON_SIMD_CALL mechanism. However, looking at your original message, > it occurs to me your prospects of that working are improved if you > ensure that the NON_SIMD_CALL'd code (test_func) does not refer to any > global variables, and certainly does not refer to any libc or other > functions (printf et al). Any kind of entanglement with libc or > dynamic linking is likely to have a bad outcome, for gnarly reasons > which we've grappled with a lot in the distant past. > > If you simply make test_func be a wrapper around a LOCK-prefixed > instruction and literally nothing else, your prospects might improve > (or not, YMMV :-) Worth a try, I'd say. > > AIUI the lock-prefixed insns (etc) are actually the only things that > you absolutely can't run on the simulator, right? That's certainly my understanding. The LOCK instructions are buried several function layers down from where they'd most conveniently be wrapped, amongst a number of libc calls, so extracting them isn't trivial (alas), but certainly possible. I'll put my thinking cap on about this and the other suggestions - thanks to all who have responded! - and try to figure out what will work best for this particular situation. If I come up with anything that might be of general use, I'll report back... Thanks again, Patrick |
|
From: Patrick S. <ps...@ci...> - 2009-07-24 15:00:12
|
OK - to follow up on a blast from the past: On Tue, 11 Dec 2007, Patrick Smears wrote: > Hi all, > >> Sorry not to have been very inputful on this so far. >> >> I can't claim to have great insight into the robustness aspects >> of the NON_SIMD_CALL mechanism. However, looking at your original message, >> it occurs to me your prospects of that working are improved if you >> ensure that the NON_SIMD_CALL'd code (test_func) does not refer to any >> global variables, and certainly does not refer to any libc or other >> functions (printf et al). Any kind of entanglement with libc or >> dynamic linking is likely to have a bad outcome, for gnarly reasons >> which we've grappled with a lot in the distant past. >> >> If you simply make test_func be a wrapper around a LOCK-prefixed >> instruction and literally nothing else, your prospects might improve >> (or not, YMMV :-) Worth a try, I'd say. >> >> AIUI the lock-prefixed insns (etc) are actually the only things that >> you absolutely can't run on the simulator, right? > > That's certainly my understanding. The LOCK instructions are buried several > function layers down from where they'd most conveniently be wrapped, amongst > a number of libc calls, so extracting them isn't trivial (alas), but > certainly possible. > > I'll put my thinking cap on about this and the other suggestions - thanks to > all who have responded! - and try to figure out what will work best for this > particular situation. If I come up with anything that might be of general > use, I'll report back... OK, for the record, here is what I did. To recap for anyone who missed the original discussion, the problem was that I wanted to run Valgrind on a process that communicates with a number of other processes via shared memory, and uses synchronisation primitives (mutexes etc) in that shared memory. The synchronisation primitives rely on using certain assembler instructions that perform atomic operations (e.g. read a location, and if its value is equal to register A, set it to register B, but don't let anyone change it between comparing it to A and setting it to B). Because of the way Valgrind works (simulating instructions), the atomic instructions get broken up into separate loads/stores, meaning that they're no longer atomic (i.e. it's possible for someone to change the location in question after it was compared to register A, but before it was set to register B). The upshot of this is that race conditions are introduced when running under Valgrind that do not otherwise exist - and this leads to the process becoming deadlocked :-(. A number of potential solutions were suggested on the mailing list (thanks!), but the one I decided to go with was the one from Julian quoted above - to replace each atomic instruction with a function that just uses that atomic instruction, and then call it using one of the VALGRIND_NON_SIMD_CALL*() macros (which calls the function on the 'real' CPU - rather than simulating it - and so you get the atomicity). Of course, this means that no tracking is done of memory/cache/whatever used by the atomic instruction, but this is acceptable in this application (and probably most others). Now, the locking primitives used by the application under test are (wrappers round wrappers round) the standard pthread_*() calls, which are provided (on Linux) by glibc, in the form of the nptl libpthread library[1]. So the easiest[2] thing to do was to modify the nptl routines to use VALGRIND_NON_SIMD_CALL*() when using atomic instructions. It turns out that this isn't too hard - all the pthread_* synchronisation routines in NPTL are built on top of a simpler, mutex-like primitive - the "lowlevellock" - so only the implementation of that (and the atomic instructions used) needs to be modified. (In fact that's not totally true - some calls have hand-coded optimised assembler routines for specific CPUs - but those can be disabled, leaving a simpler C implementation in terms of lowlevellocks.) This turned out to work beautifully - no more deadlocks, but plenty of bugs found/fixed in the software under test :-) In case anyone wants to use this, I have made the relevant files available. I started with the RPM for the RHEL4 version of the C library (since that was what was running on the servers in question - though it is rather long in the tooth now). The original source RPM, a .spec file that will make the modifications, plus the necessary files that it will copy in, I've placed at http://valgrind.smears.org/ If anyone has any questions about how this all works etc I'll do my best to answer them... Patrick [1] There also exists the older LinuxThreads pthread implementation, and indeed other C libraries, but the system I needed to use was using nptl, so that was what I used. [2] For some definition of 'easiest' |
|
From: Julian S. <js...@ac...> - 2009-07-24 15:47:36
|
> anyone change it between comparing it to A and setting it to B). Because > of the way Valgrind works (simulating instructions), the atomic > instructions get broken up into separate loads/stores, meaning that > they're no longer atomic (i.e. it's possible for someone to change the > location in question after it was compared to register A, but before it > was set to register B). That's fixed now, in the trunk (not 3.4.x). If you put in atomic instructions then you should get out (something equivalent to) atomic instructions. So V should now play nice with other processes communicating via shared memory. J |
|
From: Patrick S. <ps...@ci...> - 2009-07-24 15:49:21
|
On Fri, 24 Jul 2009, Julian Seward wrote: > >> anyone change it between comparing it to A and setting it to B). Because >> of the way Valgrind works (simulating instructions), the atomic >> instructions get broken up into separate loads/stores, meaning that >> they're no longer atomic (i.e. it's possible for someone to change the >> location in question after it was compared to register A, but before it >> was set to register B). > > That's fixed now, in the trunk (not 3.4.x). If you put in atomic > instructions then you should get out (something equivalent to) > atomic instructions. So V should now play nice with other processes > communicating via shared memory. Cool, that's good to know :) Is there a place I can read up on how this works? Patrick |
|
From: Julian S. <js...@ac...> - 2009-07-24 16:13:56
|
> > That's fixed now, in the trunk (not 3.4.x). If you put in atomic > > instructions then you should get out (something equivalent to) > > atomic instructions. So V should now play nice with other processes > > communicating via shared memory. > > Cool, that's good to know :) Is there a place I can read up on how this > works? Not really. On x86 and amd64, all atomic instructions are translated using a loop, which fetches the old value, computes the new value, and uses a compare-and-swap to either stuff the new value in, or detect that the old value has changed, in which case it starts over. On ppc32/64, lwarx/stwcx. are simply carried through the JIT pipeline and re-emitted (more or less) as-is. J |
|
From: Patrick S. <ps...@ci...> - 2009-07-27 09:03:22
|
On Fri, 24 Jul 2009, Patrick Smears wrote: > On Fri, 24 Jul 2009, Julian Seward wrote: > >> >>> anyone change it between comparing it to A and setting it to B). Because >>> of the way Valgrind works (simulating instructions), the atomic >>> instructions get broken up into separate loads/stores, meaning that >>> they're no longer atomic (i.e. it's possible for someone to change the >>> location in question after it was compared to register A, but before it >>> was set to register B). >> >> That's fixed now, in the trunk (not 3.4.x). If you put in atomic >> instructions then you should get out (something equivalent to) >> atomic instructions. So V should now play nice with other processes >> communicating via shared memory. I have tried this out and it works well - great :-) Patrick |
|
From: Julian S. <js...@ac...> - 2009-07-27 09:37:16
|
On Monday 27 July 2009, Patrick Smears wrote: > On Fri, 24 Jul 2009, Patrick Smears wrote: > > On Fri, 24 Jul 2009, Julian Seward wrote: > >>> anyone change it between comparing it to A and setting it to B). > >>> Because of the way Valgrind works (simulating instructions), the atomic > >>> instructions get broken up into separate loads/stores, meaning that > >>> they're no longer atomic (i.e. it's possible for someone to change the > >>> location in question after it was compared to register A, but before it > >>> was set to register B). > >> > >> That's fixed now, in the trunk (not 3.4.x). If you put in atomic > >> instructions then you should get out (something equivalent to) > >> atomic instructions. So V should now play nice with other processes > >> communicating via shared memory. > > I have tried this out and it works well - great :-) Good, thanks for the confirmation. J |
|
From: Madhan S. <mad...@gm...> - 2009-07-27 13:44:57
|
When is this release expected. This solves a long standing problem for me. Thanks, Madhan. On Mon, Jul 27, 2009 at 3:14 PM, Julian Seward <js...@ac...> wrote: > On Monday 27 July 2009, Patrick Smears wrote: > > On Fri, 24 Jul 2009, Patrick Smears wrote: > > > On Fri, 24 Jul 2009, Julian Seward wrote: > > >>> anyone change it between comparing it to A and setting it to B). > > >>> Because of the way Valgrind works (simulating instructions), the > atomic > > >>> instructions get broken up into separate loads/stores, meaning that > > >>> they're no longer atomic (i.e. it's possible for someone to change > the > > >>> location in question after it was compared to register A, but before > it > > >>> was set to register B). > > >> > > >> That's fixed now, in the trunk (not 3.4.x). If you put in atomic > > >> instructions then you should get out (something equivalent to) > > >> atomic instructions. So V should now play nice with other processes > > >> communicating via shared memory. > > > > I have tried this out and it works well - great :-) > > Good, thanks for the confirmation. > > J > > > ------------------------------------------------------------------------------ > _______________________________________________ > Valgrind-users mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-users > |
|
From: Nicholas N. <n.n...@gm...> - 2009-07-27 22:14:37
|
On Mon, Jul 27, 2009 at 11:44 PM, Madhan Sadasivam<mad...@gm...> wrote: > When is this release expected. > This solves a long standing problem for me. Mid-August, hopefully. Nick |