|
From: Evgeniy S. <eu...@go...> - 2010-05-28 11:41:43
|
Hi,
what is the best way to add a delay to the instrumented code that will not
block other threads of a program?
In more detail, I'm trying to slow down individual threads of a program in
specific places of code, and observe the behavior of other threads at that
time.
AFAIK, multithreading is implemented in Valgrind in such a way that only one
thread is active at any moment of time, and threads are switched only
between IRSB's. Because of this, sleeping or busy looping in helper
functions does not work - it blocks the whole program.
I've got some success with the following approach. I steal the address of
usleep() function from the program with a client request from vgpreload part
of the tool, and insert a call to that address during code instrumentation:
// put sleep duration in %edi
PUT(56) = 0xF4241:I64
// put return address on the stack
t15 = GET:I64(32)
t16 = Sub64(t15,0x8:I64)
PUT(32) = t16
STle(t16) = 0x405F55:I64
// call the stolen usleep()
if (1:I1) goto {Call} 0x40A012:I64
This code must be placed immediately before an IMark, whose address is the
return address of the call (0x405FF5 in this case). It also can not be
placed at the beginning of an IRSB, because valgrind complains about an
unknown PC. This approach is very arch-dependent and does not feel right.
There is also an inconvenience that all instrumentation after this call is
lost after the callee returns, since the superblock is then split into two
that must be instrumented separately one more time.
Is there a simpler way to do this? Is it possible to somehow tell the
valgrind scheduler to let the other threads run for a bit (some kind of
VG_(sched_yield) or VG_(sleep))?
|
|
From: Konstantin S. <kon...@gm...> - 2010-06-01 08:44:05
|
[a bit of explanation why we need this] We are trying to implement an automated data race verifier, similar to http://code.google.com/p/data-race-test/wiki/RaceCheckerClass and the one mentioned in http://pages.cs.wisc.edu/~shanlu/paper/asplos184-zhang.pdf (search for ConMem-v). The idea is to put a short sleep around one racey access in thread T1 to give a chance to the second racey access to execute in thread T2 while T1 is sleeping. We already did it with PIN -- it was simple because PIN is multi-threaded and we just needed to call usleep(). See http://code.google.com/p/data-race-test/wiki/RaceVerifier With Valgrind it seems to be a bit trickier: if we call sleep() inside a helper function we will block the whole process because Valgrind is single-threaded. Question: how can I sleep in a helper function so that other threads get a chance to run? Thanks! --kcc On Fri, May 28, 2010 at 3:41 PM, Evgeniy Stepanov <eu...@go...>wrote: > Hi, > > what is the best way to add a delay to the instrumented code that will not > block other threads of a program? > > In more detail, I'm trying to slow down individual threads of a program in > specific places of code, and observe the behavior of other threads at that > time. > > AFAIK, multithreading is implemented in Valgrind in such a way that only > one thread is active at any moment of time, and threads are switched only > between IRSB's. Because of this, sleeping or busy looping in helper > functions does not work - it blocks the whole program. > > I've got some success with the following approach. I steal the address of > usleep() function from the program with a client request from vgpreload part > of the tool, and insert a call to that address during code instrumentation: > > // put sleep duration in %edi > PUT(56) = 0xF4241:I64 > // put return address on the stack > t15 = GET:I64(32) > t16 = Sub64(t15,0x8:I64) > PUT(32) = t16 > STle(t16) = 0x405F55:I64 > // call the stolen usleep() > if (1:I1) goto {Call} 0x40A012:I64 > > This code must be placed immediately before an IMark, whose address is the > return address of the call (0x405FF5 in this case). It also can not be > placed at the beginning of an IRSB, because valgrind complains about an > unknown PC. This approach is very arch-dependent and does not feel right. > > There is also an inconvenience that all instrumentation after this call is > lost after the callee returns, since the superblock is then split into two > that must be instrumented separately one more time. > > Is there a simpler way to do this? Is it possible to somehow tell the > valgrind scheduler to let the other threads run for a bit (some kind of > VG_(sched_yield) or VG_(sleep))? > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > > |
|
From: Julian S. <js...@ac...> - 2010-06-03 08:42:42
|
> With Valgrind it seems to be a bit trickier: if we call sleep() inside a > helper function we will block the whole process because Valgrind is > single-threaded. > Question: how can I sleep in a helper function so that other threads get a > chance to run? I don't think you can, unfortunately. I can't think of any remotely sane way to do it. (Or even any insane ones). Really Valgrind needs to be made multithreaded, but that's a big job. J |
|
From: Konstantin S. <kon...@gm...> - 2010-06-03 09:00:33
|
On Thu, Jun 3, 2010 at 1:04 PM, Julian Seward <js...@ac...> wrote: > >> With Valgrind it seems to be a bit trickier: if we call sleep() inside a >> helper function we will block the whole process because Valgrind is >> single-threaded. >> Question: how can I sleep in a helper function so that other threads get a >> chance to run? > > I don't think you can, unfortunately. I can't think of any remotely sane > way to do it. (Or even any insane ones). I see. So, we need to sleep in the instrumented code using the native program's sleep. This seems to work at some extent, but we had some problems (see the first message from Evgeniy). Do you have a code somewhere which inserts a native program's call into an IRSB? > > Really Valgrind needs to be made multithreaded, but that's a big job. Haha. PIN *is* multithreaded, and that is the biggest headache for me. But yes, I would love to have this headache with valgrind too :) :) --kcc > > J > |
|
From: Julian S. <js...@ac...> - 2010-06-03 09:40:48
|
> I've got some success with the following approach. I steal the address of
> usleep() function from the program with a client request from vgpreload
> part of the tool, and insert a call to that address during code
> instrumentation:
>
> // put sleep duration in %edi
> PUT(56) = 0xF4241:I64
> // put return address on the stack
> t15 = GET:I64(32)
> t16 = Sub64(t15,0x8:I64)
> PUT(32) = t16
> STle(t16) = 0x405F55:I64
> // call the stolen usleep()
> if (1:I1) goto {Call} 0x40A012:I64
>
> This code must be placed immediately before an IMark, whose address is the
> return address of the call (0x405FF5 in this case). It also can not be
> placed at the beginning of an IRSB, because valgrind complains about an
> unknown PC. This approach is very arch-dependent and does not feel right.
I think this will work, but as you say, it is ugly.
> Is there a simpler way to do this? Is it possible to somehow tell the
> valgrind scheduler to let the other threads run for a bit (some kind of
> VG_(sched_yield) or VG_(sleep))?
Yes (I think so .. I tried something like this a couple of months
back).
Let's suppose X is the client instruction after which you want to
let other threads run. After the translation of X, finish the
IRSB, and put a jump to the next instruction. (in the same way
that the front ends will translate an unconditional branch that
they don't chase into).
Except .. for this jump, mark it as Ijk_Yield, not _Boring.
In scheduler.c find this
case VEX_TRC_JMP_YIELD:
/* Explicit yield, because this thread is in a spin-lock
or something. Only let the thread run for a short while
longer. Because swapping to another thread is expensive,
we're prepared to let this thread eat a little more CPU
before swapping to another. That means that short term
spins waiting for hardware to poke memory won't cause a
thread swap. */
if (VG_(dispatch_ctr) > 2000)
VG_(dispatch_ctr) = 2000;
break;
change '2000' to '1'
In scheduler.c find this
/* ------------ now we don't have The Lock ------------ */
...
/* ------------ now we do have The Lock ------------ */
in between these two comments add this
VG_(do_syscall0)(__NR_sched_yield);
this should cause the thread to be placed to the back of the run queue
for threads of this priority, which will allow another thread to run.
but be careful, I think __NR_sched_yield on linux takes a parameter which
controls its behaviour. Google for that.
add debug printing to make sure this is really behaving as you expect.
(it's all pretty fragile, but I'm sure i had something like this working
earlier this year)
J
|
|
From: Evgeniy S. <eu...@go...> - 2010-08-04 08:54:09
|
Thanks for the advice! Some comments inline.
On Thu, Jun 3, 2010 at 10:02 AM, Julian Seward <js...@ac...> wrote:
>
> > I've got some success with the following approach. I steal the address of
> > usleep() function from the program with a client request from vgpreload
> > part of the tool, and insert a call to that address during code
> > instrumentation:
> >
> > // put sleep duration in %edi
> > PUT(56) = 0xF4241:I64
> > // put return address on the stack
> > t15 = GET:I64(32)
> > t16 = Sub64(t15,0x8:I64)
> > PUT(32) = t16
> > STle(t16) = 0x405F55:I64
> > // call the stolen usleep()
> > if (1:I1) goto {Call} 0x40A012:I64
> >
> > This code must be placed immediately before an IMark, whose address is
> the
> > return address of the call (0x405FF5 in this case). It also can not be
> > placed at the beginning of an IRSB, because valgrind complains about an
> > unknown PC. This approach is very arch-dependent and does not feel right.
>
> I think this will work, but as you say, it is ugly.
>
> > Is there a simpler way to do this? Is it possible to somehow tell the
> > valgrind scheduler to let the other threads run for a bit (some kind of
> > VG_(sched_yield) or VG_(sleep))?
>
> Yes (I think so .. I tried something like this a couple of months
> back).
>
> Let's suppose X is the client instruction after which you want to
> let other threads run. After the translation of X, finish the
> IRSB, and put a jump to the next instruction. (in the same way
> that the front ends will translate an unconditional branch that
> they don't chase into).
>
It seems that, unless I pass --vex-iropt-level=0, pre-instrumentation
optimization pass can move some instruction side effects past the IMark. For
example, this code:
0xD7DF7F7: movq (%rsi),%rax
------ IMark(0xD7DF7F7, 3) ------
t0 = GET:I64(48)
PUT(0) = LDle:I64(t0)
0xD7DF7FA: leaq 64(%rsp), %rcx
------ IMark(0xD7DF7FA, 5) ------
PUT(168) = 0xD7DF7FA:I64
t1 = Add64(GET:I64(32),0x40:I64)
PUT(8) = t1
..................
is translated to this even before the tool can take a look at it:
------ IMark(0xD7DF7F7, 3) ------
t0 = GET:I64(48)
t58 = LDle:I64(t0)
------ IMark(0xD7DF7FA, 5) ------
t60 = GET:I64(32)
t59 = Add64(t60,0x40:I64)
..............
If the IRSB is finished at 0xD7DF7FA IMark, all effects of the previous
instruction will be lost. Disabling optimization helps, but obviously slows
down execution and still looks fragile. What is the correct way to finish an
IRSB at some point, making sure that all effects of instructions up to that
point have already happened?
Except .. for this jump, mark it as Ijk_Yield, not _Boring.
>
> In scheduler.c find this
>
> case VEX_TRC_JMP_YIELD:
> /* Explicit yield, because this thread is in a spin-lock
> or something. Only let the thread run for a short while
> longer. Because swapping to another thread is expensive,
> we're prepared to let this thread eat a little more CPU
> before swapping to another. That means that short term
> spins waiting for hardware to poke memory won't cause a
> thread swap. */
> if (VG_(dispatch_ctr) > 2000)
> VG_(dispatch_ctr) = 2000;
> break;
>
> change '2000' to '1'
>
> In scheduler.c find this
>
> /* ------------ now we don't have The Lock ------------ */
> ...
> /* ------------ now we do have The Lock ------------ */
>
> in between these two comments add this
>
> VG_(do_syscall0)(__NR_sched_yield);
>
> this should cause the thread to be placed to the back of the run queue
> for threads of this priority, which will allow another thread to run.
>
> but be careful, I think __NR_sched_yield on linux takes a parameter which
> controls its behaviour. Google for that.
>
> add debug printing to make sure this is really behaving as you expect.
> (it's all pretty fragile, but I'm sure i had something like this working
> earlier this year)
>
I've noticed a VG_(vg_yield) function that seems to do exactly what is
needed. Is there a fundamental reason why it can not be used in helper
functions? Is there an assumption somewhere that the_BigLock can not be
released when a block of generated code has started execution and not
finished it yet?
|