|
From: Petar J. <mip...@gm...> - 2012-11-29 14:52:16
|
hi everyone,
here is another issue related to LL/SC, MIPS and Valgrind. The issue happens in
DRD in pthread_spin_lock related tests (annotate_spinlock, pth_spinlock).
On MIPS, pthread_spin_lock looks like this:
0000d130 <pthread_spin_lock>:
0x048869a0: ll a2,0(a0)
0x048869a4: bnez a2,0x48869a0
0x048869a8: li a1,1
0x048869ac: sc a1,0(a0)
0x048869b0: beqz a1,0x48869a0
0x048869b4: move at,at
As you can see, branch after SC will go directly to the start of the function.
However, V will not jump to the begging of the block, rather back again to the
wrapper function in drd_pthread_intercepts.c. V will chain the translated
blocks, and skip the original block with LL as it was marked T_NoRedir. This
will trigger endless loop with SC failing each time, and subsequent calls to the
wrapper function which also changes stack pointer each time and the test
crashes at the end.
For the test purposes, we recompiled glibc to include an extra NOP at the start
of the function pthread_spin_lock, and then we see no issues.
So, any ideas how to correct this in a nice way?
Regards,
Petar
LOG:
--24728-- REDIR: 0x48869a0 (pthread_spin_lock) redirected to 0x485384c
(pthread_spin_lock)
==== SB 1661 (evchecks 18997) [tid 1] 0x485384c pthread_spin_lock
/home/dejan/mnt-rtrkw137/valgrind_tmp/exec/lib/valgrind/vgpreload_drd-mips32-linux.so+0xe84c
==== SB 1662 (evchecks 18997) [tid 1] 0x48538d8 pthread_spin_lock+140
/home/dejan/mnt-rtrkw137/valgrind_tmp/exec/lib/valgrind/vgpreload_drd-mips32-linux.so+0xe8d8
==== SB 1663 (evchecks 18997) [tid 1] 0x48869a0 pthread_spin_lock
/lib/libpthread-2.11.2.so+0xd9a0
==== SB 1663 (evchecks 18997) [tid 1] 0x48869ac pthread_spin_lock+12
/lib/libpthread-2.11.2.so+0xd9ac
|
|
From: John R. <jr...@bi...> - 2012-11-29 15:29:13
|
> On MIPS, pthread_spin_lock looks like this:
>
> 0000d130 <pthread_spin_lock>:
> 0x048869a0: ll a2,0(a0)
> 0x048869a4: bnez a2,0x48869a0
> 0x048869a8: li a1,1
> 0x048869ac: sc a1,0(a0)
> 0x048869b0: beqz a1,0x48869a0
> 0x048869b4: move at,at
>
> As you can see, branch after SC will go directly to the start of the function.
So does the 'bnez' at 0x048869a4, immediately after the LL !
> For the test purposes, we recompiled glibc to include an extra NOP at the start
> of the function pthread_spin_lock, and then we see no issues.
>
> So, any ideas how to correct this in a nice way?
Um, translate any function which begins with LL, as if there were a NOP first?
Besides, you must be dealing with a MIPS chip that has interlocks on memory operations.
The code above has no delay slots!
top:
ll a2,0(a0)
nop # delay slot for LOAD
bnez a2,top
li a1,1
sc a1,0(a0)
nop # delay slot for the LOAD that is implicit in SC
beqz a1,top
--
|
|
From: Petar J. <mip...@gm...> - 2012-11-29 16:11:58
|
On Thu, Nov 29, 2012 at 4:30 PM, John Reiser <jr...@bi...> wrote: >> On MIPS, pthread_spin_lock looks like this: >> >> 0000d130 <pthread_spin_lock>: >> 0x048869a0: ll a2,0(a0) >> 0x048869a4: bnez a2,0x48869a0 >> 0x048869a8: li a1,1 >> 0x048869ac: sc a1,0(a0) >> 0x048869b0: beqz a1,0x48869a0 >> 0x048869b4: move at,at >> >> As you can see, branch after SC will go directly to the start of the function. > > So does the 'bnez' at 0x048869a4, immediately after the LL ! That is correct, but not that important for this issue. > >> For the test purposes, we recompiled glibc to include an extra NOP at the start >> of the function pthread_spin_lock, and then we see no issues. >> >> So, any ideas how to correct this in a nice way? > > Um, translate any function which begins with LL, as if there were a NOP first? The problem is not related to NOP or LL. Adding a nop will move the branch point one instruction down, so the position will not match the start of the block. That's the catch of this test attempt. > Besides, you must be dealing with a MIPS chip that has interlocks on memory operations. > The code above has no delay slots! It does, but this is not relevant for this issue. I would assume a similar issue could happen on another architecture too. Petar |
|
From: John R. <jr...@bi...> - 2012-11-29 16:57:17
|
On 11/29/2012 08:11 AM, Petar Jovanovic wrote:
> The problem is not related to NOP or LL. Adding a nop will move the branch
> point one instruction down, so the position will not match the start of the
> block. That's the catch of this test attempt.
In other words, you believe that the problem arises because coregrind
cannot tell the difference between:
function: .globl function
<code with no branches and no labels>
b_xxx function # conditional or unconditional branch
and
function: .globl function
L20: # local label
<code with no branches and no labels>
b_xxx L20 # conditional or unconditional branch
?
--
|
|
From: Julian S. <js...@ac...> - 2012-11-30 13:21:36
|
Petar, I can see what the problem is, but I can't think of a clean way to fix it. It is unrelated to chaining -- only the redirection is a problem. It could happen also in a pre-chaining version (eg, 3.7.x) The problem is that the function redirection system regards all control flow transfers to the label <pthread_spin_lock> (in this example) as something that it must redirect. It cannot distinguish between a transfer from outside the function -- which is what it correctly redirects -- and the two transfers from inside the function (the bnez and beqz) -- which it incorrectly redirects (to the wrapper). In fact I can't think of any good solutions right now. One possibility is to add some kind of hack that says that all targets reached from conditional branches are not to be redirected. That would fix this case, but could cause potential other failures on, in particular, conditional calls on ARM to a redirected function. Another possibility is to guarantee that all control flow transfers that stay within the same function are not subject to redirection. Problem is that this means the behaviour of the simulator becomes dependent on the details of what is in the symbol table of libpthread.so (in this example). For example, if the symbol table correctly describes both the start address and size of pthread_spin_lock, then this heuristic will work. But if the size field is missing or too small, it may be that the bnez / beqz are not recognised as being "inside" the function, and so this heuristic fails. Also, there is an ambiguity about the meaning of a jump back to the start of a function (in general). Is this just a loop back edge, that we do not want to redirect? Or is it a tail-recursive call to the function, that we do want to recursively redirect (so the recursive call goes via a recursive invokation of the wrapper) ? There's no easy way to distinguish these cases at the machine code level. Another possibility (which also doesn't work) is to have a way for a wrapper function to say "I can never be called recursively", so that an apparent recursive redirect attempt (as in this example) is detected and avoided. Problem is that this requires keeping track of the set of currently active wrapper functions in each thread, and correctly clearing them even when a thread longjumps (ugly but possible) or when it switches stacks arbitrarily (more or less impossible to do reliably). So, right now, no .. I can't think of a robust solution. Can you at least file a bug report about this, so it is tracked? J On Thursday, November 29, 2012, Petar Jovanovic wrote: > hi everyone, > > here is another issue related to LL/SC, MIPS and Valgrind. The issue > happens in DRD in pthread_spin_lock related tests (annotate_spinlock, > pth_spinlock). > > On MIPS, pthread_spin_lock looks like this: > > 0000d130 <pthread_spin_lock>: > 0x048869a0: ll a2,0(a0) > 0x048869a4: bnez a2,0x48869a0 > 0x048869a8: li a1,1 > 0x048869ac: sc a1,0(a0) > 0x048869b0: beqz a1,0x48869a0 > 0x048869b4: move at,at > > As you can see, branch after SC will go directly to the start of the > function. However, V will not jump to the begging of the block, rather > back again to the wrapper function in drd_pthread_intercepts.c. V will > chain the translated blocks, and skip the original block with LL as it was > marked T_NoRedir. This will trigger endless loop with SC failing each > time, and subsequent calls to the wrapper function which also changes > stack pointer each time and the test crashes at the end. > > For the test purposes, we recompiled glibc to include an extra NOP at the > start of the function pthread_spin_lock, and then we see no issues. > > So, any ideas how to correct this in a nice way? > > > Regards, > Petar > > LOG: > --24728-- REDIR: 0x48869a0 (pthread_spin_lock) redirected to 0x485384c > (pthread_spin_lock) > ==== SB 1661 (evchecks 18997) [tid 1] 0x485384c pthread_spin_lock > /home/dejan/mnt-rtrkw137/valgrind_tmp/exec/lib/valgrind/vgpreload_drd-mips3 > 2-linux.so+0xe84c ==== SB 1662 (evchecks 18997) [tid 1] 0x48538d8 > pthread_spin_lock+140 > /home/dejan/mnt-rtrkw137/valgrind_tmp/exec/lib/valgrind/vgpreload_drd-mips > 32-linux.so+0xe8d8 ==== SB 1663 (evchecks 18997) [tid 1] 0x48869a0 > pthread_spin_lock > /lib/libpthread-2.11.2.so+0xd9a0 > ==== SB 1663 (evchecks 18997) [tid 1] 0x48869ac pthread_spin_lock+12 > /lib/libpthread-2.11.2.so+0xd9ac > > --------------------------------------------------------------------------- > --- Keep yourself connected to Go Parallel: > VERIFY Test and improve your parallel project with help from experts > and peers. http://goparallel.sourceforge.net > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers |
|
From: Josef W. <Jos...@gm...> - 2012-11-30 13:23:34
|
Am 29.11.2012 15:52, schrieb Petar Jovanovic: > hi everyone, > > here is another issue related to LL/SC, MIPS and Valgrind. The issue happens in > DRD in pthread_spin_lock related tests (annotate_spinlock, pth_spinlock). > > On MIPS, pthread_spin_lock looks like this: > > 0000d130 <pthread_spin_lock>: > 0x048869a0: ll a2,0(a0) > 0x048869a4: bnez a2,0x48869a0 > 0x048869a8: li a1,1 > 0x048869ac: sc a1,0(a0) > 0x048869b0: beqz a1,0x48869a0 > 0x048869b4: move at,at > > As you can see, branch after SC will go directly to the start of the function. > However, V will not jump to the begging of the block, rather back again to the > wrapper function in drd_pthread_intercepts.c. As far as I remember, doing redirection in a robust way is very tricky. I think you found a buggy corner case which did not happen yet. A solution could be this: for all branches/jumps within the instruction range of a function with redirection, never do a redirection for the function itself. But whether that is easily doable, that is a question for Julian. Josef |
|
From: Petar J. <mip...@gm...> - 2012-11-30 15:02:45
|
> It is unrelated to chaining -- only the redirection is a > problem. It could happen also in a pre-chaining version (eg, 3.7.x) It could happen there as well, but chaining made it worse as it removed a block with LL. > One possibility > is to add some kind of hack that says that all targets reached from > conditional branches are not to be redirected. That would fix this > case, but could cause potential other failures on, in particular, > conditional calls on ARM to a redirected function. If we introduce a change which would prevent redirections coming from branches and jumps (in contrast to branch-and-link and jump-and-link), would not that work for ARM as well? > Also, there is an ambiguity about the meaning of a jump back to the start > of a function (in general). Is this just a loop back edge, that we do not > want to redirect? Or is it a tail-recursive call to the function, that > we do want to recursively redirect (so the recursive call goes via > a recursive invokation of the wrapper) ? There's no easy way to distinguish > these cases at the machine code level. True, but are we aware of any function of interest that makes use of tail- recursive calls? Petar |
|
From: Julian S. <js...@ac...> - 2012-11-30 15:56:26
|
On Friday, November 30, 2012, Petar Jovanovic wrote: > It could happen there as well, but chaining made it worse as it removed > a block with LL. I'm not sure I understand what you mean. Chaining just makes it cheaper to get from one block to the next. It does not change the order in which the blocks are visited. Anyway .. not directly relevant. > If we introduce a change which would prevent redirections coming from > branches and jumps (in contrast to branch-and-link and jump-and-link), > would not that work for ARM as well? It is true that "write PC+4 to the link register and set PC to some new value" clearly indicates a call. No problems there. The difficulty is that "set PC to some new value but don't change the link register" usually, but not always, indicates a non-call. In particular, if we are doing a tail call then there will simply be a jump to the start of the new function (or this one, if recursive), which is no different from a conditional branch back to a loop that starts at the beginning of this function. So the presence/absence of writing the link register can't be used to reliably distinguish the call/non-call cases. J |
|
From: Petar J. <mip...@gm...> - 2012-11-30 17:33:33
|
> I'm not sure I understand what you mean. Chaining just makes it cheaper > to get from one block to the next. It does not change the order in which > the blocks are visited. > Anyway .. not directly relevant. Not directly relevant, just to make myself clear - we did saw one of the blocks not being chained (#3, see the first email), and that may have something to do with that block marked as T_NoRedir. Have not analyzed that further. > The difficulty is that "set PC to some new value but don't change the > link register" usually, but not always, indicates a non-call. True in general, but the functions we replace are in shared libraries, and there are some restrictions for them, as far as PIC/Linux is in question. > In > particular, if we are doing a tail call then there will simply be a jump > to the start of the new function (or this one, if recursive), which > is no different from a conditional branch back to a loop that starts > at the beginning of this function. Recursive tail call does not necessarily jump to the start of the function. So we may be already missing these calls (if there were any in functions that we replace). Also, some optimizations are not possible for shared libraries, since (i.e. calls to) functions may be overridden. Petar |
|
From: John R. <jr...@bi...> - 2012-11-30 17:43:40
|
On 11/30/2012 07:47 AM, Julian Seward wrote: > The difficulty is that "set PC to some new value but don't change the > link register" usually, but not always, indicates a non-call. In > particular, if we are doing a tail call then there will simply be a jump > to the start of the new function (or this one, if recursive), which > is no different from a conditional branch back to a loop that starts > at the beginning of this function. > > So the presence/absence of writing the link register can't be used to > reliably distinguish the call/non-call cases. But if neither side makes any provision for instantiating a new activation record, then that is a non-CALL. If the link register is not read and if the stack pointer [or, the head of the chain of activation records] is not read and written, then it is safe to classify the transfer as a non-CALL. -- |
|
From: Petar J. <mip...@gm...> - 2012-12-14 14:21:26
|
Bug has been created for this issue: https://bugs.kde.org/show_bug.cgi?id=311690 Until it is fixed, one of the workarounds could be to avoid pthread_spinlock interception for Helgrind and DRD for MIPS32. Let me know if anyone disagrees. Petar Index: helgrind/hg_intercepts.c =================================================================== --- helgrind/hg_intercepts.c (revision 13176) +++ helgrind/hg_intercepts.c (working copy) @@ -1095,7 +1095,7 @@ /*--- pthread_spinlock_t functions ---*/ /*----------------------------------------------------------------*/ -#if defined(HAVE_PTHREAD_SPIN_LOCK) +#if defined(HAVE_PTHREAD_SPIN_LOCK) && !defined(VGP_mips32_linux) /* Handled: pthread_spin_init pthread_spin_destroy pthread_spin_lock pthread_spin_trylock Index: drd/drd_pthread_intercepts.c =================================================================== --- drd/drd_pthread_intercepts.c (revision 13176) +++ drd/drd_pthread_intercepts.c (working copy) @@ -803,7 +803,7 @@ PTH_FUNCS(int, pthreadZucondZubroadcast, pthread_cond_broadcast_intercept, (pthread_cond_t* cond), (cond)); -#if defined(HAVE_PTHREAD_SPIN_LOCK) +#if defined(HAVE_PTHREAD_SPIN_LOCK) && !defined(VGP_mips32_linux) static __always_inline int pthread_spin_init_intercept(pthread_spinlock_t *spinlock, int pshared) { |
|
From: Bart V. A. <bva...@ac...> - 2012-12-15 10:05:02
|
On 12/14/12 15:21, Petar Jovanovic wrote: > Bug has been created for this issue: > > https://bugs.kde.org/show_bug.cgi?id=311690 > > Until it is fixed, one of the workarounds could be to avoid pthread_spinlock > interception for Helgrind and DRD for MIPS32. > > Let me know if anyone disagrees. > > Petar > > > Index: helgrind/hg_intercepts.c > =================================================================== > --- helgrind/hg_intercepts.c (revision 13176) > +++ helgrind/hg_intercepts.c (working copy) > @@ -1095,7 +1095,7 @@ > /*--- pthread_spinlock_t functions ---*/ > /*----------------------------------------------------------------*/ > > -#if defined(HAVE_PTHREAD_SPIN_LOCK) > +#if defined(HAVE_PTHREAD_SPIN_LOCK) && !defined(VGP_mips32_linux) > > /* Handled: pthread_spin_init pthread_spin_destroy > pthread_spin_lock pthread_spin_trylock > Index: drd/drd_pthread_intercepts.c > =================================================================== > --- drd/drd_pthread_intercepts.c (revision 13176) > +++ drd/drd_pthread_intercepts.c (working copy) > @@ -803,7 +803,7 @@ > PTH_FUNCS(int, pthreadZucondZubroadcast, pthread_cond_broadcast_intercept, > (pthread_cond_t* cond), (cond)); > > -#if defined(HAVE_PTHREAD_SPIN_LOCK) > +#if defined(HAVE_PTHREAD_SPIN_LOCK) && !defined(VGP_mips32_linux) > static __always_inline > int pthread_spin_init_intercept(pthread_spinlock_t *spinlock, int pshared) > { Changes like the above will leave anyone puzzled who tries to understand the Valgrind source code and who has not been following this discussion. How about the following: * Let configure.in define a preprocessor symbol on mips32 for the purpose of disabling intercepting pthread_spin_*(). * Test whether that symbol has been defined in hg_intercepts.c and drd_pthread_intercepts.c instead of testing VGP_mips32_linux. * Modify drd/tests/pth_spinlock.vgtest and helgrind/tests/pth_spinlock.vgtest such that these tests are skipped if intercepting pthread_spin_*() has been disabled. There are already examples of tests that are skipped based on the symbols defined in config.h, e.g. none/tests/tls.vgtest and none/tests/selfrun.vgtest. Bart. |
|
From: Petar J. <mip...@gm...> - 2012-12-20 19:00:02
|
> Changes like the above will leave anyone puzzled who tries to understand the > Valgrind source code and who has not been following this discussion. How > about the following: > * Let configure.in define a preprocessor symbol on mips32 for the > purpose of disabling intercepting pthread_spin_*(). > * Test whether that symbol has been defined in hg_intercepts.c and > drd_pthread_intercepts.c instead of testing VGP_mips32_linux. > * Modify drd/tests/pth_spinlock.vgtest and > helgrind/tests/pth_spinlock.vgtest such that these tests are skipped > if intercepting pthread_spin_*() has been disabled. There are already > examples of tests that are skipped based on the symbols defined in > config.h, e.g. none/tests/tls.vgtest and none/tests/selfrun.vgtest. > > Bart. > Done in r13190. |