|
From: Jeremy F. <je...@go...> - 2003-07-30 06:55:45
|
On Tue, 2003-07-29 at 08:56, Ashley Pittman wrote: > It would be great if valgrind could support our use of clone out of the > box and I welcome anything that would help this. Ah, yes. Someone at Sandia was showing me the use of clone() in the Quadrics driver. Some form of clone support is definitely on the todo list; it just depends on exactly what flags you're passing to clone(). The closer it is to a pthreads thread, the easier it is. What's wrong with using pthreads, BTW? > I quickly hit a second problem though in that when valgrind is tracking > two threads it appears to hold a mutex lock while one of them is in a > system call. This unfortunately goes against our software model because > the new thread that we create spends all it's time in kernel space only > returning to user space to process signals. Right. Valgrind's current way of handling syscalls is to assume that they never block, and if they do block, try to recast them into a non-blocking form. That's almost possible with standard syscalls, but it completely fails to cope with all the various ioctls different drivers use, which is what I'm assuming you're talking about here. I've nearly finished a complete reworking of the syscalls/signals code which fixes this by allowing syscalls to block in their own cloned LWP while still allowing the scheduler and other threads to continue. > As you can imagine this produces deadlock and the main thread never gets > the chance to do anything. What would be ideal is if valgrind could > somehow forget about this new thread that was generated and concentrate > on the original one. It can't forget about threads, but it can manage it like any other thread. J |
|
From: Ashley P. <as...@qu...> - 2003-07-30 09:50:42
|
On Wednesday, Jul 30, 2003, at 07:55 Europe/London, Jeremy Fitzhardinge
wrote:
> On Tue, 2003-07-29 at 08:56, Ashley Pittman wrote:
>> It would be great if valgrind could support our use of clone out of
>> the
>> box and I welcome anything that would help this.
>
> Ah, yes. Someone at Sandia was showing me the use of clone() in the
> Quadrics driver. Some form of clone support is definitely on the todo
> list; it just depends on exactly what flags you're passing to clone().
> The closer it is to a pthreads thread, the easier it is.
>
> What's wrong with using pthreads, BTW?
We produce a parallel programming library, "libelan" which is used as a
basis for our MPI implementation. As we are in the performance market
we do anything we can to avoid additional latency and linking with
pthreads isn't free. It's hard to justify to customers why they should
link with pthreads to run a single threaded app.
As a workaround for this it would be possible to produce a version of
our library which could detect when it valground[0] and made the
pthread call if it was. I seem to remember from the archives there is
some way of detecting this at runtime?
Of course longer term support for our use of clone would be preferred,
here are the flags we pass in:
if ((res = __clone (elan3_lwp, stack + ELANLWP_STACK_SIZE,
CLONE_VM | CLONE_FS | CLONE_FILES |
CLONE_SIGHAND,
(void *) ctx)) == -1)
>> I quickly hit a second problem though in that when valgrind is
>> tracking
>> two threads it appears to hold a mutex lock while one of them is in a
>> system call. This unfortunately goes against our software model
>> because
>> the new thread that we create spends all it's time in kernel space
>> only
>> returning to user space to process signals.
>
> Right. Valgrind's current way of handling syscalls is to assume that
> they never block, and if they do block, try to recast them into a
> non-blocking form. That's almost possible with standard syscalls, but
> it completely fails to cope with all the various ioctls different
> drivers use, which is what I'm assuming you're talking about here.
This is exactly the problem and prevents the use of valgrind with our
library. Our ioctl only returns on program exit.
> I've nearly finished a complete reworking of the syscalls/signals code
> which fixes this by allowing syscalls to block in their own cloned LWP
> while still allowing the scheduler and other threads to continue.
It looks like you are a few steps ahead of me on this then, can I offer
my services as a beta tester?
Ashley,
0: Is that the correct spelling?
|