|
From: Tom H. <th...@cy...> - 2004-07-04 18:42:05
|
I've been looking at bug 83060 which provides a patch to implement support for the new asynchronous I/O system calls in 2.6 kernels. The patch is largely fine. I've got it applied and I've got a test case which seems to work if --pointercheck=no is specified. The reason it fails with pointer checking on is that the io_setup call allocates an I/O context in user space - it actually allocates one or more pages of user memory. There is no way to pass an address hint to it so valgrind has no control over the address that the memory will be allocated at. As a result it typically seems to get allocated in the valgrind part of the address space which can't be accessed by the client while pointer checking is turned on. As libaio tries to peek at it in order to avoid entering the kernel when io_getevents is called and there are no events pending this is a problem... Anybody got any suggestions for a way of handling this? Tom -- Tom Hughes (th...@cy...) Software Engineer, Cyberscience Corporation http://www.cyberscience.com/ |
|
From: Jeremy F. <je...@go...> - 2004-07-05 01:43:55
|
On Sun, 2004-07-04 at 19:38 +0100, Tom Hughes wrote: > As a result it typically seems to get allocated in the valgrind part > of the address space which can't be accessed by the client while > pointer checking is turned on. As libaio tries to peek at it in order > to avoid entering the kernel when io_getevents is called and there > are no events pending this is a problem... > > Anybody got any suggestions for a way of handling this? The most immediate is to try and get the AIO people to change such a braindead API. There aren't any other instances of a syscall plonking things down in memory without some way to control it. Would mremap work on these things, so we can move them down into the client address space? J |
|
From: Tom H. <th...@cy...> - 2004-07-05 07:47:47
|
In message <1088989847.6827.6.camel@localhost>
Jeremy Fitzhardinge <je...@go...> wrote:
> On Sun, 2004-07-04 at 19:38 +0100, Tom Hughes wrote:
>> As a result it typically seems to get allocated in the valgrind part
>> of the address space which can't be accessed by the client while
>> pointer checking is turned on. As libaio tries to peek at it in order
>> to avoid entering the kernel when io_getevents is called and there
>> are no events pending this is a problem...
>>
>> Anybody got any suggestions for a way of handling this?
>
> The most immediate is to try and get the AIO people to change such a
> braindead API. There aren't any other instances of a syscall plonking
> things down in memory without some way to control it.
I agree that it's pretty braindead. In fact older versions of libaio
don't seem to peek at the memory in this way, but the fact that it is
mapped in user space suggests that it was always intended.
> Would mremap work on these things, so we can move them down into the
> client address space?
I don't think it would, because the kernel has other pointers to the
memory in question that wouldn't be adjusted by the mremap call.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Jeremy F. <je...@go...> - 2004-07-05 17:00:07
|
On Mon, 2004-07-05 at 08:47 +0100, Tom Hughes wrote: > > Would mremap work on these things, so we can move them down into the > > client address space? > > I don't think it would, because the kernel has other pointers to the > memory in question that wouldn't be adjusted by the mremap call. Hm, well that depends on which address space the kernel pokes at this memory. If its via a kernel mapping it doesn't matter where it is mapped in the user address space. Or I wonder if there's some way of getting an aliased mapping. I think mmap on /proc/self/mem doesn't work any more though... J |
|
From: Jeremy F. <je...@go...> - 2004-07-06 21:53:20
|
On Mon, 2004-07-05 at 08:47 +0100, Tom Hughes wrote: > I agree that it's pretty braindead. In fact older versions of libaio > don't seem to peek at the memory in this way, but the fact that it is > mapped in user space suggests that it was always intended. Well, I think the practical solution is to do what we do to convince ld. so to put things in the right place: pad the Valgrind portion of the address space before the io_setup, and remove the padding afterwards. That will force io_setup to allocate its mapping in the client address space. J |
|
From: Tom H. <th...@cy...> - 2004-07-11 11:11:47
|
In message <1089150608.3089.1.camel@localhost>
Jeremy Fitzhardinge <je...@go...> wrote:
> On Mon, 2004-07-05 at 08:47 +0100, Tom Hughes wrote:
> > I agree that it's pretty braindead. In fact older versions of libaio
> > don't seem to peek at the memory in this way, but the fact that it is
> > mapped in user space suggests that it was always intended.
>
> Well, I think the practical solution is to do what we do to convince ld.
> so to put things in the right place: pad the Valgrind portion of the
> address space before the io_setup, and remove the padding afterwards.
> That will force io_setup to allocate its mapping in the client address
> space.
I've been working on this, and it's certainly fun...
The first problem I ran into was that the space from valgrind_mmap_end
to valgrind_end is all in the segment list regardless of whether or not
it has anything mapped at the kernel level. I think this is to stop
valgrind using it for memory mapping.
I can work around that by marking the segments which are not mapped so
that I know I need to pad them, but it turns out the VG_(brk) doesn't
update the segment list which it gives out memory and I've far been
unable to make it do so as it just seems to either segfault or get into
an infinite loop.
I've cludged around that for now, but now I find that the kernel is
choosing to allocate the io_setup page at the top of the client address
space, immediately below the client stack, thus stopping it from being
extended!
So it looks like I will have to pad the client address space as well
in order to control where the kernel puts that page...
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Nicholas N. <nj...@ca...> - 2004-07-11 16:49:48
|
On Sun, 11 Jul 2004, Tom Hughes wrote: > I can work around that by marking the segments which are not mapped so > that I know I need to pad them, but it turns out the VG_(brk) doesn't > update the segment list which it gives out memory and I've far been > unable to make it do so as it just seems to either segfault or get into > an infinite loop. > > I've cludged around that for now, but now I find that the kernel is > choosing to allocate the io_setup page at the top of the client address > space, immediately below the client stack, thus stopping it from being > extended! > > So it looks like I will have to pad the client address space as well > in order to control where the kernel puts that page... Hmm, sounds like there's some interaction here with my embryonic layout changes... for example, I've got rid of VG_(brk) by merging Valgrind's heap and its mmap-segment, and I want to merge that with the shadow memory area... N |
|
From: Tom H. <th...@cy...> - 2004-07-11 16:58:18
|
In message <Pin...@he...>
Nicholas Nethercote <nj...@ca...> wrote:
> On Sun, 11 Jul 2004, Tom Hughes wrote:
>
> > I can work around that by marking the segments which are not mapped so
> > that I know I need to pad them, but it turns out the VG_(brk) doesn't
> > update the segment list which it gives out memory and I've far been
> > unable to make it do so as it just seems to either segfault or get into
> > an infinite loop.
>
> Hmm, sounds like there's some interaction here with my embryonic layout
> changes... for example, I've got rid of VG_(brk) by merging Valgrind's
> heap and its mmap-segment, and I want to merge that with the shadow memory
> area...
Probably. It turns out that the reason for the infinite loop if you
try and make VG_(brk) register the memory it allocates with the segment
list is that the segment list code tries to allocate a new skip list
node which winds up calling VG_(brk) again to get memory and hence you
wind up in an infinite loop.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Nicholas N. <nj...@ca...> - 2004-07-05 08:57:58
|
On Sun, 4 Jul 2004, Tom Hughes wrote: > As a result it typically seems to get allocated in the valgrind part > of the address space which can't be accessed by the client while > pointer checking is turned on. As libaio tries to peek at it in order > to avoid entering the kernel when io_getevents is called and there > are no events pending this is a problem... > > Anybody got any suggestions for a way of handling this? Something Julian suggested: instead of partitioning the address space, as we currently do -- part for the client, part for Valgrind+tool -- instead virtualize it, by introducing a software MMU. Every address would have to be converted before use, though... it would be a lot of infrastructure and probably quite tricky, and possibly very slow. However, it would get rid of the problems with the client and Valgrind banging heads in the address space. N |
|
From: Julian S. <js...@ac...> - 2004-07-05 09:16:50
|
On Monday 05 July 2004 09:47, Nicholas Nethercote wrote: > On Sun, 4 Jul 2004, Tom Hughes wrote: > > As a result it typically seems to get allocated in the valgrind part > > of the address space which can't be accessed by the client while > > pointer checking is turned on. As libaio tries to peek at it in order > > to avoid entering the kernel when io_getevents is called and there > > are no events pending this is a problem... > > > > Anybody got any suggestions for a way of handling this? > > Something Julian suggested: instead of partitioning the address space, as > we currently do -- part for the client, part for Valgrind+tool -- instead > virtualize it, by introducing a software MMU. Every address would have to > be converted before use, though... it would be a lot of infrastructure and > probably quite tricky, and possibly very slow. However, it would get rid > of the problems with the client and Valgrind banging heads in the address > space. This is something we discussed a few months ago. On the plus side it completely decouples the client and V's address space and so sidesteps what I see (perhaps incorrectly) as the somewhat fragile scheme we have at the moment. On the minus side there is some overhead, plus there is complication at syscall boundaries of doing the relevant address translation and scatter/gather operations (copy_to_user, copy_from_user in effect). =46rom the performance point of view, it might be possible to somehow piggyback on what is effectively a simulated MMU=20 maintained anyway by memcheck/helgrind/addrcheck. =20 In any case I would be interested to know what the performance and complexity impact of a soft MMU is. Without that data, at the moment we don't really know what tradeoff we're making. J |
|
From: Jeremy F. <je...@go...> - 2004-07-05 17:18:59
|
On Mon, 2004-07-05 at 10:11 +0100, Julian Seward wrote:
> This is something we discussed a few months ago. On the plus side
> it completely decouples the client and V's address space and so
> sidesteps what I see (perhaps incorrectly) as the somewhat fragile
> scheme we have at the moment. On the minus side there is some
> overhead, plus there is complication at syscall boundaries of
> doing the relevant address translation and scatter/gather
> operations (copy_to_user, copy_from_user in effect).
Unfortunately I think a VMMU would be even more fragile. It would mean
being able to reliably intercept every single memory address passing
over the user/kernel boundary. At present we make a best attempt, but
if we miss something its no big deal. With a VMMU we would have to be
100% accurate (this is the same reason I'm not keen on completely
virtualizing the file-descriptor space).
Aside from that, there are some much more common syscalls which would be
impossible to deal with in a VMMU scheme, like the SHM ones. They don't
allow a shared memory segment to be broken into separate pieces in the
user virtual address space, so you wouldn't be able to map them into the
Valgrind client's address space linearly. It may not even help with
this aio problem, if the shared aio memory has pointers in it.
The static address space partition is a pain, I agree. But I think its
pain we can deal with, given the advantages of protecting Valgrind from
the client, being able to enforce the client's address space
limitations, and being able to scale to a 64-bit address space easily.
After all, all this is just a limitation of the 32-bit address space; in
a 64-bit space we have plenty of space to fit everything in, and a
simple address space partition is the simplest way to go.
I think for now, the best way squeeze everything into the address space
is:
* take advantage of the 4G/4G patches which some distros are
shipping with; that gives us an extra Gbyte of address space to
play with
* use less memory internally; the big hog is the debug stuff.
stabs makes it hard to do anything but load everything at once,
but stabs is obsolete and dwarf allows pretty fine-grained
incremental loading
* fix things as they come up
J
|
|
From: Nicholas N. <nj...@ca...> - 2004-07-05 17:29:06
|
On Mon, 5 Jul 2004, Jeremy Fitzhardinge wrote: > I think for now, the best way squeeze everything into the address space > is: > * take advantage of the 4G/4G patches which some distros are > shipping with; that gives us an extra Gbyte of address space to Do you know how to detect this at configure-time? N |
|
From: Jeremy F. <je...@go...> - 2004-07-06 16:26:42
|
On Mon, 2004-07-05 at 10:29, Nicholas Nethercote wrote: > On Mon, 5 Jul 2004, Jeremy Fitzhardinge wrote: > > > I think for now, the best way squeeze everything into the address space > > is: > > * take advantage of the 4G/4G patches which some distros are > > shipping with; that gives us an extra Gbyte of address space to > > Do you know how to detect this at configure-time? I think you could write a script which looks at /proc/self/maps and has a guess. But I think I'd prefer it if we built versions of Valgrind for a selection of address space shapes (2G, 3G, 4G) and selected one at runtime. The address space shape can change too easily: it just depends on which kernel you boot (and it could be an exec-time configuration). J |
|
From: Nicholas N. <nj...@ca...> - 2004-07-06 16:31:27
|
On Tue, 6 Jul 2004, Jeremy Fitzhardinge wrote: > But I think I'd prefer it if we built versions of Valgrind for a > selection of address space shapes (2G, 3G, 4G) and selected one at > runtime. The address space shape can change too easily: it just depends > on which kernel you boot (and it could be an exec-time configuration). Hmm, good point. Does anyone use 2G shapes? Are there any other common shapes? Ideally we'd be able to detect this at run-time. N |
|
From: Jeremy F. <je...@go...> - 2004-07-06 21:56:30
|
On Tue, 2004-07-06 at 17:31 +0100, Nicholas Nethercote wrote: > Hmm, good point. Does anyone use 2G shapes? Are there any other common > shapes? Ideally we'd be able to detect this at run-time. Yes, people do. If you have 2G of physical memory and you want the kernel to use all of it directly (ie, not use highmem), you need to give the kernel enough address space to map it all: hence 2G+2G. J |