|
From: Nicholas N. <nj...@ca...> - 2004-02-28 16:28:36
|
Hi, Here are what I see as the current issues w.r.t a 2.1.1 release: - Dirk's ulimit problem: Dirk, did Jeremy's commit fix this? - The stale/zombie thread problem: Any ideas on how to fix? Jeremy? - Bug 69616, the glibc 2.3.3 pthreadtypes.h problem: I just committed a fix, which hopefully puts that behind us. - Tom's regression failures: lots for the RH test boxes. Any ideas what is causing them, Tom? I'd also like the FAQ to be updated and possibly rearranged, but that's not crucial. Any other show-stoppers? N |
|
From: Tom H. <th...@cy...> - 2004-02-28 18:24:06
|
In message <Pin...@re...>
Nicholas Nethercote <nj...@ca...> wrote:
> Here are what I see as the current issues w.r.t a 2.1.1 release:
>
> - Dirk's ulimit problem: Dirk, did Jeremy's commit fix this?
It certainly looks like it is fixed here.
> - The stale/zombie thread problem: Any ideas on how to fix? Jeremy?
Well there is the fix posted to the users list, but it isn't
particularly nice.
> - Tom's regression failures: lots for the RH test boxes. Any ideas what
> is causing them, Tom?
I've been through these and posted some information and few patches.
Tom
--
Tom Hughes (th...@cy...)
Software Engineer, Cyberscience Corporation
http://www.cyberscience.com/
|
|
From: Robert W. <rj...@du...> - 2004-02-28 20:58:13
|
> Any other show-stoppers? If your code overrides operator new(), then 73655 is going to bite you.=20 I don't know how common this is, but it's definitely stopping us from using Valgrind on our compiler at the moment. I think I know how to fix this - I just haven't sat down and thought it through yet. Drop me an email if you want to hear to gory details. Regards, Robert. --=20 Robert Walsh Amalgamated Durables, Inc. - "We don't make the things you buy." Email: rj...@du... |
|
From: Nicholas N. <nj...@ca...> - 2004-02-29 00:01:17
|
On Sat, 28 Feb 2004, Robert Walsh wrote: > > Any other show-stoppers? > > If your code overrides operator new(), then 73655 is going to bite you. > I don't know how common this is, but it's definitely stopping us from > using Valgrind on our compiler at the moment. Was that introduced with Jeremy's FV changes, ie. do 2.0.0/2.1.0 work ok? > I think I know how to fix this - I just haven't sat down and thought it > through yet. Drop me an email if you want to hear to gory details. I'd be interested to hear, and you might as well tell the group while you're at it? Actually, I'm not certain what the problem is -- if a program overrides new or new[], is the idea that Valgrind's new/new[] should *not* be called... but sometimes it is? N |
|
From: Jeremy F. <je...@go...> - 2004-02-29 23:14:44
|
On Sat, 2004-02-28 at 12:54, Robert Walsh wrote:
> > Any other show-stoppers?
>
> If your code overrides operator new(), then 73655 is going to bite you.
> I don't know how common this is, but it's definitely stopping us from
> using Valgrind on our compiler at the moment.
>
> I think I know how to fix this - I just haven't sat down and thought it
> through yet. Drop me an email if you want to hear to gory details.
The bug is that Valgrind shouldn't be going around overriding symbols
any old place - it should be overriding them in specific libraries. The
new symbol intercept mechanism is careful to specify the intercepts in
terms of library/symbol, so that this is the case.
In this mechanism is the intercepts are provided by the library through
various client calls (for example, the init function in
vg_preloadmemcheck.so says "I override malloc!", etc). This keeps
things nice and modular, since the library which does the override tells
Valgrind about it itself, without having to scatter the knowledge
around.
The problem is that it relies on the init functions being run really
early - before any other code is run. It is *not* OK to use glibc
malloc for a while, and then intercept it with the Valgrind malloc -
things will get very confused.
So the quick fix is to make the intercept functions also globally
visible to the linker, so ld.so's override machinery can see it. This
happens early enough, but it causes unintended functions to be
overridden - in this case, operator new().
The correct fix needs to have two properties to be really satisfying:
1. the libraries containing intercepts must be self-describing, so
that there's no external knowledge of what functions they want
to intercept
2. it needs to take effect as soon as the library is mapped in,
before any code is run
The current code satisfies 1, but not 2.
I tried an experiment with adding a special valgrind-specific section to
intercept-containing .so files (.valgrind.intercept), which contains the
library and symbol name of the intercepted function, and the pointer to
the intercepting function. The idea is that when we see the client map
in a file which we identify as an ELF .so file, we check for a
.valgrind.intercept section, and set up the intercepts so described.
This problem with this scheme is that it is complex and ELF-specific.
The complexity comes from the fact that we need to interpret relocation
information to understand the string references in the
.valgrind.intercept section. Also, since an ELF file is mapped in with
several mmap calls, we need to delay evaluation until all the relevent
mmaps have been performed (normally this isn't an issue, since they
symtab is entirely contained within one mmaped chunk, and needs no
relocation; in the .valgrind.intercept case, we need to inspect the data
section for the strings making up the intercept description).
I guess the alternative to self-description is having a single source
file which is compiled for Valgrind-internal use and for client use,
containing the appropriate parts for each. At least this would
centralize things into a single source file, even if it has pieces
scattered between core and client. But it still has issues about when
to trigger the registration of the intercepts.
J
|
|
From: Dirk M. <dm...@gm...> - 2004-03-01 03:38:58
Attachments:
l
|
On Saturday 28 February 2004 17:25, Nicholas Nethercote wrote: > - Dirk's ulimit problem: Dirk, did Jeremy's commit fix this? not for me. Now valgrind instantly segfaults on any application. the core file is of no use.. contains no useful information :( attaching a strace. I'm running kernel 2.6.3. Other setups not yet tested. Dirk |
|
From: Jeremy F. <je...@go...> - 2004-03-01 08:09:17
|
On Sun, 2004-02-29 at 19:34, Dirk Mueller wrote: > On Saturday 28 February 2004 17:25, Nicholas Nethercote wrote: > > > - Dirk's ulimit problem: Dirk, did Jeremy's commit fix this? > > not for me. Now valgrind instantly segfaults on any application. the core file > is of no use.. contains no useful information :( > > attaching a strace. I'm running kernel 2.6.3. Other setups not yet tested. Hm. A number of the mmap calls are failing. How much physical memory and swap does this machine have? Is it stock 2.6.3? J |
|
From: Doug R. <df...@nl...> - 2004-03-01 09:30:07
|
On Mon, 2004-03-01 at 08:05, Jeremy Fitzhardinge wrote: > On Sun, 2004-02-29 at 19:34, Dirk Mueller wrote: > > On Saturday 28 February 2004 17:25, Nicholas Nethercote wrote: > > > > > - Dirk's ulimit problem: Dirk, did Jeremy's commit fix this? > > > > not for me. Now valgrind instantly segfaults on any application. the core file > > is of no use.. contains no useful information :( > > > > attaching a strace. I'm running kernel 2.6.3. Other setups not yet tested. > > Hm. A number of the mmap calls are failing. How much physical memory > and swap does this machine have? Is it stock 2.6.3? I've been spending some time this weekend merging with valgrind cvs and I've seen similar problems on FreeBSD. The two main problems I had were that the first call to malloc in valgrind (a result of loading .valgrindrc) wiped part of ld-elf.so (which was loaded at 0xb0000000) since find_map_space didn't yet have a proper map of the address space. I bodged this by changing info.map_base in stage1.c so that ld-elf.so loaded somewhere else. The next problem was that all the mmaps in load_client (actually mapelf) failed miserably. This was because they were all trying to map client space and they didn't include the VKI_MAP_CLIENT flag. I ended up or-ing in 0x80000000 in a few places in ume.c. Perhaps a simpler solution to both problems might be for VG_(mmap) to go straight through to mmap_inner and not check the flags if valgrind is still initialising. |
|
From: Dirk M. <dm...@gm...> - 2004-03-01 13:18:55
|
On Monday 01 March 2004 09:05, Jeremy Fitzhardinge wrote: > Hm. A number of the mmap calls are failing. How much physical memory > and swap does this machine have? 1 GB / 1 GB. ulimit -v 650MB. > Is it stock 2.6.3? yes. Dirk |
|
From: Jeremy F. <je...@go...> - 2004-03-02 00:00:09
Attachments:
fix-virtlim.patch
|
On Mon, 2004-03-01 at 05:14, Dirk Mueller wrote: > On Monday 01 March 2004 09:05, Jeremy Fitzhardinge wrote: > > > Hm. A number of the mmap calls are failing. How much physical memory > > and swap does this machine have? > > 1 GB / 1 GB. ulimit -v 650MB. Ah, OK. Does this help, or do you also set the hard limit? J |
|
From: Jeremy F. <je...@go...> - 2004-03-02 00:02:12
|
On Mon, 2004-03-01 at 01:25, Doug Rabson wrote: > On Mon, 2004-03-01 at 08:05, Jeremy Fitzhardinge wrote: > > On Sun, 2004-02-29 at 19:34, Dirk Mueller wrote: > > > On Saturday 28 February 2004 17:25, Nicholas Nethercote wrote: > > > > > > > - Dirk's ulimit problem: Dirk, did Jeremy's commit fix this? > > > > > > not for me. Now valgrind instantly segfaults on any application. the core file > > > is of no use.. contains no useful information :( > > > > > > attaching a strace. I'm running kernel 2.6.3. Other setups not yet tested. > > > > Hm. A number of the mmap calls are failing. How much physical memory > > and swap does this machine have? Is it stock 2.6.3? > > I've been spending some time this weekend merging with valgrind cvs and > I've seen similar problems on FreeBSD. The two main problems I had were > that the first call to malloc in valgrind (a result of loading > .valgrindrc) wiped part of ld-elf.so (which was loaded at 0xb0000000) > since find_map_space didn't yet have a proper map of the address space. > I bodged this by changing info.map_base in stage1.c so that ld-elf.so > loaded somewhere else. Yes, there's some tricky bootstrap stuff here. We should really try to build the Segment list as soon as possible, so it's safe to use VG_(mmap). > The next problem was that all the mmaps in load_client (actually mapelf) > failed miserably. This was because they were all trying to map client > space and they didn't include the VKI_MAP_CLIENT flag. I ended up or-ing > in 0x80000000 in a few places in ume.c. That's a bit of a bug. vg_glibc.c shouldn't be redirecting mmap() to VG_(mmap) yet for precisely this reason. I think Linux will complain if it sees flags set it doesn't understand, so unconditionally or-ing in VKI_CLIENT_MAP isn't going to work in general (and it only just barely works in your case, because stage1's use of ume.c happens to link with the plain old libc mmap, which presumably ignores VKI_MAP_CLIENT). > Perhaps a simpler solution to both problems might be for VG_(mmap) to go > straight through to mmap_inner and not check the flags if valgrind is > still initialising. Something like that. The goal is to make it so that libraries can use brk and mmap freely, and they'll get results that will keep Valgrind happy and sane. This means that we need to be sure that VG_(mmap) changes operation mode between plain old kernel mmap and the internally managed segment list mmap at precisely the right time, and that it doesn't leave any valuable crud in the client address space early at the start of time. This is all very fiddly. J |
|
From: Dirk M. <dm...@gm...> - 2004-03-02 00:33:31
|
On Tuesday 02 March 2004 00:53, Jeremy Fitzhardinge wrote: > Ah, OK. Does this help, or do you also set the hard limit? hard limit is set as well. do we really need that much mapped memory? Dirk |
|
From: Jeremy F. <je...@go...> - 2004-03-02 00:58:53
|
On Mon, 2004-03-01 at 16:28, Dirk Mueller wrote: > On Tuesday 02 March 2004 00:53, Jeremy Fitzhardinge wrote: > > > Ah, OK. Does this help, or do you also set the hard limit? > > hard limit is set as well. do we really need that much mapped memory? Yes. At the very least, stage1 needs to reserve the chunks of the address space needed for the client, before running the dynamic linker. Since the dynamic linker is unmodified glibc code, we can't control where it will place things, so we need to force its hand. The other large mappings (like the shadow memory region) aren't really necessary, but they make inspecting things with /proc/<pid>/maps much easier. What's the purpose of your virtual limits. Is it to stop runaways, or actual malicious resource use? J |
|
From: Dirk M. <dm...@gm...> - 2004-03-02 01:20:10
|
On Tuesday 02 March 2004 01:52, Jeremy Fitzhardinge wrote: > Yes. At the very least, stage1 needs to reserve the chunks of the > address space needed for the client, before running the dynamic linker. > Since the dynamic linker is unmodified glibc code, we can't control > where it will place things, so we need to force its hand. Hmm, and why do we have to force it to place things in a certain way? > The other large mappings (like the shadow memory region) aren't really > necessary, but they make inspecting things with /proc/<pid>/maps much > easier. inspecting which kind of things ? > What's the purpose of your virtual limits. Is it to stop runaways, or > actual malicious resource use? Well, both. it also helps debugging since you get a segfault instead of a machine that swapped to death before you were able to attach a debugger. Dirk |
|
From: Jeremy F. <je...@go...> - 2004-03-02 22:22:23
|
On Mon, 2004-03-01 at 17:15, Dirk Mueller wrote: > On Tuesday 02 March 2004 01:52, Jeremy Fitzhardinge wrote: > > > Yes. At the very least, stage1 needs to reserve the chunks of the > > address space needed for the client, before running the dynamic linker. > > Since the dynamic linker is unmodified glibc code, we can't control > > where it will place things, so we need to force its hand. > > Hmm, and why do we have to force it to place things in a certain way? Because the client address space is mapped 1:1 with the address space it would normally get without Valgrind (the difference is that it stops a bit short). FV creates a clear distinction between the client address space and Valgrind's address space, so that there's no scope for the client to start trashing Valgrind's memory. In order to achieve this, we need to make sure that all the pieces of Valgrind get placed outside the client address space (hence the need for the two-stage bootstrap). > > The other large mappings (like the shadow memory region) aren't really > > necessary, but they make inspecting things with /proc/<pid>/maps much > > easier. > > inspecting which kind of things ? Debugging Valgrind - for looking at the state of the address space as the program executes; in particular, you can see how shadow memory is being consumed. > > What's the purpose of your virtual limits. Is it to stop runaways, or > > actual malicious resource use? > > Well, both. it also helps debugging since you get a segfault instead of a > machine that swapped to death before you were able to attach a debugger. Well, in that case only setting the soft limit should be enough to prevent that failure. J |
|
From: Doug R. <df...@nl...> - 2004-03-02 09:19:24
|
On Mon, 2004-03-01 at 23:55, Jeremy Fitzhardinge wrote: > On Mon, 2004-03-01 at 01:25, Doug Rabson wrote: > > On Mon, 2004-03-01 at 08:05, Jeremy Fitzhardinge wrote: > > > On Sun, 2004-02-29 at 19:34, Dirk Mueller wrote: > > > > On Saturday 28 February 2004 17:25, Nicholas Nethercote wrote: > > > > > > > > > - Dirk's ulimit problem: Dirk, did Jeremy's commit fix this? > > > > > > > > not for me. Now valgrind instantly segfaults on any application. the core file > > > > is of no use.. contains no useful information :( > > > > > > > > attaching a strace. I'm running kernel 2.6.3. Other setups not yet tested. > > > > > > Hm. A number of the mmap calls are failing. How much physical memory > > > and swap does this machine have? Is it stock 2.6.3? > > > > I've been spending some time this weekend merging with valgrind cvs and > > I've seen similar problems on FreeBSD. The two main problems I had were > > that the first call to malloc in valgrind (a result of loading > > .valgrindrc) wiped part of ld-elf.so (which was loaded at 0xb0000000) > > since find_map_space didn't yet have a proper map of the address space. > > I bodged this by changing info.map_base in stage1.c so that ld-elf.so > > loaded somewhere else. > > Yes, there's some tricky bootstrap stuff here. We should really try to > build the Segment list as soon as possible, so it's safe to use > VG_(mmap). > > > The next problem was that all the mmaps in load_client (actually mapelf) > > failed miserably. This was because they were all trying to map client > > space and they didn't include the VKI_MAP_CLIENT flag. I ended up or-ing > > in 0x80000000 in a few places in ume.c. > > That's a bit of a bug. vg_glibc.c shouldn't be redirecting mmap() to > VG_(mmap) yet for precisely this reason. I think Linux will complain if > it sees flags set it doesn't understand, so unconditionally or-ing in > VKI_CLIENT_MAP isn't going to work in general (and it only just barely > works in your case, because stage1's use of ume.c happens to link with > the plain old libc mmap, which presumably ignores VKI_MAP_CLIENT). The FreeBSD kernel seems to ignore the extra flags but its certainly not behaviour that we should rely on. > > > Perhaps a simpler solution to both problems might be for VG_(mmap) to go > > straight through to mmap_inner and not check the flags if valgrind is > > still initialising. > > Something like that. The goal is to make it so that libraries can use > brk and mmap freely, and they'll get results that will keep Valgrind > happy and sane. This means that we need to be sure that VG_(mmap) > changes operation mode between plain old kernel mmap and the internally > managed segment list mmap at precisely the right time, and that it > doesn't leave any valuable crud in the client address space early at the > start of time. > > This is all very fiddly. I had another problem yesterday where layout_remaining_space was failing to mmap the redzone and the shadow map. I added yet another flag to VG_(mmap) to allow mapping the shadow area but it all starts to feel wrong. The other idea I had yesterday was to add a new flag, e.g. VKI_MAP_NOCHECKING. The mmap override would or this in to the flags passed to VG_(mmap) and that would turn off the client/valgrind address space checks. This would leave ume.c just using the plain standard mmap api. |
|
From: Jeremy F. <je...@go...> - 2004-03-02 22:15:02
|
On Tue, 2004-03-02 at 01:14, Doug Rabson wrote: > I had another problem yesterday where layout_remaining_space was failing > to mmap the redzone and the shadow map. I added yet another flag to > VG_(mmap) to allow mapping the shadow area but it all starts to feel > wrong. The other idea I had yesterday was to add a new flag, e.g. > VKI_MAP_NOCHECKING. The mmap override would or this in to the flags > passed to VG_(mmap) and that would turn off the client/valgrind address > space checks. This would leave ume.c just using the plain standard mmap > api. Hm, yes. For the sake of correctness, I think it is a good idea if we're clear about the intent of each mmap(), and whether it should be allowed to go into the client address space or not. If we're using !MAP_FIXED, then it's obviously critical that it explicitly says which address space it should go into. I was playing with having two flags rather than just one: VKI_MAP_CLIENT, and VKI_MAP_VALGRIND (ie, all address space except the client). This would allow us to say VG_(mmap)(..., VKI_MAP_*|VKI_MAP_FIXED|VKI_MAP_CLIENT|VKI_MAP_VALGRIND,...), meaning that we're OK with the mapping going into either address space, for the use of ume.c. (We could make ume.c always use VG_(mmap) rather than mmap, and provide an appropriate implementation for use in stage1.) J |
|
From: Doug R. <df...@nl...> - 2004-03-02 23:09:10
|
On Tue, 2004-03-02 at 22:08, Jeremy Fitzhardinge wrote: > On Tue, 2004-03-02 at 01:14, Doug Rabson wrote: > > I had another problem yesterday where layout_remaining_space was failing > > to mmap the redzone and the shadow map. I added yet another flag to > > VG_(mmap) to allow mapping the shadow area but it all starts to feel > > wrong. The other idea I had yesterday was to add a new flag, e.g. > > VKI_MAP_NOCHECKING. The mmap override would or this in to the flags > > passed to VG_(mmap) and that would turn off the client/valgrind address > > space checks. This would leave ume.c just using the plain standard mmap > > api. > > Hm, yes. For the sake of correctness, I think it is a good idea if > we're clear about the intent of each mmap(), and whether it should be > allowed to go into the client address space or not. If we're using > !MAP_FIXED, then it's obviously critical that it explicitly says which > address space it should go into. > > I was playing with having two flags rather than just one: > VKI_MAP_CLIENT, and VKI_MAP_VALGRIND (ie, all address space except the > client). This would allow us to say VG_(mmap)(..., > VKI_MAP_*|VKI_MAP_FIXED|VKI_MAP_CLIENT|VKI_MAP_VALGRIND,...), meaning > that we're OK with the mapping going into either address space, for the > use of ume.c. (We could make ume.c always use VG_(mmap) rather than > mmap, and provide an appropriate implementation for use in stage1.) I'm already doing something like this but I have problems with the bits of vg_main.c which map red-zones and the shadow area since the checking in VG_(mmap) is quite strict. |