|
From: Jeremy F. <je...@go...> - 2005-03-08 19:44:52
|
I've made a test release of 2.4.0-rc1 and put it at http://www.goop.org/~jeremy/valgrind/dist. I've included the source tar, the source RPM and an FC3 i386 binary RPM. This is identical to CVS HEAD except for the version number. Please try it out. If it looks good, I think we should ship it. J |
|
From: Troels W. H. <tr...@th...> - 2005-03-08 21:12:49
|
Jeremy Fitzhardinge wrote: >I've made a test release of 2.4.0-rc1 and put it at >http://www.goop.org/~jeremy/valgrind/dist. I've included the source >tar, the source RPM and an FC3 i386 binary RPM. > >This is identical to CVS HEAD except for the version number. > >Please try it out. If it looks good, I think we should ship it. > > I found a very minor issue... The copyright date says 2004. :-) ==32548== Memcheck, a memory error detector for x86-linux. ==32548== Copyright (C) 2002-2004, and GNU GPL'd, by Julian Seward et al. ==32548== Using valgrind-2.4.0.rc1, a program supervision framework for x86-linux. ==32548== Copyright (C) 2000-2004, and GNU GPL'd, by Julian Seward et al. Looking forward to trying the new architecture it with some real programs. Troels |
|
From: Nicholas N. <nj...@cs...> - 2005-03-09 03:51:53
Attachments:
change-copyright-year
|
On Tue, 8 Mar 2005, Troels Walsted Hansen wrote: > I found a very minor issue... The copyright date says 2004. :-) The attached script fixes that. Jeremy, can you run it? Instructions are within. N |
|
From: Nicholas N. <nj...@cs...> - 2005-03-09 01:01:06
|
On Tue, 8 Mar 2005, Jeremy Fitzhardinge wrote: > I've made a test release of 2.4.0-rc1 and put it at > http://www.goop.org/~jeremy/valgrind/dist. I've included the source > tar, the source RPM and an FC3 i386 binary RPM. > > This is identical to CVS HEAD except for the version number. > > Please try it out. If it looks good, I think we should ship it. My machine: Debian 3.0. Linux charco.cs.utexas.edu 2.4.29 #1 SMP Mon Jan 24 09:20:36 CST 2005 i686 unknown GNU C Library stable release version 2.2.5, by Roland McGrath et al. Copyright (C) 1992-2001, 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 2.95.4 20011002 (Debian prerelease). Compiled on a Linux 2.4.18 system on 2005-01-07. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others linuxthreads-0.9 by Xavier Leroy BIND-8.2.3-T5B libthread_db work sponsored by Alpha Processor Inc NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk Report bugs using the `glibcbug' script to <bu...@gn...>. ---- I'm getting the following regtest failures: == 198 tests, 6 stderr failures, 1 stdout failure ================= memcheck/tests/leak-tree (stderr) memcheck/tests/manuel2 (stderr) memcheck/tests/pth_once (stderr) memcheck/tests/threadederrno (stderr) memcheck/tests/vgtest_ume (stderr) memcheck/tests/zeropage (stdout) corecheck/tests/fdleak_creat (stderr) leak-tree, manuel2, pth_once, threadederrno and vgtest_ume I've been getting for a while, and I'm not very worried about them. fdleak_creat is new. The diff is: ! at 0x........: creat (in /...libc...) ! by 0x........: __libc_start_main (in /...libc...) ! by 0x........: ... --- 6,7 ---- ! at 0x........: open (in /...libc...) ! by 0x........: main (fdleak_creat.c:18) ie. it looks like something has changed on my system so that libc's creat() no calls open(). So probably not a problem. The zeropage one I've not seen before. Valgrind used to prevent clients from doing an mmap(FIXED) in the bottom 64KB of memory. Jeremy, you changed VG_(valid_client_addr)() on January 15 (rev 1.229) to disable this. The commit log said: Misc changes needed so that Valgrind can run itself. I'm not sure if this was a good thing to do -- IIRC I original put that restriction in because some of the memory allocation functions use 0 to represent failure, and so if a program did mmap(0x0, FIXED) various problems could occur. I don't know why this test has been succeeding since Jan 15, only to fail now. Another problem... this program: int main(void) { read(0,0,1); } behaves differently under Valgrind compared to native -- it doesn't wait for user input. The problem also arises from January 15, when you changed almost every use of the PRE_MEM_WRITE macro to your new SYS_PRE_MEM_WRITE macro. Did you write these new macros to address a particular problem? I don't like like this macro, because it assumes that any unaddressable argument causes the syscall to fail, which is not necessarily the case. And I'm not keen on the obvious fix -- change sys_read to use PRE_MEM_WRITE -- because I imagine that this non-failure-on-unaddressable-memory behaviour is possible on any number of the syscalls. N |
|
From: Jeremy F. <je...@go...> - 2005-03-09 02:04:23
|
Nicholas Nethercote wrote:
> fdleak_creat is new. The diff is:
>
> ! at 0x........: creat (in /...libc...)
> ! by 0x........: __libc_start_main (in /...libc...)
> ! by 0x........: ...
> --- 6,7 ----
> ! at 0x........: open (in /...libc...)
> ! by 0x........: main (fdleak_creat.c:18)
>
> ie. it looks like something has changed on my system so that libc's
> creat() no calls open(). So probably not a problem.
What's the context? I presume it ends up calling the same syscall? It
doesn't look like anything more framepointer backtrace confusion.
> I'm not sure if this was a good thing to do -- IIRC I original put
> that restriction in because some of the memory allocation functions
> use 0 to represent failure, and so if a program did mmap(0x0, FIXED)
> various problems could occur.
Valgrind does a mmap(0, FIXED) as part of its address space padding; in
general this is a legitimate but unusual operation. We could make it a
clo which is off by default, but I'm not terribly keen on adding another
special case clo. I guess a client request is another option (please
let me mmap 0).
What problems are you thinking of? Other problems occurring within
Valgrind, or just that a program can get itself into a mess which we
would detect too late?
> Another problem... this program:
>
> int main(void)
> {
> read(0,0,1);
> }
>
> behaves differently under Valgrind compared to native -- it doesn't
> wait for user input.
>
> The problem also arises from January 15, when you changed almost every
> use of the PRE_MEM_WRITE macro to your new SYS_PRE_MEM_WRITE macro.
> Did you write these new macros to address a particular problem? I
> don't like like this macro, because it assumes that any unaddressable
> argument causes the syscall to fail, which is not necessarily the
> case. And I'm not keen on the obvious fix -- change sys_read to use
> PRE_MEM_WRITE -- because I imagine that this
> non-failure-on-unaddressable-memory behaviour is possible on any
> number of the syscalls.
The specific problem I was trying to solve is where Valgrind crashes
with a SIGSEGV if you pass bogus arguments to a syscall. This isn't so
much a problem for simple buffers like read(), but it is if the memory
points to a structure which Valgrind inspects.
In this case, the read will fail either way, but it blocks first when
run natively. I guess there is the possibility that someone will map
memory under it while it is blocked so that it won't end up failing.
Rather than setting EFAULT, the alternative is for SYS_PRE_MEM_* to
return some flag saying "bad memory" to prevent any further tests on
it. This would complicate things, since it means the PRE and POST
wrappers would need to be made aware of it if they touch syscall memory.
As it is, there are possible races where a thread will remove memory
under a syscall anyway, so touching memory in a POST() function isn't
strictly safe, even if it was present in the PRE().
Is this read(fd, 0, 1) example from a real program? Do you think this
will cause a real problem? This case isn't particularly well defined,
and I think older kernels would have failed it immediately. I don't
like having variences from native behaviour, but I don't think this is
too serious.
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-09 02:50:54
|
On Tue, 8 Mar 2005, Jeremy Fitzhardinge wrote:
>> ! at 0x........: creat (in /...libc...)
>> ! by 0x........: __libc_start_main (in /...libc...)
>> ! by 0x........: ...
>> --- 6,7 ----
>> ! at 0x........: open (in /...libc...)
>> ! by 0x........: main (fdleak_creat.c:18)
>>
>> ie. it looks like something has changed on my system so that libc's
>> creat() no calls open(). So probably not a problem.
>
> What's the context? I presume it ends up calling the same syscall? It
> doesn't look like anything more framepointer backtrace confusion.
strace tells me the program (run natively) calls the syscall open() rather
than creat(). Weird.
>> I'm not sure if this was a good thing to do -- IIRC I original put
>> that restriction in because some of the memory allocation functions
>> use 0 to represent failure, and so if a program did mmap(0x0, FIXED)
>> various problems could occur.
>
> Valgrind does a mmap(0, FIXED) as part of its address space padding; in
> general this is a legitimate but unusual operation. We could make it a
> clo which is off by default, but I'm not terribly keen on adding another
> special case clo. I guess a client request is another option (please
> let me mmap 0).
Neither of those are very appealing.
> What problems are you thinking of? Other problems occurring within
> Valgrind, or just that a program can get itself into a mess which we
> would detect too late?
The former. I think the issue was that it gets very confusing if you pass
0x0 as the 'addr' parameter to VG_(find_map_space)() -- because it
interprets that to mean "I don't care where you put it". Usually
VG_(find_map_space)() isn't called if you specify FIXED, but If you look
at the wrapper for mremap(), I think it will be an issue if you mremap()
some memory to address 0x0 with MREMAP_FIXED.
It's a type issue -- the problem comes from designating one particular
integer value as special, and then problems arise when you want to use
that value as a normal integer rather than the exceptional value.
VG_(find_map_space)() has inherited this problem from mmap() -- you can't
suggest to mmap(), without using MAP_FIXED, that you want the mapped
segment to go at 0x0.
The way to fix this is to add another Bool arg --
"use_address_as_suggestion" or something -- to VG_(find_map_space)(). If
it's true, we use the passed address as a suggestion (even if it's zero).
If it's false, we put the block anywhere.
Aside: malloc() suffers from a similar problem -- it can never put a heap
block at 0x0 because 0x0 means "no block allocated". Another consequence
of this idiom is that it's really easy to forget to check for the presence
of the exceptional value. That's why posix_memalign() returns a Bool, and
puts the address of the allocated block (if one was allocated) in the 2nd
argument pass-by-reference -- it doesn't steal the value, and it's much
harder to forget to check for failure. Unfortunately, NULL-as-failure is
so engrained in general C programmming that this problem will never go
away. In contrast, in languages like Haskell you have a "Maybe" type that
looks like this:
Maybe a = Just a | Nothing
where 'a' can be any type. The nice thing here is that you don't have to
steal one of your normal values to represent failure, and also it's
impossible to forget to check for the "Nothing" failure case.
>> Another problem... this program:
>>
>> int main(void)
>> {
>> read(0,0,1);
>> }
>>
>> behaves differently under Valgrind compared to native -- it doesn't
>> wait for user input.
>
> [...]
>
> Is this read(fd, 0, 1) example from a real program?
No, but it's my standard trick for forcing a program to pause at a
particular point -- usually to look at what /proc/self/maps looks like.
> Do you think this will cause a real problem? This case isn't
> particularly well defined, and I think older kernels would have failed
> it immediately. I don't like having variences from native behaviour,
> but I don't think this is too serious.
I'm not sure. I'm very uneasy about changing native behaviour. I'm
worried that in all the 250-odd syscalls there might be some cases that
are like this but occur in less contrived code.
N
|
|
From: Jeremy F. <je...@go...> - 2005-03-09 05:07:22
|
Nicholas Nethercote wrote:
> On Tue, 8 Mar 2005, Jeremy Fitzhardinge wrote:
> strace tells me the program (run natively) calls the syscall open()
> rather than creat(). Weird.
The creat() call was made obsolete by O_CREAT; if an architecture has a
creat() syscall, its only for ancient backwards compatibility.
> Neither of those are very appealing.
I had a better idea. We could:
* put back the ban on the lower 64k
* create a PROT_NONE mapping there
The mapping would appear in /proc/self/maps, and prevent a sub-Valgrind
from trying to put padding there. The only downside is the other
programs might get confused if they see the mapping there, and if they
hit a NULL pointer they'd get a SEGV_ACCERR rather than a SEGV_MAPERR
(which could be fixed up in the signal handler, because that's a piece
of code which really needs some more special cases).
> The former. I think the issue was that it gets very confusing if you
> pass 0x0 as the 'addr' parameter to VG_(find_map_space)() -- because
> it interprets that to mean "I don't care where you put it".
Yeah. mmap() uses smallish negative values to represent special pointer
values; that's why on x86-64, the 32-bit address space only goes up to
0xffff0000 (also the kernel internals represent the end of a mapping as
address-just-after, and having a mapping go from N-0 would be too
confusing).
> The way to fix this is to add another Bool arg --
> "use_address_as_suggestion" or something -- to VG_(find_map_space)().
> If it's true, we use the passed address as a suggestion (even if it's
> zero). If it's false, we put the block anywhere.
Or use (Addr)-1.
> In contrast, in languages like Haskell you have a "Maybe" type that
> looks like this:
>
> Maybe a = Just a | Nothing
Yep, it's one of my favorite things in CAML, along with pattern matching.
> No, but it's my standard trick for forcing a program to pause at a
> particular point -- usually to look at what /proc/self/maps looks like.
I generally use pause();
> I'm not sure. I'm very uneasy about changing native behaviour. I'm
> worried that in all the 250-odd syscalls there might be some cases
> that are like this but occur in less contrived code.
I'm not planning on worrying about it for 2.4.0 unless it breaks some
real code; we can reconsider it for 2.4.1+3.0.
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-09 14:16:25
|
On Tue, 8 Mar 2005, Jeremy Fitzhardinge wrote: >> The way to fix this is to add another Bool arg -- >> "use_address_as_suggestion" or something -- to VG_(find_map_space)(). If >> it's true, we use the passed address as a suggestion (even if it's zero). >> If it's false, we put the block anywhere. > > Or use (Addr)-1. I prefer the extra Bool -- the code is clearer that way. N |
|
From: Brad H. <br...@fr...> - 2005-03-09 08:10:42
|
On Wed, 9 Mar 2005 06:44 am, Jeremy Fitzhardinge wrote: > I've made a test release of 2.4.0-rc1 and put it at > http://www.goop.org/~jeremy/valgrind/dist. I've included the source > tar, the source RPM and an FC3 i386 binary RPM. I downloaded the tarball and tried to build it. It won't configure on my PPC box (well, I am an optimist - must speak to=20 Paulus). On a fairly up-to-date FC2 box, it configures and builds fine. make regtest= =20 fails three: =3D=3D 199 tests, 1 stderr failure, 2 stdout failures =3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D memcheck/tests/zeropage (stdout) corecheck/tests/sigkill (stderr) none/tests/exec-sigmask (stdout) Result of running each of those is below. I don't know how much this helps,= or=20 what else I can do, so a bit of direction is probably required if more info= =20 is required. Brad [bradh@banksia valgrind-2.4.0.rc1]$ ./memcheck/tests/zeropage succeeded?! succeeded?! succeeded?! [bradh@banksia valgrind-2.4.0.rc1]$ ./corecheck/tests/sigkill setting signal 1: Success getting signal 1: Success setting signal 2: Success getting signal 2: Success setting signal 3: Success getting signal 3: Success setting signal 4: Success getting signal 4: Success setting signal 5: Success getting signal 5: Success setting signal 6: Success getting signal 6: Success setting signal 7: Success getting signal 7: Success setting signal 8: Success getting signal 8: Success setting signal 9: Invalid argument getting signal 9: Success setting signal 10: Success getting signal 10: Success setting signal 11: Success getting signal 11: Success setting signal 12: Success getting signal 12: Success setting signal 13: Success getting signal 13: Success setting signal 14: Success getting signal 14: Success setting signal 15: Success getting signal 15: Success setting signal 16: Success getting signal 16: Success setting signal 17: Success getting signal 17: Success setting signal 18: Success getting signal 18: Success setting signal 19: Invalid argument getting signal 19: Success setting signal 20: Success getting signal 20: Success setting signal 21: Success getting signal 21: Success setting signal 22: Success getting signal 22: Success setting signal 23: Success getting signal 23: Success setting signal 24: Success getting signal 24: Success setting signal 25: Success getting signal 25: Success setting signal 26: Success getting signal 26: Success setting signal 27: Success getting signal 27: Success setting signal 28: Success getting signal 28: Success setting signal 29: Success getting signal 29: Success setting signal 30: Success getting signal 30: Success setting signal 31: Success getting signal 31: Success setting signal 32: Invalid argument getting signal 32: Invalid argument setting signal 33: Success getting signal 33: Success setting signal 34: Success getting signal 34: Success setting signal 35: Success getting signal 35: Success setting signal 36: Success getting signal 36: Success setting signal 37: Success getting signal 37: Success setting signal 38: Success getting signal 38: Success setting signal 39: Success getting signal 39: Success setting signal 40: Success getting signal 40: Success setting signal 41: Success getting signal 41: Success setting signal 42: Success getting signal 42: Success setting signal 43: Success getting signal 43: Success setting signal 44: Success getting signal 44: Success setting signal 45: Success getting signal 45: Success setting signal 46: Success getting signal 46: Success setting signal 47: Success getting signal 47: Success setting signal 48: Success getting signal 48: Success setting signal 49: Success getting signal 49: Success setting signal 50: Success getting signal 50: Success setting signal 51: Success getting signal 51: Success setting signal 52: Success getting signal 52: Success setting signal 53: Success getting signal 53: Success setting signal 54: Success getting signal 54: Success setting signal 55: Success getting signal 55: Success setting signal 56: Success getting signal 56: Success setting signal 57: Success getting signal 57: Success setting signal 58: Success getting signal 58: Success setting signal 59: Success getting signal 59: Success setting signal 60: Success getting signal 60: Success setting signal 61: Success getting signal 61: Success setting signal 62: Success getting signal 62: Success setting signal 65: Invalid argument getting signal 65: Invalid argument [bradh@banksia valgrind-2.4.0.rc1]$ ./none/tests/exec-sigmask full: signal 32 missing from mask |
|
From: Tom H. <to...@co...> - 2005-03-09 08:32:01
|
In message <200...@fr...>
Brad Hards <br...@fr...> wrote:
> On Wed, 9 Mar 2005 06:44 am, Jeremy Fitzhardinge wrote:
>> I've made a test release of 2.4.0-rc1 and put it at
>> http://www.goop.org/~jeremy/valgrind/dist. I've included the source
>> tar, the source RPM and an FC3 i386 binary RPM.
> I downloaded the tarball and tried to build it.
>
> It won't configure on my PPC box (well, I am an optimist - must speak to
> Paulus).
It doesn't have PPC support, so that isn't surprising. Or did you
mean it didn't work after you applied the PPC patch?
> On a fairly up-to-date FC2 box, it configures and builds fine. make regtest
> fails three:
> == 199 tests, 1 stderr failure, 2 stdout failures =================
> memcheck/tests/zeropage (stdout)
> corecheck/tests/sigkill (stderr)
> none/tests/exec-sigmask (stdout)
>
> Result of running each of those is below. I don't know how much this helps, or
> what else I can do, so a bit of direction is probably required if more info
> is required.
None of those looks like a major problem - the zeropage one is already
being discussed and the other two are just caused by glibc details by
the looks of it.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Jeremy F. <je...@go...> - 2005-03-09 10:28:42
|
Brad Hards wrote:
>It won't configure on my PPC box (well, I am an optimist - must speak to
>Paulus).
>
>
I should have mentioned that this release is still ia32 only...
>On a fairly up-to-date FC2 box, it configures and builds fine. make regtest
>fails three:
>== 199 tests, 1 stderr failure, 2 stdout failures =================
>memcheck/tests/zeropage (stdout)
>corecheck/tests/sigkill (stderr)
>none/tests/exec-sigmask (stdout)
>
>Result of running each of those is below. I don't know how much this helps, or
>what else I can do, so a bit of direction is probably required if more info
>is required.
>
>
Those are expected, unfortunately. They don't represent real bugs, but
the regression test suite is a bit brittle about differences between
different libc implementations. The zeropage is testing for something
we no longer enforce (though that may change back).
J
|
|
From: Brad H. <br...@fr...> - 2005-03-09 10:22:42
|
On Wed, 9 Mar 2005 07:31 pm, Tom Hughes wrote: > > It won't configure on my PPC box (well, I am an optimist - must speak to > > Paulus). > > It doesn't have PPC support, so that isn't surprising. Or did you > mean it didn't work after you applied the PPC patch? I was hoping that PPC support might have been merged into the main tree. I= =20 didn't know if it would work or not. This is really just confirmation that= =20 the check for CPU types works - I haven't tried any patches. Brad |
|
From: Tom H. <to...@co...> - 2005-03-09 10:38:44
|
In message <200...@fr...>
Brad Hards <br...@fr...> wrote:
> On Wed, 9 Mar 2005 07:31 pm, Tom Hughes wrote:
>> > It won't configure on my PPC box (well, I am an optimist - must speak to
>> > Paulus).
>>
>> It doesn't have PPC support, so that isn't surprising. Or did you
>> mean it didn't work after you applied the PPC patch?
>
> I was hoping that PPC support might have been merged into the main tree. I
> didn't know if it would work or not. This is really just confirmation that
> the check for CPU types works - I haven't tried any patches.
No, the hope is to get PPC32 support into the 3.0 release later
in the year along with AMD64 support.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Christian P. <tr...@ge...> - 2005-03-11 06:46:33
|
On Wednesday 09 March 2005 11:38 am, Tom Hughes wrote: > In message <200...@fr...> > > I was hoping that PPC support might have been merged into the main tree. > > I didn't know if it would work or not. This is really just confirmation > > that the check for CPU types works - I haven't tried any patches. > > No, the hope is to get PPC32 support into the 3.0 release later > in the year along with AMD64 support. Yay! That's what I wanted to read (the AMD64 part). While waiting for 3.0 then, maybe you can tell me=20 what's new in 2.4? ((btw, can you provide me a state of amd64 porting? maybe,=20 that you already finished up to N% of porting? thanks)) Regards, Christian Parpart. =2D-=20 Netiquette: http://www.ietf.org/rfc/rfc1855.txt 07:44:22 up 133 days, 14 min, 1 user, load average: 0.00, 0.02, 0.00 |
|
From: Nicholas N. <nj...@cs...> - 2005-03-11 14:11:16
|
On Fri, 11 Mar 2005, Christian Parpart wrote: >> No, the hope is to get PPC32 support into the 3.0 release later >> in the year along with AMD64 support. > > Yay! That's what I wanted to read (the AMD64 part). > While waiting for 3.0 then, maybe you can tell me > what's new in 2.4? 2.4 has 6 months worth of bug fixes. Also, the way in which Valgrind handles threads has changed completely -- Valgrind no longer provides its own implementation of libpthread. This is a big win because that part was a total headache to maintain, and a frequent source of bugs. Getting rid of it let us remove thousands of lines of the most difficult and error-prone code in Valgrind. The down-side is that the change broke Helgrind and the pthread bug detection (eg. misuses of mutexes). We're working on reinstating those features, but they're not in 2.4.0 -- we decided it had been long enough between releases that 2.4.0 should ship without them. > ((btw, can you provide me a state of amd64 porting? maybe, > that you already finished up to N% of porting? thanks)) Percentages are hard to estimate. A little birdie told me that Mozilla has been seen to run successfully. But there's still a *lot* of work to be done to get it release-worthy -- getting AMD64 to work has required rewriting the entire JITter, which is a pretty decent chunk of Valgrind. N |