|
From: Jeremy F. <je...@go...> - 2005-03-08 19:44:52
|
I've made a test release of 2.4.0-rc1 and put it at http://www.goop.org/~jeremy/valgrind/dist. I've included the source tar, the source RPM and an FC3 i386 binary RPM. This is identical to CVS HEAD except for the version number. Please try it out. If it looks good, I think we should ship it. J |
|
From: Nicholas N. <nj...@cs...> - 2005-03-09 01:01:07
|
On Tue, 8 Mar 2005, Jeremy Fitzhardinge wrote: > I've made a test release of 2.4.0-rc1 and put it at > http://www.goop.org/~jeremy/valgrind/dist. I've included the source > tar, the source RPM and an FC3 i386 binary RPM. > > This is identical to CVS HEAD except for the version number. > > Please try it out. If it looks good, I think we should ship it. My machine: Debian 3.0. Linux charco.cs.utexas.edu 2.4.29 #1 SMP Mon Jan 24 09:20:36 CST 2005 i686 unknown GNU C Library stable release version 2.2.5, by Roland McGrath et al. Copyright (C) 1992-2001, 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 2.95.4 20011002 (Debian prerelease). Compiled on a Linux 2.4.18 system on 2005-01-07. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others linuxthreads-0.9 by Xavier Leroy BIND-8.2.3-T5B libthread_db work sponsored by Alpha Processor Inc NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk Report bugs using the `glibcbug' script to <bu...@gn...>. ---- I'm getting the following regtest failures: == 198 tests, 6 stderr failures, 1 stdout failure ================= memcheck/tests/leak-tree (stderr) memcheck/tests/manuel2 (stderr) memcheck/tests/pth_once (stderr) memcheck/tests/threadederrno (stderr) memcheck/tests/vgtest_ume (stderr) memcheck/tests/zeropage (stdout) corecheck/tests/fdleak_creat (stderr) leak-tree, manuel2, pth_once, threadederrno and vgtest_ume I've been getting for a while, and I'm not very worried about them. fdleak_creat is new. The diff is: ! at 0x........: creat (in /...libc...) ! by 0x........: __libc_start_main (in /...libc...) ! by 0x........: ... --- 6,7 ---- ! at 0x........: open (in /...libc...) ! by 0x........: main (fdleak_creat.c:18) ie. it looks like something has changed on my system so that libc's creat() no calls open(). So probably not a problem. The zeropage one I've not seen before. Valgrind used to prevent clients from doing an mmap(FIXED) in the bottom 64KB of memory. Jeremy, you changed VG_(valid_client_addr)() on January 15 (rev 1.229) to disable this. The commit log said: Misc changes needed so that Valgrind can run itself. I'm not sure if this was a good thing to do -- IIRC I original put that restriction in because some of the memory allocation functions use 0 to represent failure, and so if a program did mmap(0x0, FIXED) various problems could occur. I don't know why this test has been succeeding since Jan 15, only to fail now. Another problem... this program: int main(void) { read(0,0,1); } behaves differently under Valgrind compared to native -- it doesn't wait for user input. The problem also arises from January 15, when you changed almost every use of the PRE_MEM_WRITE macro to your new SYS_PRE_MEM_WRITE macro. Did you write these new macros to address a particular problem? I don't like like this macro, because it assumes that any unaddressable argument causes the syscall to fail, which is not necessarily the case. And I'm not keen on the obvious fix -- change sys_read to use PRE_MEM_WRITE -- because I imagine that this non-failure-on-unaddressable-memory behaviour is possible on any number of the syscalls. N |
|
From: Jeremy F. <je...@go...> - 2005-03-09 02:04:23
|
Nicholas Nethercote wrote:
> fdleak_creat is new. The diff is:
>
> ! at 0x........: creat (in /...libc...)
> ! by 0x........: __libc_start_main (in /...libc...)
> ! by 0x........: ...
> --- 6,7 ----
> ! at 0x........: open (in /...libc...)
> ! by 0x........: main (fdleak_creat.c:18)
>
> ie. it looks like something has changed on my system so that libc's
> creat() no calls open(). So probably not a problem.
What's the context? I presume it ends up calling the same syscall? It
doesn't look like anything more framepointer backtrace confusion.
> I'm not sure if this was a good thing to do -- IIRC I original put
> that restriction in because some of the memory allocation functions
> use 0 to represent failure, and so if a program did mmap(0x0, FIXED)
> various problems could occur.
Valgrind does a mmap(0, FIXED) as part of its address space padding; in
general this is a legitimate but unusual operation. We could make it a
clo which is off by default, but I'm not terribly keen on adding another
special case clo. I guess a client request is another option (please
let me mmap 0).
What problems are you thinking of? Other problems occurring within
Valgrind, or just that a program can get itself into a mess which we
would detect too late?
> Another problem... this program:
>
> int main(void)
> {
> read(0,0,1);
> }
>
> behaves differently under Valgrind compared to native -- it doesn't
> wait for user input.
>
> The problem also arises from January 15, when you changed almost every
> use of the PRE_MEM_WRITE macro to your new SYS_PRE_MEM_WRITE macro.
> Did you write these new macros to address a particular problem? I
> don't like like this macro, because it assumes that any unaddressable
> argument causes the syscall to fail, which is not necessarily the
> case. And I'm not keen on the obvious fix -- change sys_read to use
> PRE_MEM_WRITE -- because I imagine that this
> non-failure-on-unaddressable-memory behaviour is possible on any
> number of the syscalls.
The specific problem I was trying to solve is where Valgrind crashes
with a SIGSEGV if you pass bogus arguments to a syscall. This isn't so
much a problem for simple buffers like read(), but it is if the memory
points to a structure which Valgrind inspects.
In this case, the read will fail either way, but it blocks first when
run natively. I guess there is the possibility that someone will map
memory under it while it is blocked so that it won't end up failing.
Rather than setting EFAULT, the alternative is for SYS_PRE_MEM_* to
return some flag saying "bad memory" to prevent any further tests on
it. This would complicate things, since it means the PRE and POST
wrappers would need to be made aware of it if they touch syscall memory.
As it is, there are possible races where a thread will remove memory
under a syscall anyway, so touching memory in a POST() function isn't
strictly safe, even if it was present in the PRE().
Is this read(fd, 0, 1) example from a real program? Do you think this
will cause a real problem? This case isn't particularly well defined,
and I think older kernels would have failed it immediately. I don't
like having variences from native behaviour, but I don't think this is
too serious.
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-09 02:50:55
|
On Tue, 8 Mar 2005, Jeremy Fitzhardinge wrote:
>> ! at 0x........: creat (in /...libc...)
>> ! by 0x........: __libc_start_main (in /...libc...)
>> ! by 0x........: ...
>> --- 6,7 ----
>> ! at 0x........: open (in /...libc...)
>> ! by 0x........: main (fdleak_creat.c:18)
>>
>> ie. it looks like something has changed on my system so that libc's
>> creat() no calls open(). So probably not a problem.
>
> What's the context? I presume it ends up calling the same syscall? It
> doesn't look like anything more framepointer backtrace confusion.
strace tells me the program (run natively) calls the syscall open() rather
than creat(). Weird.
>> I'm not sure if this was a good thing to do -- IIRC I original put
>> that restriction in because some of the memory allocation functions
>> use 0 to represent failure, and so if a program did mmap(0x0, FIXED)
>> various problems could occur.
>
> Valgrind does a mmap(0, FIXED) as part of its address space padding; in
> general this is a legitimate but unusual operation. We could make it a
> clo which is off by default, but I'm not terribly keen on adding another
> special case clo. I guess a client request is another option (please
> let me mmap 0).
Neither of those are very appealing.
> What problems are you thinking of? Other problems occurring within
> Valgrind, or just that a program can get itself into a mess which we
> would detect too late?
The former. I think the issue was that it gets very confusing if you pass
0x0 as the 'addr' parameter to VG_(find_map_space)() -- because it
interprets that to mean "I don't care where you put it". Usually
VG_(find_map_space)() isn't called if you specify FIXED, but If you look
at the wrapper for mremap(), I think it will be an issue if you mremap()
some memory to address 0x0 with MREMAP_FIXED.
It's a type issue -- the problem comes from designating one particular
integer value as special, and then problems arise when you want to use
that value as a normal integer rather than the exceptional value.
VG_(find_map_space)() has inherited this problem from mmap() -- you can't
suggest to mmap(), without using MAP_FIXED, that you want the mapped
segment to go at 0x0.
The way to fix this is to add another Bool arg --
"use_address_as_suggestion" or something -- to VG_(find_map_space)(). If
it's true, we use the passed address as a suggestion (even if it's zero).
If it's false, we put the block anywhere.
Aside: malloc() suffers from a similar problem -- it can never put a heap
block at 0x0 because 0x0 means "no block allocated". Another consequence
of this idiom is that it's really easy to forget to check for the presence
of the exceptional value. That's why posix_memalign() returns a Bool, and
puts the address of the allocated block (if one was allocated) in the 2nd
argument pass-by-reference -- it doesn't steal the value, and it's much
harder to forget to check for failure. Unfortunately, NULL-as-failure is
so engrained in general C programmming that this problem will never go
away. In contrast, in languages like Haskell you have a "Maybe" type that
looks like this:
Maybe a = Just a | Nothing
where 'a' can be any type. The nice thing here is that you don't have to
steal one of your normal values to represent failure, and also it's
impossible to forget to check for the "Nothing" failure case.
>> Another problem... this program:
>>
>> int main(void)
>> {
>> read(0,0,1);
>> }
>>
>> behaves differently under Valgrind compared to native -- it doesn't
>> wait for user input.
>
> [...]
>
> Is this read(fd, 0, 1) example from a real program?
No, but it's my standard trick for forcing a program to pause at a
particular point -- usually to look at what /proc/self/maps looks like.
> Do you think this will cause a real problem? This case isn't
> particularly well defined, and I think older kernels would have failed
> it immediately. I don't like having variences from native behaviour,
> but I don't think this is too serious.
I'm not sure. I'm very uneasy about changing native behaviour. I'm
worried that in all the 250-odd syscalls there might be some cases that
are like this but occur in less contrived code.
N
|
|
From: Jeremy F. <je...@go...> - 2005-03-09 05:07:22
|
Nicholas Nethercote wrote:
> On Tue, 8 Mar 2005, Jeremy Fitzhardinge wrote:
> strace tells me the program (run natively) calls the syscall open()
> rather than creat(). Weird.
The creat() call was made obsolete by O_CREAT; if an architecture has a
creat() syscall, its only for ancient backwards compatibility.
> Neither of those are very appealing.
I had a better idea. We could:
* put back the ban on the lower 64k
* create a PROT_NONE mapping there
The mapping would appear in /proc/self/maps, and prevent a sub-Valgrind
from trying to put padding there. The only downside is the other
programs might get confused if they see the mapping there, and if they
hit a NULL pointer they'd get a SEGV_ACCERR rather than a SEGV_MAPERR
(which could be fixed up in the signal handler, because that's a piece
of code which really needs some more special cases).
> The former. I think the issue was that it gets very confusing if you
> pass 0x0 as the 'addr' parameter to VG_(find_map_space)() -- because
> it interprets that to mean "I don't care where you put it".
Yeah. mmap() uses smallish negative values to represent special pointer
values; that's why on x86-64, the 32-bit address space only goes up to
0xffff0000 (also the kernel internals represent the end of a mapping as
address-just-after, and having a mapping go from N-0 would be too
confusing).
> The way to fix this is to add another Bool arg --
> "use_address_as_suggestion" or something -- to VG_(find_map_space)().
> If it's true, we use the passed address as a suggestion (even if it's
> zero). If it's false, we put the block anywhere.
Or use (Addr)-1.
> In contrast, in languages like Haskell you have a "Maybe" type that
> looks like this:
>
> Maybe a = Just a | Nothing
Yep, it's one of my favorite things in CAML, along with pattern matching.
> No, but it's my standard trick for forcing a program to pause at a
> particular point -- usually to look at what /proc/self/maps looks like.
I generally use pause();
> I'm not sure. I'm very uneasy about changing native behaviour. I'm
> worried that in all the 250-odd syscalls there might be some cases
> that are like this but occur in less contrived code.
I'm not planning on worrying about it for 2.4.0 unless it breaks some
real code; we can reconsider it for 2.4.1+3.0.
J
|
|
From: Nicholas N. <nj...@cs...> - 2005-03-09 14:16:25
|
On Tue, 8 Mar 2005, Jeremy Fitzhardinge wrote: >> The way to fix this is to add another Bool arg -- >> "use_address_as_suggestion" or something -- to VG_(find_map_space)(). If >> it's true, we use the passed address as a suggestion (even if it's zero). >> If it's false, we put the block anywhere. > > Or use (Addr)-1. I prefer the extra Bool -- the code is clearer that way. N |
|
From: Nicholas N. <nj...@cs...> - 2005-03-09 03:51:53
Attachments:
change-copyright-year
|
On Tue, 8 Mar 2005, Troels Walsted Hansen wrote: > I found a very minor issue... The copyright date says 2004. :-) The attached script fixes that. Jeremy, can you run it? Instructions are within. N |
|
From: Oswald B. <os...@kd...> - 2005-03-09 06:22:07
|
On Tue, Mar 08, 2005 at 09:51:45PM -0600, Nicholas Nethercote wrote: > On Tue, 8 Mar 2005, Troels Walsted Hansen wrote: > >I found a very minor issue... The copyright date says 2004. :-) > > The attached script fixes that. Jeremy, can you run it? Instructions > are within. > this is an evil hack. copyright years should be updated per file, not in bulk. ideally, a pre-checkin script would do it. but note also, that changing the case of a string is not exactly a copyright-worthy change. oh, well, i'm too pedantic. ;) > for i in `find . -name '*'` ; do > if [ -f $i ] ; then > IFS=' ' # paranoia. for bash, $'\n' would do, too for i in `find . -type f`; do -- Hi! I'm a .signature virus! Copy me into your ~/.signature, please! -- Chaos, panic, and disorder - my work here is done. |
|
From: Jeremy F. <je...@go...> - 2005-03-09 10:28:42
|
Brad Hards wrote:
>It won't configure on my PPC box (well, I am an optimist - must speak to
>Paulus).
>
>
I should have mentioned that this release is still ia32 only...
>On a fairly up-to-date FC2 box, it configures and builds fine. make regtest
>fails three:
>== 199 tests, 1 stderr failure, 2 stdout failures =================
>memcheck/tests/zeropage (stdout)
>corecheck/tests/sigkill (stderr)
>none/tests/exec-sigmask (stdout)
>
>Result of running each of those is below. I don't know how much this helps, or
>what else I can do, so a bit of direction is probably required if more info
>is required.
>
>
Those are expected, unfortunately. They don't represent real bugs, but
the regression test suite is a bit brittle about differences between
different libc implementations. The zeropage is testing for something
we no longer enforce (though that may change back).
J
|
|
From: Aleksander S. <A....@os...> - 2005-03-09 13:09:01
|
> I've made a test release of 2.4.0-rc1 and put it at > http://www.goop.org/~jeremy/valgrind/dist. I've included the source > [...] > Please try it out. If it looks good, I think we should ship it. One bug found: http://bugs.kde.org/show_bug.cgi?id=101173 Best regards, Aleksander -- Aleksander Salwa OSMOSYS Technologies ul. Rozdzienskiego 188B Katowice, Poland http://www.osmosys.tv |