|
From: Julian S. <js...@ac...> - 2006-12-31 20:08:27
|
I've been testing the 3.2 branch on OpenSUSE 10.2 (kernel 2.6.18.2, glibc-2.5) and mostly it works pretty well. However, it can't run bash: sewardj@imac:~/Vg32BRANCH/branch32$ ./Inst/bin/valgrind /bin/bash [... preamble ...] ERROR: ld.so: object '/home/sewardj/Vg32BRANCH/branch32/Inst/lib/valgrind/ppc32-linux/vgpreload_core.so' from LD_PRELOAD cannot be preloaded: ignored. ERROR: ld.so: object '/home/sewardj/Vg32BRANCH/branch32/Inst/lib/valgrind/ppc32-linux/vgpreload_memcheck.so' from LD_PRELOAD cannot be preloaded: ignored. /bin/bash: error while loading shared libraries: libreadline.so.5: failed to map segment from shared object: Cannot allocate memory V starts up OK and loads /bin/bash and its ELF interpreter as usual, and starts it. However, at some point ld.so is doing some fixed mmaps for the 3 .so's mentioned above, and these are failing: sys_mmap ( 0xFFFDE000, 69636, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC) sys_mmap ( 0xFFFD9000, 90236, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC) sys_mmap ( 0xFFFA1000, 319664, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC) This strikes me as strange because the addresses (0xFFFDE000 etc) are almost at the end of 4G. It's also strange because other programs, including large ones, run just fine, and these .so's are mapped quite low, as is normal. Is it legitimate for ld.so to attempt to map anything to the 0xFFFDxxxx areas? Doing a plain 'cat /proc/self/maps' shows that the stack is placed just below the 2G boundary and there is nothing above, which makes me think this kernel is running in a 2G+2G configuration. This machine is an old G3, if that's relevant: processor : 0 cpu : 740/750 temperature : 31-33 C (uncalibrated) clock : 400.000000MHz revision : 131.0 (pvr 0008 8300) bogomips : 49.79 timebase : 24967326 platform : PowerMac machine : PowerMac2,1 motherboard : PowerMac2,1 MacRISC2 MacRISC Power Macintosh detected as : 66 (iMac FireWire) pmac flags : 00000014 L2 cache : 512K unified pmac-generation : NewWorld I tested some other architecture variations and cannot reproduce it on any other setup: SuSE 10.2 on x86 and amd64 SuSE 10.1 on ppc32 Any ideas what might be going on? J |
Julian Seward wrote: > I've been testing the 3.2 branch on OpenSUSE 10.2 (kernel 2.6.18.2, glibc-2.5) > and mostly it works pretty well. However, it can't run bash: ... > sys_mmap ( 0xFFFDE000, 69636, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC) > sys_mmap ( 0xFFFD9000, 90236, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC) > sys_mmap ( 0xFFFA1000, 319664, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC) > > This strikes me as strange because the addresses (0xFFFDE000 etc) are almost > at the end of 4G. It's also strange because other programs, including large > ones, run just fine, and these .so's are mapped quite low, as is normal. It's also strange because the file offset (6th argument) is 0 from the same fd 3 (5th argument) but with three different sizes (2nd argument). Is this three successive attempts at a single mapping, or serial execution of a parallel mapping? > > Is it legitimate for ld.so to attempt to map anything to the 0xFFFDxxxx > areas? Without MAP_FIXED in the 4th argument, then a non-zero 1st argument is just a preference which the kernel may choose to ignore, and should ignore if outside the apparent 2G user space. ld-linux sets such a preference for a pre-linked .so. This can be done by mmap(phdr[0].p_vaddr, ... which "just happens" to be what ET_EXEC also requires. Doing a plain 'cat /proc/self/maps' shows that the stack is placed > just below the 2G boundary and there is nothing above, which makes me > think this kernel is running in a 2G+2G configuration. ... I have encountered similar symptoms on x86 without valgrind and with earlier versions of kernel and glibc and my code. The sys_mmap ( 0xFFFDE000, 69636, 5, 2050,,) with a first argument that is close to 4G suggests a mixup with respect to pre-linking and/or -fPIE (position-independent executable) and/or policy to randomize address space. The address was formed by subtracting a size from 0. 2050==0x802 ==> (MAP_EXEC | MAP_DENYWRITE) which is the usual protection for shared .text, and does not include MAP_FIXED (0x10). This also is suspicious, because without MAP_FIXED the kernel gives no guarantee of relative position among the 3 PT_LOAD from the same file, which /bin/ld [or the creator of that file] probably expected. And if MAP_FIXED _is_ present, then there should have been a prior "reservation" mmap() of size roundup(69636) + roundup(90236) + roundup(319664) in order for ld-linux to preserve relative position. Run under strace. Note which file corresponds to fd 3, and run "readelf --segments" on that file. There should be 3 PT_LOAD with sizes that match the second argument to mmap(), unless the 6th argument being 0 is telling us that these were successive attempts at a single mapping, instead of serialized implementation of parallel mapping. If the .p_vaddr of the lowest-addressed PT_LOAD is not 0, then pre-linking definitely is involved. Un-prelink (and/or un-PIE, and/or un-set random assignment policy) and try again. If .p_vaddr is 0, then consult the code to ld-linux. Also check the recent history of fs/binfmt_elf.c and related vma code in linux kernel. -- |
|
From: Julian S. <js...@ac...> - 2007-01-01 21:49:45
|
On Sunday 31 December 2006 22:50, John Reiser wrote:
> Julian Seward wrote:
> > I've been testing the 3.2 branch on OpenSUSE 10.2 (kernel 2.6.18.2,
> > glibc-2.5) and mostly it works pretty well. However, it can't run bash:
>
> ...
>
> > sys_mmap ( 0xFFFDE000, 69636, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0x=
C)
> > sys_mmap ( 0xFFFD9000, 90236, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0x=
C)
> > sys_mmap ( 0xFFFA1000, 319664, 5, 2050, 3, 0 ) --> [pre-fail]
> > Failure(0xC)
> >
> > This strikes me as strange because the addresses (0xFFFDE000 etc) are
> > almost at the end of 4G. It's also strange because other programs,
> > including large ones, run just fine, and these .so's are mapped quite
> > low, as is normal.
On further investigation I am more mystified. I have established that this
is not a regression, as 3.2.1 fails similarly on ppc32, and also that it is=
=20
not caused by the 64k page stuff added to 3.2.2.
I cannot afford to spend any more time on this, so maybe some Linux-on-PPC
person can chase it more if required.
Here's what else I discovered:
The bogus mmap addresses are produced by __elf_preferred_address in
glibc-2.5/sysdeps/powerpc/powerpc32/dl-machine.c (I assume). They are
handed off to the failing mmap in _dl_map_object_from_fd in dl-load.c:
/* This is a position-independent shared object. We can let the
kernel map it anywhere it likes, but we must have space for all
the segments in their specified positions relative to the first.
So we map the first segment without MAP_FIXED, but with its
extent increased to cover all the segments. Then we remove
access from excess portion, and there is known sufficient space
there to remap from the later segments.
As a refinement, sometimes we have an address that we would
prefer to map such objects at; but this is only a preference,
the OS can do whatever it likes. */
ElfW(Addr) mappref;
mappref =3D (ELF_PREFERRED_ADDRESS (loader, maplength,
c->mapstart
& GLRO(dl_use_load_bias))
- MAP_BASE_ADDR (l));
/* Remember which part of the address space this object uses. */
l->l_map_start =3D (ElfW(Addr)) __mmap ((void *) mappref, maplength,
c->prot,
MAP_COPY|MAP_FILE,
fd, c->mapoff);
if (__builtin_expect ((void *) l->l_map_start =3D=3D MAP_FAILED, 0))
{
map_error:
errstring =3D N_("failed to map segment from shared object");
goto call_lose_errno;
}
=46rom comparing against strace results, I see the failing mmap addresses a=
re
exactly of 0x80000000 greater than the corresponding values from strace.
This made me wonder if there is some sign-extend bug in the 32-bit virtual
ppc CPU code, but I could not find any such, and besides that code has been
extensively tested and hammered on this past year.
I also discovered the same mmap-fail problem occurs in various other
situations:
kernel in 32-bit mode, openSUSE 10.2, running bash
kernel in 32-bit mode, openSUSE 10.2, running ssh
kernel in 32-bit mode, openSUSE 10.1, running ssh (but bash is OK)
=46or a kernel in 64-bit mode (on ppc970), openSUSE 10.1, both bash and
ssh run fine, even though they are still 32-bit executables.
I notice that 32-bit mode kernels appear to have 2G+2G userspace split,
whereas a 64-bit kernel running a 32-bit executable can offer that exe
a full 4G of its own.
Anyway, this is all just for the record. Am not chasing it further.
J
|
|
From: Paul M. <pa...@sa...> - 2007-01-09 05:32:49
|
Julian Seward writes: > V starts up OK and loads /bin/bash and its ELF interpreter as usual, and > starts it. However, at some point ld.so is doing some fixed mmaps for the > 3 .so's mentioned above, and these are failing: > > sys_mmap ( 0xFFFDE000, 69636, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC) > sys_mmap ( 0xFFFD9000, 90236, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC) > sys_mmap ( 0xFFFA1000, 319664, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC) > > This strikes me as strange because the addresses (0xFFFDE000 etc) are almost > at the end of 4G. It's also strange because other programs, including large > ones, run just fine, and these .so's are mapped quite low, as is normal. It looks like those libraries have been prelinked on a 64-bit system and then moved to a 32-bit system. 32-bit processes get more address space (4GB) on a 64-bit machine than an a 32-bit machine (where they get either 2GB or 3GB). However, I would expect the dynamic linker to recover from the mmap failure and just retry by mapping the library somewhere else. Strange. Paul. |
|
From: Tom H. <to...@co...> - 2007-01-09 08:41:24
|
In message <178...@ca...>
Paul Mackerras <pa...@sa...> wrote:
> Julian Seward writes:
>
>> V starts up OK and loads /bin/bash and its ELF interpreter as usual, and
>> starts it. However, at some point ld.so is doing some fixed mmaps for the
>> 3 .so's mentioned above, and these are failing:
>>
>> sys_mmap ( 0xFFFDE000, 69636, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC)
>> sys_mmap ( 0xFFFD9000, 90236, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC)
>> sys_mmap ( 0xFFFA1000, 319664, 5, 2050, 3, 0 ) --> [pre-fail] Failure(0xC)
>>
>> This strikes me as strange because the addresses (0xFFFDE000 etc) are almost
>> at the end of 4G. It's also strange because other programs, including large
>> ones, run just fine, and these .so's are mapped quite low, as is normal.
>
> It looks like those libraries have been prelinked on a 64-bit system
> and then moved to a 32-bit system. 32-bit processes get more address
> space (4GB) on a 64-bit machine than an a 32-bit machine (where they
> get either 2GB or 3GB).
>
> However, I would expect the dynamic linker to recover from the mmap
> failure and just retry by mapping the library somewhere else.
> Strange.
That's not how prelinking handles failures. If you look at those mmap
calls you will see that although there is an address given, the flags
do not include MAP_FIXED so the address is only a hint.
What mmap is supposed to do is that if it can't use the supplied
address then it should just choose another one at random. The dynamic
linker will then notice that the mapping didn't happen at the
prelinked address and do the relocations to fix things up.
The fact that valgrind is returning ENOMEM will make the dynamic
linker think that there really isn't enough memory anywhere in the
process space to map the library, so it will give up.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2007-01-09 16:43:16
|
> Paul Mackerras <pa...@sa...> wrote: > > >> This strikes me as strange because the addresses (0xFFFDE000 etc) are > >> almost at the end of 4G. It's also strange because other programs, > >> including large ones, run just fine, and these .so's are mapped quite > >> low, as is normal. > > > > It looks like those libraries have been prelinked on a 64-bit system > > and then moved to a 32-bit system. 32-bit processes get more address > > space (4GB) on a 64-bit machine than an a 32-bit machine (where they > > get either 2GB or 3GB). > > > > However, I would expect the dynamic linker to recover from the mmap > > failure and just retry by mapping the library somewhere else. Paul, Tom, thanks for the analysis. With that info the problem is obvious. V's address space manager has a clear idea of what address ranges it doesn't want the client to use, but it doesn't have a clear idea of what ranges the kernel might refuse. ld.so asks for a hinted mapping at 0xFFFDE000, aspacemgr says "ok by me", ML_(generic_PRE_sys_mmap) duly presents that to the kernel, but now as MAP_FIXED, and the kernel refuses. So the syscall wrapper for mmap hands the failure back to ld.so. My fix is: in ML_(generic_PRE_sys_mmap), if the kernel refuses what was originally a hinted mapping, try again as a non-hinted mapping. That appears to fix the problem. J |
|
From: Julian S. <js...@ac...> - 2007-01-09 15:09:32
|
> The fact that valgrind is returning ENOMEM will make the dynamic > linker think that there really isn't enough memory anywhere in the > process space to map the library, so it will give up. So then it's a bug in m_aspacemgr's handling of hinted mappings that can't be placed at the hint address, yes? I better peer at it I suppose. J |
|
From: Tom H. <to...@co...> - 2007-01-09 15:24:37
|
In message <200...@ac...>
Julian Seward <js...@ac...> wrote:
>> The fact that valgrind is returning ENOMEM will make the dynamic
>> linker think that there really isn't enough memory anywhere in the
>> process space to map the library, so it will give up.
>
> So then it's a bug in m_aspacemgr's handling of hinted mappings that
> can't be placed at the hint address, yes? I better peer at it I suppose.
I think it must be, yes.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
Julian Seward wrote:
> V's address space manager has a clear idea of what address ranges it
> doesn't want the client to use, but it doesn't have a clear idea of what
> ranges the kernel might refuse.
It seems to me that there still may be some misunderstandings here.
These misunderstandings did cause me trouble in the past, particularly
with randomized placement for mmap() turned on by Fedora Core.
A pageframe that is mapped already is never ("re-")allocated by mmap(),
unless the mmap specifies MAP_FIXED and the interval covers that page.
The only reliable way to preserve an address range that V's address space
manager does not want the kernel to use, is to "reserve" it by using
mmap(addr, length, PROT_NONE, MAP_ANON|MAP_FIXED,,). Otherwise the kernel
is free to use any portion of the interval [addr, length + addr) for
_any_ mmap(), hinted or not, as long as MAP_FIXED is not requested.
Randomization is allowed to ignore hints, and sometimes does [has].
On a 32-bit machine this may require the address space manager to
track all pageframes; but there are at most 1M pageframes [4KB pages],
so a 128KB bitmap suffices. On a 64-bit machine the manager probably
must know something about segments anyway.
> ld.so asks for a hinted mapping at 0xFFFDE000, aspacemgr says "ok by me",
> ML_(generic_PRE_sys_mmap) duly presents that to the kernel, but now as
> MAP_FIXED, and the kernel refuses. So the syscall wrapper for mmap hands
> the failure back to ld.so.
If the virtualizer changes the arguments to a system call, then the
virtualizer should handle internally any resulting error conditions
that are due to the changes. The strategy above (handing the failure
of mmap(,,,MAP_FIXED,,) back to ld.so even though ld.so did not ask
for MAP_FIXED) violates this rule of good programming practice. The code
might work today on some systems, but probably it will fail mysteriously
on other systems or in the future.
> My fix is: in ML_(generic_PRE_sys_mmap), if the kernel refuses what was
> originally a hinted mapping, try again as a non-hinted mapping. That
> appears to fix the problem.
I agree that this is likely to work in many cases. However, a kernel which
actively randomizes placement of mmap still will cause trouble by violating
the assumed reservations that the address space manager has not communicated
to the kernel.
--
|
|
From: Tom H. <to...@co...> - 2007-01-09 18:15:44
|
In message <45A3D9DE.2050901@BitWagon.com>
John Reiser <jreiser@BitWagon.com> wrote:
> It seems to me that there still may be some misunderstandings here.
> These misunderstandings did cause me trouble in the past, particularly
> with randomized placement for mmap() turned on by Fedora Core.
>
> A pageframe that is mapped already is never ("re-")allocated by mmap(),
> unless the mmap specifies MAP_FIXED and the interval covers that page.
Agreed.
> The only reliable way to preserve an address range that V's address space
> manager does not want the kernel to use, is to "reserve" it by using
> mmap(addr, length, PROT_NONE, MAP_ANON|MAP_FIXED,,).
We've been there, done that, and got the T-Shirt. It proved not to be
a workable solution.
> Otherwise the kernel
> is free to use any portion of the interval [addr, length + addr) for
> _any_ mmap(), hinted or not, as long as MAP_FIXED is not requested.
> Randomization is allowed to ignore hints, and sometimes does [has].
> On a 32-bit machine this may require the address space manager to
> track all pageframes; but there are at most 1M pageframes [4KB pages],
> so a 128KB bitmap suffices. On a 64-bit machine the manager probably
> must know something about segments anyway.
What we do now is to maintain a shadow map of the address space, and
apply the allocation rules ourselves and translate all mmap calls into
calls with MAP_FIXED set.
So an mmap with no address hint will cause valgrind to pick an address
that it believes to be free and then do a MAP_FIXED call. With a hint
we try and use the hinted address and if not choose another, then do a
call with MAP_FIXED for the chosen address, A real MAP_FIXED call is
done as is.
So all maps wind up being done as fixed mappings.
> If the virtualizer changes the arguments to a system call, then the
> virtualizer should handle internally any resulting error conditions
> that are due to the changes. The strategy above (handing the failure
> of mmap(,,,MAP_FIXED,,) back to ld.so even though ld.so did not ask
> for MAP_FIXED) violates this rule of good programming practice. The code
> might work today on some systems, but probably it will fail mysteriously
> on other systems or in the future.
Exactly, which is the bug Julian has fixed, that if the fixed call
fails then we try and find another address to use instead. That way
we are preserving the normal kernel behaviour when the hint address
is not available.
> > My fix is: in ML_(generic_PRE_sys_mmap), if the kernel refuses what was
> > originally a hinted mapping, try again as a non-hinted mapping. That
> > appears to fix the problem.
>
> I agree that this is likely to work in many cases. However, a kernel which
> actively randomizes placement of mmap still will cause trouble by violating
> the assumed reservations that the address space manager has not communicated
> to the kernel.
No, because we never let the kernel's address space randomiser get a
look in as we do all maps as fixed ones ;-)
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|