|
From: John R.
|
Hi,
I'd like more support in valgrind for use of x86 segment registers
as on Linux. The current hot application is wine, the open-source
implementation of the Win32 API, but there are others. I'm exploring
changes which would do what I want. This message seeks comment.
Coregrind currently supports x86 segment registers via set_thread_area
and modify_ldt, and the instruction translator recognizes segment
prefix opcode bytes. But the initial guest environment has no gdt
(Global Descriptor Table) nor ldt (Local Descriptor Table), so any
use of a segment prefix fails before set_thread_area or modify_ldt.
In particular, this example program runs fine from shell, but gets
a [pseudo-]SIGSEGV at "movl %eax,%es:4(%edi)" when run under memcheck.
[Well, after I added support for lds/les/lfs/lgs/lss opcodes.]
-----
_start: .globl _start // gcc -o segpfx -nostartfiles -nostdlib segpfx.S
nop
push %ss; push %esp
les (%esp),%edi // %es= %ss; %edi= 4+ %esp
movl %eax,%es:4(%edi) // overwrites argc
movw (%esp),%es // %es= %ss {compare with 'les' above}
sub %ebx,%ebx
lea 1(%ebx),%eax
int $0x80 // exit(0)
-----
The exact failure mechanism is x86g_use_seg_selector finding 0==gdt,
thus the necessary 8-byte descriptor is missing.
For an even shorter testcase, just use a redundant %ds prefix:
-----
_start: .globl _start // gcc -nostartfiles -nostdlib
nop
movl %esp,%ecx // hardware always uses %ss when %esp is base reg
movl %ds:(%ecx),%ebx // argc as exit code ;-)
movl $1,%eax
int $0x80 // exit(argc)
-----
The change I'm considering is having the translator allocate a gdt
if there is none when a segment prefix is seen. I'll also set up
8-byte descriptors for %cs= 0x73 and %ds=%es=%ss= 0x7b
with base=0 and limit=0xbffff (0xC0000000 in 4KB pages); those are
the current initial values for a process in Linux with the most
common configuration of addressing (3GB user + 1GB kernel.)
Comments?
--
John Reiser, jreiser@BitWagon.com
|
|
From: Tom H. <to...@co...> - 2008-05-29 17:32:55
|
In message <483ED8DD.8070903@BitWagon.com>
John Reiser <jreiser@BitWagon.com> wrote:
> I'd like more support in valgrind for use of x86 segment registers
> as on Linux. The current hot application is wine, the open-source
> implementation of the Win32 API, but there are others. I'm exploring
> changes which would do what I want. This message seeks comment.
We seem to be running wine under valgrind without any segment
related problems - is there something in particular in wine that
is triggering this for you?
> The change I'm considering is having the translator allocate a gdt
> if there is none when a segment prefix is seen.
Is there any advantage to deferring it rather than just setting
it up upfront?
> I'll also set up
> 8-byte descriptors for %cs= 0x73 and %ds=%es=%ss= 0x7b
> with base=0 and limit=0xbffff (0xC0000000 in 4KB pages); those are
> the current initial values for a process in Linux with the most
> common configuration of addressing (3GB user + 1GB kernel.)
Is there no way to find out what a processes default descriptors
would be rather than having to make this assumption?
We know from past experience that it is quite common for valgrind
users to have all sorts of odd user/kernel splits. In fact as far
as I know most recent RedHat/Fedora systems have a 4G user space.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2008-05-29 20:03:01
|
> So far, the wine regression tests "make test" for comctl32 image.c
> trigger several complaints from memcheck about referencing variables
> that were on the stack but are no longer on the protected side of
> the stack pointer. This is in the "relay" code which modulates
> subroutine calls between different calling conventions. Indeed there
> are 5 "push nnn(%ecx)" where (nnn+%ecx) < %esp now, although
> %esp <= (nnn+%ecx) a short while ago. Fixing the "relay" code
> to draw no complaints from memcheck requires using a segment prefix.
How far below %esp do these references extend? Memcheck routinely
deals with accesses below the stack pointer on {amd64,ppc64}-linux
and {ppc32,ppx64}-aix5. See VG_STACK_REDZONE_SZB in
include/pub_tool_machine.h.
J
|
|
From: John R.
|
>>So far, the wine regression tests "make test" for comctl32 image.c
>>trigger several complaints from memcheck about referencing variables
>>that were on the stack but are no longer on the protected side of
>>the stack pointer.
> How far below %esp do these references extend? Memcheck routinely
> deals with accesses below the stack pointer on {amd64,ppc64}-linux
> and {ppc32,ppx64}-aix5. See VG_STACK_REDZONE_SZB in
> include/pub_tool_machine.h.
In wine the redzone could be up to 0x2c8 bytes (sizeof(CONTEXT86)),
and today the size would have to be at least (0x2c8 - 0x98).
In this case I'm not [yet] enthusiastic about using a red zone. For one
reason, on i686 neither the Linux kernel nor glibc agrees that there is a
redzone at all. Both freely scribble on anything below %esp.
Yes, it should be possible to use sigprocmask to prevent kernel scribbling,
but keeping everything under control is problematic, and the result
of failure is non-reproducible behvior which is very costly in
developer time.
For another reason, the redzone consumes available space. Some of
the cases that matter have hard limits on stack space, such as 12KB,
and 0x2c8 bytes cannot be ignored when the limit is only 0x3000.
Also, the wine "relay" code exercised by comctl32 image.c is only the
first case. I believe that calls between 32-bit and 16-bit code
may be next. Of course 16-bit x86 code is not supported today
by V tools, but there may be other "bare" uses of segment prefixes
in 32-bit code.
--
John Reiser, jreiser@BitWagon.com
|
|
From: John R.
|
Tom Hughes wrote: > In message <483ED8DD.8070903@BitWagon.com> > John Reiser <jreiser@BitWagon.com> wrote: > > >>I'd like more support in valgrind for use of x86 segment registers >>as on Linux. The current hot application is wine, the open-source >>implementation of the Win32 API, but there are others. I'm exploring >>changes which would do what I want. This message seeks comment. > > > We seem to be running wine under valgrind without any segment > related problems - is there something in particular in wine that > is triggering this for you? So far, the wine regression tests "make test" for comctl32 image.c trigger several complaints from memcheck about referencing variables that were on the stack but are no longer on the protected side of the stack pointer. This is in the "relay" code which modulates subroutine calls between different calling conventions. Indeed there are 5 "push nnn(%ecx)" where (nnn+%ecx) < %esp now, although %esp <= (nnn+%ecx) a short while ago. Fixing the "relay" code to draw no complaints from memcheck requires using a segment prefix. Fixing the code is required because VEX "dirty helpers" (including future ones such as getting FP and XMM state into signal frame), and possibly delivery of Linux signals, might overwrite the source of the 'push'. Indeed, I inserted various 'printf' to see what was going on, and wound up overwriting the source for some of the 'push'. [Enhancement opportunity: This stack is bounded (a 12KB limit that is logically "hard"), so it would be useful to have an option of checking when stack allocations run into an existing block. I have applied a temporary partial development fix using my SET_BOGEY user request at the hard lower bound.] > > >>The change I'm considering is having the translator allocate a gdt >>if there is none when a segment prefix is seen. > > > Is there any advantage to deferring it rather than just setting > it up upfront? If deferred, then there is essentially no runtime cost (space, time) unless a program uses segment prefixes. The only cost is some minor complexity: a few 'if' statements sprinked into the translator. > > >>I'll also set up >>8-byte descriptors for %cs= 0x73 and %ds=%es=%ss= 0x7b >>with base=0 and limit=0xbffff (0xC0000000 in 4KB pages); those are >>the current initial values for a process in Linux with the most >>common configuration of addressing (3GB user + 1GB kernel.) > > > Is there no way to find out what a processes default descriptors > would be rather than having to make this assumption? The default descriptors are an operating system policy. Before NPTL (Native Posix Threading Library) there was even some use of descriptors in the LDT. But for a few years now (>=kernel-2.6.9 ?) the Linux defaults are in the GDT at 0x73 and 0x7B. > > We know from past experience that it is quite common for valgrind > users to have all sorts of odd user/kernel splits. In fact as far > as I know most recent RedHat/Fedora systems have a 4G user space. "4G/4G" may be an option, but 3G/1G is the default on Fedora 6,7,8,9. My reading is that the 10% to 25% slowdown in syscall performance seen by many 4G/4G boxes was enough to relegate 4G/4G to niche cases. x86_64 boxes became inexpensive enough a couple years ago, and many corporate purchasing cycles have had enough time to replace old i686 boxes. About the only use I see of 4G/4G is in academia, the third world, or the tail end of 5-year coporate cycles. Even in those cases, careful user-level management of 3G/1G often can be helpful; see http://bitwagon.com/tub/tub.html . When host and guest architecture agree, then rounding up the initial stack pointer of the tool itself might be a good way to guess the segment limit. Another way might be to inspect the Elf32_Phdr.p_type for PT_GNU_STACK, then look at .p_vaddr; but the Phdr are not used for this purpose today. -- John Reiser, jreiser@BitWagon.com |
|
From: Tom H. <to...@co...> - 2008-05-29 23:46:16
|
In message <483F0843.4050806@BitWagon.com>
John Reiser <jreiser@BitWagon.com> wrote:
> Tom Hughes wrote:
>
> > Is there any advantage to deferring it rather than just setting
> > it up upfront?
>
> If deferred, then there is essentially no runtime cost (space, time)
> unless a program uses segment prefixes. The only cost is some minor
> complexity: a few 'if' statements sprinked into the translator.
Fine - it's a long time since I looked at this stuff, but if there
is a cost to having a GDT setup that deferring it sounds good.
> > Is there no way to find out what a processes default descriptors
> > would be rather than having to make this assumption?
>
> The default descriptors are an operating system policy. Before NPTL
> (Native Posix Threading Library) there was even some use of descriptors
> in the LDT. But for a few years now (>=kernel-2.6.9 ?) the Linux
> defaults are in the GDT at 0x73 and 0x7B.
Can valgrind not read the descriptors and/or tables it is given
and then emulate those to the client? or is that not an appropriate
assumption to make (or impossible for a user process)?
> > We know from past experience that it is quite common for valgrind
> > users to have all sorts of odd user/kernel splits. In fact as far
> > as I know most recent RedHat/Fedora systems have a 4G user space.
>
> "4G/4G" may be an option, but 3G/1G is the default on Fedora 6,7,8,9.
I'm obviously out of date then - I mostly work on 64 bit machines
these days anyway. I think they did have a 4G/4G default on 32 bit
at one point but obviously they decided it wasn't a good idea.
> My reading is that the 10% to 25% slowdown in syscall performance
> seen by many 4G/4G boxes was enough to relegate 4G/4G to niche cases.
> x86_64 boxes became inexpensive enough a couple years ago, and
> many corporate purchasing cycles have had enough time to replace
> old i686 boxes. About the only use I see of 4G/4G is in academia,
> the third world, or the tail end of 5-year coporate cycles. Even in
> those cases, careful user-level management of 3G/1G often can be helpful;
> see http://bitwagon.com/tub/tub.html .
I think the more common thing we see is 2G/2G and even 1G/3G on
occassions, thought I'm not sure valgrind handles that at all. It
seems to be particularly common with embedded systems using odd
custom kernels.
> When host and guest architecture agree, then rounding up the initial
> stack pointer of the tool itself might be a good way to guess the
> segment limit. Another way might be to inspect the Elf32_Phdr.p_type
> for PT_GNU_STACK, then look at .p_vaddr; but the Phdr are not used
> for this purpose today.
Rounding up the initial stack pointer is probably a reasonable
idea if it's just a question of guessing the split. I think that's
what we've done in the past.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Julian S. <js...@ac...> - 2008-05-30 07:56:43
|
> I think the more common thing we see is 2G/2G and even 1G/3G on > occassions, thought I'm not sure valgrind handles that at all. It > seems to be particularly common with embedded systems using odd > custom kernels. I think all those get handled correctly. That's why the default load address is 0x38000000, so it falls just below the 1G boundary. J |
|
From: Julian S. <js...@ac...> - 2008-05-29 20:17:02
|
On Thursday 29 May 2008 21:56, Julian Seward wrote: > > So far, the wine regression tests "make test" for comctl32 image.c > > trigger several complaints from memcheck about referencing variables > > that were on the stack but are no longer on the protected side of > > the stack pointer. This is in the "relay" code which modulates > > subroutine calls between different calling conventions. Thinking about this more, why is it legitimate for Wine to access memory below %esp on x86? Usually we would think of any program that did so as broken. Can you clarify? J |
|
From: Julian S. <js...@ac...> - 2008-05-30 08:41:21
|
> >>>So far, the wine regression tests "make test" for comctl32 image.c > >>>trigger several complaints from memcheck about referencing variables > >>>that were on the stack but are no longer on the protected side of > >>>the stack pointer. This is in the "relay" code which modulates > >>>subroutine calls between different calling conventions. > > > > Thinking about this more, why is it legitimate for Wine to access > > memory below %esp on x86? Usually we would think of any program > > that did so as broken. Can you clarify? > > Perhaps such references are supposed to occur only when scribbling > is impossible. I have seen things such as "enter critical section". > But those calls are not immediately close to the references on the > wrong side of the stack pointer, so there must be an assumption that > the caller has guaranteed the lockout. And then there is the > "semantic gap": a Win32 critical section is not necessarily the same > on Linux, and in any case a VEX "dirty helper" violates it. > Perhaps all Win32 signals must be delivered on an alternate > signal stack. I have to say I don't understand this at all. Regardless of Win32 code, critical sections, or whatever, the process is running as a Linux user space process and so is bound by the kernel's signal delivery behaviour: the kernel can nuke anything below 0(%esp) at any time. The only way I can see that this could possibly be safe is to block all signal delivery at the time, and I'm not even sure that's really possible -- I don't think it is possible to block delivery of a synchronous signal. It would be interesting to have a Wine programmer to explain what's going on here and why it is considered to be safe. J |
|
From: John R.
|
>>>So far, the wine regression tests "make test" for comctl32 image.c >>>trigger several complaints from memcheck about referencing variables >>>that were on the stack but are no longer on the protected side of >>>the stack pointer. This is in the "relay" code which modulates >>>subroutine calls between different calling conventions. > > > Thinking about this more, why is it legitimate for Wine to access > memory below %esp on x86? Usually we would think of any program > that did so as broken. Can you clarify? Perhaps such references are supposed to occur only when scribbling is impossible. I have seen things such as "enter critical section". But those calls are not immediately close to the references on the wrong side of the stack pointer, so there must be an assumption that the caller has guaranteed the lockout. And then there is the "semantic gap": a Win32 critical section is not necessarily the same on Linux, and in any case a VEX "dirty helper" violates it. Perhaps all Win32 signals must be delivered on an alternate signal stack. Also, many debugging techniques assume that the stack pointer itself is the only authority. It is safer to write the code as if that is true. In this case that requires segment override prefix. -- John Reiser, jreiser@BitWagon.com |
|
From: Eric P. <eri...@or...> - 2008-05-30 09:02:46
|
Julian Seward a écrit : >>>>> So far, the wine regression tests "make test" for comctl32 image.c >>>>> trigger several complaints from memcheck about referencing variables >>>>> that were on the stack but are no longer on the protected side of >>>>> the stack pointer. This is in the "relay" code which modulates >>>>> subroutine calls between different calling conventions. >>>>> >>> Thinking about this more, why is it legitimate for Wine to access >>> memory below %esp on x86? Usually we would think of any program >>> that did so as broken. Can you clarify? >>> >> Perhaps such references are supposed to occur only when scribbling >> is impossible. I have seen things such as "enter critical section". >> But those calls are not immediately close to the references on the >> wrong side of the stack pointer, so there must be an assumption that >> the caller has guaranteed the lockout. And then there is the >> "semantic gap": a Win32 critical section is not necessarily the same >> on Linux, and in any case a VEX "dirty helper" violates it. >> Perhaps all Win32 signals must be delivered on an alternate >> signal stack. >> > > I have to say I don't understand this at all. > > Regardless of Win32 code, critical sections, or whatever, the > process is running as a Linux user space process and so is bound > by the kernel's signal delivery behaviour: the kernel can nuke > anything below 0(%esp) at any time. The only way I can see that this > could possibly be safe is to block all signal delivery at the time, > and I'm not even sure that's really possible -- I don't think it > is possible to block delivery of a synchronous signal. > > It would be interesting to have a Wine programmer to explain what's > going on here and why it is considered to be safe. > > or using a dedicated signal stack, which is what Wine does A+ -- Eric Pouech "The problem with designing something completely foolproof is to underestimate the ingenuity of a complete idiot." (Douglas Adams) |
|
From: Julian S. <js...@ac...> - 2008-05-30 09:04:27
|
> > It would be interesting to have a Wine programmer to explain what's > > going on here and why it is considered to be safe. > > or using a dedicated signal stack, which is what Wine does Ah, of course. My apologies. J |