|
From: Eric P. <eri...@wa...> - 2006-01-26 20:31:30
|
Hi folks, I've just been trying to make Wine run under valgrind (3.1.0). Things greately evolved since our last tries: we can now run lots of applications, and a major part of our test suite under valgrind. congrats folks!! The down side being of course some more bugs to fix <g>. I've posted two patches in bugzilla to improve some parts, which hopefully can make for 3.2.0. There's still a showstopper for a better Wine support. The way we handle SEH (structured exception handling in Windows's terms) doesn't work with valgrind (tested on x86). Basically, Wine needs to translate some Unix signals into exceptions (in Windows language). For lots of reason, we decided to implement this situation with: - grabbing information about the signal (sigcontext, cause of the signal - btw, on the two patches I'm talking of is about trapno information that we do need). This is done in the unix signal handler - from the unix signal handler, push that information onto the stack (thread stack, not the signal handler stack) - change EIP in the sigcontext to the address of a dedicated function of ours - when the finish the signal handler, execution will resume in the dedicated function, where we can transfer all signal information, but executed on the thread stack (not the signal stack) We had to do it for reasons dealing mainly for race issues. What doesn't currently work in valgrind is changing the EIP in the sigcontext. Current implementation of restore_sigcontext() in coregrind/m_sigframe/sigframe-x86-linux.c doesn't restore EIP into VEX: when we return from the signal handler, execution resumes at the opcode with first generated the signal. Back to square one :-/ I simply tried to set EIP back in the restore_sigcontext() function, but this leads to a crash in IR (about unsupported opcode). I suppose I'm missing something (like resetting some other elements for execution in some other places). Anything I could do about this ? TIA -- Eric Pouech |
|
From: Julian S. <js...@ac...> - 2006-02-07 01:23:57
|
Eric
> I've just been trying to make Wine run under valgrind (3.1.0). Things
> greately evolved since our last tries: we can now run lots of
> applications, and a major part of our test suite under valgrind.
> congrats folks!!
That's great. In an ideal world we could make Wine run really well
under V and then be able to use V to debug Windows apps. Adam Gundy
did this a couple of years ago, but the required changes were never
merged in.
> I've posted two patches in bugzilla to improve some parts, which
> hopefully can make for 3.2.0.
Uh ... where did you post these, exactly? Can you send a URL ?
> What doesn't currently work in valgrind is changing the EIP in the
> sigcontext. Current implementation of restore_sigcontext() in
> coregrind/m_sigframe/sigframe-x86-linux.c doesn't restore EIP into VEX:
> when we return from the signal handler, execution resumes at the opcode
> with first generated the signal. Back to square one :-/
Sun's JVM also does nasty tricks like this, and I think Tom Hughes
got the same bad-opcode result when he tried restoring EIP into vex.
I would like to fix it for Java folks too.
> I simply tried to set EIP back in the restore_sigcontext() function, but
> this leads to a crash in IR (about unsupported opcode).
Hmm. Doing that works OK at least for my simple test program
on SuSE 10 (x86):
$ vTRUNK --trace-signals=yes --tool=none ./sig_change_pc
[...]
--1237-- Max kernel-supported signal is 64
++1237++ sys_sigaction: sigNo 40, new 0xBEB86D18, old 0x0, new flags 0x4000004
before
--1237-- kill: sent signal 40 to pid 1237
--1237-- poll_signals: got signal 40 for thread 1
--1237-- Polling found signal 40 for tid 1
--1237-- delivering signal 40 (SIGRT8):0 to thread 1
--1237-- push_signal_frame (thread 1): signal 40
in handler, setting EIP to 0x8048588
returned EIP is 0x8048588
--1237-- VG_(signal_return) (thread 1): isRT=1 valid magic; EIP=0x8048588
diversion!
==1237==
Test program is below. It also works correctly if I uncomment the
two commented-out bits, so as to get a segfault instead. The relevant
valgrind diff (against current svn head, versions 5616/1570) is:
Index: coregrind/m_sigframe/sigframe-x86-linux.c
===================================================================
--- coregrind/m_sigframe/sigframe-x86-linux.c (revision 5616)
+++ coregrind/m_sigframe/sigframe-x86-linux.c (working copy)
@@ -630,8 +630,8 @@
tst->arch.vex.guest_ESI = sc->esi;
tst->arch.vex.guest_EDI = sc->edi;
//:: tst->arch.vex.guest_eflags = sc->eflags;
-//:: tst->arch.vex.guest_EIP = sc->eip;
-
+ VG_(printf)("returned EIP is %p\n", sc->eip);
+ tst->arch.vex.guest_EIP = sc->eip;
tst->arch.vex.guest_CS = sc->cs;
tst->arch.vex.guest_SS = sc->ss;
tst->arch.vex.guest_DS = sc->ds;
> I suppose I'm
> missing something (like resetting some other elements for execution in
> some other places). Anything I could do about this ?
Does this work for you? I'd like to help but obviously I need to be
able to reproduce the problem first.
J
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <assert.h>
#define __USE_GNU
#include <sys/ucontext.h>
void diversion ( void )
{
printf("diversion!\n");
exit(0);
}
void handler ( int signo, siginfo_t* info, void* ctxV )
{
ucontext_t* ctx = (ucontext_t*)ctxV;
printf("in handler, setting EIP to %p\n", (void*)&diversion);
ctx->uc_mcontext.gregs[REG_EIP] = (unsigned long)&diversion;
}
int main ( void )
{
int r;
struct sigaction act;
memset(&act, 0, sizeof(act));
act.sa_sigaction = handler;
act.sa_flags = SA_SIGINFO;
sigemptyset(&act.sa_mask);
r = sigaction(/*SIGSEGV*/40, &act, NULL);
assert(r == 0);
printf("before\n");
kill( getpid(), 40 );
/* *(int*)1 = 1; */
printf("after -- SHOULD NOT SEE THIS\n");
return 0;
}
|
|
From: Eric P. <eri...@wa...> - 2006-02-07 20:56:36
|
Julian Seward wrote: > Eric > Uh ... where did you post these, exactly? Can you send a URL ? http://bugs.kde.org/show_bug.cgi?id=120728 http://bugs.kde.org/show_bug.cgi?id=120732 >>What doesn't currently work in valgrind is changing the EIP in the >>sigcontext. Current implementation of restore_sigcontext() in >>coregrind/m_sigframe/sigframe-x86-linux.c doesn't restore EIP into VEX: >>when we return from the signal handler, execution resumes at the opcode >>with first generated the signal. Back to square one :-/ > > > Sun's JVM also does nasty tricks like this, and I think Tom Hughes > got the same bad-opcode result when he tried restoring EIP into vex. > I would like to fix it for Java folks too. > > > >>I simply tried to set EIP back in the restore_sigcontext() function, but >>this leads to a crash in IR (about unsupported opcode). > > > Hmm. Doing that works OK at least for my simple test program > on SuSE 10 (x86): > yup, your 'simple' test program works I slightly changed your test program, to something that's closer to what we do in Wine... and it doesn't work under VG :-( I left in the test code for both forms for linux signal handlers (< 2.2 and >= 2.1.48), and both fail identically however the error I get is not the as the one with real Wine (but the test framework is not the same either) #include <string.h> #include <stdio.h> #include <stdlib.h> #include <signal.h> #include <assert.h> #include <setjmp.h> #include "valgrind/memcheck.h" #define __USE_GNU #include <sys/ucontext.h> jmp_buf my_je; void diversion( int i, char* ptr ) { printf("diversion %d %s!\n", i, ptr); longjmp(my_je, i); } typedef struct { unsigned short sc_gs, __gsh; unsigned short sc_fs, __fsh; unsigned short sc_es, __esh; unsigned short sc_ds, __dsh; unsigned long sc_edi; unsigned long sc_esi; unsigned long sc_ebp; unsigned long sc_esp; unsigned long sc_ebx; unsigned long sc_edx; unsigned long sc_ecx; unsigned long sc_eax; unsigned long sc_trapno; unsigned long sc_err; unsigned long sc_eip; unsigned short sc_cs, __csh; unsigned long sc_eflags; unsigned long esp_at_signal; unsigned short sc_ss, __ssh; unsigned long i387; unsigned long oldmask; unsigned long cr2; } volatile SIGCONTEXT; struct stack_layout { void *ret_addr; int i; char *str; char buf[16]; unsigned long ebp; unsigned long eip; } *stack; void handler_old ( int signo, SIGCONTEXT __context ) { printf("in handler2, setting EIP to %p\n", (void*)&diversion); stack = (struct stack_layout*)__context.sc_esp; stack--; /* push the stack_layout structure */ VALGRIND_MAKE_WRITABLE(stack, sizeof(*stack)); stack->ret_addr = (void *)0xdeadbabe; /* raise_func must not return */ /* setting this to a reasonable value doesn't help stack->ret_addr = (unsigned long)&diversion; */ stack->i = 12; stack->str = stack->buf; strcpy(stack->buf, "foo-bar"); __context.sc_eip = (unsigned long)&diversion; __context.sc_esp = (unsigned long)stack; } void handler_new( int signo, siginfo_t* xx, void* uc) { ucontext_t* ctx = (ucontext_t*)uc; printf("in handler2, setting EIP to %p\n", (void*)&diversion); stack = (struct stack_layout*)ctx->uc_mcontext.gregs[REG_ESP]; stack--; /* push the stack_layout structure */ VALGRIND_MAKE_WRITABLE(stack, sizeof(*stack)); stack->ret_addr = (void *)0xdeadbabe; /* raise_func must not return */ /* setting this to a reasonable value doesn't help stack->ret_addr = (unsigned long)&diversion; */ stack->i = 12; stack->str = stack->buf; strcpy(stack->buf, "foo-bar"); ctx->uc_mcontext.gregs[REG_EIP] = (unsigned long)&diversion; ctx->uc_mcontext.gregs[REG_ESP] = (unsigned long)stack; } int main ( void ) { int r; struct sigaction act; memset(&act, 0, sizeof(act)); #if 0 act.sa_handler = (void (*)())handler_old; act.sa_flags = SA_RESTART | SA_ONSTACK; #else act.sa_sigaction = (void (*)())handler_new; act.sa_flags = SA_RESTART | SA_ONSTACK | SA_SIGINFO; #endif sigemptyset(&act.sa_mask); r = sigaction(SIGSEGV, &act, NULL); assert(r == 0); if ((r = setjmp(my_je)) == 0) { printf("before\n"); *(int*)1 = 1; } printf("after %d\n", r); return 0; } -- Eric Pouech |
|
From: Julian S. <js...@ac...> - 2006-02-11 02:15:23
|
In fact on further study I don't really understand how this is supposed
to work (safely), even natively.
> struct stack_layout
> {
> void *ret_addr;
> int i;
> char *str;
> char buf[16];
> unsigned long ebp;
> unsigned long eip;
> };
>
> void handler_new( int signo, siginfo_t* xx, void* uc)
> {
> ucontext_t* ctx = (ucontext_t*)uc;
> struct stack_layout* stack;
> printf("in handler2, setting EIP to %p\n", (void*)&diversion);
> stack = (struct stack_layout*)ctx->uc_mcontext.gregs[REG_ESP];
>
> stack--; /* push the stack_layout structure */
> VALGRIND_MAKE_WRITABLE(stack, sizeof(*stack));
> stack->ret_addr = (void *)0xdeadbabe; /* raise_func must not
> return */
> /* setting this to a reasonable value doesn't help
> stack->ret_addr = (unsigned long)&diversion;
> */
> stack->i = 12;
> stack->str = stack->buf;
> strcpy(stack->buf, "foo-bar");
>
> ctx->uc_mcontext.gregs[REG_EIP] = (unsigned long)&diversion;
> ctx->uc_mcontext.gregs[REG_ESP] = (unsigned long)stack;
> }
My understanding is:
- in main, the segfault happens
- kernel pushes a signal frame on the stack, saving the machine state
in it, and enters handler_new
- handler_new finds out what %esp was at the time of the fault
(stack = ctx->uc_mcontext.gregs[REG_ESP]).
My picture of the stack is now
------------
frame for main()
------------ <--- "stack"
kernel-constructed sigframe ...
...
kernel-constructed sigframe ...
- You then do "stack--", which moves "stack" down one frame unit. Now
it overlaps with the kernel-constructed sigframe.
- You write stuff in *stack, trashing part of the kernel-constructed frame.
- You set ctx->uc_mcontext.gregs[REG_EIP] and REG_ESP.
- handler_new returns. The machine state is restored from the partially
corrupted kernel-constructed sigframe. Execution resumes in diversion()
with the stack looking like this:
------------
frame for main()
------------
struct stack_layout
------------ <------- %esp
How do you know it is OK to overwrite the top of the
kernel-constructed signal frame with your struct stack_layout?
I suspect (if my analysis is right) that this could be a cause
of problems with V. V's signal delivery frames look different from
the kernel's one as they contain more information, and trashing
the top part of it will likely cause problems.
A perhaps safer approach is:
In the handler, do not write anything onto the stack and do not
change ctx->uc_mcontext.gregs[REG_ESP]. Instead copy all info you
need to construct the struct stack_layout, including the values
of ctx->uc_mcontext.gregs[REG_ESP] and [REG_EIP], to some safe place
(thread-local storage?). Set ctx->uc_mcontext.gregs[REG_EIP] to
point to a handwritten assembly function, and let the handler return.
You are now in your handwritten assembly function. Using the saved
info, construct the struct stack_layout, and jump the the EIP noted
in the saved info.
This relies on the ideas that
- overwriting the top end of the signal frame is likely to kill valgrind
- changing %ESP inside the signal handler is likely to cause memcheck to
emit lots of bogus messages, and these may be difficult to get rid of
(changing %ESP is really asking for trouble from memcheck :-)
My proposal avoids both problems.
J
|
>>struct stack_layout
>>{
>> void *ret_addr;
>> int i;
>> char *str;
>> char buf[16];
>> unsigned long ebp;
>> unsigned long eip;
>>};
>>
>>void handler_new( int signo, siginfo_t* xx, void* uc)
>>{
>> ucontext_t* ctx = (ucontext_t*)uc;
>> struct stack_layout* stack;
>> printf("in handler2, setting EIP to %p\n", (void*)&diversion);
>> stack = (struct stack_layout*)ctx->uc_mcontext.gregs[REG_ESP];
>>
>> stack--; /* push the stack_layout structure */
[snip]
> How do you know it is OK to overwrite the top of the
> kernel-constructed signal frame with your struct stack_layout?
It's **very** dirty, but it looks like it does "work" on recent Linux.
Look in linux-2.6.15/arch/i386/kernel/sigframe.h:
struct rt_sigframe
{
char __user *pretcode;
int sig;
struct siginfo __user *pinfo;
void __user *puc;
struct siginfo info;
struct ucontext uc;
struct _fpstate fpstate;
char retcode[8];
};
and in linux-2.6.15/include/asm-i386/sigcontext.h:
struct _fpstate {
/* Regular FPU environment */
unsigned long cw;
unsigned long sw;
unsigned long tag;
unsigned long ipoff;
unsigned long cssel;
unsigned long dataoff;
unsigned long datasel;
struct _fpreg _st[8];
unsigned short status;
unsigned short magic; /* 0xffff = regular FPU data only */
/* FXSR FPU environment */
unsigned long _fxsr_env[6]; /* FXSR FPU env is ignored */
unsigned long mxcsr;
unsigned long reserved;
struct _fpxreg _fxsr_st[8]; /* FXSR FPU reg data is ignored */
struct _xmmreg _xmm[8];
unsigned long padding[56];
};
and the comment in linux-2.6.15/arch/i386/kernel/signal.c:
/*
* This is movl $,%eax ; int $0x80
*
* WE DO NOT USE IT ANY MORE! It's only left here for historical
* reasons and because gdb uses it as a signature to notice
* signal handler stack frames.
*/
err |= __put_user(0xb8, (char __user *)(frame->retcode+0));
err |= __put_user(__NR_rt_sigreturn, (int __user *)(frame->retcode+1));
err |= __put_user(0x80cd, (short __user *)(frame->retcode+5));
So the 8 bytes of retcode is "available", as is the 4*56 bytes of
_fpstate.padding, assuming the space for the FXSR FPU environment
really is present. If so, then no "important" data gets overwritten
because sizeof(struct stack_layout) < (8 + 4*56) .
If the kernel actually pushes only struct sigframe (not struct rt_sigframe),
then you're much more likely to be in trouble:
struct sigframe
{
char __user *pretcode;
int sig;
struct sigcontext sc;
struct _fpstate fpstate;
unsigned long extramask[_NSIG_WORDS-1];
char retcode[8];
};
where _NSIG_WORDS equals 2. extramask[0] holds the bits for
the second (and last) group of 32 signals, and will be overwritten.
This can lead to _extremely_ hard-to-diagnose random behavior.
Notice that struct sigframe contains no siginfo; SA_SIGINFO was
omitted when asking the kernel to establish the handler.
This report:
rt_sigframe and vDSO inhibit virtualization of signal handling
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=180351
might be of interest. Among other things: with the latest glibc-2.3.90
and Fedora Core kernel vDSO, then it is hard to avoid a race if you're
mucking with the frame when a pthread_cond_wait() gets canceled.
[snip]
> - changing %ESP inside the signal handler is likely to cause memcheck to
> emit lots of bogus messages, and these may be difficult to get rid of
> (changing %ESP is really asking for trouble from memcheck :-)
Any emulator must give special consideration to return from signal,
including noticing when %esp changes "unexpectedly."
--
|
|
From: Eric P. <eri...@wa...> - 2006-02-11 20:23:25
|
> My understanding is: > > - in main, the segfault happens > > - kernel pushes a signal frame on the stack, saving the machine state > in it, and enters handler_new > > - handler_new finds out what %esp was at the time of the fault > (stack = ctx->uc_mcontext.gregs[REG_ESP]). > > My picture of the stack is now > > ------------ > frame for main() > ------------ <--- "stack" > kernel-constructed sigframe ... > ... > kernel-constructed sigframe ... > > - You then do "stack--", which moves "stack" down one frame unit. Now > it overlaps with the kernel-constructed sigframe. No... the signal handler is called on a different stack than the one the thread... (sigaction is called with SA_ONSTACK flag) We push the stack structure on the thread's stack, not the sigaltstack Furthermore, since the stack grows downwards, it's actually pushed onto the thread's stack, as if the thread had called the function itself, and doesn't trash any data on the stack. A+ -- Eric Pouech |
>> - You then do "stack--", which moves "stack" down one frame unit. Now
>> it overlaps with the kernel-constructed sigframe.
>
> No... the signal handler is called on a different stack than the one the
> thread... (sigaction is called with SA_ONSTACK flag)
The code was not clear enough; it fooled at least a couple analysts.
Add a comment at the receiving end which documents the expectations.
For example:
-----
void handler_new( int signo, siginfo_t* xx, void* uc)
{
ucontext_t* ctx = (ucontext_t*)uc;
printf("in handler2, setting EIP to %p\n", (void*)&diversion);
stack = (struct stack_layout*)ctx->uc_mcontext.gregs[REG_ESP];
stack--; /* push the stack_layout structure */
/* handler_new was established with SA_ONSTACK. So we
are on the alternate stack, while 'stack' points to
the user stack, which the kernel left undisturbed.
*/
-----
--
|
|
From: Eric P. <eri...@wa...> - 2006-02-25 09:18:16
|
A summary of where we stand for making Valgrind and Wine work well
together. The starting point is V 3.1.0 and Wine 0.9.7.
First of all (reminder) V & Wine now work (mostly) out of the box together.
Here's a list of (known) problems faced:
a/ missing support for some ioctl's (HDMA, serial). Added to V (now in
branch, at least of x86)
b/ missing support for tkill syscall. Added to V (now in branch), Wine
fixed to use the better tgkill syscall (0.9.8)
c/ some missing instrumentation of Wine (regarding stack manipulation).
Wine fixed (0.9.8)
d/ possibility to change EIP in signal handler. Added to V (now in
branch, for x86).
e/ trapno is not passed in signal handlers. A hacky patch proposed, and
rightfully rejected. Solution in sight (Tom Hughes), not yet in branch.
f/ missing instruction emulation in VEX (x86, and likely amd 64). Wine
needs:
push/pop [ds,es,ss]
iret (on 32bit intercall, cs not changed)(*)
int 3
cli/sti
(*) actually, Wine would require much more for 16bit emulation, but
I don't think it's reasonable, nor a short term goal, to use V &
Wine for 16bit code.
The push/pop code was easy to fix (it's already written). However
iret will be a bit trickier (I have a dirty hack for iret and int 3,
but likely Julian would rather do it himself ;-). We can try to get
rid of the cli/sti pair.
g/ missing stack backtrace information
As already explained, the first thread of a process is handled
as follows: it starts using the stack created from the process
creation (from Unix). As the process may require a larger stack
(from the Windows definition of the PE header), the Wine loader
always create a second stack at the loader startup, and switch
execution to that second stack when calling the PE module entry
point.
The outcome of this, from a V point of view, is that we get two
stacks (values are important):
- first stack at 0xBE??????
- second stack at 0x20?????? (ie below the first one).
The second stack is declared to V by Wine.
When V needs to print a backtrace (for any issue), in
get_StackTrace2 there's a test:
if (fp_min + VG_(clo_max_stackframe) <= fp_max) {
Here fp_min gets its value from the second stack (in the
0x20?????? area, and fp_max in the 0xbe?????? area). The outcome
is that no backtrace is generated :-/. Trying to remove the test
(or increase the max_stackframe value) provides some other
issues as not all the area in the 0x20????? - 0xbe000000 is
mapped, hence having a some crashes.
This one is a bit more hairy to fix. I have a patch that mixes
the max_stackframe test, with some extra test about the stacks
that are known by V (and checking that info for the backtrace
really points to the stack). But since V doesn't make use of
this information, it must be for good reasons.
The net result of this is:
- because of g/, we cannot get a full backtrace of an issue found by V.
Currently, we disabled the offending test in get_StackTrace2 pointed out
in g. With the listed side effects (crashes), and we'd like to have
something working out of the box (from the V branches)
- because of e/ and f/, we cannot use Wine and V on programs triggering
exceptions (the missing parts in e/ and f/ are only used in those
occasions).
I'd really like to see e/, f/ and g/ fixed. Let me know if you need some
help.
The good side of the story:
- we ran the regression tests (from the Wine regression test suite) on 5
DLLs under V, and fixed something like 15 potential issues and bugs
(even if some tests don't work under V yet)
- that's worth going further and run the full Wine regression tests under V.
A+
--
Eric Pouech
|
|
From: Julian S. <js...@ac...> - 2006-02-11 01:31:18
|
> I slightly changed your test program, to something that's closer to what
> we do in Wine... and it doesn't work under VG :-(
Wow. That's a really ugly hack :-)
> I left in the test code for both forms for linux signal handlers (< 2.2
> and >= 2.1.48), and both fail identically
I can't make it fail using the current svn head + the 1-line patch to
restore EIP after the handler returns:
Index: coregrind/m_sigframe/sigframe-x86-linux.c
===================================================================
--- coregrind/m_sigframe/sigframe-x86-linux.c (revision 5629)
+++ coregrind/m_sigframe/sigframe-x86-linux.c (working copy)
@@ -630,8 +630,7 @@
tst->arch.vex.guest_ESI = sc->esi;
tst->arch.vex.guest_EDI = sc->edi;
//:: tst->arch.vex.guest_eflags = sc->eflags;
-//:: tst->arch.vex.guest_EIP = sc->eip;
-
+ tst->arch.vex.guest_EIP = sc->eip;
tst->arch.vex.guest_CS = sc->cs;
tst->arch.vex.guest_SS = sc->ss;
tst->arch.vex.guest_DS = sc->ds;
Here's what I get:
sewardj@suse10:~/VgTRUNK/trunk$ uname -a
Linux suse10 2.6.13-15-default #1 Tue Sep 13 14:56:15 UTC 2005 i686 i686
i386 GNU/Linux
sewardj@suse10:~/VgTRUNK/trunk$ gcc -g -Wall -o gruesome gruesome.c
-I./Inst/include
sewardj@suse10:~/VgTRUNK/trunk$ ./gruesome
before
in handler2, setting EIP to 0x8048608
diversion 12 foo-bar!
after 12
sewardj@suse10:~/VgTRUNK/trunk$ ./Inst/bin/valgrind --tool=none ./gruesome
==7115== Nulgrind, a binary JIT-compiler
==7115== Copyright (C) 2002-2005, and GNU GPL'd, by Nicholas Nethercote.
[...]
==7115==
before
in handler2, setting EIP to 0x8048608
diversion 12 foo-bar!
after 12
==7115==
Admittedly with --tool=memcheck there are a bunch of bogus-looking
error reports, but it still runs and gets the right answer.
This is SuSE 10.0 on x86.
When you say "valgrind fails", exactly what happens?
J
|
|
From: Eric P. <eri...@wa...> - 2006-02-11 20:33:28
|
Julian Seward wrote:
>>I slightly changed your test program, to something that's closer to what
>>we do in Wine... and it doesn't work under VG :-(
> Wow. That's a really ugly hack :-)
thanks ;-)
> When you say "valgrind fails", exactly what happens?
test program compiled with the old fashion(linux < 2.4 signal frame)
(NB: using the new rt-signal leads to the same message)
[eric@calliope ~]$ valgrind --tool=none ./vgt
==14133== Nulgrind, a binary JIT-compiler.
==14133== Copyright (C) 2002-2005, and GNU GPL'd, by Nicholas Nethercote.
==14133== Using LibVEX rev 1471, a library for dynamic binary translation.
==14133== Copyright (C) 2004-2005, and GNU GPL'd, by OpenWorks LLP.
==14133== Using valgrind-3.1.0, a dynamic binary instrumentation framework.
==14133== Copyright (C) 2000-2005, and GNU GPL'd, by Julian Seward et al.
==14133== For more details, rerun with: -v
==14133==
before
in handler2, setting EIP to 0x804859e
==14133==
==14133== Process terminating with default action of signal 11 (SIGSEGV)
==14133== Access not within mapped region at address 0xDEADBAC2
==14133== at 0x4063E55: vfprintf (in /lib/tls/libc-2.3.4.so)
==14133== by 0x406C3CF: printf (in /lib/tls/libc-2.3.4.so)
==14133== by 0x80485E1: handler_old (vgt.c:58)
==14133== by 0x40517D7: (within /lib/tls/libc-2.3.4.so)
==14133== by 0x406C3CF: printf (in /lib/tls/libc-2.3.4.so)
==14133== by 0x80485B6: diversion (vgt.c:16)
==14133== by 0xDEADBABD: ???
==14133== by 0x403EE4A: __libc_start_main (in /lib/tls/libc-2.3.4.so)
--14133-- INTERNAL ERROR: Valgrind received a signal 11 (SIGSEGV) - exiting
--14133-- si_code=1; Faulting address: 0xDEADBAB6; sp: 0x62511898
valgrind: the 'impossible' happened:
Killed by fatal signal
==14133== at 0xB0005F4D: vgPlain_arena_free (m_mallocfree.c:181)
==14133== by 0xB0033DC5: free_LDT_or_GDT (syswrap-x86-linux.c:448)
==14133== by 0xB0033EEF: deallocate_LGDTs_for_thread
(syswrap-x86-linux.c:489)
==14133== by 0xB003434C: vgPlain_cleanup_thread (syswrap-x86-linux.c:711)
==14133== by 0xB001C48E: mostly_clear_thread_record (scheduler.c:478)
==14133== by 0xB001BD2E: vgPlain_exit_thread (scheduler.c:258)
==14133== by 0xB0003DF2: shutdown_actions_NORETURN (m_main.c:2581)
==14133== by 0xB002C301: run_a_thread_NORETURN (syswrap-linux.c:138)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable
==14133== at 0x4063E55: vfprintf (in /lib/tls/libc-2.3.4.so)
==14133== by 0x406C3CF: printf (in /lib/tls/libc-2.3.4.so)
==14133== by 0x80485E1: handler_old (vgt.c:58)
==14133== by 0x40517D7: (within /lib/tls/libc-2.3.4.so)
==14133== by 0x406C3CF: printf (in /lib/tls/libc-2.3.4.so)
==14133== by 0x80485B6: diversion (vgt.c:16)
==14133== by 0xDEADBABD: ???
==14133== by 0x403EE4A: __libc_start_main (in /lib/tls/libc-2.3.4.so)
A+
--
Eric Pouech
|
|
From: Julian S. <js...@ac...> - 2006-02-12 01:14:19
|
> > When you say "valgrind fails", exactly what happens? > > ==14133== Using valgrind-3.1.0, a dynamic binary instrumentation framework. Ah. 3.1.0. That wasn't clear from before. Sure, I can repro the crash with 3.1.0, but I guess something or other got fixed after 3.1.0, since the program runs OK with the current trunk (pre 3.2.0) sources, as I said. So I suggest you dump 3.1.0 and check out and build the trunk, which is very easy: see http://www.valgrind.org/downloads/repository.html and let us know if you find any problems with that. J |
|
From: Eric P. <eri...@wa...> - 2006-02-13 09:29:26
|
Julian Seward wrote: >>>When you say "valgrind fails", exactly what happens? >> >>==14133== Using valgrind-3.1.0, a dynamic binary instrumentation framework. > > > Ah. 3.1.0. That wasn't clear from before. Sure, I can repro the crash > with 3.1.0, but I guess something or other got fixed after 3.1.0, since > the program runs OK with the current trunk (pre 3.2.0) sources, as I said. > > So I suggest you dump 3.1.0 and check out and build the trunk, which is > very easy: see http://www.valgrind.org/downloads/repository.html > and let us know if you find any problems with that. with svn trunk things are way better ;-) But, not yet 100% correct. Basically, with the attached patches, Wine works (almost - see below) correctly, even for SEH: 1/ trapno in sigframe (vg-trapno.diff) (already posted in the bug database, but enhanced from previous post). This patch sets the trapno field in sigframe structure. 2/ handling ESP change in sig handler (vg-sig.diff) This patch tries to fetch the cases where a sig handler changes ESP in ucontext, and lets V know about it. Ok, it's a bit black magic, but it prevents a tons of warnings. I only wrote it for old signal frame signatures, but it's rather straightforward to implement it to RT signal handlers. 3/ disabled support for tkill (vg-tkill.diff) This patch (partly) reenables tkill syscall under x86-linux. Basically, Wine is made to run either under (NPTL) pthread support, or old kernel thread support. We use tkill, which is transparent whatever the thread system we're running on (pthread or kthread). 4/ missing opcodes in X86 emulation (vg-vex.diff) Wine uses some implemented opcodes (push / pop of cs, ds, es, fs, and gs), as well as iret. The first ones were quite simple to implement. For iret, I put something together, which is wrong from emulation point of view: - eflags is not correctly restored, I simply copied the popf stuff - it doesn't restore cs With those patches applied to V, I can now run simply SEH programs in Wine under memcheck. However, this still generates some warnings (invalid read/write of size 4), and for some of them are in code reading/writing values onto the stack, which looks like V is still fooled with all the tricks we do in Wine. Comments welcome! A+ -- Eric Pouech |
|
From: Tom H. <to...@co...> - 2006-02-13 09:54:19
|
In message <43F...@wa...>
Eric Pouech <eri...@wa...> wrote:
> 1/ trapno in sigframe (vg-trapno.diff)
> (already posted in the bug database, but enhanced from previous
> post). This patch sets the trapno field in sigframe structure.
I did look at that bug yesterday but I'm not a big fan of your
solution and I'm trying to come up with a better approach.
How accurate is the mapping you provided - given that you are trying
to go backwards from signal number/reason to trap number is that
actually a reversible mapping? or does the kernel map multiple trap
numbers to a single signal+reason combination in some cases?
> 2/ handling ESP change in sig handler (vg-sig.diff)
> This patch tries to fetch the cases where a sig handler changes ESP in
> ucontext, and lets V know about it. Ok, it's a bit black magic, but it
> prevents a tons of warnings. I only wrote it for old signal frame
> signatures, but it's rather straightforward to implement it to RT
> signal handlers.
Can you put this on the bug tracker so we don't lose it please.
> 3/ disabled support for tkill (vg-tkill.diff)
> This patch (partly) reenables tkill syscall under
> x86-linux. Basically, Wine is made to run either under (NPTL) pthread
> support, or old kernel thread support. We use tkill, which is
> transparent whatever the thread system we're running on (pthread or
> kthread).
I think I've got your post in valgrind-users about this marked to
be looked at when I get a chance, but once again putting it on the
bug tracker is a better idea.
I will look at this, but I believe tkill is deprecated as it is
potentially dangerous and that tgkill is the preferred call now
where it is available as it ensures that you can only signal your
own threads.
> 4/ missing opcodes in X86 emulation (vg-vex.diff)
> Wine uses some implemented opcodes (push / pop of cs, ds, es, fs, and
> gs), as well as iret. The first ones were quite simple to
> implement. For iret, I put something together, which is wrong from
> emulation point of view:
> - eflags is not correctly restored, I simply copied the popf stuff
> - it doesn't restore cs
That's one for Julian.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Eric P. <eri...@wa...> - 2006-02-13 17:37:05
|
Tom Hughes wrote: > In message <43F...@wa...> > Eric Pouech <eri...@wa...> wrote: > > >>1/ trapno in sigframe (vg-trapno.diff) >>(already posted in the bug database, but enhanced from previous >>post). This patch sets the trapno field in sigframe structure. > > > I did look at that bug yesterday but I'm not a big fan of your > solution and I'm trying to come up with a better approach. > > How accurate is the mapping you provided - given that you are trying > to go backwards from signal number/reason to trap number is that > actually a reversible mapping? or does the kernel map multiple trap > numbers to a single signal+reason combination in some cases? I dunno (I do have the same thoughts about it). My point was not to make a 100% accurate patch, but at least to show the areas that needed to be improved. IMO, the best solution would be not to ask for a RT signal handler if the user doesn't ask for (there's nothing to be done here, just passing the information), but it complicates a bit the signal handling code on V side. >>2/ handling ESP change in sig handler (vg-sig.diff) >>This patch tries to fetch the cases where a sig handler changes ESP in >>ucontext, and lets V know about it. Ok, it's a bit black magic, but it >>prevents a tons of warnings. I only wrote it for old signal frame >>signatures, but it's rather straightforward to implement it to RT >>signal handlers. > > > Can you put this on the bug tracker so we don't lose it please. Done >>3/ disabled support for tkill (vg-tkill.diff) >>This patch (partly) reenables tkill syscall under >>x86-linux. Basically, Wine is made to run either under (NPTL) pthread >>support, or old kernel thread support. We use tkill, which is >>transparent whatever the thread system we're running on (pthread or >>kthread). > > > I think I've got your post in valgrind-users about this marked to > be looked at when I get a chance, but once again putting it on the > bug tracker is a better idea. Not sure I did post something to vg-users :-/ Done for putting it on the bug tracker. > I will look at this, but I believe tkill is deprecated as it is > potentially dangerous and that tgkill is the preferred call now > where it is available as it ensures that you can only signal your > own threads. Yes, but it won't work on Linux < 2.6. So, we need to keep using tkill (at least on those kernels). I'll upgrade Wine to use tgkill on >= 2.6, that's a prefered solution. A+ -- Eric Pouech |
|
From: Tom H. <to...@co...> - 2006-02-13 17:52:43
|
In message <43F...@wa...>
Eric Pouech <eri...@wa...> wrote:
> Tom Hughes wrote:
>
> > In message <43F...@wa...>
> > Eric Pouech <eri...@wa...> wrote:
> >
> > > 1/ trapno in sigframe (vg-trapno.diff)
> > > (already posted in the bug database, but enhanced from previous
> > > post). This patch sets the trapno field in sigframe structure.
> >
> > I did look at that bug yesterday but I'm not a big fan of your
> > solution and I'm trying to come up with a better approach.
> >
> > How accurate is the mapping you provided - given that you are trying
> > to go backwards from signal number/reason to trap number is that
> > actually a reversible mapping? or does the kernel map multiple trap
> > numbers to a single signal+reason combination in some cases?
>
> I dunno (I do have the same thoughts about it). My point was not to make
> a 100% accurate patch, but at least to show the areas that needed to be
> improved. IMO, the best solution would be not to ask for a RT signal
> handler if the user doesn't ask for (there's nothing to be done here,
> just passing the information), but it complicates a bit the signal
> handling code on V side.
I'm not sure I understand what you mean there - regardless of what
sort of handler valgrind asks the kernel for it should always be
presenting the right sort (rt or non-rt) of signal frame to the
application's handler.
> > > 3/ disabled support for tkill (vg-tkill.diff)
> > > This patch (partly) reenables tkill syscall under
> > > x86-linux. Basically, Wine is made to run either under (NPTL) pthread
> > > support, or old kernel thread support. We use tkill, which is
> > > transparent whatever the thread system we're running on (pthread or
> > > kthread).
> >
> > I think I've got your post in valgrind-users about this marked to
> > be looked at when I get a chance, but once again putting it on the
> > bug tracker is a better idea.
>
> Not sure I did post something to vg-users :-/
You're right, it was somebody else asking about tkill.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Eric P. <eri...@wa...> - 2006-02-13 20:25:22
|
Tom Hughes wrote: >>I dunno (I do have the same thoughts about it). My point was not to make >>a 100% accurate patch, but at least to show the areas that needed to be >>improved. IMO, the best solution would be not to ask for a RT signal >>handler if the user doesn't ask for (there's nothing to be done here, >>just passing the information), but it complicates a bit the signal >>handling code on V side. > > > I'm not sure I understand what you mean there - regardless of what > sort of handler valgrind asks the kernel for it should always be > presenting the right sort (rt or non-rt) of signal frame to the > application's handler. I was talking about the sig handler valgrinds asks for to the kernel. IMO, if we want to get the mappings 100% right from the kernel, we should do something like: - if the user asks for a non rt sig handler, vg should ask for a non rt sig handler - if the user asks for a rt sig handler, vg should ask for a rt sig handler of course, that would duplicate all the sig handler code A+ -- Eric Pouech |
|
From: Tom H. <to...@co...> - 2006-02-13 23:53:39
|
In message <43F...@wa...>
Eric Pouech <eri...@wa...> wrote:
> Tom Hughes wrote:
>
> > > I dunno (I do have the same thoughts about it). My point was not to make
> > > a 100% accurate patch, but at least to show the areas that needed to be
> > > improved. IMO, the best solution would be not to ask for a RT signal
> > > handler if the user doesn't ask for (there's nothing to be done here,
> > > just passing the information), but it complicates a bit the signal
> > > handling code on V side.
> >
> > I'm not sure I understand what you mean there - regardless of what
> > sort of handler valgrind asks the kernel for it should always be
> > presenting the right sort (rt or non-rt) of signal frame to the
> > application's handler.
>
> I was talking about the sig handler valgrinds asks for to the kernel.
> IMO, if we want to get the mappings 100% right from the kernel, we
> should do something like:
> - if the user asks for a non rt sig handler, vg should ask for a non rt
> sig handler
> - if the user asks for a rt sig handler, vg should ask for a rt sig handler
> of course, that would duplicate all the sig handler code
I understand what you are suggesting, but why do you think we need to
do it? and what does it have to do with the trap number stuff?
The rt handlers are effectvely a superset of the non-rt ones so we
can always construct a valid non-rt signal frame based on an rt
signal frames supplied to us by the kernel, which is what we do.
In other words it should be transparent to the application running
under valgrind that valgrind is always asking the kernel for rt
signal frames. You seem to think that it is not transparent for
some reason?
I think the correct solution to the trap number problem is to make
sure we propagate it from the original kernel delivered signal
frame to the one we create for the client program, but that is
slightly complicated. We would also need to fake something up when
we synthesise a signal.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Eric P. <eri...@wa...> - 2006-02-14 13:28:23
|
Tom Hughes wrote: > In message <43F...@wa...> > Eric Pouech <eri...@wa...> wrote: > > >>Tom Hughes wrote: >> >> >>>>I dunno (I do have the same thoughts about it). My point was not to make >>>>a 100% accurate patch, but at least to show the areas that needed to be >>>>improved. IMO, the best solution would be not to ask for a RT signal >>>>handler if the user doesn't ask for (there's nothing to be done here, >>>>just passing the information), but it complicates a bit the signal >>>>handling code on V side. >>> >>>I'm not sure I understand what you mean there - regardless of what >>>sort of handler valgrind asks the kernel for it should always be >>>presenting the right sort (rt or non-rt) of signal frame to the >>>application's handler. >> >>I was talking about the sig handler valgrinds asks for to the kernel. >>IMO, if we want to get the mappings 100% right from the kernel, we >>should do something like: >>- if the user asks for a non rt sig handler, vg should ask for a non rt >>sig handler >>- if the user asks for a rt sig handler, vg should ask for a rt sig handler >>of course, that would duplicate all the sig handler code > > > I understand what you are suggesting, but why do you think we need to > do it? and what does it have to do with the trap number stuff? > > The rt handlers are effectvely a superset of the non-rt ones so we > can always construct a valid non-rt signal frame based on an rt > signal frames supplied to us by the kernel, which is what we do. > > In other words it should be transparent to the application running > under valgrind that valgrind is always asking the kernel for rt > signal frames. You seem to think that it is not transparent for > some reason? > > I think the correct solution to the trap number problem is to make > sure we propagate it from the original kernel delivered signal > frame to the one we create for the client program, but that is > slightly complicated. We would also need to fake something up when > we synthesise a signal. I didn't see any place where you get the trap-no from a rt signal handler on x86, or did I miss something ? A+ -- Eric Pouech |
|
From: Tom H. <to...@co...> - 2006-02-14 14:57:40
|
In message <43F...@wa...>
Eric Pouech <eri...@wa...> wrote:
> Tom Hughes wrote:
>
> > I think the correct solution to the trap number problem is to make
> > sure we propagate it from the original kernel delivered signal
> > frame to the one we create for the client program, but that is
> > slightly complicated. We would also need to fake something up when
> > we synthesise a signal.
>
> I didn't see any place where you get the trap-no from a rt signal
> handler on x86, or did I miss something ?
Well that's the point... The path for a normal signal that originates
from the kernel is like this:
- signal delivered to sync_signalhandler or async_signalhandler
- at this point we have a ucontext with a trapno in it
- signal passed to deliver_signal
- here the ucontext is lost as it is not passed in
- push_signal_frame called to construct frame for client program
- platform specific VG_(sigframe_create) called to construct frame
- build_rt_sigframe or build_sigframe called
- synth_ucontext called
- here we need to store the trap number in the created ucontext
What you proposed is that synth_ucontext should try and guess the
trap number from the signal number and code.
I am suggesting that the original kernel ucontext should be propagated
down through that call stack so that synth_ucontext can take the
original kernel supplied trap number from it which is guaranteed to
be accurate.
The only problem is that deliver_signal is called from other places
like synth_fault_common, synth_sigill and synth_sigtrap which would
have to invent a ucontext with an appropriate trap number or something.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Eric P. <eri...@wa...> - 2006-02-14 20:44:32
|
> Well that's the point... The path for a normal signal that originates > from the kernel is like this: > > - signal delivered to sync_signalhandler or async_signalhandler > - at this point we have a ucontext with a trapno in it > > - signal passed to deliver_signal > - here the ucontext is lost as it is not passed in > > - push_signal_frame called to construct frame for client program > > - platform specific VG_(sigframe_create) called to construct frame > > - build_rt_sigframe or build_sigframe called > > - synth_ucontext called > - here we need to store the trap number in the created ucontext > > What you proposed is that synth_ucontext should try and guess the > trap number from the signal number and code. > > I am suggesting that the original kernel ucontext should be propagated > down through that call stack so that synth_ucontext can take the > original kernel supplied trap number from it which is guaranteed to > be accurate. > The only problem is that deliver_signal is called from other places > like synth_fault_common, synth_sigill and synth_sigtrap which would > have to invent a ucontext with an appropriate trap number or something. that'd be better A+ -- Eric Pouech |