|
From: Stephen M.
|
I've recently tried using callgrind and the self-hosting features of recent Valgrind versions in order to profile our Valgrind-based tracing and dynamic analysis tools (http://pag.csail.mit.edu/fjalar/). So far, this seems to work pretty well. However, I ran into one problem with an error coming from the following code in dispatch-x86-linux.S: /* We're leaving. Check that nobody messed with %mxcsr or %fpucw. We can't mess with %eax here as it holds the tentative return value, but any other is OK. */ #if !defined(ENABLE_INNER) /* This check fails for self-hosting, so skip in that case */ pushl $0 fstcw (%esp) cmpl $0x027F, (%esp) popl %esi /* get rid of the word without trashing %eflags */ jnz invariant_violation #endif cmpl $0, VG_(machine_x86_have_mxcsr) jz L2 pushl $0 stmxcsr (%esp) andl $0xFFFFFFC0, (%esp) /* mask out status flags */ cmpl $0x1F80, (%esp) popl %esi jnz invariant_violation L2: /* otherwise we're OK */ jmp run_innerloop_exit_REALLY (the eventual error message is: valgrind: m_scheduler/scheduler.c:994 (vgPlain_scheduler): the 'impossible' happened. valgrind: VG_(scheduler), phase 3: run_innerloop detected host state invariant failure ) Since the %fpucw check is commented out, the problems must have been coming from the %mxcsr check, and indeed when I ifdef'ed that out too, things seemed to run fine. I can't say I understand why the %fpucw check fails when you're self-hosting, but it seems at least plausible that a similar issue affects the %mxcsr check. I'm not sure what it is about our tool that tickles this problem; I didn't see similar failures with a memcheck built from a pristine valgrind source (our tool is based in part on memcheck). One potentially significant difference is that we still link with glibc (I get the impression that's discouraged, but we're loathe to give up fprintf() and friends), now statically. The computer I'm seeing this on is an older Athlon that I'm not sure even has an MXCSR register, if that makes a difference. Is ifdeffing out the second check the right fix? -- Stephen |
|
From: Nicholas N. <nj...@cs...> - 2006-02-25 01:45:48
|
On Fri, 24 Feb 2006, Stephen McCamant wrote: > One potentially significant difference is that we still > link with glibc (I get the impression that's discouraged, but we're > loathe to give up fprintf() and friends), now statically. What does fprintf() do that you need that VG_(printf)() doesn't? I have no idea about the MXCSR question... Nick |
|
From: Stephen M.
|
>>>>> "NN" == Nicholas Nethercote <nj...@cs...> writes: NN> On Fri, 24 Feb 2006, Stephen McCamant wrote: SMcC> One potentially significant difference is that we still link SMcC> with glibc (I get the impression that's discouraged, but we're SMcC> loathe to give up fprintf() and friends), now statically. NN> What does fprintf() do that you need that VG_(printf)() doesn't? The usual, printing to file descriptors other than 2. So, you might ask, what does fprintf() do that VG_(sprintf)() and VG_(write)() don't? Buffering, though we haven't actually measured the speed difference. In general, there's nothing we're currently getting from glibc that couldn't be reimplemented on top of the primitives Valgrind provides, and we'd like to go in that direction eventually, but we're waiting for a reason that makes it worth the effort, and we haven't found one yet. In fact, I just checked now the list of symbols we're using from glibc (shown below). It's even longer than I'd remembered, and it includes a number of functions that have perfectly good VG_()() versions. (Many of them probably come from the readelf code that we've assimilated without much editing.) _IO_putc __assert_fail __ctype_b_loc __errno_location _exit abort atoi calloc close dup dup2 execv fclose fcntl fdopen fflush fgets fopen fopen64 fork fprintf fputc fputs fread free fscanf fseek ftell fwrite getenv getopt_long gmtime mkdir mkfifo open64 optarg pipe printf putchar puts remove setvbuf sprintf stat stderr stdout strerror strncpy strstr strtok strtoul tfind tsearch vfprintf waitpid -- Stephen |
|
From: Julian S. <js...@ac...> - 2006-02-27 18:15:07
|
> I can't say I understand why the %fpucw check fails when you're > self-hosting, but it seems at least plausible that a similar issue > affects the %mxcsr check. I'm not sure what it is about our tool that > tickles this problem; I didn't see similar failures with a memcheck > built from a pristine valgrind source Check that the helper functions your tool calls do not mess with mxcsr. Technically it is caller saved, but saving it across all helper function calls would be expensive and so vex doesn't. > (our tool is based in part on > memcheck). One potentially significant difference is that we still > link with glibc (I get the impression that's discouraged, Very much so. We too have been loathe to lose glibc support, but after a long and somewhat confusing discussion some months ago, it became clear that linking in glibc opens the possibility of various kinds of bad interactions between glibc and V, since V needs to control various kinds of resources (address space layout, signal state) but glibc believes that it is the sole owner of such resources. Getting rid of glibc was therefore a step forward from both a stability and portability standpoint. I would not be surprised to find that your use of glibc causes V to bomb with an assertion failure when running large programs with the flag --sanity-level=3 or above. J |
|
From: Stephen M.
|
>>>>> "JS" == Julian Seward <js...@ac...> writes:
SMcC> I can't say I understand why the %fpucw check fails when you're
SMcC> self-hosting, but it seems at least plausible that a similar
SMcC> issue affects the %mxcsr check. I'm not sure what it is about
SMcC> our tool that tickles this problem; I didn't see similar
SMcC> failures with a memcheck built from a pristine valgrind source
JS> Check that the helper functions your tool calls do not mess with
JS> mxcsr. Technically it is caller saved, but saving it across all
JS> helper function calls would be expensive and so vex doesn't.
I grepped for ldmxscr in the disassembly of our statically linked tool
binary (is that a good way to check?) and found two uses: one in
vgPlain_run_innerloop(), which looks like it's setting up the value
that the failing check expects to find, and one in __setfpucw(), which
is called by glibc's init(). The latter seems likely to be part of the
problem, but it doesn't seem likely to be getting called from a helper
function. I also think I remember that the failure was happening on
the very first dispatch.
I suspect there's something subtle going on here with the multiple
levels of interpretation involved in self-hosting that I'm curious to
understand but can't put my finger on. However, if this doesn't ring
any bells with you, it'll be a while before I have the time to look
into it further, given that we already have both a workaround and a
strategy for a long-term fix.
SMcC> (our tool is based in part on
SMcC> memcheck). One potentially significant difference is that we still
SMcC> link with glibc (I get the impression that's discouraged,
JS> Very much so.
[...]
JS> I would not be surprised to find that your use of glibc causes V to
JS> bomb with an assertion failure when running large programs with the
JS> flag --sanity-level=3 or above.
Indeed it does, though we hadn't noticed this in the past, since we
don't often use the sanity checks, and we hadn't seen any other
consequences of the insanity. But I guess this does bump avoiding
glibc a bit up our priority list.
-- Stephen
|