|
From: Tom H. <to...@co...> - 2006-06-20 16:05:50
|
One of my colleagues was running callgrind from the 3.2.0 release on an amd64 FC5 box and found he was encountering this assertion: valgrind: m_scheduler/scheduler.c:996 (vgPlain_scheduler): the 'impossible' happened. valgrind: VG_(scheduler), phase 3: run_innerloop detected host state invariant failure The assertion doesn't seem to fire in a consistent location - in fact sometimes the programs runs to completion. It all seems to depend on what else is running on the machine at the same time. Instrumenting the code has shown that the problem is the floating point control word which is 0x37f at the end of the inner loop instead of the expected value of 0x27f. That corresponds to changing from double precision to double extended precision. Does anybody have any suggestions where I can go from here to track this down? Is this likely to be a kernel bug with the kernel not restoring the control word properly on a context switch? That seems very unlikely somehow but it's the only scenario I've come up with so far to explain what I'm seeing. Tom -- Tom Hughes (to...@co...) http://www.compton.nu/ |
|
From: Josef W. <Jos...@gm...> - 2006-06-20 16:21:21
|
On Tuesday 20 June 2006 17:08, Tom Hughes wrote: > Instrumenting the code has shown that the problem is the floating > point control word which is 0x37f at the end of the inner loop instead > of the expected value of 0x27f. That corresponds to changing from > double precision to double extended precision. > > Does anybody have any suggestions where I can go from here to track > this down? Is this likely to be a kernel bug with the kernel not > restoring the control word properly on a context switch? That seems > very unlikely somehow but it's the only scenario I've come up with > so far to explain what I'm seeing. Hmmm... Not that I understand the issue, but it worries me: Can it theoretically be some corruption produced by callgrinds instrumentation? Probably not if this depends on the kernels scheduling... Josef |
|
From: Tom H. <to...@co...> - 2006-06-20 16:37:27
|
In message <200...@gm...>
Josef Weidendorfer <Jos...@gm...> wrote:
> On Tuesday 20 June 2006 17:08, Tom Hughes wrote:
> > Instrumenting the code has shown that the problem is the floating
> > point control word which is 0x37f at the end of the inner loop instead
> > of the expected value of 0x27f. That corresponds to changing from
> > double precision to double extended precision.
> >
> > Does anybody have any suggestions where I can go from here to track
> > this down? Is this likely to be a kernel bug with the kernel not
> > restoring the control word properly on a context switch? That seems
> > very unlikely somehow but it's the only scenario I've come up with
> > so far to explain what I'm seeing.
>
> Hmmm... Not that I understand the issue, but it worries me:
Basically before executing each chunk of translated code valgrind
sets the FPU and vector unit control registers to known values and
then when the translated code finishes executing it checks that
they are the same and asserts if they aren't.
In this case it is bits 8+9 of the FPU control word which seem to
be changing - they control the precision to which x87 instructions
operate.
> Can it theoretically be some corruption produced by callgrinds
> instrumentation?
Well that was my first thought, but the behaviour should be consistent
in that case - each run should fail, and in the same place.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|
|
From: Tom H. <to...@co...> - 2006-06-21 08:45:08
|
In message <819...@lo...>
Tom Hughes <to...@co...> wrote:
> In message <200...@gm...>
> Josef Weidendorfer <Jos...@gm...> wrote:
>
>> Can it theoretically be some corruption produced by callgrinds
>> instrumentation?
>
> Well that was my first thought, but the behaviour should be consistent
> in that case - each run should fail, and in the same place.
I've reproduced it with --tool=none now so we can rule out callgrind
having any involvement in the problem.
Tom
--
Tom Hughes (to...@co...)
http://www.compton.nu/
|