|
From: Julian S. <js...@ac...> - 2011-11-08 12:55:48
|
On Tuesday, November 08, 2011, Christian Borntraeger wrote:
> Julian, do you know if the reason for the high retry value of 2000 is still
> valid?
2000 seemed to work well with various MPI libraries running on
late model Pentium 4 and early Opterons, IIRC. This was some
years ago. The MPI libraries poke bits of hardware in high
performance network cards that are mapped into user space. It
might have been for the Quadrics cards.
I'm not sure that we can decide on any number that works well in
all circumstances.
I'd prefer to try and fix this problem by first, adding
memory fences in the regtests that do inter-thread memory access
without synchronisation. This would be simple if we make a
function to force a complete load and store fence on all platforms,
eg
void complete_mem_fence ( void )
{
#if defined(VGA_x86)
__asm__ __volatile__("mfence");
#elif defined(VGA_s390x)
// equivalent on s390,
etc
etc
}
and use it consistently in the tests that require it.
Without proper fencing, we don't have any assurance that such
test programs will finish in finite time even when running natively.
At least if they are properly fenced, then any failure to terminate
must be caused only by Valgrind's games with thread scheduling.
Then, if that doesn't help, we might also need to use the fair-scheduling
stuff that Bart has been working on.
J
|