#147 blas-atlas-3.9.21 segfault in tuning - negative time delta

Developer
closed-fixed
5
2013-10-03
2010-02-26
Ed Catmur
No

Downstream: https://bugs.gentoo.org/show_bug.cgi?id=303185

2300 make[3]: *** [res/dR1SUMM] Segmentation fault
make[3]: Leaving directory
`/scratch/tmp/portage/sci-libs/blas-atlas-3.9.21/work/ATLAS/gentoo-build/tune/blas/ger'
make[2]: ***
[/scratch/tmp/portage/sci-libs/blas-atlas-3.9.21/work/ATLAS/gentoo-build/tune/blas/ger/res/dR1SUMM]
Error 2

Cause of segfault is that if r1time() calls time00() (implementation is
ATL_cputime, on gentoo x86) for the start and end times within the
resolution of the getrusage() clock, it is possible for it to return
different values (presumably due to binary/decimal rounding errors) and
thus the time delta can be calculated as a very small negative number,
and the mflops a very large negative number.

If this happens enough that the median mflops calculated are negative,
so TimeR1Kernel returns negative, then TimeAllKernelsForContext will
return NULL as no kernel will be found that has mflops greater than the
initial value of 0.

The tuning code unconditionally dereferences the return of
TimeAllKernelsForContext, and segfaults.

My proposed solution is to coerce the time delta calculated in r1time()
to be nonnegative. Will attach proposed patch.

Discussion

  • Ed Catmur
    Ed Catmur
    2010-02-26

    zerotime.patch

     
    Attachments
  • Ed,

    Thank you very much for this detailed bug report. I'm guessing I never see this error because I mostly install on unloaded systems where I can use walltime, so this never happens.

    I'll have to look at the problem, but one solution is to keep rerunning the timing with greater reps until a positive return value is achieved; if we accept things that are essentially below clock resolution, ATLAS's decisions will all be arbitrary . . .

    I'm in the middle of a bunch of new development, so it may take me a bit to get this fixed. If your machine is not heavily loaded, I recommend you use the -D c -DPentiumCPS=Mhz flag discussed in the install guide to get cycle accurate walltimes in the meantime.

    Thank you very much for this bug report,
    Clint

     
    • labels: 360151 -->
    • assigned_to: nobody --> rwhaley
     
    • labels: --> Install problem
    • milestone: --> Developer
    • status: open --> open-accepted
     
    • status: open-accepted --> open-fixed
     
  • I believe this should be fixed in 3.9.33. Can you confirm?

    Thanks,
    Clint

     
    • status: open-fixed --> closed-fixed
     
  • very old, closing