From: Rick K. <rk...@nc...> - 2006-06-30 19:35:03
|
Appended is a message from Bron Nelson of SGI regarding a problem that can occur on large single-system-image Linux systems such as the SGI Altix. Note that this problem is not specific to the Altix, but a general limitation of the kernel, as mentioned below. I'm passing this along to this list to alert people of potential problems and also to mention that a workaround should be available in the future. I believe that the problem is related to the timeslice frequency for multiplexing within the PAPI library. By default it is 10us - this is an interval that is currently hardcoded into PAPI. The PAPI developers indicated today that a feature addition to PAPI that would allow user-selectable timeslices should appear in a later release. This in turn would allow PerfSuite to use longer timeslices to avoid pummeling the kernel with interrupts from large numbers of threads and causing system lockup as reported below. Many thanks to Bron Nelson of SGI and Phil Mucci/Dan Terpstra of PAPI/ICL for reporting and response! Rick ---------- Forwarded message ---------- Date: Thu, 29 Jun 2006 16:26:12 -0700 From: Bron Nelson <br...@br...> Subject: perfsuite locking up a 512 Altix We recently saw an instance of the use of "perfsuite" locking up one of the 512p Altix machines at Nasa Ames. The problem appears to be a generic limitation of the Linux community kernel: interrupts can only be delivered at a certain rate, and attempts to deliver signals a rates exceeding that limit cause the kernel to lock. We have generally found that doing 10ms sampling on a 64p job is pretty much right on the edge of what Linux is capable of handling. I would like to suggest/request that perfsuite automatically "dial-down" the rate at which it samples for larger user jobs (e.g. normal rate up to 32p, half the normal rate for 33-64p, a quarter the rater for 65-128p, etc.). While addmittedly this is a bandage, not a cure, the likelyhood of a major overhaul of the Linux signal handling code to fix the underlying problem seems vanishingly small. -- Bron Campbell Nelson br...@sg... These statements are my own, not those of Silicon Graphics. "I tremble for my country when I reflect that God is just." Thomas Jefferson |