[LaunchMON-devel] BGP functional port
Brought to you by:
dongahn
From: Dong H. A. <ah...@ll...> - 2009-03-17 16:36:52
|
All, I committed all modifications I've made for BGP functional port into the 0.7 branch, which I validated on Dawn upto 36K CN with 147,456 MPI tasks. During testing, I found a performance/scalability problem with BG RM and am trying to leverage SWL testing to address it, we will see. (IBM PMR number and description below). It may also be interesting to see RM performance signatures towards extreme scales for other high end systems like Crays'... Dong ------- Hi Dong, FYI, this BG/P scaling defect is being tracked via PMR 64040, 49R. Best regards Paul -----Original Message----- From: Adam Bertsch [mailto:ad...@ll...] Sent: Monday, March 16, 2009 15:53 To: Paul Szepietowski Subject: another dawn PMR Dong Ahn reports: All, I've just completed LaunchMON testing on Dawn, and had success for functionality validation all the way to 36K CN (fully loaded). This exercise is to validate SOW items 5.7.1.5, 5.7.1.6, and a portion of 5.7.1.4. With the provided efix, I am happy with the correctness aspect of system components, but not quite happy with the performance aspect, in particular towards extreme scale. (high performance variations observed at lower scales aside). My reading of the logs suggest that the bottleneck for daemon launching is in BGP control system's collecting, generating and distributing large proctable among its distributed components. Following are the overheads for LaunchMON's launchAndSpawnDaemons service along with the control system's proctable handling overheads. #CN #dmons tasks service proctab 16384 128 65536 201 secs 156 secs (78%) 20480 160 81920 215 secs 200 secs (93%) 32768 256 131072 515 secs 483 secs (94%) 36864 288 147456 653 secs 631 secs (97%) If nothing else, this perf signature doesn't comply with the scalability requirements in 5.7.1.5: the proctab handling performance almost looks to me as a polynomial. -------- 5.7.1.5 -------- For example, daemon launch time shall vary by no more than the log of the daemon count. Similarly job launch time under the control of a CDT shall vary by no more than the log of the MPI task count. -------- <CUT> |