launchmon-devel Mailing List for LaunchMON
Brought to you by:
dongahn
You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
(4) |
Apr
|
May
|
Jun
|
Jul
|
Aug
(2) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
---|
From: Dong H. A. <ah...@ll...> - 2009-10-30 20:18:44
|
FYI, There appears to be a bug in a version of the NPTL thread debug library where its td_thr_get_info call (a call to provide thread-related information to the debugger. That function) has a conditional statement whose condition is determined based on an uninitialized value. Looking at its source code, this condition seem to only occur when the target thread is the main thread. So, I am working around this by not relying on this call for the main thread. I've added the work around to the main 0.7 branch. So, if you see some odd intermittent thread/process tracing errors, you might want to pick this up from the main branch. Best, Dong |
From: Dong H. A. <ah...@ll...> - 2009-10-30 02:01:13
|
Hi Jim and Ramya, I spent some time today to commit some recent changes to the launchmon 0.7 dev branch today which include the jobsnap BGP port. And I took this opportunity to create a branch for each of you so that you can grow that branch with your openRTE port and Cray XT port work. Jim: yours is launchmon-0.7-JimG and you should be able to check out a working copy as %svn co https://launchmon.svn.sourceforge.net/svnroot/launchmon/branches/launchmon-0.7-JimG launchmon Ramya: yours is launchmon-0.7-Ramya-CrayXT %svn co https://launchmon.svn.sourceforge.net/svnroot/launchmon/branches/launchmon-0.7-Ramya-CrayXT launchmon Once you feel the port is stable enough to begin merging the branches to the main 0.7, let me know. We should probably want to target sometime before this year to release 0.7 ... hopefully. Best, Dong |
From: Todd G. <tga...@gm...> - 2009-08-28 15:56:48
|
Thanks, Dong! This works for me and compiles with STAT w/o conflicts. -Todd On Aug 27, 2009, at 5:32 PM, Dong H. Ahn wrote: > fyi, i just incorporated Todd's prefix config support to help other > autoconf projects to use the launchmon package without having to > suffer config.h conflicts. The branch should now export include/lmon- > config.h as part of installation instead of config.h, and all other > launchmon header files check system header files using prefixed > macros (e.g., LAUNCHMON_HAVE_STDIO_H vs. HAVE_STDIO_H.) > > Dong |
From: Dong H. A. <ah...@ll...> - 2009-08-28 00:33:08
|
fyi, i just incorporated Todd's prefix config support to help other autoconf projects to use the launchmon package without having to suffer config.h conflicts. The branch should now export include/lmon-config.h as part of installation instead of config.h, and all other launchmon header files check system header files using prefixed macros (e.g., LAUNCHMON_HAVE_STDIO_H vs. HAVE_STDIO_H.) Dong |
From: Bronis R. de S. <br...@ll...> - 2009-03-19 10:27:29
|
Mark: This sounds like a good idea. Making things less complex for the user is generally desirable. Bronis On Thu, 19 Mar 2009, Mark O'Connor wrote: > Hi all, > > In a recent conference call between Dong Ahn, David and myself were > discussing ways to enhance LaunchMON's abstraction of the "launch a > job and co-locate these daemons" task. Ideally we'd like to see > LaunchMON provide an MPI-independent interface to the API-user, at > least for simple/common cases. I've summarized our thoughts here - > Dong suggested bouncing this to the general distribution list because > he thought it's worth thinking through carefully. > > >From a tool writer's perspective, LaunchMON would ideally abstract > away enough of the RM to enable starting and attaching to jobs in the > majority of cases. Currently, it abstracts the RM details away when > attaching daemons to a running process - only the hostname and pid of > the RM are necessary. However, when starting an mpi job with daemons > co-hosted, the path to the RM and a list of its arguments must be > provided. This passes RM-dependency back to the tool, which then needs > to discover the MPIs installed on the system, select the correct one, > understand whether it uses -np or -n or -proc to specify the number if > processes and so on. Of course, RMs typically provide a vast array of > options many of which are unique to that particular RM - it's not > expected that LaunchMON should abstract away all differences between > the RMs; just the ones they have in common and are most-often used. In > this case, the path of the target program, its arguments and the > number of processes might be enough. > > To do this, LaunchMON could expose an extra, higher-level, way to > start a job, requiring only the path to target executable, its > arguments, the number of processes to launch and an additional array > of strings to pass to the RM as arguments. This is similar to the > level of abstraction provided to the user in DDT's GUI - the RM is > detected at install, and after that most users simply choose the > number of processes and their program. If they wish to use specific RM > features not abstracted by the GUI, they have the option of passing > extra arguments directly to the RM. > > Note: I think it would be acceptable if LaunchMON did not attempt to > abstract out handling of queuing systems - this can be very complex. > Instead, the caller must ensure that, if necessary, it is executed by > the batch system with the appropriate configuration options set. > > As mentioned, DDT already contains the above abstractions for a vast > list of RMs; this experience might be something we could contribute to > the LaunchMON project; from a purely technical perspective repackaging > these into a reusable bundle would not require an undue amount of > effort. > > Mark > Allinea Software > > ------------------------------------------------------------------------------ > Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are > powering Web 2.0 with engaging, cross-platform capabilities. Quickly and > easily build your RIAs with Flex Builder, the Eclipse(TM)based development > software that enables intelligent coding and step-through debugging. > Download the free 60 day trial. http:// p.sf.net/sfu/www-adobe-com > _______________________________________________ > Launchmon-devel mailing list > Lau...@li... > https:// lists.sourceforge.net/lists/listinfo/launchmon-devel > > |
From: Mark O'C. <ma...@al...> - 2009-03-19 08:45:00
|
Hi all, In a recent conference call between Dong Ahn, David and myself were discussing ways to enhance LaunchMON's abstraction of the "launch a job and co-locate these daemons" task. Ideally we'd like to see LaunchMON provide an MPI-independent interface to the API-user, at least for simple/common cases. I've summarized our thoughts here - Dong suggested bouncing this to the general distribution list because he thought it's worth thinking through carefully. >From a tool writer's perspective, LaunchMON would ideally abstract away enough of the RM to enable starting and attaching to jobs in the majority of cases. Currently, it abstracts the RM details away when attaching daemons to a running process - only the hostname and pid of the RM are necessary. However, when starting an mpi job with daemons co-hosted, the path to the RM and a list of its arguments must be provided. This passes RM-dependency back to the tool, which then needs to discover the MPIs installed on the system, select the correct one, understand whether it uses -np or -n or -proc to specify the number if processes and so on. Of course, RMs typically provide a vast array of options many of which are unique to that particular RM - it's not expected that LaunchMON should abstract away all differences between the RMs; just the ones they have in common and are most-often used. In this case, the path of the target program, its arguments and the number of processes might be enough. To do this, LaunchMON could expose an extra, higher-level, way to start a job, requiring only the path to target executable, its arguments, the number of processes to launch and an additional array of strings to pass to the RM as arguments. This is similar to the level of abstraction provided to the user in DDT's GUI - the RM is detected at install, and after that most users simply choose the number of processes and their program. If they wish to use specific RM features not abstracted by the GUI, they have the option of passing extra arguments directly to the RM. Note: I think it would be acceptable if LaunchMON did not attempt to abstract out handling of queuing systems - this can be very complex. Instead, the caller must ensure that, if necessary, it is executed by the batch system with the appropriate configuration options set. As mentioned, DDT already contains the above abstractions for a vast list of RMs; this experience might be something we could contribute to the LaunchMON project; from a purely technical perspective repackaging these into a reusable bundle would not require an undue amount of effort. Mark Allinea Software |
From: Dong H. A. <ah...@ll...> - 2009-03-17 16:36:52
|
All, I committed all modifications I've made for BGP functional port into the 0.7 branch, which I validated on Dawn upto 36K CN with 147,456 MPI tasks. During testing, I found a performance/scalability problem with BG RM and am trying to leverage SWL testing to address it, we will see. (IBM PMR number and description below). It may also be interesting to see RM performance signatures towards extreme scales for other high end systems like Crays'... Dong ------- Hi Dong, FYI, this BG/P scaling defect is being tracked via PMR 64040, 49R. Best regards Paul -----Original Message----- From: Adam Bertsch [mailto:ad...@ll...] Sent: Monday, March 16, 2009 15:53 To: Paul Szepietowski Subject: another dawn PMR Dong Ahn reports: All, I've just completed LaunchMON testing on Dawn, and had success for functionality validation all the way to 36K CN (fully loaded). This exercise is to validate SOW items 5.7.1.5, 5.7.1.6, and a portion of 5.7.1.4. With the provided efix, I am happy with the correctness aspect of system components, but not quite happy with the performance aspect, in particular towards extreme scale. (high performance variations observed at lower scales aside). My reading of the logs suggest that the bottleneck for daemon launching is in BGP control system's collecting, generating and distributing large proctable among its distributed components. Following are the overheads for LaunchMON's launchAndSpawnDaemons service along with the control system's proctable handling overheads. #CN #dmons tasks service proctab 16384 128 65536 201 secs 156 secs (78%) 20480 160 81920 215 secs 200 secs (93%) 32768 256 131072 515 secs 483 secs (94%) 36864 288 147456 653 secs 631 secs (97%) If nothing else, this perf signature doesn't comply with the scalability requirements in 5.7.1.5: the proctab handling performance almost looks to me as a polynomial. -------- 5.7.1.5 -------- For example, daemon launch time shall vary by no more than the log of the daemon count. Similarly job launch time under the control of a CDT shall vary by no more than the log of the MPI task count. -------- <CUT> |
From: Dong H. A. <ah...@ll...> - 2009-03-12 20:13:54
|
Testing... |