## [MonAMI-users] Torque efficiency?

 [MonAMI-users] Torque efficiency? From: Steve Young - 2008-07-01 19:18:09 Attachments: Message as HTML ```Hi, I was just wondering how does the efficiency of jobs get figured out from the torque output of "Running Jobs by Efficiency" as seen on this page: http://chem.hamilton.edu/modules/myiframe/index.php?iframeid=2 Just thought I would start here... I'm guessing this might be something for the torque list. -Steve ```

 [MonAMI-users] Torque efficiency? From: Steve Young - 2008-07-01 19:18:09 Attachments: Message as HTML ```Hi, I was just wondering how does the efficiency of jobs get figured out from the torque output of "Running Jobs by Efficiency" as seen on this page: http://chem.hamilton.edu/modules/myiframe/index.php?iframeid=2 Just thought I would start here... I'm guessing this might be something for the torque list. -Steve ```
 Re: [MonAMI-users] Torque efficiency? From: Paul Millar - 2008-07-02 23:16:49 ```Hi Steve, On Tuesday 01 July 2008 21:17:53 Steve Young wrote: > I was just wondering how does the efficiency of jobs get figured out > from the torque output of "Running Jobs by Efficiency" as seen on > this page: > > http://chem.hamilton.edu/modules/myiframe/index.php?iframeid=2 Yes, it's not immediately obvious, so I've added a section on this within the User Guide, ready for the next release. "Efficiency" is something the torque plugin calculates, rather than something the Torque server reports. It's a simple calculation: just divide a job's CPU time by it's wall-clock time (both metrics are reported by Torque server). This calculates (something like) the average efficiency of the job over its execution time so far. The number is then bracketed into five "efficiency bins": less than 20%, 20%--40%, 40%--60%, 60%--80% and greater than 80%. What is plotted is the number of running jobs in each bin. There's a couple of problems with this: one trivial the other difficult. The first (easy) problem is that the above calculation doesn't take into account how many nodes a job is running on. The solution is to count the number of nodes a job is using and divide the efficiency by that number. I've fixed the code in CVS so the next version should give more correct values. The second (hard) problem is that, if a job "busy-waits" for something (most likely network traffic) or is caught in a tight loop (i.e., a bug) then the process will consume lots of CPU, but not make any progress. So, although it appears to be 100% efficient, it might be making no progress towards completing. As it happens, MPI libraries tend to use busy-waits when waiting for network traffic, as it has lower overheads (provided there's no contention for CPU usage). So, the efficiency measurements should be taken with a pinch of salt. If the measured efficiency is low, then the job really is poorly utilising the CPU and making slow progress to completing its goal. If the measured efficiency is high then, unless you know the code isn't using busy-waits, you can't be sure of its efficiency. If the code does use busy-waits (like MPI jobs) then you need some other means of estimating efficiency. > Just thought I would start here... I'm guessing this might be > something for the torque list. I've ask them ... which is where I got my information from ;-) Cheers, Paul. ```
 Re: [MonAMI-users] Torque efficiency? From: Steve Young - 2008-07-03 00:57:06 ```Thanks Paul =). Exactly what I was looking for. -Steve On Jul 2, 2008, at 7:17 PM, Paul Millar wrote: > Hi Steve, > > On Tuesday 01 July 2008 21:17:53 Steve Young wrote: >> I was just wondering how does the efficiency of jobs get figured out >> from the torque output of "Running Jobs by Efficiency" as seen on >> this page: >> >> http://chem.hamilton.edu/modules/myiframe/index.php?iframeid=2 > > Yes, it's not immediately obvious, so I've added a section on this > within the > User Guide, ready for the next release. > > "Efficiency" is something the torque plugin calculates, rather than > something > the Torque server reports. It's a simple calculation: just divide > a job's CPU > time by it's wall-clock time (both metrics are reported by Torque > server). > This calculates (something like) the average efficiency of the job > over its > execution time so far. The number is then bracketed into five > "efficiency bins": > less than 20%, 20%--40%, 40%--60%, 60%--80% and greater than 80%. > What is > plotted is the number of running jobs in each bin. > > There's a couple of problems with this: one trivial the other > difficult. > > The first (easy) problem is that the above calculation doesn't take > into > account how many nodes a job is running on. The solution is to > count the > number of nodes a job is using and divide the efficiency by that > number. I've > fixed the code in CVS so the next version should give more correct > values. > > The second (hard) problem is that, if a job "busy-waits" for > something (most > likely network traffic) or is caught in a tight loop (i.e., a bug) > then the > process will consume lots of CPU, but not make any progress. So, > although it > appears to be 100% efficient, it might be making no progress > towards completing. > As it happens, MPI libraries tend to use busy-waits when waiting > for network > traffic, as it has lower overheads (provided there's no contention > for CPU > usage). > > So, the efficiency measurements should be taken with a pinch of > salt. If the > measured efficiency is low, then the job really is poorly utilising > the CPU and > making slow progress to completing its goal. If the measured > efficiency is high > then, unless you know the code isn't using busy-waits, you can't be > sure of > its efficiency. > > If the code does use busy-waits (like MPI jobs) then you need some > other means > of estimating efficiency. > >> Just thought I would start here... I'm guessing this might be >> something for the torque list. > > I've ask them ... which is where I got my information from ;-) > > Cheers, > > Paul. ```