From: Jie J. <jj...@nu...> - 2009-11-16 14:03:40
|
Hi Rick, When processing the collected data with "psprocess", it always show the "Wall clock time" result. I have two questions about the "Wall clock time". First, it is much larger than the run time of the target program. [root@node2 bin]# time psrun -c test_config1.xml ./cg.A libpsrun.c:181 : SIGPROF ignored on startup. Handler=0x1, flags=14000000 PerfSuite debugging enabled (debug level: PS_DEBUG_OFF) [PID 5562] Library version: threaded [PID 5562] Environment (entry of psrun_init) [PID 5562] PSRUN_DOFORK = (null) [PID 5562] LD_PRELOAD = libpsrun.so.0 [PID 5562] PSRUN_PID = 5562 [PID 5562] PS_HWPC_FILE = cg.A NAS Parallel Benchmarks (NPB3.2-SER) - CG Benchmark Size: 14000 Iterations: 15 Initialization time = 0.656 seconds iteration ||r|| zeta 1 0.25789587124191E-12 19.9997581277040 2 0.25434985977194E-14 17.1140495745506 3 0.25346577542259E-14 17.1296668946143 4 0.25342984287709E-14 17.1302113581192 5 0.25247550490803E-14 17.1302338856353 6 0.25375789728060E-14 17.1302349879482 7 0.25309911213776E-14 17.1302350498916 8 0.24971158788969E-14 17.1302350537510 9 0.24662516791025E-14 17.1302350540101 10 0.25086578290790E-14 17.1302350540284 11 0.24878397192172E-14 17.1302350540298 12 0.24359141964394E-14 17.1302350540299 13 0.24247346800617E-14 17.1302350540299 14 0.24157219672237E-14 17.1302350540299 15 0.24243304908282E-14 17.1302350540299 Benchmark completed VERIFICATION SUCCESSFUL Zeta is 0.171302350540E+02 Error is 0.526781606656E-13 CG Benchmark Completed. Class = A Size = 14000 Iterations = 15 Time in seconds = 2.06 Mop/s total = 724.79 Operation type = floating point Verification = SUCCESSFUL Version = 3.2.1 Compile date = 09 Nov 2009 Compile options: F77 = ifort FLINK = $(F77) F_LIB = (none) F_INC = (none) FFLAGS = -O -g FLINKFLAGS = -O RAND = randi8 Please send all errors/feedbacks to: NPB Development Team np...@na... real 0m2.756s user 0m2.711s sys 0m0.022s [root@node2 bin]# psprocess -m test_metric.xml cg.A.5562.node2.xml PerfSuite Hardware Performance Summary Report Version : 1.0 Created : Mon Nov 16 20:46:23 CST 2009 Generator : psprocess 0.5 XML Source : cg.A.5562.node2.xml Execution Information ============================================================================================ Collector : libpshwpc Date : Mon Nov 16 20:45:34 2009 Host : node2 Process ID : 5562 Thread : 0 User : root Command : cg.A Processor and System Information ============================================================================================ Node CPUs : 8 Vendor : Intel Family : Pentium Pro (P6) Brand : Intel(R) Xeon(R) CPU E5540 @ 2.53GHz CPU Revision : 5 Clock (MHz) : 1600.000 Memory (MB) : 16078.69 Pagesize (KB) : 4 Cache Information ============================================================================================ Cache levels : 3 -------------------------------- Level 1 Type : instruction Size (KB) : 32 Linesize (B) : 64 Assoc : 4 Type : data Size (KB) : 32 Linesize (B) : 64 Assoc : 8 -------------------------------- Level 2 Type : unified Size (KB) : 256 Linesize (B) : 64 Assoc : 8 -------------------------------- Level 3 Type : unified Size (KB) : 8192 Linesize (B) : 64 Assoc : 16 Index Description Counter Value ============================================================================================ 1 MEM_LOAD_RETIRED:LLC_UNSHARED_HIT (description not available).... 338818848 2 MEM_LOAD_RETIRED:LLC_MISS (description not available)............ 3219718 3 UNHALTED_CORE_CYCLES (description not available)................. 7312056865 Event Index ============================================================================================ 1: MEM_LOAD_RETIRED:LLC_UNSHARED_HIT 2: MEM_LOAD_RETIRED:LLC_MISS 3: UNHALTED_CORE_CYCLES Statistics ============================================================================================ Counting domain........................................................ user Multiplexed............................................................ no Wall clock time (seconds).............................................. 4.310 ---------------------------------------------- Here we can see that the "Wall clock time" output (4.31s) by psprocess is quite larger than the runtime of cg.A (both in terms of the outputs of cg.A,2.06s, and time command, about 2.7s.). Where does other part of time go? What causes the overhead? And what's the real meaning of the "Wall clock time" here? Second, in the output xml file of psrun, there is the count of cpu time: <cputime units="seconds"> <usertime>2.002680</usertime> <systemtime>0.000010</systemtime> </cputime> We can see that this is quite close to the real run time of cg.A. Why does psprocess not show these valuse? Will you add this function in upcoming ps-1.0? Regards, Jie |