From: Maynard J. <may...@us...> - 2011-11-03 15:28:33
|
On 11/03/2011 6:56 AM, Andreas Krebbel wrote: > This patch makes use of the event mechanism to allow for dynamic > enabling and disabling of the System z hardware sampling facility with > the OProfile user space tools. Andreas, Thanks for the patch. Other than a few minor things (which I can fix when I commit), I have just one question below concerning S390_HW_SAMPLER_BUFSIZE. -Maynard > > A single virtual counter is created which can be used to > enable/disable hardware sampling dynamically from user space. The > counter can be used with the two events 0 and 1. Using event 0 > enables use of timer based sampling while event 1 turns on hardware > sampling. These values have to stay like this since the > /dev/oprofile/0/event always has to match > /dev/oprofile/hwsampling/hwsampler content in order to support the > existing interface in parallel. > > This adds the new opcontrol option --s390hwsampbufsize. This option > specifies the number of 2MB memory areas allocated per CPU by the > kernel module. > > Signed-off-by: Andreas Krebbel<kr...@li...> > --- > doc/opcontrol.1.in | 10 ++++ > doc/oprofile.xml | 58 +++++++++++++++++++++++ > events/Makefile.am | 4 +- > events/s390x/basic_mode_sampling_v1/events | 8 +++ > events/s390x/basic_mode_sampling_v1/unit_masks | 8 +++ > libop/op_cpu_type.c | 1 + > libop/op_cpu_type.h | 1 + > libop/op_events.c | 3 + > libpp/op_header.cpp | 10 ++-- > utils/opcontrol | 59 ++++++++++++++++++++++++ > utils/ophelp.c | 6 ++ > 11 files changed, 162 insertions(+), 6 deletions(-) > create mode 100644 events/s390x/basic_mode_sampling_v1/events > create mode 100644 events/s390x/basic_mode_sampling_v1/unit_masks > > diff --git a/doc/opcontrol.1.in b/doc/opcontrol.1.in > index e8e0bff..53afb4e 100644 > --- a/doc/opcontrol.1.in > +++ b/doc/opcontrol.1.in > @@ -164,6 +164,16 @@ that can coordinate a multi-domain profiling session. Including domain 0 in > the list of active domains is optional. (e.g. --active-domains=2,5,6 and > --active-domains=0,2,5,6 are equivalent) > .br > +.SH OPTIONS (specific to System z) > +.TP > +.BI "--s390hwsampbufsize="<buffers> ^-- similar options use 'num', without angle-brackets. > +Number of 2MB areas used per CPU for storing sample data. The best > +size for the sample memory depends on the particular system and the > +workload to be measured. Providing the sampler with too little memory > +results in lost samples. Reserving too much system memory for the > +sampler impacts the overall performance and, hence, also the workload > +to be measured. > +.br > > .SH ENVIRONMENT > No special environment variables are recognised by opcontrol. > diff --git a/doc/oprofile.xml b/doc/oprofile.xml > index d6581a8..1eee88d 100644 > --- a/doc/oprofile.xml > +++ b/doc/oprofile.xml > @@ -1223,6 +1223,64 @@ Note: * All IBS fetch event must have the same event count and unitmask, > > </sect2> > > +<sect2 id="systemz"> > +<title>IBM System z hardware sampling support</title> > +<para> > +IBM System z provides a facility which does instruction sampling as > +part of the CPU. This has great advantages over the timer based > +sampling approach like better sampling resolution with less overhead > +and the possibility to get samples within code sections where > +interrupts are disabled (useful especially for Linux kernel code). > +</para> > +<para> > +A public description of the System z CPU-Measurement Facilities can be > +found here: > +<ulink url="http://www-01.ibm.com/support/docview.wss?uid=isg26fcd1cc32246f4c8852574ce0044734a">The Load-Program-Parameter and CPU-Measurement Facilities</ulink> > +</para> > +<para> > +System z hardware sampling can be used for Linux instances in LPAR > +mode. The hardware sampling support used by OProfile was introduced > +for System z10 in October 2008. > +</para> > +<para> > +To enable hardware sampling for an LPAR you must activate the LPAR > +with authorization for basic sampling control. See the "Support > +Element Operations Guide" for your mainframe system for more > +information. > +</para> > +<para> > +The hardware sampling facility can be enabled and disabled using the > +event interface. A `virtual' counter 0 has been defined that supports > +two events, TIMER and HWSAMPLING. By default the HWSAMPLING event is > +used on machines providing the facility. For both events only the > +`count', `kernel' and `user' options are evaluated by the kernel > +module. > +</para> > +<para> > +The `count' value is the sampling rate as it is passed to the CPU > +measurement facility. A sample will be taken by the hardware every > +`count' cycles. Using low values here will quickly fill up the > +sampling buffers and will generate CPU load on the OProfile daemon and > +the kernel module being busy flushing the hardware buffers. This > +might considerably impact the workload to be profiled. "</para>" was needed here in order for docs build to complete. > +<para> > +The unit mask `um' is required to be zero. > +</para> > +<para> > +The opcontrol tool provides a new option specific to System z > +hardware sampling: > +</para> > + > +<itemizedlist> > +<listitem>--s390hwsampbufsize="buffers" ^-- as I said above, let's use "num" instead. > +Number of 2MB areas used per CPU for storing sample data. The best The two lines above are rendered as one line in the HTML. We can just add a semicolon (:) to separate the option name from the long > +size for the sample memory depends on the particular system and the > +workload to be measured. Providing the sampler with too little memory > +results in lost samples. Reserving too much system memory for the > +sampler impacts the overall performance and, hence, also the workload > +to be measured.</listitem> > +</itemizedlist> > +</sect2> > > <sect2 id="misuse"> > <title>Dangerous counter settings</title> > diff --git a/events/Makefile.am b/events/Makefile.am > index 0512afc..bb7729f 100644 > --- a/events/Makefile.am > +++ b/events/Makefile.am > @@ -71,8 +71,8 @@ event_files = \ > ppc/e300/events ppc/e300/unit_masks \ > tile/tile64/events tile/tile64/unit_masks \ > tile/tilepro/events tile/tilepro/unit_masks \ > - tile/tilegx/events tile/tilegx/unit_masks > - > + tile/tilegx/events tile/tilegx/unit_masks \ > + s390x/basic_mode_sampling_v1/events s390x/basic_mode_sampling_v1/unit_masks > > install-data-local: > for i in ${event_files} ; do \ > diff --git a/events/s390x/basic_mode_sampling_v1/events b/events/s390x/basic_mode_sampling_v1/events > new file mode 100644 > index 0000000..b3a908e > --- /dev/null > +++ b/events/s390x/basic_mode_sampling_v1/events > @@ -0,0 +1,8 @@ > +# Copyright OProfile authors > +# Copyright (c) International Business Machines, 2011. > +# Contributed by Andreas Krebbel<kr...@li...>. > +# > +# S/390 Basic Mode Sampling events > +# > +event:0x00 counters:0 um:zero minimum:100 name:TIMER : Sampling using timer interrupt > +event:0x01 counters:0 um:zero minimum:2202 name:HWSAMPLING : Sampling using Basic Mode Hardware Sampling > diff --git a/events/s390x/basic_mode_sampling_v1/unit_masks b/events/s390x/basic_mode_sampling_v1/unit_masks > new file mode 100644 > index 0000000..9e634f4 > --- /dev/null > +++ b/events/s390x/basic_mode_sampling_v1/unit_masks > @@ -0,0 +1,8 @@ > +# Copyright OProfile authors# > +# Copyright (c) International Business Machines, 2011. > +# Contributed by Andreas Krebbel<kr...@li...>. > +# > +# S/390 Basic Mode Hardware Sampling unit masks > +# > +name:zero type:mandatory default:0x0 > + 0x0 No unit mask > diff --git a/libop/op_cpu_type.c b/libop/op_cpu_type.c > index 6aa604f..c1bc0f8 100644 > --- a/libop/op_cpu_type.c > +++ b/libop/op_cpu_type.c > @@ -97,6 +97,7 @@ static struct cpu_descr const cpu_descrs[MAX_CPU_TYPE] = { > { "TILE64", "tile/tile64", CPU_TILE_TILE64, 2 }, > { "TILEPro", "tile/tilepro", CPU_TILE_TILEPRO, 4 }, > { "TILE-GX", "tile/tilegx", CPU_TILE_TILEGX, 4 }, > + { "IBM System z with basic mode hardware sampling support", "s390x/basic_mode_sampling_v1", CPU_S390_HWSAMPV1, 1 }, > }; > > static size_t const nr_cpu_descrs = sizeof(cpu_descrs) / sizeof(struct cpu_descr); > diff --git a/libop/op_cpu_type.h b/libop/op_cpu_type.h > index 32454bb..91c457f 100644 > --- a/libop/op_cpu_type.h > +++ b/libop/op_cpu_type.h > @@ -94,6 +94,7 @@ typedef enum { > CPU_TILE_TILE64, /**< Tilera TILE64 family */ > CPU_TILE_TILEPRO, /**< Tilera TILEPro family (Pro64 or Pro36) */ > CPU_TILE_TILEGX, /**< Tilera TILE-GX family */ > + CPU_S390_HWSAMPV1, /* IBM System z with Basic Mode Sampling support */ > MAX_CPU_TYPE > } op_cpu; > > diff --git a/libop/op_events.c b/libop/op_events.c > index c00fd79..49faf5d 100644 > --- a/libop/op_events.c > +++ b/libop/op_events.c > @@ -1125,6 +1125,9 @@ void op_default_event(op_cpu cpu_type, struct op_default_event_descr * descr) > case CPU_PPC_E300: > descr->name = "CPU_CLK"; > break; > + case CPU_S390_HWSAMPV1: > + descr->name = "HWSAMPLING"; > + break; > > case CPU_TILE_TILE64: > case CPU_TILE_TILEPRO: > diff --git a/libpp/op_header.cpp b/libpp/op_header.cpp > index 754015a..9b8c594 100644 > --- a/libpp/op_header.cpp > +++ b/libpp/op_header.cpp > @@ -254,11 +254,13 @@ string const describe_cpu(opd_header const& header) > str = xml_utils::get_profile_header(cpu_name, header.cpu_speed); > } else { > str += string("CPU: ") + op_get_cpu_type_str(cpu); > - str += ", speed "; > + if (header.cpu_speed> 0) { > + ostringstream ss; > > - ostringstream ss; > - ss<< header.cpu_speed; > - str += ss.str() + " MHz (estimated)"; > + str += ", speed "; > + ss<< header.cpu_speed; > + str += ss.str() + " MHz (estimated)"; > + } > } > return str; > } > diff --git a/utils/opcontrol b/utils/opcontrol > index 0951574..1441006 100644 > --- a/utils/opcontrol > +++ b/utils/opcontrol > @@ -239,6 +239,10 @@ opcontrol: usage: > --xen Xen image (for Xen only) > --active-domains=<list> List of domains in profiling session (for Xen) > (list contains domain ids separated by commas) > + > + System z specific options > + > + --s390hwsampbufsize=buffers Number of 2MB areas used per CPU for storing sample data. ^-- again, 'num' versus 'buffers' > EOF > } > > @@ -381,6 +385,10 @@ do_init() > IBS_OP_COUNT=0 > IBS_OP_UNITMASK=0 > > + # System z specific values > + S390_HW_SAMPLER=0 > + S390_HW_SAMPLER_BUFSIZE=0 > + > OPROFILED="$OPDIR/oprofiled" > > # location for daemon setup information > @@ -405,6 +413,9 @@ do_init() > IS_TIMER=1 > else > case "$CPUTYPE" in > + s390x/basic_mode_sampling_v1) > + S390_HW_SAMPLER=1 > + ;; > ia64/*) > IS_PERFMON=$KERNEL_SUPPORT > ;; > @@ -496,6 +507,9 @@ do_save_setup() > if test "$XEN_RANGE"; then > echo "XEN_RANGE=$XEN_RANGE">> $SETUP_FILE > fi > + if test "$S390_HW_SAMPLER" = "1" -a "$S390_HWSAMPLER_BUFSIZE" != "0"; then > + echo "S390_HW_SAMPLER_BUFSIZE=$S390_HW_SAMPLER_BUFSIZE">> $SETUP_FILE > + fi > SETUP_FILE="$SAVE_SETUP_FILE" > } > > @@ -971,6 +985,13 @@ do_options() > exec $OPHELP > ;; > > + --s390hwsampbufsize) > + error_if_not_number "$arg" "$val" > + S390_HW_SAMPLER_BUFSIZE=$val > + DO_SETUP=yes > + ;; > + > + > *) > echo "Unknown option \"$arg\". See opcontrol --help">&2 > exit 1 > @@ -1354,6 +1375,32 @@ check_event_mapping_data() > fi > fi > fi > + if test "$S390_HW_SAMPLER" = "1" -a "$EVENT" = "HWSAMPLING"; then > + if test "$CALLGRAPH" != "0"; then > + echo "Callgraph sample collection is not supported with ">&2 > + echo "System z hardware sampling. Please use --callgraph=0 ">&2 > + echo "or enable timer based sampling.">&2 > + exit 1 > + fi > + if test -r $MOUNT/hwsampling/hw_min_interval; then > + min_interval=`cat $MOUNT/hwsampling/hw_min_interval` > + else > + echo "$MOUNT/hwsampling/hw_min_interval could not be found" > + exit 1 > + fi > + if test -r $MOUNT/hwsampling/hw_max_interval; then > + max_interval=`cat $MOUNT/hwsampling/hw_max_interval` > + else > + echo "$MOUNT/hwsampling/hw_max_interval could not be found" > + exit 1 > + fi > + if test "$COUNT" -lt "$min_interval" -o "$COUNT" -gt "$max_interval"; then > + echo "Invalid value for hardware sampling rate">&2 ^-- insert "$COUNT" ^-- "." > + echo "should be between $min_interval and $max_interval">&2 ^-- s/should/Value must ^-- "." > + exit 1 > + fi > + fi > + > len=`echo -n $event_num | wc -c` > num_chars_in_grpid=`expr $len - 2` > GRP_NUM_VAL=`echo | awk '{print substr("'"${event_num}"'",1,"'"${num_chars_in_grpid}"'")}'` > @@ -1418,6 +1465,10 @@ do_param_setup() > return > fi > > + if test "$S390_HW_SAMPLER" = "1" -a "$S390_HW_SAMPLER_BUFSIZE" != "0"; then > + echo $S390_HW_SAMPLER_BUFSIZE>$MOUNT/hwsampling/hw_sdbt_blocks > + fi > + > # use the default setup if none set > if test "$NR_CHOSEN" = 0; then > set_event 0 $DEFAULT_EVENT > @@ -1701,6 +1752,14 @@ do_status() > echo "CPU buffer size: $CPU_BUF_SIZE" > fi > fi > + if test "$S390_HW_SAMPLER" = "1"; then > + echo -n "System z hardware sampling buffer size (in 2MB areas): " Using two echos to get the description and value displayed will end up printing two lines. We don't want that. > + if test "$S390_HW_SAMPLER_BUFSIZE" = "0"; then > + cat $MOUNT/hwsampling/hw_sdbt_blocks I don't think this if/else is appropriate. The S390_HW_SAMPLER_BUFSIZE is cached in daemonrc, and at this point in opcontrol, we will have updated the S390_HW_SAMPLER_BUFSIZE variable with the cached value. So just always use S390_HW_SAMPLER_BUFSIZE. That's what we do for all other profiling values. Or am I missing something? > + else > + echo "$S390_HW_SAMPLER_BUFSIZE" > + fi > + fi > > exit 0 > } > diff --git a/utils/ophelp.c b/utils/ophelp.c > index bffdfce..b80f3b1 100644 > --- a/utils/ophelp.c > +++ b/utils/ophelp.c > @@ -743,6 +743,12 @@ int main(int argc, char const * argv[]) > "http://www.tilera.com for more information.\n"; > break; > > + case CPU_S390_HWSAMPV1: > + event_doc = "IBM System z Basic Mode Sampling\n" > + "http://www-01.ibm.com/support/docview.wss" > + "?uid=isg26fcd1cc32246f4c8852574ce0044734a\n"; > + break; > + > case CPU_RTC: > break; > |