From: Jason Y. <jas...@am...> - 2008-01-29 21:49:17
Attachments:
op_cvs_IBS_PATCH_1
|
This patch contains opcontrol changes. --- opcontrol | 139 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 131 insertions(+), 8 deletions(-) diff -uprN -X dontdiff oprofile-cvs-original/utils/opcontrol oprofile-cvs-ibs/utils/opcontrol --- oprofile-cvs-original/utils/opcontrol 2008-01-28 16:05:29.000000000 -0600 +++ oprofile-cvs-ibs/utils/opcontrol 2008-01-29 09:39:46.000000000 -0600 @@ -120,6 +120,15 @@ do_help() --reset clears out data from current session --save=name save data from current session to session_name --deinit unload the oprofile module and oprofilefs + --no-event clear all events + + --ibs-fetch=#count enable AMD IBS Fetch sampling with maximum count. + Use 0 to disable IBS Fetch profiling. + ( Default = 250000, Min = 25000, Max = 1048575 ) + + --ibs-op=#count enable AMD IBS OP sampling with maximum count. + Use 0 to disable IBS OP profiling. + ( Default = 250000, Min = 25000, Max = 1048575 ) -e/--event=eventspec @@ -294,6 +303,8 @@ do_init() SEPARATE_THREAD=0 SEPARATE_CPU=0 CALLGRAPH=0 + IBS_FETCH=0 + IBS_OP=0 OPROFILED="$OPDIR/oprofiled" @@ -357,6 +368,8 @@ do_init() vecho "KERNEL_RANGE $KERNEL_RANGE" vecho "XENIMAGE $XENIMAGE" vecho "XEN_RANGE $XEN_RANGE" + vecho "IBS_FETCH $IBS_FETCH" + vecho "IBS_OP $IBS_OP" } @@ -368,7 +381,7 @@ create_dir() echo "Couldn't mkdir -p $1" >&2 exit 1 fi - chmod 755 "$1" + chmod 775 "$1" fi } @@ -430,6 +443,8 @@ do_save_setup() if test "$XEN_RANGE"; then echo "XEN_RANGE=$XEN_RANGE" >> $SETUP_FILE fi + echo "IBS_FETCH=$IBS_FETCH" >> $SETUP_FILE + echo "IBS_OP=$IBS_OP" >> $SETUP_FILE } @@ -757,7 +772,46 @@ do_options() ;; # --setup options - + --ibs-fetch) + error_if_empty $arg $val + if test ! -f $MOUNT/ibs_fetch/enable ; then + echo "IBS Fetch profiling unsupported on this kernel/hardware" >&2 + exit 1 + fi + + if [ $val -gt 1048575 ] || [ $val -lt 25000 ] + then + if [ $val -ne 0 ] + then + echo "Error: IBS Fetch interval $val is invalid." >& 2 + echo "Please specify value between 25000 and 1048575, or 0 to disable." >&2 + exit 1 + fi + fi + + IBS_FETCH=$val + DO_SETUP=yes + ;; + --ibs-op) + error_if_empty $arg $val + if test ! -f $MOUNT/ibs_uops/enable ; then + echo "IBS Op profiling unsupported on this kernel/hardware" >&2 + exit 1 + fi + + if [ $val -gt 1048575 ] || [ $val -lt 25000 ] + then + if [ $val -ne 0 ] + then + echo "Error: IBS Op interval $val is invalid." >& 2 + echo "Please specify value between 25000 and 1048575, or 0 to disable." >&2 + exit 1 + fi + fi + + IBS_OP=$val + DO_SETUP=yes + ;; --session-dir) # already processed ;; @@ -794,10 +848,28 @@ do_options() if test "$val" = "default"; then val=$DEFAULT_EVENT fi + if test `echo $val | grep IBS_FETCH`; then + echo "ERROR: $val is an IBS derived event." + echo " Please specify --ibs-fetch instead." + echo " See opcontrol --help for more information." + exit 1 + fi + + if test `echo $val | grep IBS_OP`; then + echo "ERROR: $val is an IBS derived event." + echo " Please specify --ibs-op instead." + echo " See opcontrol --help for more information." + exit 1 + fi set_event $NR_CHOSEN "$val" NR_CHOSEN=`expr $NR_CHOSEN + 1` DO_SETUP=yes ;; + # Clearing PMC events + --no-event) + NR_CHOSEN=0 + DO_SETUP=yes + ;; -p|--separate) OLD_IFS=$IFS IFS=, @@ -981,7 +1053,7 @@ do_kill_daemon() fi COUNT=`expr $COUNT + 1` - if test "$COUNT" -eq 15; then + if test "$COUNT" -eq 60; then echo "Daemon stuck shutting down; killing !" kill -9 `cat $LOCK_FILE` fi @@ -1193,6 +1265,28 @@ do_param_setup() fi fi + if test "$KERNEL_SUPPORT" = "yes" -a -f $MOUNT/ibs_fetch/enable ; then + if test "$IBS_FETCH" != "0"; then + set_param ibs_fetch/max_count $IBS_FETCH + set_param ibs_fetch/enable 1 + else + set_param ibs_fetch/enable 0 + fi + else + echo "IBS Fetch not supported - ignored" >&2 + fi + + if test "$KERNEL_SUPPORT" = "yes" -a -f $MOUNT/ibs_uops/enable ; then + if test "$IBS_OP" != "0"; then + set_param ibs_uops/max_count $IBS_OP + set_param ibs_uops/enable 1 + else + set_param ibs_uops/enable 0 + fi + else + echo "IBS OP not supported - ignored" >&2 + fi + if test $NOTE_SIZE != 0; then set_param notesize $NOTE_SIZE fi @@ -1207,8 +1301,9 @@ do_param_setup() return fi - # use the default setup if none set - if test "$NR_CHOSEN" = 0; then + # use the default setup if none set and both IBS_FETCH and IBS_OP + # are not enabled + if test "$NR_CHOSEN" = 0 -a ! "$IBS_FETCH" != "0" -a ! "$IBS_OP" != "0"; then set_event 0 $DEFAULT_EVENT NR_CHOSEN=1 HW_CTRS=`$OPHELP --check-events $DEFAULT_EVENT --callgraph=$CALLGRAPH` @@ -1338,10 +1433,21 @@ do_start_daemon() OPD_ARGS="$OPD_ARGS --image=$IMAGE_FILTER" fi + if test "$IBS_FETCH" != 0; then + OPD_ARGS="$OPD_ARGS --ibs-fetch=$IBS_FETCH" + fi + + if test "$IBS_OP" != 0; then + OPD_ARGS="$OPD_ARGS --ibs-op=$IBS_OP" + fi + if test -n "$VERBOSE"; then OPD_ARGS="$OPD_ARGS --verbose=$VERBOSE" fi + #OPD_ARGS="$OPD_ARGS --verbose=all" + OPD_ARGS="$OPD_ARGS" + vecho "executing oprofiled $OPD_ARGS" $OPROFILED $OPD_ARGS @@ -1361,6 +1467,22 @@ do_start_daemon() echo "Daemon started." } +#todo delete +do_start_ibs_fetch() +{ + if test "$KERNEL_SUPPORT" = "yes"; then + echo 1 >$MOUNT/ibs_fetch/enable + fi +} + +#todo delete +do_start_ibs_op() +{ + if test "$KERNEL_SUPPORT" = "yes"; then + echo 1 >$MOUNT/ibs_uops/enable + fi +} + do_start() { @@ -1554,9 +1676,10 @@ do_reset() do_deinit() { # unmount /dev/oprofile if it is mounted - OPROF_FS=`grep /dev/oprofile /etc/mtab` - if test -n "$OPROF_FS"; then - umount /dev/oprofile + OPROF_FS_MOUNTED=`grep /dev/oprofile /etc/mtab` + OPROF_FS=`grep /dev/oprofile /etc/mtab|cut --delimiter=" " --fields=2` + if test -n "$OPROF_FS_MOUNTED"; then + umount $OPROF_FS fi # unload the oprofile module if it is around OPROF_MOD=`lsmod | grep oprofile` |
From: Jason Y. <jas...@am...> - 2008-01-29 21:49:24
|
Hi, This is the first attempt to extend Oprofile to support Instruction Based Sampling (IBS) available on AMD Family 10h processors. The specification of IBS is described in section 2.17.2 of "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors". IBS provides wide range of precise information on instruction fetch phase and execution phase. The document "Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors" explains and demonstrates the uses of IBS in details. The patches are made against the head of CVS. They requires a separate kernel patch to work correctly on Family 10h processor on existing kernel. Design Outline ================ = Terms = EBS: Event based sampling IBS: Instructions based sampling = opcontrol changes = Three new options are added to opcontrol script. "--ibs-fetch=#count" and "ibs-op=#count" enabl and specify the max count for IBS fetch and op repectively. "--ibs-fetch=0" and --ibs-op=0" disable IBS. "--no-event" is added to clear the current event selection in daemonrc. Profile session can be taken while simultaneously enable IBS and EBS. They can also be used independently from each other. = Driver interface changes = Two directories, ibs_fetch and ibs_uops are added to the oprofilefs allowing the control of MSRs through oprofile.ko module. Both directories contains device file enable and max_count. The file "enable" enables and disables the functionalities of the directory containing it. The "max_count" file specifies the maximum count value of the periodic op/fetch counter (bit 15:0 of MSR 0xC001_1030 and 0xC001_1033). Directory "ibs_fetch" contains "ran_enable" file in addition to the files mentioned. It corresponds to bit 57 of MSR 0xC001_1030. When enabled, bits 3:0 of the fetch counter are randomized when IBS fetch is set to start the fetch counter. = Daemon changes = To differentiate IBS events from EBS events and to accommodate the fact that IBS events are not uniform in length when read from buffer. Two escape codes "IBS_FETCH_SAMPLE" and "IBS_OP_SAMPLE" and their handlers are added. Each IBS sample contains encapsulates multitudes of data. For example, single IBS fetch data contains information of instruction cache L2TLB miss, instruction cache L1TLB miss, L1 TLB page size, instruction cache miss, linear address, physical address, etc. To use the current code in the daemon, one IBS event is expanded to a number of event based sampling (EBS) events in the escape code handler. The EBS events are then processed by existing daemon code and written to sample files. = Reporting tool changes = Virtual address associated with IBS fetch may lie in the middle of an instruction. opreport and opannotate are modified to take this into consideration when printing out report. = Known issues = The escape code collision issue between Xenoprofile and Cell process also affects the escape code values used by IBS handler. Farther discussion is needed to settle this issue. The current implementation of the reporting tool caches the entire output before printing. It actually only needs to save one previous line. References ================ "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors" http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116.pdf Drongowski, Paul. "Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors". 2007. http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf |
From: John L. <le...@mo...> - 2008-01-29 22:24:21
|
On Tue, Jan 29, 2008 at 03:48:12PM -0600, Jason Yeh wrote: > This is the first attempt to extend Oprofile to support Instruction > Based Sampling (IBS) available on AMD Family 10h processors. Wow. Lot of work here! Unfortunately all your patches are line-wrapped, making looking at them quite tricky... A whole bunch of this new code doesn't meet the coding standard. I know it's tedious but it really does need cleaning up before integration. > Three new options are added to opcontrol script. "--ibs-fetch=#count" > and "ibs-op=#count" enabl and specify the max count for IBS fetch and > op repectively. "--ibs-fetch=0" and --ibs-op=0" disable IBS. > "--no-event" is added to clear the current event selection in daemonrc. We already have --event=none, why doesn't this do it? Why are we using this form instead of using --event somehow? I'd like to see a clearer explanation of this. CPU-specific command lines make me nervous. > Two directories, ibs_fetch and ibs_uops are added to the oprofilefs > allowing the control of MSRs through oprofile.ko module. Can you put your changes to this up somewhere too, so we can see what we're interfacing with? > To differentiate IBS events from EBS events and to accommodate the fact > that IBS events are not uniform in length when read from buffer. Two > escape codes "IBS_FETCH_SAMPLE" and "IBS_OP_SAMPLE" and their handlers > are added. Why is fetch and op so different they deserve different primary escape codes? Wouldn't it be nicer to have a "MULTI" escape code, and then a sub-code for the AMD-specific bits? > To use the current code in the daemon, one IBS event is expanded to a > number of event based sampling (EBS) events in the escape code handler. > The EBS events are then processed by existing daemon code and written to > sample files. How are these EBS events then named? I'd like to see a spec as well as the code (you'll need to document it anyway...) Can we see example input and output to help us understand the basics (I read the PDF some months ago, but I've forgotten most of it) regards john |
From: Jason Y. <jas...@am...> - 2008-01-30 15:55:26
Attachments:
ibs_patch.tar.gz
|
John Levon wrote: > A whole bunch of this new code doesn't meet the coding standard. I know > it's tedious but it really does need cleaning up before integration. Sorry about that. I attached that tar.gz of the patches. I will work on the coding standard in the next version of the patch. Is "check_style.py" still being used to verify the coding standard? > >> Three new options are added to opcontrol script. "--ibs-fetch=#count" >> and "ibs-op=#count" enabl and specify the max count for IBS fetch and >> op repectively. "--ibs-fetch=0" and --ibs-op=0" disable IBS. >> "--no-event" is added to clear the current event selection in daemonrc. > > We already have --event=none, why doesn't this do it? I was not aware of the switch. I will remove the redundant switch. > > Why are we using this form instead of using --event somehow? I'd like to > see a clearer explanation of this. CPU-specific command lines make me > nervous. The main reason is that IBS has no equivalent of kernel and user bit, no unit mask. I felt that using separate switched made it more clear on using IBS. If "--event" is to be used, which format of the switch would be preferred: 1. Something similar to "--event=ibs_op:count" and "--event=ibs_fetch:count" and modify the parsing code. 2. Continue using the current format, and the kernel and user bit, unit mask must are ignored for IBS events. > >> Two directories, ibs_fetch and ibs_uops are added to the oprofilefs >> allowing the control of MSRs through oprofile.ko module. > > Can you put your changes to this up somewhere too, so we can see what > we're interfacing with? I also included the kernel patch in the .tar.gz file. > >> To differentiate IBS events from EBS events and to accommodate the fact >> that IBS events are not uniform in length when read from buffer. Two >> escape codes "IBS_FETCH_SAMPLE" and "IBS_OP_SAMPLE" and their handlers >> are added. > > Why is fetch and op so different they deserve different primary escape > codes? Wouldn't it be nicer to have a "MULTI" escape code, and then a > sub-code for the AMD-specific bits? This can be done. I will add it to the next version of the patch. > > How are these EBS events then named? I'd like to see a spec as well as > the code (you'll need to document it anyway...) The IBS events are named either IBS_FETCH_*** or IBS_OP_***. The events are derived from MSRs and not part of the spec. I will create a separate document on how the IBS events are derived. > > Can we see example input and output to help us understand the basics (I > read the PDF some months ago, but I've forgotten most of it) When IBS fetch sampling is collected, any of the following derived fetch events could be generate. The same applies to IBS op. Due to the number of the derived events, it will be impractical to try to view them at the same time. Here is an example of opreport output with event IBS_FETCH_4K_PAGE: Sahara64:~ # opreport --merge=all event:IBS_FETCH_4K_PAGE CPU: AMD64 family10h, speed 800 MHz (estimated) Counted IBS_FETCH_4K_PAGE events (IBS 4K page translation) with a unit mask of 0x00 (No unit mask) count 500000 IBS_FETCH_4K_P...| samples| %| ------------------ 247 46.1682 no-vmlinux 219 40.9346 Xvnc 30 5.6075 libc-2.4.so 14 2.6168 oprofiled 5 0.9346 libcairo.so.2.2.3 4 0.7477 libgdk-x11-2.0.so.0.800.11 3 0.5607 bash 2 0.3738 ld-2.4.so 2 0.3738 libpthread-2.4.so 2 0.3738 libglib-2.0.so.0.800.6 2 0.3738 libdbus-1.so.2.0.0 1 0.1869 metacity 1 0.1869 libpango-1.0.so.0.1001.1 1 0.1869 gvim 1 0.1869 libXrender.so.1.2.2 1 0.1869 libxpcom_core.so The sample names are: IBS_FETCH_SAMPLES IBS_FETCH_KILLED IBS_FETCH_ATTEMPTED IBS_FETCH_COMPLETED IBS_FETCH_ABORTED IBS_FETCH_ITLB_HITS IBS_FETCH_L1_ITLB_MISSES_L2_ITLB_HITS IBS_FETCH_L1_ITLB_MISSES_L2_ITLB_MISSES IBS_FETCH_ICACHE_MISSES IBS_FETCH_ICACHE_HITS IBS_FETCH_4K_PAGE IBS_FETCH_2M_PAGE IBS_FETCH_LATENCY IBS_OP_ALL IBS_OP_TAG_TO_RETIRE IBS_OP_COMP_TO_RET IBS_OP_BRANCH_RETIRED IBS_OP_MISPREDICTED_BRANCH IBS_OP_TAKEN_BRANCH IBS_OP_MISPREDICTED_BRANCH_TAKEN IBS_OP_RETURNS IBS_OP_MISPREDICTED_RETURNS IBS_OP_RESYNC IBS_OP_ALL_LOAD_STORE IBS_OP_LOAD IBS_OP_STORE IBS_OP_L1_DTLB_HITS IBS_OP_L1_DTLB_MISS_L2_DTLB_HIT IBS_OP_L1_L2_DTLB_MISS IBS_OP_DATA_CACHE_MISS IBS_OP_DATA_HITS IBS_OP_MISALIGNED_DATA_ACC IBS_OP_BANK_CONF_LOAD IBS_OP_BANK_CONF_STORE IBS_OP_FORWARD IBS_OP_CANCELLED IBS_OP_DCUC_MEM_ACC IBS_OP_DCWC_MEM_ACC IBS_OP_LOCKED IBS_OP_MAB_HIT IBS_OP_L1_DTLB_4K IBS_OP_L1_DTLB_2M IBS_OP_L1_DTLB_1G IBS_OP_L2_DTLB_4K IBS_OP_L2_DTLB_2M IBS_OP_DC_LOAD_LAT IBS_OP_NB_LOCAL_ONLY IBS_OP_NB_REMOTE_ONLY IBS_OP_NB_LOCAL_L3 IBS_OP_NB_LOCAL_CACHE IBS_OP_NB_REMOTE_CACHE IBS_OP_NB_LOCAL_DRAM IBS_OP_NB_REMOTE_DRAM IBS_OP_NB_LOCAL_OTHER IBS_OP_NB_REMOTE_OTHER IBS_OP_NB_CACHE_MODIFIED IBS_OP_NB_CACHE_OWNED IBS_OP_NB_LOCAL_CACHE_LAT IBS_OP_NB_REMOTE_CACHE_LAT Please let me if you have more questions. Thanks. Jason |
From: John L. <le...@mo...> - 2008-01-30 16:00:27
|
On Wed, Jan 30, 2008 at 09:54:35AM -0600, Jason Yeh wrote: > Sorry about that. I attached that tar.gz of the patches. Is there any way you can send it inline but not mangled? It's harder for me to review in a tar file. If not, that's fine. > coding standard in the next version of the patch. Is "check_style.py" still > being used to verify the coding standard? Yep, along with the doc and some common sense :) > >Why are we using this form instead of using --event somehow? I'd like to > >see a clearer explanation of this. CPU-specific command lines make me > >nervous. > > The main reason is that IBS has no equivalent of kernel and user bit, no > unit mask. I felt that using separate switched made it more clear on using > IBS. If "--event" is to be used, which format of the switch would be > preferred: > 1. Something similar to "--event=ibs_op:count" and > "--event=ibs_fetch:count" and modify the parsing code. We can do this. Make it so that the user/kernel/unit mask cause an error if specified for these special events. > When IBS fetch sampling is collected, any of the following derived fetch > events could be generate. The same applies to IBS op. Due to the number of So what about stuff like linear addresses you say you store? regards john |
From: Jason Y. <jas...@am...> - 2008-01-30 17:15:11
|
John Levon wrote: > On Wed, Jan 30, 2008 at 09:54:35AM -0600, Jason Yeh wrote: > >> Sorry about that. I attached that tar.gz of the patches. > > Is there any way you can send it inline but not mangled? It's harder for > me to review in a tar file. If not, that's fine. I do not know way to prevent it from happening. My client has line wrap disabled and the mail is being edited as text. I search around to see if I can find a way to do that. >>> Why are we using this form instead of using --event somehow? I'd like to >>> see a clearer explanation of this. CPU-specific command lines make me >>> nervous. >> The main reason is that IBS has no equivalent of kernel and user bit, no >> unit mask. I felt that using separate switched made it more clear on using >> IBS. If "--event" is to be used, which format of the switch would be >> preferred: >> 1. Something similar to "--event=ibs_op:count" and >> "--event=ibs_fetch:count" and modify the parsing code. > > We can do this. Make it so that the user/kernel/unit mask cause an error > if specified for these special events. Sounds good to me. I will make changes accordingly. > So what about stuff like linear addresses you say you store? Unfortunately, the linear address is currently not used. Jason |
From: John S. <lin...@fr...> - 2008-02-01 13:43:39
|
John Levon wrote: > Is there any way you can send it inline but not mangled? It's harder > for me to review in a tar file. If not, that's fine. Here's my attempt at inlining the patches. (Patches 2, 3, and 6 have been left out because they are large.) |
From: Jason Y. <jas...@am...> - 2008-01-29 21:49:27
Attachments:
op_cvs_IBS_PATCH_2
|
This patch contains daemon/kernel module interface change and daemon code processing IBS events. --- opd_ibs.h | 320 +++++++++++++++++++++ opd_interface.h | 4 opd_trans.c | 844 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- opd_trans.h | 3 4 files changed, 1157 insertions(+), 14 deletions(-) diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_interface.h oprofile-cvs-ibs/daemon/opd_interface.h --- oprofile-cvs-original/daemon/opd_interface.h 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_interface.h 2008-01-29 09:39:46.000000000 -0600 @@ -39,7 +39,9 @@ #define LAST_CODE 14 #else #define DOMAIN_SWITCH_CODE 11 -#define LAST_CODE 12 +#define IBS_FETCH_SAMPLE 13 +#define IBS_OP_SAMPLE 14 +#define LAST_CODE 15 #endif #endif /* OPD_INTERFACE_H */ diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_trans.h oprofile-cvs-ibs/daemon/opd_trans.h --- oprofile-cvs-original/daemon/opd_trans.h 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_trans.h 2008-01-29 09:39:46.000000000 -0600 @@ -18,6 +18,7 @@ #include "opd_cookie.h" #include "op_types.h" +#include "opd_ibs.h" #include <stdint.h> @@ -54,6 +55,8 @@ struct transient { pid_t tid; pid_t tgid; uint64_t embedded_offset; + struct ibs_fetch_sample * ibs_fetch; + struct ibs_op_sample * ibs_op; }; typedef void (*handler_t)(struct transient *); diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_trans.c oprofile-cvs-ibs/daemon/opd_trans.c --- oprofile-cvs-original/daemon/opd_trans.c 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_trans.c 2008-01-29 09:55:46.000000000 -0600 @@ -14,6 +14,10 @@ * Modified by Maynard Johnson <may...@us...> * These modifications are: * (C) Copyright IBM Corporation 2007 + * + * Modified by Jason Yeh and Paul Drongowski for AMD IBS + * These modifications are: + * Copyright (c) 2007 Advanced Micro Devices, Inc. */ #include "opd_trans.h" @@ -23,6 +27,7 @@ #include "opd_stats.h" #include "opd_printf.h" #include "opd_interface.h" +#include "opd_ibs.h" #include <limits.h> #include <string.h> @@ -50,7 +55,7 @@ void clear_trans_current(struct transien uint64_t pop_buffer_value(struct transient * trans) { - uint64_t val; + uint64_t val = 0; if (!trans->remaining) { fprintf(stderr, "BUG: popping empty buffer !\n"); @@ -82,16 +87,629 @@ int enough_remaining(struct transient * } -static void opd_put_sample(struct transient * trans, unsigned long long pc) +// Function: opd_decode_ibs_fetch +// +// This function decodes IBS hardware-level event flags and fields. It +// effectively translates from the MSR encoding to the fields within +// an ibs_fetch_sample -- an abstraction of the hardware-level data. +// Event fields are either zero (false) or non-zero (true), except +// the fetch latency, which is a 16-bit cycle count, and the fetch page size +// field, which is a 2-bit unsigned integer. +// +static void opd_decode_ibs_fetch( + struct ibs_fetch_sample const * raw_sample, + struct decoded_ibs_fetch_sample * pIBSFetch) { - unsigned long long event; + unsigned int ibs_fetch_ctl_high; - if (!enough_remaining(trans, 1)) { - trans->remaining = 0; - return; + /* MSRC001_1030 IBS Fetch Control Register */ + ibs_fetch_ctl_high = raw_sample->ibs_fetch_ctl_high; + + // Bits 47:32 IbsFetchLat: instruction fetch latency + pIBSFetch->m_FetchLatency = (ibs_fetch_ctl_high & FETCH_MASK_LATENCY); + + // Bit 50 IbsFetchComp: instruction fetch complete. + pIBSFetch->m_FetchCompletion = (ibs_fetch_ctl_high & FETCH_MASK_COMPLETE)!=0; + + // Bit 51 IbsIcMiss: instruction cache miss. + pIBSFetch->m_InstCacheMiss = (ibs_fetch_ctl_high & FETCH_MASK_IC_MISS)!=0; + + // Bit 52 IbsPhyAddrValid: instruction fetch physical address valid. + pIBSFetch->m_PhysicalAddrValid = (ibs_fetch_ctl_high & FETCH_MASK_PHY_ADDR)!=0; + + // Bits 54:53 IbsL1TlbPgSz: instruction cache L1TLB page size. + pIBSFetch->m_TLBPageSize = ((ibs_fetch_ctl_high >> 21) & 0x3); + + // Bit 55 IbsL1TlbMiss: instruction cache L1TLB miss. + pIBSFetch->m_L1TLBMiss = (ibs_fetch_ctl_high & FETCH_MASK_L1_MISS)!=0; + + // Bit 56 IbsL2TlbMiss: instruction cache L2TLB miss. + pIBSFetch->m_L2TLBMiss = (ibs_fetch_ctl_high & FETCH_MASK_L2_MISS)!=0; + + // A fetch is a killed fetch if all the masked bits are clear + pIBSFetch->m_Killed = (ibs_fetch_ctl_high & FETCH_MASK_KILLED)==0; + + pIBSFetch->m_InstCacheHit = (pIBSFetch->m_FetchCompletion && !pIBSFetch->m_InstCacheMiss); + + pIBSFetch->m_L1TLBHit = (!pIBSFetch->m_L1TLBMiss && pIBSFetch->m_PhysicalAddrValid); + + pIBSFetch->m_ITLB_L1M_L2H = (pIBSFetch->m_L1TLBMiss && !pIBSFetch->m_L2TLBMiss); + + pIBSFetch->m_ITLB_L1M_L2M = (pIBSFetch->m_L1TLBMiss && pIBSFetch->m_L2TLBMiss); +} + + +// +// Log the specified IBS derived event. +// +static void opd_log_ibs_event(unsigned int event, + struct transient * trans) + { + opd_stats[OPD_IBS_SAMPLE]++; + trans->event = event; + sfile_log_sample(trans); +} + +// +// Log the specified IBS cycle count. +// +static void opd_log_ibs_count(unsigned int event, + struct transient * trans, unsigned int count) +{ + opd_stats[OPD_IBS_SAMPLE]++; + trans->event = event; + sfile_log_sample_count(trans, count); +} + +// +// Aggregate the IBS derived event. Increase the +// derived event count by one. +// +#define AGG_IBS_EVENT(EV) \ + { \ + opd_log_ibs_event(EV, trans); \ + } + +// +// Aggregate the IBS latency/cycle counts. Increase the +// derived event count by the specified count value. +// +#define AGG_IBS_COUNT(EV, COUNT) \ + { \ + opd_log_ibs_count(EV, trans, COUNT); \ + } + + + +// +// Function: opd_log_ibs_fetch +// +// This function converts IBS fetch event flags and values into +// derived events. If the tagged (sampled) fetched caused a derived +// event, the derived event is tallied. +// +static void opd_log_ibs_fetch(struct transient * trans) +{ + struct decoded_ibs_fetch_sample ibsFetchRec ; + + if (!trans->ibs_fetch) { + verbprintf(vsamples, "DEBUG: no ibs_fetch: \n"); + return; + } + + // Decode the hardware-level IBS fetch information + opd_decode_ibs_fetch(trans->ibs_fetch, &ibsFetchRec); + + // IBS all fetch samples (kills + attempts) + AGG_IBS_EVENT(DE_IBS_FETCH_ALL) ; + + // IBS killed fetches ("case 0") -- All interesting event + // flags are clear + if (ibsFetchRec.m_Killed) + { + AGG_IBS_EVENT(DE_IBS_FETCH_KILLED) ; + // Take an early out with IBS killed fetches, effectively + // filtering killed fetches out of the other event counts + return ; + } + + // Any non-killed fetch is an attempted fetch + AGG_IBS_EVENT(DE_IBS_FETCH_ATTEMPTED) ; + + if (ibsFetchRec.m_FetchCompletion) + { + // IBS Fetch Completed + AGG_IBS_EVENT(DE_IBS_FETCH_COMPLETED) ; + } + else + { + // IBS Fetch Aborted + AGG_IBS_EVENT(DE_IBS_FETCH_ABORTED) ; + } + + // IBS L1 ITLB hit + if (ibsFetchRec.m_L1TLBHit) + { + AGG_IBS_EVENT(DE_IBS_L1_ITLB_HIT) ; + } + + // IBS L1 ITLB miss and L2 ITLB hit + if (ibsFetchRec.m_ITLB_L1M_L2H) + { + AGG_IBS_EVENT(DE_IBS_ITLB_L1M_L2H) ; + } + + // IBS L1 & L2 ITLB miss; complete ITLB miss + if (ibsFetchRec.m_ITLB_L1M_L2M) + { + AGG_IBS_EVENT(DE_IBS_ITLB_L1M_L2M) ; + } + + // IBS instruction cache miss + if (ibsFetchRec.m_InstCacheMiss) + { + AGG_IBS_EVENT(DE_IBS_IC_MISS) ; + } + + // IBS instruction cache hit + if (ibsFetchRec.m_InstCacheHit) + { + AGG_IBS_EVENT(DE_IBS_IC_HIT) ; + } + + // IBS page translations (only valid when physical address is valid) + if (ibsFetchRec.m_PhysicalAddrValid) + { + switch (ibsFetchRec.m_TLBPageSize) + { + case L1TLB4K: + AGG_IBS_EVENT(DE_IBS_FETCH_4K_PAGE) ; + break; + case L1TLB2M: + AGG_IBS_EVENT(DE_IBS_FETCH_2M_PAGE) ; + break; + default: + // DE_IBS_FETCH_1G_PAGE ; + // DE_IBS_FETCH_XX_PAGE ; + break; + } + } + + if (ibsFetchRec.m_FetchLatency) + { + AGG_IBS_COUNT(DE_IBS_FETCH_LATENCY, ibsFetchRec.m_FetchLatency) ; } +} + +// +// Function: opd_decode_ibs_op +// +// This function translates IBS op event data from its hardware-level +// representation to fields within an ibs_op_sample structure.It hides +// the MSR layout of IBS op data. +// +static void opd_decode_ibs_op(struct ibs_op_sample const * raw_sample, + struct decoded_ibs_op_sample * pIBSOp) +{ + register unsigned int ibs_op_data1_high = raw_sample->ibs_op_data1_high; + register unsigned int ibs_op_data2_low = raw_sample->ibs_op_data2_low; + register unsigned int ibs_op_data3_low = raw_sample->ibs_op_data3_low; + + + // + // MSRC001_1035 IBS OP Data Register (IbsOpData) + // + // 15:0 IbsCompToRetCtr: macro-op completion to retire count + pIBSOp->m_CompToRetireCycles = (raw_sample->ibs_op_data1_low & BR_MASK_RETIRE); + + // 31:16 IbsTagToRetCtr: macro-op tag to retire count. + pIBSOp->m_TagToRetireCycles = (raw_sample->ibs_op_data1_low >> 16) & BR_MASK_RETIRE; + + // 32 IbsOpBrnResync: resync macro-op. + pIBSOp->m_OpBranchResync = (ibs_op_data1_high & BR_MASK_BRN_RESYNC) != 0 ; + + // 33 IbsOpMispReturn: mispredicted return macro-op. + pIBSOp->m_OpMispredictedReturn = (ibs_op_data1_high & BR_MASK_MISP_RETURN) != 0 ; + + // 34 IbsOpReturn: return macro-op. + pIBSOp->m_OpReturn = (ibs_op_data1_high & BR_MASK_RETURN) != 0 ; + + // 35 IbsOpBrnTaken: taken branch macro-op. + pIBSOp->m_OpBranchTaken = (ibs_op_data1_high & BR_MASK_BRN_TAKEN) != 0 ; + + // 36 IbsOpBrnMisp: mispredicted branch macro-op. + pIBSOp->m_OpBranchMispredicted = (ibs_op_data1_high & BR_MASK_BRN_MISP) != 0 ; + + // 37 IbsOpBrnRet: branch macro-op retired. + pIBSOp->m_OpBranchRetired = (ibs_op_data1_high & BR_MASK_BRN_RET) != 0 ; + + + // + // MSRC001_1036 IBS Op Data 2 Register (IbsOpData2) + // + // 5 NbIbsReqCacheHitSt: IBS L3 cache state + pIBSOp->m_NbIbsCacheHitSt = (ibs_op_data2_low & NB_MASK_L3_STATE) != 0 ; + + // 4 NbIbsReqDstProc: IBS request destination processor + pIBSOp->m_NbIbsReqDstProc = (ibs_op_data2_low & NB_MASK_REQ_DST_PROC) != 0 ; + + // 2:0 NbIbsReqSrc: Northbridge IBS request data source + pIBSOp->m_NbIbsReqSrc = (ibs_op_data2_low & NB_MASK_REQ_DATA_SRC) ; + + + // + // MSRC001_1037 IBS Op Data3 Register + // + // Bits 48:32 IbsDcMissLat + pIBSOp->m_IbsDcMissLat = raw_sample->ibs_op_data3_high & 0xFFFF; + + // 0 IbsLdOp: Load op + pIBSOp->m_IbsLdOp = (ibs_op_data3_low & DC_MASK_LOAD_OP) != 0 ; + + // 1 IbsStOp: Store op + pIBSOp->m_IbsStOp = (ibs_op_data3_low & DC_MASK_STORE_OP) != 0 ; + + // 2 IbsDcL1TlbMiss: Data cache L1TLB miss + pIBSOp->m_IbsDcL1tlbMiss = (ibs_op_data3_low & DC_MASK_L1_TLB_MISS) != 0 ; + + // 3 IbsDcL2tlbMiss: Data cache L2TLB miss + pIBSOp->m_IbsDcL2tlbMiss = (ibs_op_data3_low & DC_MASK_L2_TLB_MISS) != 0 ; + + // 4 IbsDcL1tlbHit2M: Data cache L1TLB hit in 2M page + pIBSOp->m_IbsDcL1tlbHit2M = (ibs_op_data3_low & DC_MASK_L1_HIT_2M) != 0 ; + + // 5 IbsDcL1tlbHit1G: Data cache L1TLB hit in 1G page + pIBSOp->m_IbsDcL1tlbHit1G = (ibs_op_data3_low & DC_MASK_L1_HIT_1G) != 0 ; + + // 6 IbsDcL2tlbHit2M: Data cache L2TLB hit in 2M page + pIBSOp->m_IbsDcL2tlbHit2M = (ibs_op_data3_low & DC_MASK_L2_HIT_2M) != 0 ; + + // 7 IbsDcMiss: Data cache miss + pIBSOp->m_IbsDcMiss = (ibs_op_data3_low & DC_MASK_DC_MISS) != 0 ; + + // 8 IbsDcMisAcc: Misaligned access + pIBSOp->m_IbsDcMisAcc = (ibs_op_data3_low & DC_MASK_MISALIGN_ACCESS) != 0 ; + + // 9 IbsDcLdBnkCon: Bank conflict on load operation + pIBSOp->m_IbsDcLdBnkCon = (ibs_op_data3_low & DC_MASK_LD_BANK_CONFLICT) != 0 ; + + // 10 IbsDcStBnkCon: Bank conflict on store operation + pIBSOp->m_IbsDcStBnkCon = (ibs_op_data3_low & DC_MASK_ST_BANK_CONFLICT) != 0 ; + + // 11 IbsDcStToLdFwd: Data forwarded from store to load operation + pIBSOp->m_IbsDcStToLdFwd = (ibs_op_data3_low & DC_MASK_ST_TO_LD_FOR) != 0 ; + + // 12 IbsDcDcStToLdCan: Data forwarding from store to load operation cancelled + pIBSOp->m_IbsDcStToLdCan = (ibs_op_data3_low & DC_MASK_ST_TO_LD_CANCEL) != 0 ; + + // 13 IbsDcDcUcMemAcc: UC memory access + pIBSOp->m_IbsDcUcMemAcc = (ibs_op_data3_low & DC_MASK_UC_MEM_ACCESS) != 0 ; + + // 14 IbsDcWcMemAcc: WC memory access + pIBSOp->m_IbsDcWcMemAcc = (ibs_op_data3_low & DC_MASK_WC_MEM_ACCESS) != 0 ; + + // 15 IbsDcLockedOp: Locked operation + pIBSOp->m_IbsDcLockedOp = (ibs_op_data3_low & DC_MASK_LOCKED_OP) != 0 ; + + // 16 IbsDcMabHit: MAB hit + pIBSOp->m_IbsDcMabHit = (ibs_op_data3_low & DC_MASK_MAB_HIT) != 0 ; + + // 17 IbsDcLinAddrValid: Data cache linear address valid + pIBSOp->m_IbsDcLinAddrValid = (ibs_op_data3_low & DC_MASK_LIN_ADDR_VALID) != 0 ; + + // 18 IbsDcPhyAddrValid: Data cache physical address valid + pIBSOp->m_IbsDcPhyAddrValid = (ibs_op_data3_low & DC_MASK_PHY_ADDR_VALID) != 0 ; +} + + +// +// Function: opd_log_ibs_op +// +// This function translates the IBS op event flags and values into +// IBS op derived events. If an op derived event occured, it's tallied. +// +static void opd_log_ibs_op(struct transient * trans) +{ + struct decoded_ibs_op_sample ibsOpRec; + unsigned int useL2TranslationSize = 0; + + if (!trans->ibs_op) { + verbprintf(vsamples, "DEBUG: NO trans->ibs_op: \n"); + return; + } + + opd_decode_ibs_op(trans->ibs_op, &ibsOpRec) ; + + // All IBS op samples + AGG_IBS_EVENT(DE_IBS_OP_ALL) ; + + // Tally retire cycle counts for all sampled macro-ops + if (ibsOpRec.m_TagToRetireCycles) + { + // IBS tag to retire cycles + AGG_IBS_COUNT(DE_IBS_OP_TAG_TO_RETIRE, ibsOpRec.m_TagToRetireCycles) ; + } + + if (ibsOpRec.m_CompToRetireCycles) + { + // IBS completion to retire cycles + AGG_IBS_COUNT(DE_IBS_OP_COMP_TO_RETIRE, ibsOpRec.m_CompToRetireCycles) ; + } + + // Test for an IBS branch macro-op + if (ibsOpRec.m_OpBranchRetired) + { + // IBS Branch retired op + AGG_IBS_EVENT(DE_IBS_BRANCH_RETIRED) ; + + // Test branch-specific event flags + if (ibsOpRec.m_OpBranchMispredicted) + { + // IBS mispredicted Branch op + AGG_IBS_EVENT(DE_IBS_BRANCH_MISP) ; + } + + if (ibsOpRec.m_OpBranchTaken) + { + // IBS taken Branch op + AGG_IBS_EVENT(DE_IBS_BRANCH_TAKEN) ; + } + + if (ibsOpRec.m_OpBranchTaken && ibsOpRec.m_OpBranchMispredicted) + { + // IBS mispredicted taken branch op + AGG_IBS_EVENT(DE_IBS_BRANCH_MISP_TAKEN) ; + } + + if (ibsOpRec.m_OpReturn) + { + // IBS return op + AGG_IBS_EVENT(DE_IBS_RETURN) ; + } + + if (ibsOpRec.m_OpReturn && ibsOpRec.m_OpBranchMispredicted) + { + // IBS mispredicted return op + AGG_IBS_EVENT(DE_IBS_RETURN_MISP) ; + } + } // Branch and return op sample + + // Test for a resync macro-op + if (ibsOpRec.m_OpBranchResync) + { + // IBS resync OP + AGG_IBS_EVENT(DE_IBS_RESYNC) ; + } + + if (!ibsOpRec.m_IbsLdOp && !ibsOpRec.m_IbsStOp) + { + // If no load or store operation, then take an early return + // No more derived events need to be tallied + return ; + } + + // Count the number of LS op samples + AGG_IBS_EVENT(DE_IBS_LS_ALL_OP) ; + + // Count and handle load ops + if (ibsOpRec.m_IbsLdOp) + { + // Tally an IBS load derived event + AGG_IBS_EVENT(DE_IBS_LS_LOAD_OP) ; + // If the load missed in DC, tally the DC load miss latency + if(ibsOpRec.m_IbsDcMiss) + { + // DC load miss latency is only reliable for load ops + AGG_IBS_COUNT(DE_IBS_LS_DC_LOAD_LAT, ibsOpRec.m_IbsDcMissLat) ; + } + // Data forwarding info are valid only for load ops + if(ibsOpRec.m_IbsDcStToLdFwd) + { + AGG_IBS_EVENT(DE_IBS_LS_STL_FORWARDED) ; + } + if(ibsOpRec.m_IbsDcStToLdCan) + { + AGG_IBS_EVENT(DE_IBS_LS_STL_CANCELLED) ; + } + // NB data is only guaranteed reliable for load operations + // that miss in L1 and L2 cache. NB data arrives too late + // to be reliable for store operations + if (ibsOpRec.m_IbsDcMiss && (ibsOpRec.m_NbIbsReqSrc != 0)) + { + // NB data is valid, so tally derived NB events + if( ibsOpRec.m_NbIbsReqDstProc ) + { + // Request was serviced by remote processor + AGG_IBS_EVENT(DE_IBS_NB_REMOTE) ; + AGG_IBS_COUNT(DE_IBS_NB_REMOTE_LATENCY, ibsOpRec.m_IbsDcMissLat) ; + switch( ibsOpRec.m_NbIbsReqSrc ) + { + case 0x2: + { + AGG_IBS_EVENT(DE_IBS_NB_REMOTE_CACHE) ; + if (ibsOpRec.m_NbIbsCacheHitSt) + { + AGG_IBS_EVENT(DE_IBS_NB_CACHE_STATE_O) ; + } else { + AGG_IBS_EVENT(DE_IBS_NB_CACHE_STATE_M) ; + } + break ; + } + case 0x3: + { + AGG_IBS_EVENT(DE_IBS_NB_REMOTE_DRAM) ; + break ; + } + case 0x7: + { + AGG_IBS_EVENT(DE_IBS_NB_REMOTE_OTHER) ; + break ; + } + default: + { + break ; + } + } + } else { + // Request was serviced by local processor + AGG_IBS_EVENT(DE_IBS_NB_LOCAL) ; + AGG_IBS_COUNT(DE_IBS_NB_LOCAL_LATENCY, ibsOpRec.m_IbsDcMissLat) ; + switch( ibsOpRec.m_NbIbsReqSrc ) + { + case 0x1: + { + AGG_IBS_EVENT(DE_IBS_NB_LOCAL_L3) ; + break ; + } + case 0x2: + { + AGG_IBS_EVENT(DE_IBS_NB_LOCAL_CACHE) ; + if (ibsOpRec.m_NbIbsCacheHitSt) + { + AGG_IBS_EVENT(DE_IBS_NB_CACHE_STATE_O) ; + } else { + AGG_IBS_EVENT(DE_IBS_NB_CACHE_STATE_M) ; + } + break ; + } + case 0x3: + { + AGG_IBS_EVENT(DE_IBS_NB_LOCAL_DRAM) ; + break ; + } + case 0x7: + { + AGG_IBS_EVENT(DE_IBS_NB_LOCAL_OTHER) ; + break ; + } + default: + { + break ; + } + } // Northbridge request source + } // Northbridge local/remote + } // Northbridge status + } // Load operations + + // Count and handle store operations + if (ibsOpRec.m_IbsStOp) + { + AGG_IBS_EVENT(DE_IBS_LS_STORE_OP) ; + } + + if (ibsOpRec.m_IbsDcMiss) + { + AGG_IBS_EVENT(DE_IBS_LS_DC_MISS) ; + } else { + AGG_IBS_EVENT(DE_IBS_LS_DC_HIT) ; + } + + if (ibsOpRec.m_IbsDcMisAcc) + { + AGG_IBS_EVENT(DE_IBS_LS_MISALIGNED) ; + } + + if (ibsOpRec.m_IbsDcLdBnkCon) + { + AGG_IBS_EVENT(DE_IBS_LS_BNK_CONF_LOAD) ; + } + + if (ibsOpRec.m_IbsDcStBnkCon) + { + AGG_IBS_EVENT(DE_IBS_LS_BNK_CONF_STORE) ; + } + + if (ibsOpRec.m_IbsDcUcMemAcc) + { + AGG_IBS_EVENT(DE_IBS_LS_UC_MEM_ACCESS) ; + } + + if (ibsOpRec.m_IbsDcWcMemAcc) + { + AGG_IBS_EVENT(DE_IBS_LS_WC_MEM_ACCESS) ; + } + + if (ibsOpRec.m_IbsDcLockedOp) + { + AGG_IBS_EVENT(DE_IBS_LS_LOCKED_OP) ; + } + + if (ibsOpRec.m_IbsDcMabHit) + { + AGG_IBS_EVENT(DE_IBS_LS_MAB_HIT) ; + } + + // IbsDcLinAddrValid is true when address translation was successful. + // Some macro-ops do not perform an address translation and use only + // a physical address. + if (ibsOpRec.m_IbsDcLinAddrValid) + { + if (! ibsOpRec.m_IbsDcL1tlbMiss) + { + // L1 DTLB hit -- This is the most frequent case + AGG_IBS_EVENT(DE_IBS_LS_DTLB_L1H) ; + } else if (ibsOpRec.m_IbsDcL2tlbMiss) + { + // L1 DTLB miss, L2 DTLB miss + AGG_IBS_EVENT(DE_IBS_LS_DTLB_L1M_L2M) ; + } else { + // L1 DTLB miss, L2 DTLB hit + AGG_IBS_EVENT(DE_IBS_LS_DTLB_L1M_L2H) ; + useL2TranslationSize = 1 ; + } + if (useL2TranslationSize) + { + // L2 DTLB page translation + if (ibsOpRec.m_IbsDcL2tlbHit2M) + { + // 2M L2 DTLB page translation + AGG_IBS_EVENT(DE_IBS_LS_L2_DTLB_2M) ; + } else { + // 4K L2 DTLB page translation + AGG_IBS_EVENT(DE_IBS_LS_L2_DTLB_4K) ; + } + } else { + // L1 DTLB page translation + if (ibsOpRec.m_IbsDcL1tlbHit2M) + { + // 2M L1 DTLB page translation + AGG_IBS_EVENT(DE_IBS_LS_L1_DTLB_2M) ; + } else if (ibsOpRec.m_IbsDcL1tlbHit1G) + { + // 1G L1 DTLB page translation + AGG_IBS_EVENT(DE_IBS_LS_L1_DTLB_1G) ; + } else { + // This is the most common case, unfortunately + AGG_IBS_EVENT(DE_IBS_LS_L1_DTLB_4K) ; + } + } + } // Page translation size +} + + +static void opd_put_sample(struct transient * trans, unsigned long long pc) +{ + unsigned long long event = 0; - event = pop_buffer_value(trans); + if (!trans->ibs_fetch && !trans->ibs_op) { + if (!enough_remaining(trans, 1)) { + verbprintf(vibs_debug, "not enough remaining\n"); + trans->remaining = 0; + return; + } + event = pop_buffer_value(trans); + } + + /* IBS can generate samples with no valid dcookie and + * in kernel address range. Map such samples to vmlinux + * only if the user either specifies a range, or vmlinux. + */ + if ((trans->ibs_fetch || trans->ibs_op) + && trans->cookie == INVALID_COOKIE + && find_kernel_image(trans)) + { + trans->in_kernel = 1; + } if (trans->tracing != TRACING_ON) trans->event = event; @@ -116,6 +734,19 @@ static void opd_put_sample(struct transi if (!trans->current) goto out; + /* check if the current sample belongs to IBS */ + if (trans->ibs_fetch) { + if(!trans->anon) + trans->pc = trans->ibs_fetch->rip; + opd_log_ibs_fetch(trans); + goto out; + } else if (trans->ibs_op) { + if(!trans->anon) + trans->pc = trans->ibs_op->rip; + opd_log_ibs_op(trans); + goto out; + } + /* FIXME: this logic is perhaps too harsh? */ if (trans->current->ignored || (trans->last && trans->last->ignored)) goto out; @@ -123,6 +754,7 @@ static void opd_put_sample(struct transi /* log the sample or arc */ sfile_log_sample(trans); + out: /* switch to trace mode */ if (trans->tracing == TRACING_START) @@ -132,6 +764,11 @@ out: } +// +// Function: code_unknown +// +// Call this function when an unknown escape code is encountered. +// static void code_unknown(struct transient * trans __attribute__((unused))) { fprintf(stderr, "Unknown code !\n"); @@ -139,6 +776,19 @@ static void code_unknown(struct transien } +// +// Function: code_ctx_switch +// +// Handle a context switch escape code sequence. The event buffer entries +// for a context switch are: +// ESCAPE_CODE +// CTX_SWITCH_CODE +// Process ID (PID) +// Cookie +// ESCAPE_CODE +// CTX_TGID_CODE +// Task group ID (TGID) +// static void code_ctx_switch(struct transient * trans) { clear_trans_current(trans); @@ -160,12 +810,21 @@ static void code_ctx_switch(struct trans if (vmisc) { char const * app = find_cookie(trans->app_cookie); printf("CTX_SWITCH to tid %lu, tgid %lu, cookie %llx(%s)\n", - (unsigned long)trans->tid, (unsigned long)trans->tgid, - trans->app_cookie, app ? app : "none"); + (unsigned long)trans->tid, (unsigned long)trans->tgid, + trans->app_cookie, app ? app : "none"); } } +// +// Function: code_cpu_switch +// +// Handle a CPU switch escape code sequence. The event buffer entries for +// a CPU switch are: +// ESCAPE_CODE +// CPU_SWITCH_CODE +// CPU number +// static void code_cpu_switch(struct transient * trans) { clear_trans_current(trans); @@ -180,6 +839,15 @@ static void code_cpu_switch(struct trans } +// +// Function: code_cookie_switch +// +// Handle a cookie switch escape code sequence. The event buffer entries +// for a cookie switch are: +// ESCAPE_CODE +// COOKIE_SWITCH_CODE +// Cookie +// static void code_cookie_switch(struct transient * trans) { clear_trans_current(trans); @@ -193,12 +861,17 @@ static void code_cookie_switch(struct tr if (vmisc) { char const * name = verbose_cookie(trans->cookie); - verbprintf(vmisc, "COOKIE_SWITCH to cookie %s(%llx)\n", - name, trans->cookie); + verbprintf(vibs_debug, "COOKIE_SWITCH to cookie %s(%llx)\n", + name, trans->cookie); } } +// +// Function: code_kernel_enter +// +// Handle a kernel entry escape code sequence. +// static void code_kernel_enter(struct transient * trans) { verbprintf(vmisc, "KERNEL_ENTER_SWITCH to kernel\n"); @@ -230,6 +903,143 @@ static void code_module_loaded(struct tr } +// +// Function: code_ibs_fetch_sample +// +// Handle an IBS fetch sample escape code sequence. An IBS fetch sample +// is represented as an escape code sequence. (See the comment for the +// function code_ibs_op_sample() for the sequence of entries in the event +// buffer.) When this function is called, the ESCAPE_CODE and IBS_FETCH_CODE +// have already been removed from the event buffer. Thus, 7 more event buffer +// entries are needed in order to process a complete IBS fetch sample. +// +static void code_ibs_fetch_sample(struct transient * trans) +{ + if (!enough_remaining(trans, 7)) { + verbprintf(vibs_debug, "not enough remaining\n"); + trans->remaining = 0; + return; + } + + trans->ibs_fetch = malloc(sizeof(struct ibs_fetch_sample)); + if (!trans->ibs_fetch) { + verbprintf(vibs_debug, "DEBUG: IBS Out of Memory\n"); + abort(); + } + + trans->ibs_fetch->rip = pop_buffer_value(trans); + + trans->ibs_fetch->ibs_fetch_lin_addr_low = pop_buffer_value(trans); + trans->ibs_fetch->ibs_fetch_lin_addr_high = pop_buffer_value(trans); + + trans->ibs_fetch->ibs_fetch_ctl_low = pop_buffer_value(trans); + trans->ibs_fetch->ibs_fetch_ctl_high = pop_buffer_value(trans); + trans->ibs_fetch->ibs_fetch_phys_addr_low = pop_buffer_value(trans); + trans->ibs_fetch->ibs_fetch_phys_addr_high = pop_buffer_value(trans); + + verbprintf(vsamples, + "FETCH_X CP:%ld ID:%ld IP:%lx FL:%x LAT:%d P_HI:%x P_LO:%x L_HI:%x L_LO:%x\n", + trans->cpu, + (long)trans->tgid, + trans->ibs_fetch->rip, + (trans->ibs_fetch->ibs_fetch_ctl_high >> 16) & 0x3FF, + (trans->ibs_fetch->ibs_fetch_ctl_high) & 0xFFFF, + trans->ibs_fetch->ibs_fetch_phys_addr_high, + trans->ibs_fetch->ibs_fetch_phys_addr_low, + trans->ibs_fetch->ibs_fetch_lin_addr_high, + trans->ibs_fetch->ibs_fetch_lin_addr_low) ; + + opd_put_sample(trans, trans->ibs_fetch->rip); + + free(trans->ibs_fetch); + trans->ibs_fetch = NULL; +} + + +// +// Function: code_ibs_op_sample +// +// Handle an IBS op sample escape code sequence. An IBS op sample +// is represented as an escape code sequence: +// +// IBS fetch IBS op +// --------------- ---------------- +// ESCAPE_CODE ESCAPE_CODE +// IBS_FETCH_CODE IBS_OP_CODE +// Offset Offset +// IbsFetchLinAd low IbsOpRip low <-- Logical (virtual) RIP +// IbsFetchLinAd high IbsOpRip high <-- Logical (virtual) RIP +// IbsFetchCtl low IbsOpData low +// IbsFetchCtl high IbsOpData high +// IbsFetchPhysAd low IbsOpData2 low +// IbsFetchPhysAd high IbsOpData2 high +// IbsOpData3 low +// IbsOpData3 high +// IbsDcLinAd low +// IbsDcLinAd high +// IbsDcPhysAd low +// IbsDcPhysAd high +// +// When this function is called, the ESCAPE_CODE and IBS_OP_CODE have +// already been removed from the event buffer. Thus, 13 more event buffer +// entries are needed to process a complete IBS op sample. +// +// The IbsFetchLinAd and IbsOpRip are the linear (virtual) addresses +// that were generated by the IBS hardware. These addresses are mapped +// into the offset. +// +static void code_ibs_op_sample(struct transient * trans) +{ + verbprintf(vmodule, "IBS_OP_SAMPLE_CODE\n"); + + if (!enough_remaining(trans, 13)) { + verbprintf(vibs_debug, "not enough remaining\n"); + trans->remaining = 0; + return; + } + + trans->ibs_op = malloc(sizeof(struct ibs_op_sample)); + if (!trans->ibs_op) { + verbprintf(vibs_debug, "DEBUG: IBS Out of Memory\n"); + abort(); + } + + trans->ibs_op->rip = pop_buffer_value(trans); + + trans->ibs_op->ibs_op_lin_addr_low = pop_buffer_value(trans); + trans->ibs_op->ibs_op_lin_addr_high = pop_buffer_value(trans); + + trans->ibs_op->ibs_op_data1_low = pop_buffer_value(trans); + trans->ibs_op->ibs_op_data1_high = pop_buffer_value(trans); + trans->ibs_op->ibs_op_data2_low = pop_buffer_value(trans); + trans->ibs_op->ibs_op_data2_high = pop_buffer_value(trans); + trans->ibs_op->ibs_op_data3_low = pop_buffer_value(trans); + trans->ibs_op->ibs_op_data3_high = pop_buffer_value(trans); + trans->ibs_op->ibs_op_ldst_linaddr_low = pop_buffer_value(trans); + trans->ibs_op->ibs_op_ldst_linaddr_high = pop_buffer_value(trans); + trans->ibs_op->ibs_op_phys_addr_low = pop_buffer_value(trans); + trans->ibs_op->ibs_op_phys_addr_high = pop_buffer_value(trans); + + verbprintf(vsamples, + "IBS_OP_X CP:%ld ID:%d IP:%lx D1HI:%x D1LO:%x D2LO:%x D3HI:%x D3LO:%x LALO:%x PALO:%x\n", + trans->cpu, + trans->tgid, + trans->ibs_op->rip, + trans->ibs_op->ibs_op_data1_high, + trans->ibs_op->ibs_op_data1_low, + trans->ibs_op->ibs_op_data2_low, + trans->ibs_op->ibs_op_data3_high, + trans->ibs_op->ibs_op_data3_low, + trans->ibs_op->ibs_op_ldst_linaddr_low, + trans->ibs_op->ibs_op_phys_addr_low); + + opd_put_sample(trans, trans->ibs_op->rip); + + free(trans->ibs_op); + trans->ibs_op = NULL; +} + + /* * This also implicitly signals the end of the previous * trace, so we never explicitly set TRACING_OFF when @@ -274,8 +1084,13 @@ handler_t handlers[LAST_CODE + 1] = { #if defined(__powerpc__) &code_spu_profiling, &code_spu_ctx_switch, -#endif &code_unknown, +#else + &code_unknown, + &code_unknown, + &code_ibs_fetch_sample, + &code_ibs_op_sample, +#endif }; extern void (*special_processor)(struct transient *); @@ -299,7 +1114,9 @@ void opd_process_samples(char const * bu .cpu = -1, .tid = -1, .embedded_offset = UNUSED_EMBEDDED_OFFSET, - .tgid = -1 + .tgid = -1, + .ibs_fetch = NULL, + .ibs_op = NULL }; /* FIXME: was uint64_t but it can't compile on alpha where uint64_t @@ -338,3 +1155,4 @@ void opd_process_samples(char const * bu handlers[code](&trans); } } + diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_ibs.h oprofile-cvs-ibs/daemon/opd_ibs.h --- oprofile-cvs-original/daemon/opd_ibs.h 1969-12-31 18:00:00.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_ibs.h 2008-01-29 09:39:46.000000000 -0600 @@ -0,0 +1,320 @@ +/* + * @file opd_ibs.h + * AMD Family10h Instruction Based Sampling (IBS) handling. + * + * @remark Copyright 2007 OProfile authors + * @remark Read the file COPYING + * + * @author Jason Yeh + * @author Paul Drongowski + */ + +#ifndef OPD_IBS_H +#define OPD_IBS_H + +#include <stdint.h> + +// +// IBS information is processed in two steps. The first step decodes +// hardware-level IBS information and saves it in decoded form. The +// second step translates the decoded IBS information into IBS derived +// events. IBS information is tallied and is reported as derived events. +// + + + +// +// This struct represents the hardware-level IBS fetch information. +// Each field corresponds to a model-specific register (MSR.) See the +// BIOS and Kernel Developer's Guide for AMD Model Family 10h Processors +// for further details. +// +struct ibs_fetch_sample { + unsigned long int rip; + /* MSRC001_1030 IBS Fetch Control Register */ + unsigned int ibs_fetch_ctl_low; + unsigned int ibs_fetch_ctl_high; + /* MSRC001_1031 IBS Fetch Linear Address Register */ + unsigned int ibs_fetch_lin_addr_low; + unsigned int ibs_fetch_lin_addr_high; + /* MSRC001_1032 IBS Fetch Physical Address Register */ + unsigned int ibs_fetch_phys_addr_low; + unsigned int ibs_fetch_phys_addr_high; + unsigned int dummy_event; +}; + + + +// +// This struct is an abstraction of IBS fetch information. It hides +// the hardware-level model-specific register (MSR) layout. The char +// fields hold Boolean values. +// +struct decoded_ibs_fetch_sample { + unsigned short m_FetchLatency; + unsigned short m_TLBPageSize; + unsigned char m_PhysicalAddrValid; + unsigned char m_L2TLBMiss; + unsigned char m_L1TLBMiss; + unsigned char m_InstCacheMiss; + unsigned char m_InstCacheHit; + unsigned char m_FetchCompletion; + unsigned char m_L1TLBHit; + unsigned char m_ITLB_L1M_L2H; + unsigned char m_ITLB_L1M_L2M; + unsigned char m_Killed; +}; + + + + +// +// This struct represents the hardware-level IBS op information. +// +struct ibs_op_sample { + unsigned long int rip; + /* MSRC001_1034 IBS Op Logical Address Register */ + unsigned int ibs_op_lin_addr_low; + unsigned int ibs_op_lin_addr_high; + ///* MSRC001_1035 IBS Op Data Register */ + unsigned int ibs_op_data1_low; + unsigned int ibs_op_data1_high; + /* MSRC001_1036 IBS Op Data 2 Register */ + unsigned int ibs_op_data2_low; + unsigned int ibs_op_data2_high; + /* MSRC001_1037 IBS Op Data 3 Register */ + unsigned int ibs_op_data3_low; + unsigned int ibs_op_data3_high; + unsigned int ibs_op_ldst_linaddr_low; + unsigned int ibs_op_ldst_linaddr_high; + unsigned int ibs_op_phys_addr_low; + unsigned int ibs_op_phys_addr_high; +}; + + + +// +// This struct is an sbtraction of the IBS op information. It hides +// the hardware-level, MSR layout. The char fields hold Boolean values +// except m_NbIbsReqSrc which is a 3-bit field. +// +struct decoded_ibs_op_sample { + unsigned short m_TagToRetireCycles; + unsigned short m_CompToRetireCycles; + unsigned short m_IbsDcMissLat; + unsigned char m_NbIbsReqSrc; + unsigned char m_NbIbsCacheHitSt; + unsigned char m_NbIbsReqDstProc; + unsigned char m_OpBranchRetired; + unsigned char m_OpBranchMispredicted; + unsigned char m_OpBranchTaken; + unsigned char m_OpMispredictedReturn; + unsigned char m_OpBranchResync; + unsigned char m_OpReturn; + unsigned char m_IbsLdOp; + unsigned char m_IbsStOp; + unsigned char m_IbsDcLinAddrValid; + unsigned char m_IbsDcPhyAddrValid; + unsigned char m_IbsDcL1tlbMiss; + unsigned char m_IbsDcL2tlbMiss; + unsigned char m_IbsDcL1tlbHit2M; + unsigned char m_IbsDcL1tlbHit1G; + unsigned char m_IbsDcL2tlbHit2M; + unsigned char m_IbsDcMiss; + unsigned char m_IbsDcMisAcc; + unsigned char m_IbsDcLdBnkCon; + unsigned char m_IbsDcStBnkCon; + unsigned char m_IbsDcStToLdFwd; + unsigned char m_IbsDcStToLdCan; + unsigned char m_IbsDcUcMemAcc; + unsigned char m_IbsDcWcMemAcc; + unsigned char m_IbsDcLockedOp; + unsigned char m_IbsDcMabHit; +}; + + + +// +// The following defines are bit masks that are used to select +// IBS fetch event flags and values at the MSR level. +// +#define FETCH_MASK_LATENCY 0x0000FFFF +#define FETCH_MASK_COMPLETE 0x00040000 +#define FETCH_MASK_IC_MISS 0x00080000 +#define FETCH_MASK_PHY_ADDR 0x00100000 +#define FETCH_MASK_PG_SIZE 0x00600000 +#define FETCH_MASK_L1_MISS 0x00800000 +#define FETCH_MASK_L2_MISS 0x01000000 +#define FETCH_MASK_KILLED (FETCH_MASK_L1_MISS|FETCH_MASK_L2_MISS|FETCH_MASK_PHY_ADDR|FETCH_MASK_COMPLETE|FETCH_MASK_IC_MISS) + +enum IBSL1PAGESIZE +{ + L1TLB4K = 0, + L1TLB2M, + L1TLB1G, + L1TLB_Invalid +}; + + + +// +// The following defines are bit masks that are used to select +// IBS op event flags and values at the MSR level. +// + +// +// Masks for selecting raw IBS event bits/fields. +// +#define BR_MASK_RETIRE 0x0000FFFF +#define BR_MASK_BRN_RET 0x00000020 +#define BR_MASK_BRN_MISP 0x00000010 +#define BR_MASK_BRN_TAKEN 0x00000008 +#define BR_MASK_RETURN 0x00000004 +#define BR_MASK_MISP_RETURN 0x00000002 +#define BR_MASK_BRN_RESYNC 0x00000001 + +#define NB_MASK_L3_STATE 0x00000020 +#define NB_MASK_REQ_DST_PROC 0x00000010 +#define NB_MASK_REQ_DATA_SRC 0x00000007 + +#define DC_MASK_PHY_ADDR_VALID 0x00040000 +#define DC_MASK_LIN_ADDR_VALID 0x00020000 +#define DC_MASK_MAB_HIT 0x00010000 +#define DC_MASK_LOCKED_OP 0x00008000 +#define DC_MASK_WC_MEM_ACCESS 0x00004000 +#define DC_MASK_UC_MEM_ACCESS 0x00002000 +#define DC_MASK_ST_TO_LD_CANCEL 0x00001000 +#define DC_MASK_ST_TO_LD_FOR 0x00000800 +#define DC_MASK_ST_BANK_CONFLICT 0x00000400 +#define DC_MASK_LD_BANK_CONFLICT 0x00000200 +#define DC_MASK_MISALIGN_ACCESS 0x00000100 +#define DC_MASK_DC_MISS 0x00000080 +#define DC_MASK_L2_HIT_2M 0x00000040 +#define DC_MASK_L1_HIT_1G 0x00000020 +#define DC_MASK_L1_HIT_2M 0x00000010 +#define DC_MASK_L2_TLB_MISS 0x00000008 +#define DC_MASK_L1_TLB_MISS 0x00000004 +#define DC_MASK_STORE_OP 0x00000002 +#define DC_MASK_LOAD_OP 0x00000001 + + + +// +// IBS derived events are identified by event select values which are +// similar to the event select values that identify performance monitoring +// counter (PMC) events. Event select values for IBS derived events begin +// at 0xf000. +// +#define IBS_EVENT_BASE 0xf000 +#define OP_MAX_IBS_COUNTERS 600 + + +// +// IBS derived events +// +// The definitions in this file *must* match definitions +// of IBS derived events in gh-events.xml and in the +// oprofile AMD Family 10h events file. More information +// about IBS derived events is given in the Software Oprimization +// Guide for AMD Family 10h Processors. +// + +// +// The following defines associate a 16-bit select value with an IBS +// derived fetch event. +// +#define DE_IBS_FETCH_ALL 0xF000 +#define DE_IBS_FETCH_KILLED 0xF001 +#define DE_IBS_FETCH_ATTEMPTED 0xF002 +#define DE_IBS_FETCH_COMPLETED 0xF003 +#define DE_IBS_FETCH_ABORTED 0xF004 +#define DE_IBS_L1_ITLB_HIT 0xF005 +#define DE_IBS_ITLB_L1M_L2H 0xF006 +#define DE_IBS_ITLB_L1M_L2M 0xF007 +#define DE_IBS_IC_MISS 0xF008 +#define DE_IBS_IC_HIT 0xF009 +#define DE_IBS_FETCH_4K_PAGE 0xF00A +#define DE_IBS_FETCH_2M_PAGE 0xF00B +#define DE_IBS_FETCH_1G_PAGE 0xF00C +#define DE_IBS_FETCH_XX_PAGE 0xF00D +#define DE_IBS_FETCH_LATENCY 0xF00E + + +// +// The following defines associate a 16-bit select value with an IBS +// derived branch/return macro-op event. +// +#define DE_IBS_OP_ALL 0xF100 +#define DE_IBS_OP_TAG_TO_RETIRE 0xF101 +#define DE_IBS_OP_COMP_TO_RETIRE 0xF102 +#define DE_IBS_BRANCH_RETIRED 0xF103 +#define DE_IBS_BRANCH_MISP 0xF104 +#define DE_IBS_BRANCH_TAKEN 0xF105 +#define DE_IBS_BRANCH_MISP_TAKEN 0xF106 +#define DE_IBS_RETURN 0xF107 +#define DE_IBS_RETURN_MISP 0xF108 +#define DE_IBS_RESYNC 0xF109 + + +// +// The following defines associate a 16-bit select value with an IBS +// derived load/store event. +// +#define DE_IBS_LS_ALL_OP 0xF200 +#define DE_IBS_LS_LOAD_OP 0xF201 +#define DE_IBS_LS_STORE_OP 0xF202 +#define DE_IBS_LS_DTLB_L1H 0xF203 +#define DE_IBS_LS_DTLB_L1M_L2H 0xF204 +#define DE_IBS_LS_DTLB_L1M_L2M 0xF205 +#define DE_IBS_LS_DC_MISS 0xF206 +#define DE_IBS_LS_DC_HIT 0xF207 +#define DE_IBS_LS_MISALIGNED 0xF208 +#define DE_IBS_LS_BNK_CONF_LOAD 0xF209 +#define DE_IBS_LS_BNK_CONF_STORE 0xF20A +#define DE_IBS_LS_STL_FORWARDED 0xF20B +#define DE_IBS_LS_STL_CANCELLED 0xF20C +#define DE_IBS_LS_UC_MEM_ACCESS 0xF20D +#define DE_IBS_LS_WC_MEM_ACCESS 0xF20E +#define DE_IBS_LS_LOCKED_OP 0xF20F +#define DE_IBS_LS_MAB_HIT 0xF210 +#define DE_IBS_LS_L1_DTLB_4K 0xF211 +#define DE_IBS_LS_L1_DTLB_2M 0xF212 +#define DE_IBS_LS_L1_DTLB_1G 0xF213 +#define DE_IBS_LS_L1_DTLB_RES 0xF214 +#define DE_IBS_LS_L2_DTLB_4K 0xF215 +#define DE_IBS_LS_L2_DTLB_2M 0xF216 +#define DE_IBS_LS_L2_DTLB_RES1 0xF217 +#define DE_IBS_LS_L2_DTLB_RES2 0xF218 +#define DE_IBS_LS_DC_LOAD_LAT 0xF219 + + +// +// The following defines associate a 16-bit select value with an IBS +// derived Northbridge (NB) event. +// +#define DE_IBS_NB_LOCAL 0xF240 +#define DE_IBS_NB_REMOTE 0xF241 +#define DE_IBS_NB_LOCAL_L3 0xF242 +#define DE_IBS_NB_LOCAL_CACHE 0xF243 +#define DE_IBS_NB_REMOTE_CACHE 0xF244 +#define DE_IBS_NB_LOCAL_DRAM 0xF245 +#define DE_IBS_NB_REMOTE_DRAM 0xF246 +#define DE_IBS_NB_LOCAL_OTHER 0xF247 +#define DE_IBS_NB_REMOTE_OTHER 0xF248 +#define DE_IBS_NB_CACHE_STATE_M 0xF249 +#define DE_IBS_NB_CACHE_STATE_O 0xF24A +#define DE_IBS_NB_LOCAL_LATENCY 0xF24B +#define DE_IBS_NB_REMOTE_LATENCY 0xF24C + + + +#define IBS_EVENT_TO_COUNTER(x) \ + (x - IBS_EVENT_BASE + OP_MAX_COUNTERS) + +#define COUNTER_TO_IBS_EVENT(x) \ + (x + IBS_EVENT_BASE - OP_MAX_COUNTERS) + +#define IS_IBS_FETCH(x) \ + (x < DE_IBS_OP_ALL) + +#endif |
From: Maynard J. <may...@us...> - 2008-07-23 14:28:24
|
Jason Yeh wrote: > This patch contains daemon/kernel module interface change and daemon code processing IBS events. > Jason, I know you plan on reworking this patch, so I'm waiting to do a full review. But a cursory glance at the changes to opd_trans.c makes me think that it might be best to separate out the IBS stuff into its own file, similar to how John suggested my Cell SPU-specific changes were done. Please take a look at that technique (opd_spu.c) and see if that makes sense to do in this case. too. Thanks. -Maynard > --- > opd_ibs.h | 320 +++++++++++++++++++++ > opd_interface.h | 4 > opd_trans.c | 844 +++++++++++++++++++++++++++++++++++++++++++++++++++++++- > opd_trans.h | 3 > 4 files changed, 1157 insertions(+), 14 deletions(-) > > > diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_interface.h oprofile-cvs-ibs/daemon/opd_interface.h > --- oprofile-cvs-original/daemon/opd_interface.h 2008-01-28 16:05:28.000000000 -0600 > +++ oprofile-cvs-ibs/daemon/opd_interface.h 2008-01-29 09:39:46.000000000 -0600 > @@ -39,7 +39,9 @@ > #define LAST_CODE 14 > #else > #define DOMAIN_SWITCH_CODE 11 > -#define LAST_CODE 12 > +#define IBS_FETCH_SAMPLE 13 > +#define IBS_OP_SAMPLE 14 > +#define LAST_CODE 15 > #endif > > #endif /* OPD_INTERFACE_H */ > diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_trans.h oprofile-cvs-ibs/daemon/opd_trans.h > --- oprofile-cvs-original/daemon/opd_trans.h 2008-01-28 16:05:28.000000000 -0600 > +++ oprofile-cvs-ibs/daemon/opd_trans.h 2008-01-29 09:39:46.000000000 -0600 > @@ -18,6 +18,7 @@ > > #include "opd_cookie.h" > #include "op_types.h" > +#include "opd_ibs.h" > > #include <stdint.h> > > @@ -54,6 +55,8 @@ struct transient { > pid_t tid; > pid_t tgid; > uint64_t embedded_offset; > + struct ibs_fetch_sample * ibs_fetch; > + struct ibs_op_sample * ibs_op; > }; > > typedef void (*handler_t)(struct transient *); > diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_trans.c oprofile-cvs-ibs/daemon/opd_trans.c > --- oprofile-cvs-original/daemon/opd_trans.c 2008-01-28 16:05:28.000000000 -0600 > +++ oprofile-cvs-ibs/daemon/opd_trans.c 2008-01-29 09:55:46.000000000 -0600 > @@ -14,6 +14,10 @@ > * Modified by Maynard Johnson <may...@us...> > * These modifications are: > * (C) Copyright IBM Corporation 2007 > + * > + * Modified by Jason Yeh and Paul Drongowski for AMD IBS > + * These modifications are: > + * Copyright (c) 2007 Advanced Micro Devices, Inc. > */ > > #include "opd_trans.h" > @@ -23,6 +27,7 @@ > #include "opd_stats.h" > #include "opd_printf.h" > #include "opd_interface.h" > +#include "opd_ibs.h" > > #include <limits.h> > #include <string.h> > @@ -50,7 +55,7 @@ void clear_trans_current(struct transien > > uint64_t pop_buffer_value(struct transient * trans) > { > - uint64_t val; > + uint64_t val = 0; > > if (!trans->remaining) { > fprintf(stderr, "BUG: popping empty buffer !\n"); > @@ -82,16 +87,629 @@ int enough_remaining(struct transient * > } > > > -static void opd_put_sample(struct transient * trans, unsigned long long pc) > +// Function: opd_decode_ibs_fetch > +// > +// This function decodes IBS hardware-level event flags and fields. It > +// effectively translates from the MSR encoding to the fields within > +// an ibs_fetch_sample -- an abstraction of the hardware-level data. > +// Event fields are either zero (false) or non-zero (true), except > +// the fetch latency, which is a 16-bit cycle count, and the fetch page size > +// field, which is a 2-bit unsigned integer. > +// > +static void opd_decode_ibs_fetch( > + struct ibs_fetch_sample const * raw_sample, > + struct decoded_ibs_fetch_sample * pIBSFetch) > { > - unsigned long long event; > + unsigned int ibs_fetch_ctl_high; > > - if (!enough_remaining(trans, 1)) { > - trans->remaining = 0; > - return; > + /* MSRC001_1030 IBS Fetch Control Register */ > + ibs_fetch_ctl_high = raw_sample->ibs_fetch_ctl_high; > + > + // Bits 47:32 IbsFetchLat: instruction fetch latency > + pIBSFetch->m_FetchLatency = (ibs_fetch_ctl_high & FETCH_MASK_LATENCY); > + > + // Bit 50 IbsFetchComp: instruction fetch complete. > + pIBSFetch->m_FetchCompletion = (ibs_fetch_ctl_high & FETCH_MASK_COMPLETE)!=0; > + > + // Bit 51 IbsIcMiss: instruction cache miss. > + pIBSFetch->m_InstCacheMiss = (ibs_fetch_ctl_high & FETCH_MASK_IC_MISS)!=0; > + > + // Bit 52 IbsPhyAddrValid: instruction fetch physical address valid. > + pIBSFetch->m_PhysicalAddrValid = (ibs_fetch_ctl_high & FETCH_MASK_PHY_ADDR)!=0; > + > + // Bits 54:53 IbsL1TlbPgSz: instruction cache L1TLB page size. > + pIBSFetch->m_TLBPageSize = ((ibs_fetch_ctl_high >> 21) & 0x3); > + > + // Bit 55 IbsL1TlbMiss: instruction cache L1TLB miss. > + pIBSFetch->m_L1TLBMiss = (ibs_fetch_ctl_high & FETCH_MASK_L1_MISS)!=0; > + > + // Bit 56 IbsL2TlbMiss: instruction cache L2TLB miss. > + pIBSFetch->m_L2TLBMiss = (ibs_fetch_ctl_high & FETCH_MASK_L2_MISS)!=0; > + > + // A fetch is a killed fetch if all the masked bits are clear > + pIBSFetch->m_Killed = (ibs_fetch_ctl_high & FETCH_MASK_KILLED)==0; > + > + pIBSFetch->m_InstCacheHit = (pIBSFetch->m_FetchCompletion && !pIBSFetch->m_InstCacheMiss); > + > + pIBSFetch->m_L1TLBHit = (!pIBSFetch->m_L1TLBMiss && pIBSFetch->m_PhysicalAddrValid); > + > + pIBSFetch->m_ITLB_L1M_L2H = (pIBSFetch->m_L1TLBMiss && !pIBSFetch->m_L2TLBMiss); > + > + pIBSFetch->m_ITLB_L1M_L2M = (pIBSFetch->m_L1TLBMiss && pIBSFetch->m_L2TLBMiss); > +} > + > + > +// > +// Log the specified IBS derived event. > +// > +static void opd_log_ibs_event(unsigned int event, > + struct transient * trans) > + { > + opd_stats[OPD_IBS_SAMPLE]++; > + trans->event = event; > + sfile_log_sample(trans); > +} > + > +// > +// Log the specified IBS cycle count. > +// > +static void opd_log_ibs_count(unsigned int event, > + struct transient * trans, unsigned int count) > +{ > + opd_stats[OPD_IBS_SAMPLE]++; > + trans->event = event; > + sfile_log_sample_count(trans, count); > +} > + > +// > +// Aggregate the IBS derived event. Increase the > +// derived event count by one. > +// > +#define AGG_IBS_EVENT(EV) \ > + { \ > + opd_log_ibs_event(EV, trans); \ > + } > + > +// > +// Aggregate the IBS latency/cycle counts. Increase the > +// derived event count by the specified count value. > +// > +#define AGG_IBS_COUNT(EV, COUNT) \ > + { \ > + opd_log_ibs_count(EV, trans, COUNT); \ > + } > + > + > + > +// > +// Function: opd_log_ibs_fetch > +// > +// This function converts IBS fetch event flags and values into > +// derived events. If the tagged (sampled) fetched caused a derived > +// event, the derived event is tallied. > +// > +static void opd_log_ibs_fetch(struct transient * trans) > +{ > + struct decoded_ibs_fetch_sample ibsFetchRec ; > + > + if (!trans->ibs_fetch) { > + verbprintf(vsamples, "DEBUG: no ibs_fetch: \n"); > + return; > + } > + > + // Decode the hardware-level IBS fetch information > + opd_decode_ibs_fetch(trans->ibs_fetch, &ibsFetchRec); > + > + // IBS all fetch samples (kills + attempts) > + AGG_IBS_EVENT(DE_IBS_FETCH_ALL) ; > + > + // IBS killed fetches ("case 0") -- All interesting event > + // flags are clear > + if (ibsFetchRec.m_Killed) > + { > + AGG_IBS_EVENT(DE_IBS_FETCH_KILLED) ; > + // Take an early out with IBS killed fetches, effectively > + // filtering killed fetches out of the other event counts > + return ; > + } > + > + // Any non-killed fetch is an attempted fetch > + AGG_IBS_EVENT(DE_IBS_FETCH_ATTEMPTED) ; > + > + if (ibsFetchRec.m_FetchCompletion) > + { > + // IBS Fetch Completed > + AGG_IBS_EVENT(DE_IBS_FETCH_COMPLETED) ; > + } > + else > + { > + // IBS Fetch Aborted > + AGG_IBS_EVENT(DE_IBS_FETCH_ABORTED) ; > + } > + > + // IBS L1 ITLB hit > + if (ibsFetchRec.m_L1TLBHit) > + { > + AGG_IBS_EVENT(DE_IBS_L1_ITLB_HIT) ; > + } > + > + // IBS L1 ITLB miss and L2 ITLB hit > + if (ibsFetchRec.m_ITLB_L1M_L2H) > + { > + AGG_IBS_EVENT(DE_IBS_ITLB_L1M_L2H) ; > + } > + > + // IBS L1 & L2 ITLB miss; complete ITLB miss > + if (ibsFetchRec.m_ITLB_L1M_L2M) > + { > + AGG_IBS_EVENT(DE_IBS_ITLB_L1M_L2M) ; > + } > + > + // IBS instruction cache miss > + if (ibsFetchRec.m_InstCacheMiss) > + { > + AGG_IBS_EVENT(DE_IBS_IC_MISS) ; > + } > + > + // IBS instruction cache hit > + if (ibsFetchRec.m_InstCacheHit) > + { > + AGG_IBS_EVENT(DE_IBS_IC_HIT) ; > + } > + > + // IBS page translations (only valid when physical address is valid) > + if (ibsFetchRec.m_PhysicalAddrValid) > + { > + switch (ibsFetchRec.m_TLBPageSize) > + { > + case L1TLB4K: > + AGG_IBS_EVENT(DE_IBS_FETCH_4K_PAGE) ; > + break; > + case L1TLB2M: > + AGG_IBS_EVENT(DE_IBS_FETCH_2M_PAGE) ; > + break; > + default: > + // DE_IBS_FETCH_1G_PAGE ; > + // DE_IBS_FETCH_XX_PAGE ; > + break; > + } > + } > + > + if (ibsFetchRec.m_FetchLatency) > + { > + AGG_IBS_COUNT(DE_IBS_FETCH_LATENCY, ibsFetchRec.m_FetchLatency) ; > } > +} > + > +// > +// Function: opd_decode_ibs_op > +// > +// This function translates IBS op event data from its hardware-level > +// representation to fields within an ibs_op_sample structure.It hides > +// the MSR layout of IBS op data. > +// > +static void opd_decode_ibs_op(struct ibs_op_sample const * raw_sample, > + struct decoded_ibs_op_sample * pIBSOp) > +{ > + register unsigned int ibs_op_data1_high = raw_sample->ibs_op_data1_high; > + register unsigned int ibs_op_data2_low = raw_sample->ibs_op_data2_low; > + register unsigned int ibs_op_data3_low = raw_sample->ibs_op_data3_low; > + > + > + // > + // MSRC001_1035 IBS OP Data Register (IbsOpData) > + // > + // 15:0 IbsCompToRetCtr: macro-op completion to retire count > + pIBSOp->m_CompToRetireCycles = (raw_sample->ibs_op_data1_low & BR_MASK_RETIRE); > + > + // 31:16 IbsTagToRetCtr: macro-op tag to retire count. > + pIBSOp->m_TagToRetireCycles = (raw_sample->ibs_op_data1_low >> 16) & BR_MASK_RETIRE; > + > + // 32 IbsOpBrnResync: resync macro-op. > + pIBSOp->m_OpBranchResync = (ibs_op_data1_high & BR_MASK_BRN_RESYNC) != 0 ; > + > + // 33 IbsOpMispReturn: mispredicted return macro-op. > + pIBSOp->m_OpMispredictedReturn = (ibs_op_data1_high & BR_MASK_MISP_RETURN) != 0 ; > + > + // 34 IbsOpReturn: return macro-op. > + pIBSOp->m_OpReturn = (ibs_op_data1_high & BR_MASK_RETURN) != 0 ; > + > + // 35 IbsOpBrnTaken: taken branch macro-op. > + pIBSOp->m_OpBranchTaken = (ibs_op_data1_high & BR_MASK_BRN_TAKEN) != 0 ; > + > + // 36 IbsOpBrnMisp: mispredicted branch macro-op. > + pIBSOp->m_OpBranchMispredicted = (ibs_op_data1_high & BR_MASK_BRN_MISP) != 0 ; > + > + // 37 IbsOpBrnRet: branch macro-op retired. > + pIBSOp->m_OpBranchRetired = (ibs_op_data1_high & BR_MASK_BRN_RET) != 0 ; > + > + > + // > + // MSRC001_1036 IBS Op Data 2 Register (IbsOpData2) > + // > + // 5 NbIbsReqCacheHitSt: IBS L3 cache state > + pIBSOp->m_NbIbsCacheHitSt = (ibs_op_data2_low & NB_MASK_L3_STATE) != 0 ; > + > + // 4 NbIbsReqDstProc: IBS request destination processor > + pIBSOp->m_NbIbsReqDstProc = (ibs_op_data2_low & NB_MASK_REQ_DST_PROC) != 0 ; > + > + // 2:0 NbIbsReqSrc: Northbridge IBS request data source > + pIBSOp->m_NbIbsReqSrc = (ibs_op_data2_low & NB_MASK_REQ_DATA_SRC) ; > + > + > + // > + // MSRC001_1037 IBS Op Data3 Register > + // > + // Bits 48:32 IbsDcMissLat > + pIBSOp->m_IbsDcMissLat = raw_sample->ibs_op_data3_high & 0xFFFF; > + > + // 0 IbsLdOp: Load op > + pIBSOp->m_IbsLdOp = (ibs_op_data3_low & DC_MASK_LOAD_OP) != 0 ; > + > + // 1 IbsStOp: Store op > + pIBSOp->m_IbsStOp = (ibs_op_data3_low & DC_MASK_STORE_OP) != 0 ; > + > + // 2 IbsDcL1TlbMiss: Data cache L1TLB miss > + pIBSOp->m_IbsDcL1tlbMiss = (ibs_op_data3_low & DC_MASK_L1_TLB_MISS) != 0 ; > + > + // 3 IbsDcL2tlbMiss: Data cache L2TLB miss > + pIBSOp->m_IbsDcL2tlbMiss = (ibs_op_data3_low & DC_MASK_L2_TLB_MISS) != 0 ; > + > + // 4 IbsDcL1tlbHit2M: Data cache L1TLB hit in 2M page > + pIBSOp->m_IbsDcL1tlbHit2M = (ibs_op_data3_low & DC_MASK_L1_HIT_2M) != 0 ; > + > + // 5 IbsDcL1tlbHit1G: Data cache L1TLB hit in 1G page > + pIBSOp->m_IbsDcL1tlbHit1G = (ibs_op_data3_low & DC_MASK_L1_HIT_1G) != 0 ; > + > + // 6 IbsDcL2tlbHit2M: Data cache L2TLB hit in 2M page > + pIBSOp->m_IbsDcL2tlbHit2M = (ibs_op_data3_low & DC_MASK_L2_HIT_2M) != 0 ; > + > + // 7 IbsDcMiss: Data cache miss > + pIBSOp->m_IbsDcMiss = (ibs_op_data3_low & DC_MASK_DC_MISS) != 0 ; > + > + // 8 IbsDcMisAcc: Misaligned access > + pIBSOp->m_IbsDcMisAcc = (ibs_op_data3_low & DC_MASK_MISALIGN_ACCESS) != 0 ; > + > + // 9 IbsDcLdBnkCon: Bank conflict on load operation > + pIBSOp->m_IbsDcLdBnkCon = (ibs_op_data3_low & DC_MASK_LD_BANK_CONFLICT) != 0 ; > + > + // 10 IbsDcStBnkCon: Bank conflict on store operation > + pIBSOp->m_IbsDcStBnkCon = (ibs_op_data3_low & DC_MASK_ST_BANK_CONFLICT) != 0 ; > + > + // 11 IbsDcStToLdFwd: Data forwarded from store to load operation > + pIBSOp->m_IbsDcStToLdFwd = (ibs_op_data3_low & DC_MASK_ST_TO_LD_FOR) != 0 ; > + > + // 12 IbsDcDcStToLdCan: Data forwarding from store to load operation cancelled > + pIBSOp->m_IbsDcStToLdCan = (ibs_op_data3_low & DC_MASK_ST_TO_LD_CANCEL) != 0 ; > + > + // 13 IbsDcDcUcMemAcc: UC memory access > + pIBSOp->m_IbsDcUcMemAcc = (ibs_op_data3_low & DC_MASK_UC_MEM_ACCESS) != 0 ; > + > + // 14 IbsDcWcMemAcc: WC memory access > + pIBSOp->m_IbsDcWcMemAcc = (ibs_op_data3_low & DC_MASK_WC_MEM_ACCESS) != 0 ; > + > + // 15 IbsDcLockedOp: Locked operation > + pIBSOp->m_IbsDcLockedOp = (ibs_op_data3_low & DC_MASK_LOCKED_OP) != 0 ; > + > + // 16 IbsDcMabHit: MAB hit > + pIBSOp->m_IbsDcMabHit = (ibs_op_data3_low & DC_MASK_MAB_HIT) != 0 ; > + > + // 17 IbsDcLinAddrValid: Data cache linear address valid > + pIBSOp->m_IbsDcLinAddrValid = (ibs_op_data3_low & DC_MASK_LIN_ADDR_VALID) != 0 ; > + > + // 18 IbsDcPhyAddrValid: Data cache physical address valid > + pIBSOp->m_IbsDcPhyAddrValid = (ibs_op_data3_low & DC_MASK_PHY_ADDR_VALID) != 0 ; > +} > + > + > +// > +// Function: opd_log_ibs_op > +// > +// This function translates the IBS op event flags and values into > +// IBS op derived events. If an op derived event occured, it's tallied. > +// > +static void opd_log_ibs_op(struct transient * trans) > +{ > + struct decoded_ibs_op_sample ibsOpRec; > + unsigned int useL2TranslationSize = 0; > + > + if (!trans->ibs_op) { > + verbprintf(vsamples, "DEBUG: NO trans->ibs_op: \n"); > + return; > + } > + > + opd_decode_ibs_op(trans->ibs_op, &ibsOpRec) ; > + > + // All IBS op samples > + AGG_IBS_EVENT(DE_IBS_OP_ALL) ; > + > + // Tally retire cycle counts for all sampled macro-ops > + if (ibsOpRec.m_TagToRetireCycles) > + { > + // IBS tag to retire cycles > + AGG_IBS_COUNT(DE_IBS_OP_TAG_TO_RETIRE, ibsOpRec.m_TagToRetireCycles) ; > + } > + > + if (ibsOpRec.m_CompToRetireCycles) > + { > + // IBS completion to retire cycles > + AGG_IBS_COUNT(DE_IBS_OP_COMP_TO_RETIRE, ibsOpRec.m_CompToRetireCycles) ; > + } > + > + // Test for an IBS branch macro-op > + if (ibsOpRec.m_OpBranchRetired) > + { > + // IBS Branch retired op > + AGG_IBS_EVENT(DE_IBS_BRANCH_RETIRED) ; > + > + // Test branch-specific event flags > + if (ibsOpRec.m_OpBranchMispredicted) > + { > + // IBS mispredicted Branch op > + AGG_IBS_EVENT(DE_IBS_BRANCH_MISP) ; > + } > + > + if (ibsOpRec.m_OpBranchTaken) > + { > + // IBS taken Branch op > + AGG_IBS_EVENT(DE_IBS_BRANCH_TAKEN) ; > + } > + > + if (ibsOpRec.m_OpBranchTaken && ibsOpRec.m_OpBranchMispredicted) > + { > + // IBS mispredicted taken branch op > + AGG_IBS_EVENT(DE_IBS_BRANCH_MISP_TAKEN) ; > + } > + > + if (ibsOpRec.m_OpReturn) > + { > + // IBS return op > + AGG_IBS_EVENT(DE_IBS_RETURN) ; > + } > + > + if (ibsOpRec.m_OpReturn && ibsOpRec.m_OpBranchMispredicted) > + { > + // IBS mispredicted return op > + AGG_IBS_EVENT(DE_IBS_RETURN_MISP) ; > + } > + } // Branch and return op sample > + > + // Test for a resync macro-op > + if (ibsOpRec.m_OpBranchResync) > + { > + // IBS resync OP > + AGG_IBS_EVENT(DE_IBS_RESYNC) ; > + } > + > + if (!ibsOpRec.m_IbsLdOp && !ibsOpRec.m_IbsStOp) > + { > + // If no load or store operation, then take an early return > + // No more derived events need to be tallied > + return ; > + } > + > + // Count the number of LS op samples > + AGG_IBS_EVENT(DE_IBS_LS_ALL_OP) ; > + > + // Count and handle load ops > + if (ibsOpRec.m_IbsLdOp) > + { > + // Tally an IBS load derived event > + AGG_IBS_EVENT(DE_IBS_LS_LOAD_OP) ; > + // If the load missed in DC, tally the DC load miss latency > + if(ibsOpRec.m_IbsDcMiss) > + { > + // DC load miss latency is only reliable for load ops > + AGG_IBS_COUNT(DE_IBS_LS_DC_LOAD_LAT, ibsOpRec.m_IbsDcMissLat) ; > + } > + // Data forwarding info are valid only for load ops > + if(ibsOpRec.m_IbsDcStToLdFwd) > + { > + AGG_IBS_EVENT(DE_IBS_LS_STL_FORWARDED) ; > + } > + if(ibsOpRec.m_IbsDcStToLdCan) > + { > + AGG_IBS_EVENT(DE_IBS_LS_STL_CANCELLED) ; > + } > + // NB data is only guaranteed reliable for load operations > + // that miss in L1 and L2 cache. NB data arrives too late > + // to be reliable for store operations > + if (ibsOpRec.m_IbsDcMiss && (ibsOpRec.m_NbIbsReqSrc != 0)) > + { > + // NB data is valid, so tally derived NB events > + if( ibsOpRec.m_NbIbsReqDstProc ) > + { > + // Request was serviced by remote processor > + AGG_IBS_EVENT(DE_IBS_NB_REMOTE) ; > + AGG_IBS_COUNT(DE_IBS_NB_REMOTE_LATENCY, ibsOpRec.m_IbsDcMissLat) ; > + switch( ibsOpRec.m_NbIbsReqSrc ) > + { > + case 0x2: > + { > + AGG_IBS_EVENT(DE_IBS_NB_REMOTE_CACHE) ; > + if (ibsOpRec.m_NbIbsCacheHitSt) > + { > + AGG_IBS_EVENT(DE_IBS_NB_CACHE_STATE_O) ; > + } else { > + AGG_IBS_EVENT(DE_IBS_NB_CACHE_STATE_M) ; > + } > + break ; > + } > + case 0x3: > + { > + AGG_IBS_EVENT(DE_IBS_NB_REMOTE_DRAM) ; > + break ; > + } > + case 0x7: > + { > + AGG_IBS_EVENT(DE_IBS_NB_REMOTE_OTHER) ; > + break ; > + } > + default: > + { > + break ; > + } > + } > + } else { > + // Request was serviced by local processor > + AGG_IBS_EVENT(DE_IBS_NB_LOCAL) ; > + AGG_IBS_COUNT(DE_IBS_NB_LOCAL_LATENCY, ibsOpRec.m_IbsDcMissLat) ; > + switch( ibsOpRec.m_NbIbsReqSrc ) > + { > + case 0x1: > + { > + AGG_IBS_EVENT(DE_IBS_NB_LOCAL_L3) ; > + break ; > + } > + case 0x2: > + { > + AGG_IBS_EVENT(DE_IBS_NB_LOCAL_CACHE) ; > + if (ibsOpRec.m_NbIbsCacheHitSt) > + { > + AGG_IBS_EVENT(DE_IBS_NB_CACHE_STATE_O) ; > + } else { > + AGG_IBS_EVENT(DE_IBS_NB_CACHE_STATE_M) ; > + } > + break ; > + } > + case 0x3: > + { > + AGG_IBS_EVENT(DE_IBS_NB_LOCAL_DRAM) ; > + break ; > + } > + case 0x7: > + { > + AGG_IBS_EVENT(DE_IBS_NB_LOCAL_OTHER) ; > + break ; > + } > + default: > + { > + break ; > + } > + } // Northbridge request source > + } // Northbridge local/remote > + } // Northbridge status > + } // Load operations > + > + // Count and handle store operations > + if (ibsOpRec.m_IbsStOp) > + { > + AGG_IBS_EVENT(DE_IBS_LS_STORE_OP) ; > + } > + > + if (ibsOpRec.m_IbsDcMiss) > + { > + AGG_IBS_EVENT(DE_IBS_LS_DC_MISS) ; > + } else { > + AGG_IBS_EVENT(DE_IBS_LS_DC_HIT) ; > + } > + > + if (ibsOpRec.m_IbsDcMisAcc) > + { > + AGG_IBS_EVENT(DE_IBS_LS_MISALIGNED) ; > + } > + > + if (ibsOpRec.m_IbsDcLdBnkCon) > + { > + AGG_IBS_EVENT(DE_IBS_LS_BNK_CONF_LOAD) ; > + } > + > + if (ibsOpRec.m_IbsDcStBnkCon) > + { > + AGG_IBS_EVENT(DE_IBS_LS_BNK_CONF_STORE) ; > + } > + > + if (ibsOpRec.m_IbsDcUcMemAcc) > + { > + AGG_IBS_EVENT(DE_IBS_LS_UC_MEM_ACCESS) ; > + } > + > + if (ibsOpRec.m_IbsDcWcMemAcc) > + { > + AGG_IBS_EVENT(DE_IBS_LS_WC_MEM_ACCESS) ; > + } > + > + if (ibsOpRec.m_IbsDcLockedOp) > + { > + AGG_IBS_EVENT(DE_IBS_LS_LOCKED_OP) ; > + } > + > + if (ibsOpRec.m_IbsDcMabHit) > + { > + AGG_IBS_EVENT(DE_IBS_LS_MAB_HIT) ; > + } > + > + // IbsDcLinAddrValid is true when address translation was successful. > + // Some macro-ops do not perform an address translation and use only > + // a physical address. > + if (ibsOpRec.m_IbsDcLinAddrValid) > + { > + if (! ibsOpRec.m_IbsDcL1tlbMiss) > + { > + // L1 DTLB hit -- This is the most frequent case > + AGG_IBS_EVENT(DE_IBS_LS_DTLB_L1H) ; > + } else if (ibsOpRec.m_IbsDcL2tlbMiss) > + { > + // L1 DTLB miss, L2 DTLB miss > + AGG_IBS_EVENT(DE_IBS_LS_DTLB_L1M_L2M) ; > + } else { > + // L1 DTLB miss, L2 DTLB hit > + AGG_IBS_EVENT(DE_IBS_LS_DTLB_L1M_L2H) ; > + useL2TranslationSize = 1 ; > + } > + if (useL2TranslationSize) > + { > + // L2 DTLB page translation > + if (ibsOpRec.m_IbsDcL2tlbHit2M) > + { > + // 2M L2 DTLB page translation > + AGG_IBS_EVENT(DE_IBS_LS_L2_DTLB_2M) ; > + } else { > + // 4K L2 DTLB page translation > + AGG_IBS_EVENT(DE_IBS_LS_L2_DTLB_4K) ; > + } > + } else { > + // L1 DTLB page translation > + if (ibsOpRec.m_IbsDcL1tlbHit2M) > + { > + // 2M L1 DTLB page translation > + AGG_IBS_EVENT(DE_IBS_LS_L1_DTLB_2M) ; > + } else if (ibsOpRec.m_IbsDcL1tlbHit1G) > + { > + // 1G L1 DTLB page translation > + AGG_IBS_EVENT(DE_IBS_LS_L1_DTLB_1G) ; > + } else { > + // This is the most common case, unfortunately > + AGG_IBS_EVENT(DE_IBS_LS_L1_DTLB_4K) ; > + } > + } > + } // Page translation size > +} > + > + > +static void opd_put_sample(struct transient * trans, unsigned long long pc) > +{ > + unsigned long long event = 0; > > - event = pop_buffer_value(trans); > + if (!trans->ibs_fetch && !trans->ibs_op) { > + if (!enough_remaining(trans, 1)) { > + verbprintf(vibs_debug, "not enough remaining\n"); > + trans->remaining = 0; > + return; > + } > + event = pop_buffer_value(trans); > + } > + > + /* IBS can generate samples with no valid dcookie and > + * in kernel address range. Map such samples to vmlinux > + * only if the user either specifies a range, or vmlinux. > + */ > + if ((trans->ibs_fetch || trans->ibs_op) > + && trans->cookie == INVALID_COOKIE > + && find_kernel_image(trans)) > + { > + trans->in_kernel = 1; > + } > > if (trans->tracing != TRACING_ON) > trans->event = event; > @@ -116,6 +734,19 @@ static void opd_put_sample(struct transi > if (!trans->current) > goto out; > > + /* check if the current sample belongs to IBS */ > + if (trans->ibs_fetch) { > + if(!trans->anon) > + trans->pc = trans->ibs_fetch->rip; > + opd_log_ibs_fetch(trans); > + goto out; > + } else if (trans->ibs_op) { > + if(!trans->anon) > + trans->pc = trans->ibs_op->rip; > + opd_log_ibs_op(trans); > + goto out; > + } > + > /* FIXME: this logic is perhaps too harsh? */ > if (trans->current->ignored || (trans->last && trans->last->ignored)) > goto out; > @@ -123,6 +754,7 @@ static void opd_put_sample(struct transi > /* log the sample or arc */ > sfile_log_sample(trans); > > + > out: > /* switch to trace mode */ > if (trans->tracing == TRACING_START) > @@ -132,6 +764,11 @@ out: > } > > > +// > +// Function: code_unknown > +// > +// Call this function when an unknown escape code is encountered. > +// > static void code_unknown(struct transient * trans __attribute__((unused))) > { > fprintf(stderr, "Unknown code !\n"); > @@ -139,6 +776,19 @@ static void code_unknown(struct transien > } > > > +// > +// Function: code_ctx_switch > +// > +// Handle a context switch escape code sequence. The event buffer entries > +// for a context switch are: > +// ESCAPE_CODE > +// CTX_SWITCH_CODE > +// Process ID (PID) > +// Cookie > +// ESCAPE_CODE > +// CTX_TGID_CODE > +// Task group ID (TGID) > +// > static void code_ctx_switch(struct transient * trans) > { > clear_trans_current(trans); > @@ -160,12 +810,21 @@ static void code_ctx_switch(struct trans > if (vmisc) { > char const * app = find_cookie(trans->app_cookie); > printf("CTX_SWITCH to tid %lu, tgid %lu, cookie %llx(%s)\n", > - (unsigned long)trans->tid, (unsigned long)trans->tgid, > - trans->app_cookie, app ? app : "none"); > + (unsigned long)trans->tid, (unsigned long)trans->tgid, > + trans->app_cookie, app ? app : "none"); > } > } > > > +// > +// Function: code_cpu_switch > +// > +// Handle a CPU switch escape code sequence. The event buffer entries for > +// a CPU switch are: > +// ESCAPE_CODE > +// CPU_SWITCH_CODE > +// CPU number > +// > static void code_cpu_switch(struct transient * trans) > { > clear_trans_current(trans); > @@ -180,6 +839,15 @@ static void code_cpu_switch(struct trans > } > > > +// > +// Function: code_cookie_switch > +// > +// Handle a cookie switch escape code sequence. The event buffer entries > +// for a cookie switch are: > +// ESCAPE_CODE > +// COOKIE_SWITCH_CODE > +// Cookie > +// > static void code_cookie_switch(struct transient * trans) > { > clear_trans_current(trans); > @@ -193,12 +861,17 @@ static void code_cookie_switch(struct tr > > if (vmisc) { > char const * name = verbose_cookie(trans->cookie); > - verbprintf(vmisc, "COOKIE_SWITCH to cookie %s(%llx)\n", > - name, trans->cookie); > + verbprintf(vibs_debug, "COOKIE_SWITCH to cookie %s(%llx)\n", > + name, trans->cookie); > } > } > > > +// > +// Function: code_kernel_enter > +// > +// Handle a kernel entry escape code sequence. > +// > static void code_kernel_enter(struct transient * trans) > { > verbprintf(vmisc, "KERNEL_ENTER_SWITCH to kernel\n"); > @@ -230,6 +903,143 @@ static void code_module_loaded(struct tr > } > > > +// > +// Function: code_ibs_fetch_sample > +// > +// Handle an IBS fetch sample escape code sequence. An IBS fetch sample > +// is represented as an escape code sequence. (See the comment for the > +// function code_ibs_op_sample() for the sequence of entries in the event > +// buffer.) When this function is called, the ESCAPE_CODE and IBS_FETCH_CODE > +// have already been removed from the event buffer. Thus, 7 more event buffer > +// entries are needed in order to process a complete IBS fetch sample. > +// > +static void code_ibs_fetch_sample(struct transient * trans) > +{ > + if (!enough_remaining(trans, 7)) { > + verbprintf(vibs_debug, "not enough remaining\n"); > + trans->remaining = 0; > + return; > + } > + > + trans->ibs_fetch = malloc(sizeof(struct ibs_fetch_sample)); > + if (!trans->ibs_fetch) { > + verbprintf(vibs_debug, "DEBUG: IBS Out of Memory\n"); > + abort(); > + } > + > + trans->ibs_fetch->rip = pop_buffer_value(trans); > + > + trans->ibs_fetch->ibs_fetch_lin_addr_low = pop_buffer_value(trans); > + trans->ibs_fetch->ibs_fetch_lin_addr_high = pop_buffer_value(trans); > + > + trans->ibs_fetch->ibs_fetch_ctl_low = pop_buffer_value(trans); > + trans->ibs_fetch->ibs_fetch_ctl_high = pop_buffer_value(trans); > + trans->ibs_fetch->ibs_fetch_phys_addr_low = pop_buffer_value(trans); > + trans->ibs_fetch->ibs_fetch_phys_addr_high = pop_buffer_value(trans); > + > + verbprintf(vsamples, > + "FETCH_X CP:%ld ID:%ld IP:%lx FL:%x LAT:%d P_HI:%x P_LO:%x L_HI:%x L_LO:%x\n", > + trans->cpu, > + (long)trans->tgid, > + trans->ibs_fetch->rip, > + (trans->ibs_fetch->ibs_fetch_ctl_high >> 16) & 0x3FF, > + (trans->ibs_fetch->ibs_fetch_ctl_high) & 0xFFFF, > + trans->ibs_fetch->ibs_fetch_phys_addr_high, > + trans->ibs_fetch->ibs_fetch_phys_addr_low, > + trans->ibs_fetch->ibs_fetch_lin_addr_high, > + trans->ibs_fetch->ibs_fetch_lin_addr_low) ; > + > + opd_put_sample(trans, trans->ibs_fetch->rip); > + > + free(trans->ibs_fetch); > + trans->ibs_fetch = NULL; > +} > + > + > +// > +// Function: code_ibs_op_sample > +// > +// Handle an IBS op sample escape code sequence. An IBS op sample > +// is represented as an escape code sequence: > +// > +// IBS fetch IBS op > +// --------------- ---------------- > +// ESCAPE_CODE ESCAPE_CODE > +// IBS_FETCH_CODE IBS_OP_CODE > +// Offset Offset > +// IbsFetchLinAd low IbsOpRip low <-- Logical (virtual) RIP > +// IbsFetchLinAd high IbsOpRip high <-- Logical (virtual) RIP > +// IbsFetchCtl low IbsOpData low > +// IbsFetchCtl high IbsOpData high > +// IbsFetchPhysAd low IbsOpData2 low > +// IbsFetchPhysAd high IbsOpData2 high > +// IbsOpData3 low > +// IbsOpData3 high > +// IbsDcLinAd low > +// IbsDcLinAd high > +// IbsDcPhysAd low > +// IbsDcPhysAd high > +// > +// When this function is called, the ESCAPE_CODE and IBS_OP_CODE have > +// already been removed from the event buffer. Thus, 13 more event buffer > +// entries are needed to process a complete IBS op sample. > +// > +// The IbsFetchLinAd and IbsOpRip are the linear (virtual) addresses > +// that were generated by the IBS hardware. These addresses are mapped > +// into the offset. > +// > +static void code_ibs_op_sample(struct transient * trans) > +{ > + verbprintf(vmodule, "IBS_OP_SAMPLE_CODE\n"); > + > + if (!enough_remaining(trans, 13)) { > + verbprintf(vibs_debug, "not enough remaining\n"); > + trans->remaining = 0; > + return; > + } > + > + trans->ibs_op = malloc(sizeof(struct ibs_op_sample)); > + if (!trans->ibs_op) { > + verbprintf(vibs_debug, "DEBUG: IBS Out of Memory\n"); > + abort(); > + } > + > + trans->ibs_op->rip = pop_buffer_value(trans); > + > + trans->ibs_op->ibs_op_lin_addr_low = pop_buffer_value(trans); > + trans->ibs_op->ibs_op_lin_addr_high = pop_buffer_value(trans); > + > + trans->ibs_op->ibs_op_data1_low = pop_buffer_value(trans); > + trans->ibs_op->ibs_op_data1_high = pop_buffer_value(trans); > + trans->ibs_op->ibs_op_data2_low = pop_buffer_value(trans); > + trans->ibs_op->ibs_op_data2_high = pop_buffer_value(trans); > + trans->ibs_op->ibs_op_data3_low = pop_buffer_value(trans); > + trans->ibs_op->ibs_op_data3_high = pop_buffer_value(trans); > + trans->ibs_op->ibs_op_ldst_linaddr_low = pop_buffer_value(trans); > + trans->ibs_op->ibs_op_ldst_linaddr_high = pop_buffer_value(trans); > + trans->ibs_op->ibs_op_phys_addr_low = pop_buffer_value(trans); > + trans->ibs_op->ibs_op_phys_addr_high = pop_buffer_value(trans); > + > + verbprintf(vsamples, > + "IBS_OP_X CP:%ld ID:%d IP:%lx D1HI:%x D1LO:%x D2LO:%x D3HI:%x D3LO:%x LALO:%x PALO:%x\n", > + trans->cpu, > + trans->tgid, > + trans->ibs_op->rip, > + trans->ibs_op->ibs_op_data1_high, > + trans->ibs_op->ibs_op_data1_low, > + trans->ibs_op->ibs_op_data2_low, > + trans->ibs_op->ibs_op_data3_high, > + trans->ibs_op->ibs_op_data3_low, > + trans->ibs_op->ibs_op_ldst_linaddr_low, > + trans->ibs_op->ibs_op_phys_addr_low); > + > + opd_put_sample(trans, trans->ibs_op->rip); > + > + free(trans->ibs_op); > + trans->ibs_op = NULL; > +} > + > + > /* > * This also implicitly signals the end of the previous > * trace, so we never explicitly set TRACING_OFF when > @@ -274,8 +1084,13 @@ handler_t handlers[LAST_CODE + 1] = { > #if defined(__powerpc__) > &code_spu_profiling, > &code_spu_ctx_switch, > -#endif > &code_unknown, > +#else > + &code_unknown, > + &code_unknown, > + &code_ibs_fetch_sample, > + &code_ibs_op_sample, > +#endif > }; > > extern void (*special_processor)(struct transient *); > @@ -299,7 +1114,9 @@ void opd_process_samples(char const * bu > .cpu = -1, > .tid = -1, > .embedded_offset = UNUSED_EMBEDDED_OFFSET, > - .tgid = -1 > + .tgid = -1, > + .ibs_fetch = NULL, > + .ibs_op = NULL > }; > > /* FIXME: was uint64_t but it can't compile on alpha where uint64_t > @@ -338,3 +1155,4 @@ void opd_process_samples(char const * bu > handlers[code](&trans); > } > } > + > diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_ibs.h oprofile-cvs-ibs/daemon/opd_ibs.h > --- oprofile-cvs-original/daemon/opd_ibs.h 1969-12-31 18:00:00.000000000 -0600 > +++ oprofile-cvs-ibs/daemon/opd_ibs.h 2008-01-29 09:39:46.000000000 -0600 > @@ -0,0 +1,320 @@ > +/* > + * @file opd_ibs.h > + * AMD Family10h Instruction Based Sampling (IBS) handling. > + * > + * @remark Copyright 2007 OProfile authors > + * @remark Read the file COPYING > + * > + * @author Jason Yeh > + * @author Paul Drongowski > + */ > + > +#ifndef OPD_IBS_H > +#define OPD_IBS_H > + > +#include <stdint.h> > + > +// > +// IBS information is processed in two steps. The first step decodes > +// hardware-level IBS information and saves it in decoded form. The > +// second step translates the decoded IBS information into IBS derived > +// events. IBS information is tallied and is reported as derived events. > +// > + > + > + > +// > +// This struct represents the hardware-level IBS fetch information. > +// Each field corresponds to a model-specific register (MSR.) See the > +// BIOS and Kernel Developer's Guide for AMD Model Family 10h Processors > +// for further details. > +// > +struct ibs_fetch_sample { > + unsigned long int rip; > + /* MSRC001_1030 IBS Fetch Control Register */ > + unsigned int ibs_fetch_ctl_low; > + unsigned int ibs_fetch_ctl_high; > + /* MSRC001_1031 IBS Fetch Linear Address Register */ > + unsigned int ibs_fetch_lin_addr_low; > + unsigned int ibs_fetch_lin_addr_high; > + /* MSRC001_1032 IBS Fetch Physical Address Register */ > + unsigned int ibs_fetch_phys_addr_low; > + unsigned int ibs_fetch_phys_addr_high; > + unsigned int dummy_event; > +}; > + > + > + > +// > +// This struct is an abstraction of IBS fetch information. It hides > +// the hardware-level model-specific register (MSR) layout. The char > +// fields hold Boolean values. > +// > +struct decoded_ibs_fetch_sample { > + unsigned short m_FetchLatency; > + unsigned short m_TLBPageSize; > + unsigned char m_PhysicalAddrValid; > + unsigned char m_L2TLBMiss; > + unsigned char m_L1TLBMiss; > + unsigned char m_InstCacheMiss; > + unsigned char m_InstCacheHit; > + unsigned char m_FetchCompletion; > + unsigned char m_L1TLBHit; > + unsigned char m_ITLB_L1M_L2H; > + unsigned char m_ITLB_L1M_L2M; > + unsigned char m_Killed; > +}; > + > + > + > + > +// > +// This struct represents the hardware-level IBS op information. > +// > +struct ibs_op_sample { > + unsigned long int rip; > + /* MSRC001_1034 IBS Op Logical Address Register */ > + unsigned int ibs_op_lin_addr_low; > + unsigned int ibs_op_lin_addr_high; > + ///* MSRC001_1035 IBS Op Data Register */ > + unsigned int ibs_op_data1_low; > + unsigned int ibs_op_data1_high; > + /* MSRC001_1036 IBS Op Data 2 Register */ > + unsigned int ibs_op_data2_low; > + unsigned int ibs_op_data2_high; > + /* MSRC001_1037 IBS Op Data 3 Register */ > + unsigned int ibs_op_data3_low; > + unsigned int ibs_op_data3_high; > + unsigned int ibs_op_ldst_linaddr_low; > + unsigned int ibs_op_ldst_linaddr_high; > + unsigned int ibs_op_phys_addr_low; > + unsigned int ibs_op_phys_addr_high; > +}; > + > + > + > +// > +// This struct is an sbtraction of the IBS op information. It hides > +// the hardware-level, MSR layout. The char fields hold Boolean values > +// except m_NbIbsReqSrc which is a 3-bit field. > +// > +struct decoded_ibs_op_sample { > + unsigned short m_TagToRetireCycles; > + unsigned short m_CompToRetireCycles; > + unsigned short m_IbsDcMissLat; > + unsigned char m_NbIbsReqSrc; > + unsigned char m_NbIbsCacheHitSt; > + unsigned char m_NbIbsReqDstProc; > + unsigned char m_OpBranchRetired; > + unsigned char m_OpBranchMispredicted; > + unsigned char m_OpBranchTaken; > + unsigned char m_OpMispredictedReturn; > + unsigned char m_OpBranchResync; > + unsigned char m_OpReturn; > + unsigned char m_IbsLdOp; > + unsigned char m_IbsStOp; > + unsigned char m_IbsDcLinAddrValid; > + unsigned char m_IbsDcPhyAddrValid; > + unsigned char m_IbsDcL1tlbMiss; > + unsigned char m_IbsDcL2tlbMiss; > + unsigned char m_IbsDcL1tlbHit2M; > + unsigned char m_IbsDcL1tlbHit1G; > + unsigned char m_IbsDcL2tlbHit2M; > + unsigned char m_IbsDcMiss; > + unsigned char m_IbsDcMisAcc; > + unsigned char m_IbsDcLdBnkCon; > + unsigned char m_IbsDcStBnkCon; > + unsigned char m_IbsDcStToLdFwd; > + unsigned char m_IbsDcStToLdCan; > + unsigned char m_IbsDcUcMemAcc; > + unsigned char m_IbsDcWcMemAcc; > + unsigned char m_IbsDcLockedOp; > + unsigned char m_IbsDcMabHit; > +}; > + > + > + > +// > +// The following defines are bit masks that are used to select > +// IBS fetch event flags and values at the MSR level. > +// > +#define FETCH_MASK_LATENCY 0x0000FFFF > +#define FETCH_MASK_COMPLETE 0x00040000 > +#define FETCH_MASK_IC_MISS 0x00080000 > +#define FETCH_MASK_PHY_ADDR 0x00100000 > +#define FETCH_MASK_PG_SIZE 0x00600000 > +#define FETCH_MASK_L1_MISS 0x00800000 > +#define FETCH_MASK_L2_MISS 0x01000000 > +#define FETCH_MASK_KILLED (FETCH_MASK_L1_MISS|FETCH_MASK_L2_MISS|FETCH_MASK_PHY_ADDR|FETCH_MASK_COMPLETE|FETCH_MASK_IC_MISS) > + > +enum IBSL1PAGESIZE > +{ > + L1TLB4K = 0, > + L1TLB2M, > + L1TLB1G, > + L1TLB_Invalid > +}; > + > + > + > +// > +// The following defines are bit masks that are used to select > +// IBS op event flags and values at the MSR level. > +// > + > +// > +// Masks for selecting raw IBS event bits/fields. > +// > +#define BR_MASK_RETIRE 0x0000FFFF > +#define BR_MASK_BRN_RET 0x00000020 > +#define BR_MASK_BRN_MISP 0x00000010 > +#define BR_MASK_BRN_TAKEN 0x00000008 > +#define BR_MASK_RETURN 0x00000004 > +#define BR_MASK_MISP_RETURN 0x00000002 > +#define BR_MASK_BRN_RESYNC 0x00000001 > + > +#define NB_MASK_L3_STATE 0x00000020 > +#define NB_MASK_REQ_DST_PROC 0x00000010 > +#define NB_MASK_REQ_DATA_SRC 0x00000007 > + > +#define DC_MASK_PHY_ADDR_VALID 0x00040000 > +#define DC_MASK_LIN_ADDR_VALID 0x00020000 > +#define DC_MASK_MAB_HIT 0x00010000 > +#define DC_MASK_LOCKED_OP 0x00008000 > +#define DC_MASK_WC_MEM_ACCESS 0x00004000 > +#define DC_MASK_UC_MEM_ACCESS 0x00002000 > +#define DC_MASK_ST_TO_LD_CANCEL 0x00001000 > +#define DC_MASK_ST_TO_LD_FOR 0x00000800 > +#define DC_MASK_ST_BANK_CONFLICT 0x00000400 > +#define DC_MASK_LD_BANK_CONFLICT 0x00000200 > +#define DC_MASK_MISALIGN_ACCESS 0x00000100 > +#define DC_MASK_DC_MISS 0x00000080 > +#define DC_MASK_L2_HIT_2M 0x00000040 > +#define DC_MASK_L1_HIT_1G 0x00000020 > +#define DC_MASK_L1_HIT_2M 0x00000010 > +#define DC_MASK_L2_TLB_MISS 0x00000008 > +#define DC_MASK_L1_TLB_MISS 0x00000004 > +#define DC_MASK_STORE_OP 0x00000002 > +#define DC_MASK_LOAD_OP 0x00000001 > + > + > + > +// > +// IBS derived events are identified by event select values which are > +// similar to the event select values that identify performance monitoring > +// counter (PMC) events. Event select values for IBS derived events begin > +// at 0xf000. > +// > +#define IBS_EVENT_BASE 0xf000 > +#define OP_MAX_IBS_COUNTERS 600 > + > + > +// > +// IBS derived events > +// > +// The definitions in this file *must* match definitions > +// of IBS derived events in gh-events.xml and in the > +// oprofile AMD Family 10h events file. More information > +// about IBS derived events is given in the Software Oprimization > +// Guide for AMD Family 10h Processors. > +// > + > +// > +// The following defines associate a 16-bit select value with an IBS > +// derived fetch event. > +// > +#define DE_IBS_FETCH_ALL 0xF000 > +#define DE_IBS_FETCH_KILLED 0xF001 > +#define DE_IBS_FETCH_ATTEMPTED 0xF002 > +#define DE_IBS_FETCH_COMPLETED 0xF003 > +#define DE_IBS_FETCH_ABORTED 0xF004 > +#define DE_IBS_L1_ITLB_HIT 0xF005 > +#define DE_IBS_ITLB_L1M_L2H 0xF006 > +#define DE_IBS_ITLB_L1M_L2M 0xF007 > +#define DE_IBS_IC_MISS 0xF008 > +#define DE_IBS_IC_HIT 0xF009 > +#define DE_IBS_FETCH_4K_PAGE 0xF00A > +#define DE_IBS_FETCH_2M_PAGE 0xF00B > +#define DE_IBS_FETCH_1G_PAGE 0xF00C > +#define DE_IBS_FETCH_XX_PAGE 0xF00D > +#define DE_IBS_FETCH_LATENCY 0xF00E > + > + > +// > +// The following defines associate a 16-bit select value with an IBS > +// derived branch/return macro-op event. > +// > +#define DE_IBS_OP_ALL 0xF100 > +#define DE_IBS_OP_TAG_TO_RETIRE 0xF101 > +#define DE_IBS_OP_COMP_TO_RETIRE 0xF102 > +#define DE_IBS_BRANCH_RETIRED 0xF103 > +#define DE_IBS_BRANCH_MISP 0xF104 > +#define DE_IBS_BRANCH_TAKEN 0xF105 > +#define DE_IBS_BRANCH_MISP_TAKEN 0xF106 > +#define DE_IBS_RETURN 0xF107 > +#define DE_IBS_RETURN_MISP 0xF108 > +#define DE_IBS_RESYNC 0xF109 > + > + > +// > +// The following defines associate a 16-bit select value with an IBS > +// derived load/store event. > +// > +#define DE_IBS_LS_ALL_OP 0xF200 > +#define DE_IBS_LS_LOAD_OP 0xF201 > +#define DE_IBS_LS_STORE_OP 0xF202 > +#define DE_IBS_LS_DTLB_L1H 0xF203 > +#define DE_IBS_LS_DTLB_L1M_L2H 0xF204 > +#define DE_IBS_LS_DTLB_L1M_L2M 0xF205 > +#define DE_IBS_LS_DC_MISS 0xF206 > +#define DE_IBS_LS_DC_HIT 0xF207 > +#define DE_IBS_LS_MISALIGNED 0xF208 > +#define DE_IBS_LS_BNK_CONF_LOAD 0xF209 > +#define DE_IBS_LS_BNK_CONF_STORE 0xF20A > +#define DE_IBS_LS_STL_FORWARDED 0xF20B > +#define DE_IBS_LS_STL_CANCELLED 0xF20C > +#define DE_IBS_LS_UC_MEM_ACCESS 0xF20D > +#define DE_IBS_LS_WC_MEM_ACCESS 0xF20E > +#define DE_IBS_LS_LOCKED_OP 0xF20F > +#define DE_IBS_LS_MAB_HIT 0xF210 > +#define DE_IBS_LS_L1_DTLB_4K 0xF211 > +#define DE_IBS_LS_L1_DTLB_2M 0xF212 > +#define DE_IBS_LS_L1_DTLB_1G 0xF213 > +#define DE_IBS_LS_L1_DTLB_RES 0xF214 > +#define DE_IBS_LS_L2_DTLB_4K 0xF215 > +#define DE_IBS_LS_L2_DTLB_2M 0xF216 > +#define DE_IBS_LS_L2_DTLB_RES1 0xF217 > +#define DE_IBS_LS_L2_DTLB_RES2 0xF218 > +#define DE_IBS_LS_DC_LOAD_LAT 0xF219 > + > + > +// > +// The following defines associate a 16-bit select value with an IBS > +// derived Northbridge (NB) event. > +// > +#define DE_IBS_NB_LOCAL 0xF240 > +#define DE_IBS_NB_REMOTE 0xF241 > +#define DE_IBS_NB_LOCAL_L3 0xF242 > +#define DE_IBS_NB_LOCAL_CACHE 0xF243 > +#define DE_IBS_NB_REMOTE_CACHE 0xF244 > +#define DE_IBS_NB_LOCAL_DRAM 0xF245 > +#define DE_IBS_NB_REMOTE_DRAM 0xF246 > +#define DE_IBS_NB_LOCAL_OTHER 0xF247 > +#define DE_IBS_NB_REMOTE_OTHER 0xF248 > +#define DE_IBS_NB_CACHE_STATE_M 0xF249 > +#define DE_IBS_NB_CACHE_STATE_O 0xF24A > +#define DE_IBS_NB_LOCAL_LATENCY 0xF24B > +#define DE_IBS_NB_REMOTE_LATENCY 0xF24C > + > + > + > +#define IBS_EVENT_TO_COUNTER(x) \ > + (x - IBS_EVENT_BASE + OP_MAX_COUNTERS) > + > +#define COUNTER_TO_IBS_EVENT(x) \ > + (x + IBS_EVENT_BASE - OP_MAX_COUNTERS) > + > +#define IS_IBS_FETCH(x) \ > + (x < DE_IBS_OP_ALL) > + > +#endif > > ------------------------------------------------------------------------ > > This body part will be downloaded on demand. > ------------------------------------------------------------------------ > > This body part will be downloaded on demand. |
From: Jason Y. <jas...@am...> - 2008-07-23 14:49:10
|
Maynard Johnson wrote: > Jason Yeh wrote: >> This patch contains daemon/kernel module interface change and daemon code processing IBS events. >> > Jason, I know you plan on reworking this patch, so I'm waiting to do a > full review. But a cursory glance at the changes to opd_trans.c makes > me think that it might be best to separate out the IBS stuff into its > own file, similar to how John suggested my Cell SPU-specific changes > were done. Please take a look at that technique (opd_spu.c) and see if > that makes sense to do in this case. too. I will take a look at that and see how it can benefit the IBS code. I am nearly ready for sending out the updated patches for IBS. I just need to break up the patches and write the actual email. The patches will be required to get Oprofile to work with Robert Richter and Barry Kassindorf's kernel patch. Jason |
From: Jason Y. <jas...@am...> - 2008-01-29 21:49:45
Attachments:
op_cvs_IBS_PATCH_3
|
This patch contains the rest of daemon changes to data structures to store and write IBS events to sample file, and new stats counting number of IBS samples. --- Makefile.am | 3 - liblegacy/opd_proc.c | 2 liblegacy/opd_sample_files.c | 10 +++- opd_events.c | 46 ++++++++++++++----- opd_events.h | 7 +-- opd_mangling.c | 15 +++--- opd_mangling.h | 3 - opd_printf.h | 2 opd_sfile.c | 99 +++++++++++++++++++++++++++++++++++-------- opd_sfile.h | 7 ++- opd_stats.c | 2 opd_stats.h | 1 oprofiled |binary oprofiled.c | 11 ++++ oprofiled.h | 2 15 files changed, 163 insertions(+), 47 deletions(-) diff -uprN -X dontdiff oprofile-cvs-original/daemon/Makefile.am oprofile-cvs-ibs/daemon/Makefile.am --- oprofile-cvs-original/daemon/Makefile.am 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/Makefile.am 2008-01-29 09:39:46.000000000 -0600 @@ -24,7 +24,8 @@ oprofiled_SOURCES = \ opd_perfmon.c \ opd_anon.h \ opd_anon.c \ - opd_spu.c + opd_spu.c \ + opd_ibs.h LIBS=@POPT_LIBS@ @LIBERTY_LIBS@ diff -uprN -X dontdiff oprofile-cvs-original/daemon/liblegacy/opd_proc.c oprofile-cvs-ibs/daemon/liblegacy/opd_proc.c --- oprofile-cvs-original/daemon/liblegacy/opd_proc.c 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/liblegacy/opd_proc.c 2008-01-29 09:39:46.000000000 -0600 @@ -143,7 +143,7 @@ void opd_put_image_sample(struct opd_ima sfile = image->sfiles[cpu_number][counter]; } - err = odb_update_node(&sfile->sample_file, offset); + err = odb_update_node(&sfile->sample_file, offset, 1); if (err) { fprintf(stderr, "%s\n", strerror(err)); abort(); diff -uprN -X dontdiff oprofile-cvs-original/daemon/liblegacy/opd_sample_files.c oprofile-cvs-ibs/daemon/liblegacy/opd_sample_files.c --- oprofile-cvs-original/daemon/liblegacy/opd_sample_files.c 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/liblegacy/opd_sample_files.c 2008-01-29 09:39:46.000000000 -0600 @@ -69,7 +69,7 @@ static char * opd_mangle_filename(struct { char * mangled; struct mangle_values values; - struct opd_event * event = find_counter_event(counter); + struct opd_event * event = find_counter_event(counter, 0); values.flags = 0; if (image->kernel) @@ -142,8 +142,12 @@ retry: goto out; } - fill_header(odb_get_data(&sfile->sample_file), counter, 0, 0, - image->kernel, 0, 0, 0, image->mtime); + fill_header(odb_get_data(&sfile->sample_file), counter, + 0, 0, + image->kernel, 0, + 0, 0, + image->mtime, 0); + out: free(mangled); diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_events.c oprofile-cvs-ibs/daemon/opd_events.c --- oprofile-cvs-original/daemon/opd_events.c 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_events.c 2008-01-29 09:39:46.000000000 -0600 @@ -13,6 +13,7 @@ #include "opd_events.h" #include "opd_printf.h" +#include "opd_ibs.h" #include "oprofiled.h" #include "op_string.h" @@ -22,13 +23,16 @@ #include "op_libiberty.h" #include "op_hw_config.h" #include "op_sample_file.h" +#include "op_events.h" #include <stdlib.h> #include <stdio.h> extern op_cpu cpu_type; +extern int ibs_fetch_count; +extern int ibs_op_count; -struct opd_event opd_events[OP_MAX_COUNTERS]; +struct opd_event opd_events[OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS]; static double cpu_speed; @@ -91,11 +95,6 @@ void opd_parse_events(char const * event return; } - if (!ev || !strlen(ev)) { - fprintf(stderr, "oprofiled: no events passed.\n"); - exit(EXIT_FAILURE); - } - verbprintf(vmisc, "Events: %s\n", ev); c = ev; @@ -125,13 +124,33 @@ void opd_parse_events(char const * event } -struct opd_event * find_counter_event(unsigned long counter) +struct opd_event * find_counter_event(unsigned long counter, int ibs) { size_t i; - - for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) { - if (counter == opd_events[i].counter) - return &opd_events[i]; + struct op_event * ibs_lookup_event; + + /* If IBS is enabled, use events_list to fill appropriate opd_event */ + if (ibs) { + ibs_lookup_event = op_find_event(cpu_type, COUNTER_TO_IBS_EVENT(counter)); + + if (!ibs_lookup_event) { + abort(); + } + + opd_events[counter].name = op_xstrndup(ibs_lookup_event->name, strlen(ibs_lookup_event->name)); + opd_events[counter].value = COUNTER_TO_IBS_EVENT(counter); + opd_events[counter].counter = counter; + opd_events[counter].count = + IS_IBS_FETCH(COUNTER_TO_IBS_EVENT(counter))?ibs_fetch_count:ibs_op_count; + opd_events[counter].um = 0; + opd_events[counter].kernel = 1; + opd_events[counter].user = 1; + return &opd_events[counter]; + } else { + for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) { + if (counter == opd_events[i].counter) + return &opd_events[i]; + } } fprintf(stderr, "Unknown event for counter %lu\n", counter); @@ -143,9 +162,10 @@ struct opd_event * find_counter_event(un void fill_header(struct opd_header * header, unsigned long counter, vma_t anon_start, vma_t cg_to_anon_start, int is_kernel, int cg_to_is_kernel, - int spu_samples, uint64_t embed_offset, time_t mtime) + int spu_samples, uint64_t embed_offset, time_t mtime, + int ibs) { - struct opd_event * event = find_counter_event(counter); + struct opd_event * event = find_counter_event(counter, ibs); memset(header, '\0', sizeof(struct opd_header)); header->version = OPD_VERSION; diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_events.h oprofile-cvs-ibs/daemon/opd_events.h --- oprofile-cvs-original/daemon/opd_events.h 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_events.h 2008-01-29 09:39:46.000000000 -0600 @@ -34,14 +34,15 @@ extern struct opd_event opd_events[]; void opd_parse_events(char const * events); /** Find the event for the given counter */ -struct opd_event * find_counter_event(unsigned long counter); +struct opd_event * find_counter_event(unsigned long counter, int ibs); struct opd_header; /** fill the sample file header with event info etc. */ void fill_header(struct opd_header * header, unsigned long counter, - vma_t anon_start, vma_t anon_end, + vma_t anon_start, vma_t anon_end, int is_kernel, int cg_to_is_kernel, - int spu_samples, uint64_t embed_offset, time_t mtime); + int spu_samples, uint64_t embed_offset, time_t mtime, + int ibs); #endif /* OPD_EVENTS_H */ diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_mangling.c oprofile-cvs-ibs/daemon/opd_mangling.c --- oprofile-cvs-original/daemon/opd_mangling.c 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_mangling.c 2008-01-29 09:39:46.000000000 -0600 @@ -66,11 +66,12 @@ static char * mangle_anon(struct anon_ma static char * -mangle_filename(struct sfile * last, struct sfile const * sf, int counter, int cg) +mangle_filename(struct sfile * last, struct sfile const * sf, int counter, int cg, + int ibs) { - char * mangled; + char * mangled = NULL; struct mangle_values values; - struct opd_event * event = find_counter_event(counter); + struct opd_event * event = find_counter_event(counter, ibs); values.flags = 0; @@ -139,7 +140,8 @@ mangle_filename(struct sfile * last, str int opd_open_sample_file(odb_t * file, struct sfile * last, - struct sfile * sf, int counter, int cg) + struct sfile * sf, int counter, int cg, + int ibs) { char * mangled; char const * binary; @@ -147,7 +149,7 @@ int opd_open_sample_file(odb_t * file, s vma_t last_start = 0; int err; - mangled = mangle_filename(last, sf, counter, cg); + mangled = mangle_filename(last, sf, counter, cg, ibs); if (!mangled) return EINVAL; @@ -194,7 +196,8 @@ retry: sf->anon ? sf->anon->start : 0, last_start, !!sf->kernel, last ? !!last->kernel : 0, spu_profile, sf->embedded_offset, - binary ? op_get_mtime(binary) : 0); + (binary ? op_get_mtime(binary) : 0 ), + ibs); out: sfile_put(sf); diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_mangling.h oprofile-cvs-ibs/daemon/opd_mangling.h --- oprofile-cvs-original/daemon/opd_mangling.h 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_mangling.h 2008-01-29 09:39:46.000000000 -0600 @@ -28,6 +28,7 @@ struct sfile; * Returns 0 on success. */ int opd_open_sample_file(odb_t * file, struct sfile * last, - struct sfile * sf, int counter, int cg); + struct sfile * sf, int counter, int cg, + int ibs); #endif /* OPD_MANGLING_H */ diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_printf.h oprofile-cvs-ibs/daemon/opd_printf.h --- oprofile-cvs-original/daemon/opd_printf.h 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_printf.h 2008-01-29 09:39:46.000000000 -0600 @@ -22,6 +22,8 @@ extern int vsamples; extern int varcs; /// kernel module handling extern int vmodule; +/// ibs debuging +extern int vibs_debug; /// all others not fitting in above category, not voluminous. extern int vmisc; diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_sfile.c oprofile-cvs-ibs/daemon/opd_sfile.c --- oprofile-cvs-original/daemon/opd_sfile.c 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_sfile.c 2008-01-29 09:39:46.000000000 -0600 @@ -28,6 +28,9 @@ #define HASH_SIZE 2048 #define HASH_BITS (HASH_SIZE - 1) +extern int ibs_fetch_count; +extern int ibs_op_count; + /** All sfiles are hashed into these lists */ static struct list_head hashes[HASH_SIZE]; @@ -173,6 +176,7 @@ create_sfile(unsigned long hash, struct * meaningless (though not the app_cookie if separate_kernel) */ sf->cookie = trans->in_kernel ? INVALID_COOKIE : trans->cookie; + sf->app_cookie = INVALID_COOKIE; sf->tid = (pid_t)-1; sf->tgid = (pid_t)-1; @@ -180,7 +184,7 @@ create_sfile(unsigned long hash, struct sf->kernel = ki; sf->anon = trans->anon; - for (i = 0 ; i < op_nr_counters ; ++i) + for (i = 0 ; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) odb_init(&sf->files[i]); for (i = 0; i < CG_HASH_SIZE; ++i) @@ -275,7 +279,7 @@ static void sfile_dup(struct sfile * to, memcpy(to, from, sizeof (struct sfile)); - for (i = 0 ; i < op_nr_counters ; ++i) + for (i = 0 ; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) odb_init(&to->files[i]); for (i = 0; i < CG_HASH_SIZE; ++i) @@ -293,15 +297,22 @@ static odb_t * get_file(struct transient struct cg_entry * cg; struct list_head * pos; unsigned long hash; + unsigned long counter = trans->event; odb_t * file; - if (trans->event >= op_nr_counters) { - fprintf(stderr, "%s: Invalid counter %lu\n", __FUNCTION__, - trans->event); - abort(); + if ((ibs_fetch_count || ibs_op_count) && (trans->ibs_fetch || trans->ibs_op)) { + /* Translate IBS event value to counter */ + counter = IBS_EVENT_TO_COUNTER(counter); + } else { + /* Disable counter number checking for IBS */ + if (counter >= op_nr_counters) { + fprintf(stderr, "%s: Invalid counter %lu\n", __FUNCTION__, + counter); + abort(); + } } - file = &sf->files[trans->event]; + file = &sf->files[counter]; if (!is_cg) goto open; @@ -314,7 +325,7 @@ static odb_t * get_file(struct transient list_for_each(pos, &sf->cg_hash[hash]) { cg = list_entry(pos, struct cg_entry, hash); if (sfile_equal(last, &cg->to)) { - file = &cg->to.files[trans->event]; + file = &cg->to.files[counter]; goto open; } } @@ -322,15 +333,17 @@ static odb_t * get_file(struct transient cg = xmalloc(sizeof(struct cg_entry)); sfile_dup(&cg->to, last); list_add(&cg->hash, &sf->cg_hash[hash]); - file = &cg->to.files[trans->event]; + file = &cg->to.files[counter]; open: - if (!odb_open_count(file)) - opd_open_sample_file(file, last, sf, trans->event, is_cg); + if (!odb_open_count(file)) { + opd_open_sample_file(file, last, sf, counter, is_cg, trans->ibs_fetch || trans->ibs_op); + } /* Error is logged by opd_open_sample_file */ - if (!odb_open_count(file)) + if (!odb_open_count(file)) { return NULL; + } return file; } @@ -407,7 +420,7 @@ static void sfile_log_arc(struct transie key = to & (0xffffffff); key |= ((uint64_t)from) << 32; - err = odb_update_node(file, key); + err = odb_update_node(file, key, 1); if (err) { fprintf(stderr, "%s: %s\n", __FUNCTION__, strerror(err)); abort(); @@ -415,6 +428,18 @@ static void sfile_log_arc(struct transie } +/* + * Function: sfile_log_sample + * + * This function logs a single event sample. It gets the oprofile database + * file using the information in the transient struct trans. It converts + * the PC for kernel, anonymous and JIT samples from an absolute address + * to an offset. If a file is not found for the sample, the sample is tallied + * as a lost sample. Finally, odb_insert() is called to actually insert the + * sample into the oprofile database. This function tallies a single + * event sample, so the count value passed to odb_insert() is one. + * + */ void sfile_log_sample(struct transient const * trans) { int err; @@ -437,7 +462,47 @@ void sfile_log_sample(struct transient c if (trans->current->anon) pc -= trans->current->anon->start; - + + if (vsamples) + verbose_sample(trans, pc); + + if (!file) { + opd_stats[OPD_LOST_SAMPLEFILE]++; + return; + } + + err = odb_update_node(file, (uint64_t)pc, 1); + if (err) { + fprintf(stderr, "%s: %s\n", __FUNCTION__, strerror(err)); + abort(); + } +} + + +void sfile_log_sample_count(struct transient const * trans, + unsigned long int count) +{ + int err; + vma_t pc = trans->pc; + odb_t * file; + + if (trans->tracing == TRACING_ON) { + /* can happen if kernel sample falls through the cracks, + * see opd_put_sample() */ + if (trans->last) + sfile_log_arc(trans); + return; + } + + file = get_file(trans, 0); + + /* absolute value -> offset */ + if (trans->current->kernel) + pc -= trans->current->kernel->start; + + if (trans->current->anon) + pc -= trans->current->anon->start; + if (vsamples) verbose_sample(trans, pc); @@ -446,7 +511,7 @@ void sfile_log_sample(struct transient c return; } - err = odb_update_node(file, (uint64_t)pc); + err = odb_update_node(file, (odb_key_t)pc, (odb_value_t)count); if (err) { fprintf(stderr, "%s: %s\n", __FUNCTION__, strerror(err)); abort(); @@ -459,7 +524,7 @@ static int close_sfile(struct sfile * sf size_t i; /* it's OK to close a non-open odb file */ - for (i = 0; i < op_nr_counters; ++i) + for (i = 0; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) odb_close(&sf->files[i]); return 0; @@ -478,7 +543,7 @@ static int sync_sfile(struct sfile * sf, { size_t i; - for (i = 0; i < op_nr_counters; ++i) + for (i = 0; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) odb_sync(&sf->files[i]); return 0; diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_sfile.h oprofile-cvs-ibs/daemon/opd_sfile.h --- oprofile-cvs-original/daemon/opd_sfile.h 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_sfile.h 2008-01-29 09:39:46.000000000 -0600 @@ -18,6 +18,7 @@ #include "op_hw_config.h" #include "op_types.h" #include "op_list.h" +#include "opd_ibs.h" #include <sys/types.h> @@ -61,7 +62,7 @@ struct sfile { /** true if this file should be ignored in profiles */ int ignored; /** opened sample files */ - odb_t files[OP_MAX_COUNTERS]; + odb_t files[OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS]; /** hash table of opened cg sample files */ struct list_head cg_hash[CG_HASH_SIZE]; }; @@ -107,6 +108,10 @@ struct sfile * sfile_find(struct transie /** Log the sample in a previously located sfile. */ void sfile_log_sample(struct transient const * trans); +/** Log the event/cycle count in a previously located sfile */ +void sfile_log_sample_count(struct transient const * trans, + unsigned long int count); + /** initialise hashes */ void sfile_init(void); diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_stats.c oprofile-cvs-ibs/daemon/opd_stats.c --- oprofile-cvs-original/daemon/opd_stats.c 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_stats.c 2008-01-29 09:39:46.000000000 -0600 @@ -18,6 +18,7 @@ #include <stdlib.h> #include <stdio.h> + unsigned long opd_stats[OPD_MAX_STATS]; /** @@ -50,6 +51,7 @@ void opd_print_stats(void) opd_stats[OPD_LOST_SAMPLEFILE]); printf("Nr. samples lost due to no permanent mapping: %lu\n", opd_stats[OPD_LOST_NO_MAPPING]); + printf("Nr. IBS samples mapped: %lu\n", opd_stats[OPD_IBS_SAMPLE]); print_if("Nr. event lost due to buffer overflow: %u\n", "/dev/oprofile/stats", "event_lost_overflow", 1); print_if("Nr. samples lost due to no mapping: %u\n", diff -uprN -X dontdiff oprofile-cvs-original/daemon/opd_stats.h oprofile-cvs-ibs/daemon/opd_stats.h --- oprofile-cvs-original/daemon/opd_stats.h 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/opd_stats.h 2008-01-29 09:39:46.000000000 -0600 @@ -23,6 +23,7 @@ enum { OPD_SAMPLES, /**< nr. samples */ OPD_LOST_NO_MAPPING, /**< nr samples lost due to no mapping */ OPD_DUMP_COUNT, /**< nr. of times buffer is read */ OPD_DANGLING_CODE, /**< nr. partial code notifications (buffer overflow */ + OPD_IBS_SAMPLE, /**< nr. of IBS samples mapped */ OPD_MAX_STATS /**< end of stats */ }; Binary files oprofile-cvs-original/daemon/oprofiled and oprofile-cvs-ibs/daemon/oprofiled differ diff -uprN -X dontdiff oprofile-cvs-original/daemon/oprofiled.c oprofile-cvs-ibs/daemon/oprofiled.c --- oprofile-cvs-original/daemon/oprofiled.c 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/oprofiled.c 2008-01-29 09:39:46.000000000 -0600 @@ -30,6 +30,7 @@ #include "op_lockfile.h" #include "op_list.h" #include "op_fileio.h" +#include "op_events.h" #include <sys/types.h> #include <sys/resource.h> @@ -56,6 +57,7 @@ int vsamples; int varcs; int vmodule; int vmisc; +int vibs_debug; int separate_lib; int separate_kernel; int separate_thread; @@ -63,6 +65,8 @@ int separate_cpu; int no_vmlinux; char * vmlinux; char * kernel_range; +int ibs_fetch_count = 0; +int ibs_op_count = 0; char * session_dir; int no_xen; char * xenimage; @@ -93,6 +97,8 @@ static struct poptOption options[] = { { "events", 'e', POPT_ARG_STRING, &events, 0, "events list", "[events]" }, { "version", 'v', POPT_ARG_NONE, &showvers, 0, "show version", NULL, }, { "verbose", 'V', POPT_ARG_STRING, &verbose, 0, "be verbose in log file", "all,sfile,arcs,samples,module,misc", }, + { "ibs-fetch", 'i', POPT_ARG_INT, &ibs_fetch_count, 0, "AMD IBS Fetch mode", "[0 to Max]", }, + { "ibs-op", 'o', POPT_ARG_INT, &ibs_op_count, 0, "AMD IBS OP mode", "[0 to Max]", }, POPT_AUTOHELP { NULL, 0, 0, NULL, 0, NULL, NULL, }, }; @@ -337,6 +343,7 @@ static void opd_handle_verbose_option(ch varcs = 1; vmodule = 1; vmisc = 1; + vibs_debug = 1; } else if (!strcmp(name, "sfile")) { vsfile = 1; } else if (!strcmp(name, "arcs")) { @@ -347,6 +354,8 @@ static void opd_handle_verbose_option(ch vmodule = 1; } else if (!strcmp(name, "misc")) { vmisc = 1; + } else if (!strcmp(name, "ibs_debug")) { + vibs_debug = 1; } else { fprintf(stderr, "unknown verbose options\n"); exit(EXIT_FAILURE); @@ -410,7 +419,7 @@ static void opd_options(int argc, char c } } - if (events == NULL) { + if (events == NULL && (ibs_fetch_count || ibs_op_count)) { fprintf(stderr, "oprofiled: no events specified.\n"); poptPrintHelp(optcon, stderr, 0); exit(EXIT_FAILURE); diff -uprN -X dontdiff oprofile-cvs-original/daemon/oprofiled.h oprofile-cvs-ibs/daemon/oprofiled.h --- oprofile-cvs-original/daemon/oprofiled.h 2008-01-28 16:05:28.000000000 -0600 +++ oprofile-cvs-ibs/daemon/oprofiled.h 2008-01-29 09:39:46.000000000 -0600 @@ -64,5 +64,7 @@ extern char * kernel_range; extern int no_xen; extern char * xenimage; extern char * xen_range; +extern int ibs_fetch; +extern int ibs_op; #endif /* OPROFILED_H */ |
From: Jason Y. <jas...@am...> - 2008-01-29 21:50:21
Attachments:
op_cvs_IBS_PATCH_5
|
This patch includes changes to recognize IBS events and correction to Family 10h processor name. --- libop/op_cpu_type.c | 2 +- libop/op_cpu_type.h | 2 +- libop/op_events.c | 21 ++++++++++++++++----- utils/ophelp.c | 15 ++++++++++++++- 4 files changed, 32 insertions(+), 8 deletions(-) diff -uprN -X dontdiff oprofile-cvs-original/utils/ophelp.c oprofile-cvs-ibs/utils/ophelp.c --- oprofile-cvs-original/utils/ophelp.c 2008-01-28 16:05:29.000000000 -0600 +++ oprofile-cvs-ibs/utils/ophelp.c 2008-01-29 09:39:46.000000000 -0600 @@ -73,6 +73,19 @@ static void help_for_event(struct op_eve do_arch_specific_event_help(event); nr_counters = op_get_nr_counters(cpu_type); + /* + * Sanity check + */ + if(!event) + return; + + /* + * Check for IBS derived events, we do not want + * to list these events + */ + if( event->name != NULL && strncmp(event->name,"IBS",3) == 0) + return; + printf("%s", event->name); printf(": (counter: "); @@ -385,7 +398,7 @@ int main(int argc, char const * argv[]) printf("oprofile: available events for CPU type \"%s\"\n\n", pretty); switch (cpu_type) { case CPU_HAMMER: - case CPU_FAMILY10: + case CPU_FAMILY10H: break; case CPU_ATHLON: printf ("See AMD document x86 optimisation guide (22007.pdf), Appendix D\n\n"); diff -uprN -X dontdiff oprofile-cvs-original/libop/op_cpu_type.c oprofile-cvs-ibs/libop/op_cpu_type.c --- oprofile-cvs-original/libop/op_cpu_type.c 2008-01-28 16:05:27.000000000 -0600 +++ oprofile-cvs-ibs/libop/op_cpu_type.c 2008-01-29 09:39:46.000000000 -0600 @@ -67,7 +67,7 @@ static struct cpu_descr const cpu_descrs { "ppc64 POWER6", "ppc64/power6", CPU_PPC64_POWER6, 4 }, { "ppc64 970MP", "ppc64/970MP", CPU_PPC64_970MP, 8 }, { "ppc64 Cell Broadband Engine", "ppc64/cell-be", CPU_PPC64_CELL, 8 }, - { "AMD64 family10", "x86-64/family10", CPU_FAMILY10, 4 }, + { "AMD64 family10h", "x86-64/family10h", CPU_FAMILY10H, 4 }, { "ppc64 PA6T", "ppc64/pa6t", CPU_PPC64_PA6T, 6 }, { "ARM MPCore", "arm/mpcore", CPU_ARM_MPCORE, 2 }, { "ARM V6 PMU", "arm/armv6", CPU_ARM_V6, 3 }, diff -uprN -X dontdiff oprofile-cvs-original/libop/op_cpu_type.h oprofile-cvs-ibs/libop/op_cpu_type.h --- oprofile-cvs-original/libop/op_cpu_type.h 2008-01-28 16:05:26.000000000 -0600 +++ oprofile-cvs-ibs/libop/op_cpu_type.h 2008-01-29 09:39:46.000000000 -0600 @@ -65,7 +65,7 @@ typedef enum { CPU_PPC64_POWER6, /**< ppc64 POWER6 family */ CPU_PPC64_970MP, /**< ppc64 970MP */ CPU_PPC64_CELL, /**< ppc64 Cell Broadband Engine*/ - CPU_FAMILY10, /**< AMD family 10 */ + CPU_FAMILY10H, /**< AMD family 10 */ CPU_PPC64_PA6T, /**< ppc64 PA6T */ CPU_ARM_MPCORE, /**< ARM MPCore */ CPU_ARM_V6, /**< ARM V6 */ diff -uprN -X dontdiff oprofile-cvs-original/libop/op_events.c oprofile-cvs-ibs/libop/op_events.c --- oprofile-cvs-original/libop/op_events.c 2008-01-28 16:05:27.000000000 -0600 +++ oprofile-cvs-ibs/libop/op_events.c 2008-01-29 09:39:46.000000000 -0600 @@ -432,10 +432,20 @@ static void load_events(op_cpu cpu_type) char * um_file; char * dir; struct list_head * pos; + static op_cpu last_cpu_type = 0; - if (!list_empty(&events_list)) - return; - + if(last_cpu_type != cpu_type) + { + last_cpu_type = cpu_type; + + // Empty the list and reinitialize it. + op_free_events(); + } + else + { + if (!list_empty(&events_list)) + return; + } dir = getenv("OPROFILE_EVENTS_DIR"); if (dir == NULL) dir = OP_DATADIR; @@ -691,7 +701,8 @@ struct op_event * op_find_event(op_cpu c { struct op_event * event; - load_events(cpu_type); + if (list_empty(&events_list)) + load_events(cpu_type); event = find_event(nr); @@ -758,7 +769,7 @@ void op_default_event(op_cpu cpu_type, s case CPU_CORE_2: case CPU_ATHLON: case CPU_HAMMER: - case CPU_FAMILY10: + case CPU_FAMILY10H: descr->name = "CPU_CLK_UNHALTED"; break; |
From: Jason Y. <jas...@am...> - 2008-01-29 21:50:41
Attachments:
op_cvs_IBS_PATCH_7
|
This patch includes changes to opannotate. --- opannotate.cpp | 211 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 202 insertions(+), 9 deletions(-) diff -uprN -X dontdiff oprofile-cvs-original/pp/opannotate.cpp oprofile-cvs-ibs/pp/opannotate.cpp --- oprofile-cvs-original/pp/opannotate.cpp 2008-01-28 16:05:26.000000000 -0600 +++ oprofile-cvs-ibs/pp/opannotate.cpp 2008-01-29 09:39:57.000000000 -0600 @@ -176,6 +176,92 @@ string asm_line_annotation(symbol_entry } +/// NOTE: This function annotates a list<string> containing output from objdump. +/// It uses a list iterator, and a sample_container iterator which iterates +/// from the beginning to the end, and compare sample address +/// against the instruction address on the asm line. +/// +/// There are 2 cases of annotation: +/// 1. If sample address matches current line address, annotate the current line. +/// 2. If (previous line address < sample address < current line address), +/// then we annotate previous line. This case happens when sample address +/// is not aligned with the instruction address, which is seen when profile +/// using the instruction fetch mode of AMD Instruction-Based Sampling (IBS). +/// +string asm_list_annotation(symbol_entry const * last_symbol, + list<string>::iterator sit, + sample_container::samples_iterator & samp_it ) +{ + // do not use the bfd equivalent: + // - it does not skip space at begin + // - we does not need cross architecture compile so the native + // strtoull must work, assuming unsigned long long can contain a vma + // and on 32/64 bits box bfd_vma is 64 bits + // gcc 2.91.66 workaround + string value = *sit; + bfd_vma vma = strtoull(value.c_str(), NULL, 16); + string str; + + sample_entry const * sample = &samp_it->second; + + if (sample->vma == vma) { + str += count_str(sample->counts, samples->samples_count()); + + // For each events + for (size_t i = 1; i < nr_events; ++i) + str += " "; + + str += " :"; + *sit = str + *sit; + if(samp_it != samples->end()) + ++samp_it; + samples->delete_samples(last_symbol, vma); + } else if (sample->vma < vma) { + // vma of the current line is greater than vma of the sample + + // Get the string of previous line + list<string>::iterator sit_prev = sit; + --sit_prev; + string prev_line = *sit_prev; + + // sub_string is the part that contains adderss + string sub_string = prev_line.substr(prev_line.find(":", 0)+1, prev_line.length()); + + bfd_vma vma_prev = strtoull(sub_string.c_str(), NULL, 16); + + // Need to check if prev_vma < sample->vma + if( vma_prev < sample->vma) + { + // Aggregate sample with previous line if it already has samples + sample_entry * prev_sample = (sample_entry *)samples->find_sample(last_symbol, vma_prev); + if (prev_sample) + prev_sample->counts += sample->counts; + + str += count_str(sample->counts, samples->samples_count()); + + // For each events + for (size_t i = 1; i < nr_events; ++i) + str += " "; + + str += " :"; + *sit_prev = str + sub_string; + *sit = annotation_fill + *sit; + if(samp_it != samples->end()) + ++samp_it; + samples->delete_samples(last_symbol, sample->vma); + }else{ + *sit = annotation_fill + *sit; + if(samp_it != samples->end()) + ++samp_it; + } + } else { + *sit = annotation_fill + *sit; + } + + return str; +} + + string symbol_annotation(symbol_entry const * symbol) { if (!symbol) @@ -269,11 +355,123 @@ symbol_entry const * output_objdump_asm_ } +/// NOTE: This function is similar to the function "output_objdump_asm_line" above. +/// It operates on the list<string> instead of just one string. +symbol_entry const * annotate_objdump_str_list(symbol_entry const * last_symbol, + string const & app_name, + list<string>::iterator sit, + symbol_collection const & symbols, + bool & do_output, + sample_container::samples_iterator & samp_it) +{ + // output of objdump is a human readable form and can contain some + // ambiguity so this code is dirty. It is also optimized a little bit + // so it is difficult to simplify it without breaking something ... + + // line of interest are: "[:space:]*[:xdigit:]?[ :]", the last char of + // this regexp dis-ambiguate between a symbol line and an asm line. If + // source contain line of this form an ambiguity occur and we rely on + // the robustness of this code. + string str = *sit; + size_t pos = 0; + while (pos < str.length() && isspace(str[pos])) + ++pos; + + if (pos == str.length() || !isxdigit(str[pos])) { + if (do_output) { + *sit = annotation_fill + str; + return last_symbol; + } + } + + while (pos < str.length() && isxdigit(str[pos])) + ++pos; + + if (pos == str.length() || (!isspace(str[pos]) && str[pos] != ':')) { + if (do_output) { + *sit = annotation_fill + str; + return last_symbol; + } + } + + if (is_symbol_line(str, pos)) { + + last_symbol = find_symbol(app_name, str); + + // ! complexity: linear in number of symbol must use sorted + // by address vector and lower_bound ? + // Note this use a pointer comparison. It work because symbols + // pointer are unique + if (find(symbols.begin(), symbols.end(), last_symbol) + != symbols.end()) { + do_output = true; + } else { + do_output = false; + } + if(do_output){ + *sit += symbol_annotation(last_symbol); + + // Realign the sample iterator to + // the beginning of this symbols + samp_it = samples->begin(last_symbol); + } + } else { + // not a symbol, probably an asm line. + if(do_output){ + asm_list_annotation(last_symbol, sit, samp_it); + } + } + + return last_symbol; +} + + +void output_objdump_str_list( symbol_collection const & symbols, + string const & app_name, + list<string> & asm_lines) +{ + symbol_entry const * last_symbol = 0; + + // to filter output of symbols (filter based on command line options) + bool do_output = true; + + // We simultaneously walk the two structures (list and sample_container) + // which are sorted by address. and do address comparision. + list<string>::iterator sit = asm_lines.begin(); + list<string>::iterator send = asm_lines.end(); + sample_container::samples_iterator samp_it = samples->begin(); + + for(; sit != send; sit++) { + last_symbol = annotate_objdump_str_list(last_symbol, + app_name, + sit, + symbols, + do_output, + samp_it); + + if(!do_output) { + *sit = ""; + } + } + + // Printing objdump output to stdout + sit = asm_lines.begin(); + for(; sit != send; ++sit) { + string str = *sit; + if(str.length() != 0) + cout << str << '\n'; + } +} + + + + void do_one_output_objdump(symbol_collection const & symbols, string const & image_name, string const & app_name, bfd_vma start, bfd_vma end) { vector<string> args; + list<string> asm_lines; args.push_back("-d"); args.push_back("--no-show-raw-insn"); @@ -301,16 +499,14 @@ void do_one_output_objdump(symbol_collec return; } - // to filter output of symbols (filter based on command line options) - bool do_output = true; - - symbol_entry const * last_symbol = 0; + // Read each output line from objdump and store in a list. string str; while (reader.getline(str)) { - last_symbol = output_objdump_asm_line(last_symbol, app_name, - str, symbols, do_output); + asm_lines.push_back(str); } + output_objdump_str_list(symbols, app_name, asm_lines); + // objdump always returns SUCCESS so we must rely on the stderr state // of objdump. If objdump error message is cryptic our own error // message will be probably also cryptic @@ -693,9 +889,6 @@ int opannotate(options::spec const & spe nr_events = classes.v.size(); - samples.reset(new profile_container(true, true, - classes.extra_found_images)); - list<string> images; list<inverted_profile> iprofiles = invert_profiles(classes); |
From: Jason Y. <jas...@am...> - 2008-01-29 21:52:12
Attachments:
op_cvs_IBS_PATCH_4
|
This patch includes more changes to back end data structure to support IBS events. --- libdb/db_insert.c | 8 ++++---- libdb/odb.h | 2 +- libpp/profile_container.cpp | 7 +++++++ libpp/profile_container.h | 2 ++ libpp/sample_container.cpp | 10 ++++++++++ libpp/sample_container.h | 3 +++ libutil++/op_bfd.cpp | 5 +++++ libutil++/op_bfd.h | 3 +++ libutil/op_fileio.c | 20 ++++++++++++++++++++ libutil/op_fileio.h | 13 +++++++++++++ 10 files changed, 68 insertions(+), 5 deletions(-) diff -uprN -X dontdiff oprofile-cvs-original/libdb/db_insert.c oprofile-cvs-ibs/libdb/db_insert.c --- oprofile-cvs-original/libdb/db_insert.c 2008-01-28 16:05:26.000000000 -0600 +++ oprofile-cvs-ibs/libdb/db_insert.c 2008-01-29 09:39:46.000000000 -0600 @@ -49,7 +49,7 @@ static inline int add_node(odb_data_t * return 0; } -int odb_update_node(odb_t * odb, odb_key_t key) +int odb_update_node(odb_t * odb, odb_key_t key, odb_value_t value) { odb_index_t index; odb_node_t * node; @@ -60,8 +60,8 @@ int odb_update_node(odb_t * odb, odb_key while (index) { node = &data->node_base[index]; if (node->key == key) { - if (node->value + 1 != 0) { - node->value += 1; + if (node->value + value != 0) { + node->value += value; } else { /* post profile tools must handle overflow */ /* FIXME: the tricky way will be just to add @@ -92,7 +92,7 @@ int odb_update_node(odb_t * odb, odb_key index = node->next; } - return add_node(data, key, 1); + return add_node(data, key, value); } diff -uprN -X dontdiff oprofile-cvs-original/libdb/odb.h oprofile-cvs-ibs/libdb/odb.h --- oprofile-cvs-original/libdb/odb.h 2008-01-28 16:05:26.000000000 -0600 +++ oprofile-cvs-ibs/libdb/odb.h 2008-01-29 09:39:46.000000000 -0600 @@ -178,7 +178,7 @@ void odb_hash_free_stat(odb_hash_stat_t * * returns EXIT_SUCCESS on success, EXIT_FAILURE on failure */ -int odb_update_node(odb_t * odb, odb_key_t key); +int odb_update_node(odb_t * odb, odb_key_t key, odb_value_t value); /** Add a new node w/o regarding if a node with the same key already exists * diff -uprN -X dontdiff oprofile-cvs-original/libpp/profile_container.cpp oprofile-cvs-ibs/libpp/profile_container.cpp --- oprofile-cvs-original/libpp/profile_container.cpp 2008-01-28 16:05:26.000000000 -0600 +++ oprofile-cvs-ibs/libpp/profile_container.cpp 2008-01-29 09:39:46.000000000 -0600 @@ -162,6 +162,12 @@ profile_container::add_samples(op_bfd co } +void profile_container::delete_samples(symbol_entry const * symbol, bfd_vma vma) +{ + samples->erase(symbol, vma); +} + + symbol_collection const profile_container::select_symbols(symbol_choice & choice) const { @@ -332,3 +338,4 @@ symbol_container::symbols_t::iterator pr { return symbols->end(); } + diff -uprN -X dontdiff oprofile-cvs-original/libpp/profile_container.h oprofile-cvs-ibs/libpp/profile_container.h --- oprofile-cvs-original/libpp/profile_container.h 2008-01-28 16:05:26.000000000 -0600 +++ oprofile-cvs-ibs/libpp/profile_container.h 2008-01-29 09:39:46.000000000 -0600 @@ -66,6 +66,8 @@ public: void add(profile_t const & profile, op_bfd const & abfd, std::string const & app_name, size_t pclass); + void delete_samples(symbol_entry const * symbol, bfd_vma vma); + /// Find a symbol from its image_name, vma, return zero if no symbol /// for this image at this vma symbol_entry const * find_symbol(std::string const & image_name, diff -uprN -X dontdiff oprofile-cvs-original/libpp/sample_container.cpp oprofile-cvs-ibs/libpp/sample_container.cpp --- oprofile-cvs-original/libpp/sample_container.cpp 2008-01-28 16:05:26.000000000 -0600 +++ oprofile-cvs-ibs/libpp/sample_container.cpp 2008-01-29 09:39:46.000000000 -0600 @@ -108,6 +108,16 @@ sample_container::find_by_vma(symbol_ent } +void +sample_container::erase(symbol_entry const * symbol, bfd_vma vma) +{ + sample_index_t key(symbol, vma); + samples_iterator it = samples.find(key); + if (it != samples.end()) + samples.erase(key); +} + + count_array_t sample_container::accumulate_samples(debug_name_id filename, size_t linenr) const diff -uprN -X dontdiff oprofile-cvs-original/libpp/sample_container.h oprofile-cvs-ibs/libpp/sample_container.h --- oprofile-cvs-original/libpp/sample_container.h 2008-01-28 16:05:26.000000000 -0600 +++ oprofile-cvs-ibs/libpp/sample_container.h 2008-01-29 09:39:46.000000000 -0600 @@ -44,6 +44,9 @@ public: /// samples into an existing one. Can only be done before any lookups void insert(symbol_entry const * symbol, sample_entry const &); + /// erase the sample entry for the given image_name and vma if any + void erase(symbol_entry const * symbol, bfd_vma vma); + /// return nr of samples in the given filename count_array_t accumulate_samples(debug_name_id filename_id) const; diff -uprN -X dontdiff oprofile-cvs-original/libutil/op_fileio.c oprofile-cvs-ibs/libutil/op_fileio.c --- oprofile-cvs-original/libutil/op_fileio.c 2008-01-28 16:05:29.000000000 -0600 +++ oprofile-cvs-ibs/libutil/op_fileio.c 2008-01-29 09:39:46.000000000 -0600 @@ -56,6 +56,26 @@ void op_close_file(FILE * fp) } +void op_read_file(FILE * fp, void * buf, size_t size) +{ + size_t count; + + count = fread(buf, size, 1, fp); + + if (count != 1) { + if (feof(fp)) { + fprintf(stderr, + "oprofiled:op_read_file: read less than expected %lu bytes\n", + (unsigned long)size); + } else { + fprintf(stderr, + "oprofiled:op_read_file: error reading\n"); + } + exit(EXIT_FAILURE); + } +} + + void op_write_file(FILE * fp, void const * buf, size_t size) { size_t written; diff -uprN -X dontdiff oprofile-cvs-original/libutil/op_fileio.h oprofile-cvs-ibs/libutil/op_fileio.h --- oprofile-cvs-original/libutil/op_fileio.h 2008-01-28 16:05:29.000000000 -0600 +++ oprofile-cvs-ibs/libutil/op_fileio.h 2008-01-29 09:39:46.000000000 -0600 @@ -41,6 +41,19 @@ FILE * op_try_open_file(char const * nam FILE * op_open_file(char const * name, char const * mode); /** + * op_read_file - read a file + * @param fp file pointer + * @param buf buffer + * @param size size in bytes to read + * + * Read from a file. It is considered an error + * if anything less than size bytes is read. + * Failure is fatal. + */ +void op_read_file(FILE * fp, void * buf, size_t size); + + +/** * op_read_int_from_file - parse an ASCII value from a file into an integer * @param filename name of file to parse integer value from * @param fatal non-zero if any error must be fatal diff -uprN -X dontdiff oprofile-cvs-original/libutil++/op_bfd.cpp oprofile-cvs-ibs/libutil++/op_bfd.cpp --- oprofile-cvs-original/libutil++/op_bfd.cpp 2008-01-28 16:05:30.000000000 -0600 +++ oprofile-cvs-ibs/libutil++/op_bfd.cpp 2008-01-29 09:39:46.000000000 -0600 @@ -275,6 +275,11 @@ void op_bfd::add_symbols(op_bfd::symbols << dec << syms.size() << hex << endl; } +unsigned long op_bfd::filepos(symbol_index_t sym_index) const +{ + return syms[sym_index].filepos(); +} + bfd_vma op_bfd::offset_to_pc(bfd_vma offset) const { diff -uprN -X dontdiff oprofile-cvs-original/libutil++/op_bfd.h oprofile-cvs-ibs/libutil++/op_bfd.h --- oprofile-cvs-original/libutil++/op_bfd.h 2008-01-28 16:05:30.000000000 -0600 +++ oprofile-cvs-ibs/libutil++/op_bfd.h 2008-01-29 09:39:46.000000000 -0600 @@ -158,6 +158,9 @@ public: /** return the relocated PC value for the given file offset */ bfd_vma offset_to_pc(bfd_vma offset) const; + + unsigned long filepos(symbol_index_t sym_index) const; + /** * If passed 0, return the file position of the .text section. * Otherwise, return the filepos of a section with a matching |
From: Jason Y. <jas...@am...> - 2008-01-29 21:52:15
Attachments:
op_cvs_IBS_PATCH_6
|
This patch includes new event names for IBS and renaming family10 to family10h. --- Makefile.am | 2 x86-64/family10/events | 148 ------------------- x86-64/family10/unit_masks | 312 ---------------------------------------- x86-64/family10h/events | 206 ++++++++++++++++++++++++++ x86-64/family10h/unit_masks | 342 ++++++++++++++++++++++++++++++++++++++++++++ x86-64/hammer/events | 142 ++++++++++-------- x86-64/hammer/unit_masks | 123 ++++++++------- 7 files changed, 699 insertions(+), 576 deletions(-) diff -uprN -X dontdiff oprofile-cvs-original/events/Makefile.am oprofile-cvs-ibs/events/Makefile.am --- oprofile-cvs-original/events/Makefile.am 2008-01-28 16:05:27.000000000 -0600 +++ oprofile-cvs-ibs/events/Makefile.am 2008-01-29 09:39:46.000000000 -0600 @@ -27,7 +27,7 @@ event_files = \ ppc64/cell-be/events ppc64/cell-be/unit_masks \ rtc/events rtc/unit_masks \ x86-64/hammer/events x86-64/hammer/unit_masks \ - x86-64/family10/events x86-64/family10/unit_masks \ + x86-64/family10h/events x86-64/family10h/unit_masks \ arm/xscale1/events arm/xscale1/unit_masks \ arm/xscale2/events arm/xscale2/unit_masks \ arm/armv6/events arm/armv6/unit_masks \ diff -uprN -X dontdiff oprofile-cvs-original/events/x86-64/family10/events oprofile-cvs-ibs/events/x86-64/family10/events --- oprofile-cvs-original/events/x86-64/family10/events 2008-01-28 16:05:27.000000000 -0600 +++ oprofile-cvs-ibs/events/x86-64/family10/events 1969-12-31 18:00:00.000000000 -0600 @@ -1,148 +0,0 @@ -# -# Family 10 events -# -# Copyright OProfile authors -# -# Copyright (c) Advanced Micro Devices, 2006, 2007 -# Contributed by Ray Bryant <ra...@am...>, -# Jason Yeh <jas...@am...> -# Suravee Suthikulpanit <sur...@am...> -# - -# default event -event:0x76 counters:0,1,2,3 um:zero minimum:3000 name:CPU_CLK_UNHALTED : Cycles outside of halt state - -# Floating point events -event:0x00 counters:0,1,2,3 um:fpu_ops minimum:500 name:DISPATCHED_FPU_OPS : Dispatched FPU ops -event:0x01 counters:0,1,2,3 um:zero minimum:500 name:CYCLES_FPU_EMPTY : The number of cycles in which the PFU is empty -event:0x02 counters:0,1,2,3 um:zero minimum:500 name:DISPATCHED_FPU_OPS_FAST_FLAG : The number of FPU operations that use the fast flag interface -event:0x03 counters:0,1,2,3 um:sse_ops minimum:500 name:RETIRED_SSE_OPS : The number of SSE ops or uops retired -event:0x04 counters:0,1,2,3 um:move_ops minimum:500 name:RETIRED_MOVE_OPS : The number of move uops retired -event:0x05 counters:0,1,2,3 um:serial_ops minimum:500 name:RETIRED_SERIALIZING_OPS : The number of serializing uops retired. -event:0x06 counters:0,1,2,3 um:serial_ops_sched minimum:500 name:SERIAL_UOPS_IN_FP_SCHED : Number of cycles a serializing uop is in the FP scheduler - -# Load, Store, and TLB events -event:0x20 counters:0,1,2,3 um:segregload minimum:500 name:SEGMENT_REGISTER_LOADS : Segment register loads -event:0x21 counters:0,1,2,3 um:zero minimum:500 name:PIPELINE_RESTART_DUE_TO_SELF_MODIFYING_CODE : Micro-architectural re-sync caused by self modifying code -event:0x22 counters:0,1,2,3 um:zero minimum:500 name:PIPELINE_RESTART_DUE_TO_PROBE_HIT : Micro-architectural re-sync caused by snoop -event:0x23 counters:0,1,2,3 um:zero minimum:500 name:LS_BUFFER_2_FULL_CYCLES : Cycles LS Buffer 2 Full -event:0x24 counters:0,1,2,3 um:lock_ops minimum:500 name:LOCKED_OPS : Locked operations -event:0x26 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_CLFLUSH : Retired CLFLUSH instructions -event:0x27 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_CPUID : Retired CPUID instructions -event:0x2a counters:0,1,2,3 um:store_to_load minimum:500 name:CANCELLED_STORE_TO_LOAD : Counts the number of cancelled store to load forward operations -event:0x2b counters:0,1,2,3 um:zero minimum:500 name:SMIS_RECEIVED : Counts the number of SMI received - -# Data Cache event -event:0x40 counters:0,1,2,3 um:zero minimum:500 name:DATA_CACHE_ACCESSES : Data cache accesses -event:0x41 counters:0,1,2,3 um:zero minimum:500 name:DATA_CACHE_MISSES : Data cache misses -# Note: unit mask 0x01 counts same events as event select 0x43 -event:0x42 counters:0,1,2,3 um:moess minimum:500 name:DATA_CACHE_REFILLS_FROM_L2_OR_NORTHBRIDGE : Data cache refills from L2 or northbridge -event:0x43 counters:0,1,2,3 um:moesi minimum:500 name:DATA_CACHE_REFILLS_FROM_NORTHBRIDGE : Data cache refills from northbridge -event:0x44 counters:0,1,2,3 um:moesi_gh minimum:500 name:DATA_CACHE_LINES_EVICTED : Data cache lines evicted -event:0x45 counters:0,1,2,3 um:l1_dlb_miss_l2_hit minimum:500 name:L1_DTLB_MISS_AND_L2_DTLB_HIT : L1 DTLB misses and L2 DTLB hits -event:0x46 counters:0,1,2,3 um:l1_l2_dlb_miss minimum:500 name:L1_DTLB_AND_L2_DTLB_MISS : L1 and L2 DTLB misses -event:0x47 counters:0,1,2,3 um:zero minimum:500 name:MISALIGNED_ACCESSES : Misaligned Accesses -event:0x48 counters:0,1,2,3 um:zero minimum:500 name:MICRO_ARCH_LATE_CANCEL_ACCESS : Microarchitectural late cancel of an access -event:0x49 counters:0,1,2,3 um:zero minimum:500 name:MICRO_ARCH_EARLY_CANCEL_ACCESS : Microarchitectural early cancel of an access -event:0x4a counters:0,1,2,3 um:ecc minimum:500 name:1_BIT_ECC_ERRORS : Single-bit ECC errors recorded by scrubber -event:0x4b counters:0,1,2,3 um:prefetch minimum:500 name:PREFETCH_INSTRUCTIONS_DISPATCHED : The number of prefetch instructions dispatched by the decoder -event:0x4c counters:0,1,2,3 um:locked_instruction_dcache_miss minimum:500 name:LOCKED_INSTRUCTIONS_DCACHE_MISSES : The number of dta cache misses by locked instructions. -event:0x4d counters:0,1,2,3 um:l1_dtlb_hit minimum:500 name:L1_DTLB_HIT : L1 DTLB hit -event:0x52 counters:0,1,2,3 um:soft_prefetch minimum:500 name:INEFFECTIVE_SW_PREFETCHES : Number of software prefetches that did not fetch data outside of processor core -event:0x54 counters:0,1,2,3 um:zero minimum:500 name:GLOBAL_TLB_FLUSHES : The number of global TLB flushes - -# L2 Cache and System Interface events -event:0x65 counters:0,1,2,3 um:memreqtype minimum:500 name:MEMORY_REQUESTS : Memory Requests by Type -event:0x67 counters:0,1,2,3 um:dataprefetch minimum:500 name:DATA_PREFETCHES : Data Prefetcher -event:0x6c counters:0,1,2,3 um:systemreadresponse minimum:500 name:NORTHBRIDGE_READ_RESPONSES : Northbridge Read Responses by Coherency State -event:0x6d counters:0,1,2,3 um:quadword_transfer minimum:500 name:OCTWORD_WRITE_TRANSFERS : Octwords Written to System -event:0x7d counters:0,1,2,3 um:l2_internal minimum:500 name:REQUESTS_TO_L2 : Requests to L2 Cache -event:0x7e counters:0,1,2,3 um:l2_req_miss minimum:500 name:L2_CACHE_MISS : L2 Cache Misses -event:0x7f counters:0,1,2,3 um:l2_fill minimum:500 name:L2_CACHE_FILL_WRITEBACK : L2 Fill/Writeback - -# Instruction Cache events -event:0x80 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_FETCHES : Instruction cache fetches (RevE) -event:0x81 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_MISSES : Instruction cache misses -event:0x82 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_REFILLS_FROM_L2 : Instruction Cache Refills from L2 -event:0x83 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM : Instruction Cache Refills from System -event:0x84 counters:0,1,2,3 um:zero minimum:500 name:L1_ITLB_MISS_AND_L2_ITLB_HIT : L1 ITLB misses (and L2 ITLB hits) -event:0x85 counters:0,1,2,3 um:l1_l2_itlb_miss minimum:500 name:L1_ITLB_MISS_AND_L2_ITLB_MISS : L1 ITLB Miss, L2 ITLB Miss -event:0x86 counters:0,1,2,3 um:zero minimum:500 name:PIPELINE_RESTART_DUE_TO_INSTRUCTION_STREAM_PROBE : Pipeline Restart Due to Instruction Stream Probe -event:0x87 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_FETCH_STALL : Instruction fetch stall -event:0x88 counters:0,1,2,3 um:zero minimum:500 name:RETURN_STACK_HITS : Return stack hit -event:0x89 counters:0,1,2,3 um:zero minimum:500 name:RETURN_STACK_OVERFLOWS : Return stack overflow -event:0x8b counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_VICTIMS : Number of instruction cachelines evicticed to L2 -event:0x8c counters:0,1,2,3 um:icache_invalidated minimum:500 name:INSTRUCTION_CHCHE_INVALIDATED : Instruction cache lines invalidated -event:0x99 counters:0,1,2,3 um:zero minimum:500 name:ITLB_RELOADS : The number of ITLB reloads requests -event:0x9a counters:0,1,2,3 um:zero minimum:500 name:ITLB_RELOADS_ABORTED : The number of ITLB reloads aborted - -# Execution Unit events -event:0xc0 counters:0,1,2,3 um:zero minimum:3000 name:RETIRED_INSTRUCTIONS : Retired instructions (includes exceptions, interrupts, re-syncs) -event:0xc1 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_UOPS : Retired micro-ops -event:0xc2 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_BRANCH_INSTRUCTIONS : Retired branches (conditional, unconditional, exceptions, interrupts) -event:0xc3 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS : Retired Mispredicted Branch Instructions -event:0xc4 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_TAKEN_BRANCH_INSTRUCTIONS : Retired taken branch instructions -event:0xc5 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_TAKEN_BRANCH_INSTRUCTIONS_MISPREDICTED : Retired taken branches mispredicted -event:0xc6 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_FAR_CONTROL_TRANSFERS : Retired far control transfers -event:0xc7 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_BRANCH_RESYNCS : Retired branches resyncs (only non-control transfer branches) -event:0xc8 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_NEAR_RETURNS : Retired near returns -event:0xc9 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_NEAR_RETURNS_MISPREDICTED : Retired near returns mispredicted -event:0xca counters:0,1,2,3 um:zero minimum:500 name:RETIRED_INDIRECT_BRANCHES_MISPREDICTED : Retired Indirect Branches Mispredicted -event:0xcb counters:0,1,2,3 um:fpu_instr minimum:500 name:RETIRED_MMX_FP_INSTRUCTIONS : Retired MMX/FP instructions -event:0xcc counters:0,1,2,3 um:fpu_fastpath minimum:500 name:RETIRED_FASTPATH_DOUBLE_OP_INSTRUCTIONS : Retired FastPath double-op instructions -event:0xcd counters:0,1,2,3 um:zero minimum:500 name:INTERRUPTS_MASKED_CYCLES : Cycles with interrupts masked (IF=0) -event:0xce counters:0,1,2,3 um:zero minimum:500 name:INTERRUPTS_MASKED_CYCLES_WITH_INTERRUPT_PENDING : Cycles with interrupts masked while interrupt pending -event:0xcf counters:0,1,2,3 um:zero minimum:10 name:INTERRUPTS_TAKEN : Number of taken hardware interrupts -event:0xd0 counters:0,1,2,3 um:zero minimum:500 name:DECODER_EMPTY : Nothing to dispatch (decoder empty) -event:0xd1 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALLS : Dispatch stalls -event:0xd2 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_BRANCH_ABORT : Dispatch stall from branch abort to retire -event:0xd3 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_SERIALIZATION : Dispatch stall for serialization -event:0xd4 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_SEGMENT_LOAD : Dispatch stall for segment load -event:0xd5 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_REORDER_BUFFER_FULL : Dispatch stall for reorder buffer full -event:0xd6 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_RESERVATION_STATION_FULL : Dispatch stall when reservation stations are full -event:0xd7 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_FPU_FULL : Dispatch stall when FPU is full -event:0xd8 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_LS_FULL : Dispatch stall when LS is full -event:0xd9 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_WAITING_FOR_ALL_QUIET : Dispatch stall when waiting for all to be quiet -event:0xda counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_FAR_TRANSFER_OR_RESYNC : Dispatch Stall for Far Transfer or Resync to Retire -event:0xdb counters:0,1,2,3 um:fpu_exceptions minimum:1 name:FPU_EXCEPTIONS : FPU exceptions -event:0xdc counters:0,1,2,3 um:zero minimum:1 name:DR0_BREAKPOINTS : The number of matches on the address in breakpoint register DR0 -event:0xdd counters:0,1,2,3 um:zero minimum:1 name:DR1_BREAKPOINTS : The number of matches on the address in breakpoint register DR1 -event:0xde counters:0,1,2,3 um:zero minimum:1 name:DR2_BREAKPOINTS : The number of matches on the address in breakpoint register DR2 -event:0xdf counters:0,1,2,3 um:zero minimum:1 name:DR3_BREAKPOINTS : The number of matches on the address in breakpoint register DR3 - -# Memory Controler events -event:0xe0 counters:0,1,2,3 um:page_access minimum:500 name:DRAM_ACCESSES : DRAM Accesses -event:0xe1 counters:0,1,2,3 um:mem_page_overflow minimum:500 name:MEMORY_CONTROLLER_PAGE_TABLE_OVERFLOWS : Memory controller page table overflows -event:0xe2 counters:0,1,2,3 um:slot_missed minimum:500 name:MEMORY_CONTROLLER_SLOT_MISSED : Memory controller DRAM command slots missed -event:0xe3 counters:0,1,2,3 um:turnaround minimum:500 name:MEMORY_CONTROLLER_TURNAROUNDS : Memory controller turnarounds -event:0xe4 counters:0,1,2,3 um:saturation minimum:500 name:MEMORY_CONTROLLER_BYPASS_COUNTER_SATURATION : Memory controller bypass saturation -event:0xe8 counters:0,1,2,3 um:thermal_status minimum:500 name:THERMAL_STATUS : Thermal status -event:0xe9 counters:0,1,2,3 um:cpiorequests minimum:500 name:CPU_IO_REQUESTS_TO_MEMORY_IO : CPU/IO Requests to Memory/IO (RevE) -event:0xea counters:0,1,2,3 um:cacheblock minimum:500 name:CACHE_BLOCK_COMMANDS : Cache Block Commands (RevE) -event:0xeb counters:0,1,2,3 um:sizecmds minimum:500 name:SIZED_COMMANDS : Sized Commands -event:0xec counters:0,1,2,3 um:probe minimum:500 name:PROBE_RESPONSES_AND_UPSTREAM_REQUESTS : Probe Responses and Upstream Requests -event:0xee counters:0,1,2,3 um:gart minimum:500 name:GART_EVENTS : GART Events -event:0x1f0 counters:0,1,2,3 um:mem_control_request minimum:500 name:MEMORY_CONTROLLER_REQUESTS : Sized Read/Write activity. - -# Crossbar events -event:0x1e0 counters:0,1,2,3 um:cpu_dram_req minimum:500 name:CPU_DRAM_REQUEST_TO_NODE : CPU to DRAM requests to target node -event:0x1e1 counters:0,1,2,3 um:io_dram_req minimum:500 name:IO_DRAM_REQUEST_TO_NODE : IO to DRAM requests to target node -event:0x1e2 counters:0,1,2,3 um:cpu_read_lat_0_3 minimum:500 name:CPU_READ_COMMAND_LATENCY_NODE_0_3 : Latency between the local node and remote node -event:0x1e3 counters:0,1,2,3 um:cpu_read_lat_0_3 minimum:500 name:CPU_READ_COMMAND_REQUEST_NODE_0_3 : Number of requests that a latency measurment is made for Event 0x1E2 -event:0x1e4 counters:0,1,2,3 um:cpu_read_lat_4_7 minimum:500 name:CPU_READ_COMMAND_LATENCY_NODE_4_7 : Latency between the local node and remote node -event:0x1e5 counters:0,1,2,3 um:cpu_read_lat_4_7 minimum:500 name:CPU_READ_COMMAND_REQUEST_NODE_4_7 : Number of requests that a latency measurment is made for Event 0x1E2 -event:0x1e6 counters:0,1,2,3 um:cpu_comm_lat minimum:500 name:CPU_COMMAND_LATENCY_TARGET : Determine latency between the local node and a remote node. -event:0x1e7 counters:0,1,2,3 um:cpu_comm_lat minimum:500 name:CPU_REQUEST_TARGET : Number of requests that a latency measurement is made for Event 0x1E6 - -# Link events -event:0xf6 counters:0,1,2,3 um:httransmit minimum:500 name:HYPERTRANSPORT_LINK0_TRANSMIT_BANDWIDTH : HyperTransport(tm) link 0 transmit bandwidth -event:0xf7 counters:0,1,2,3 um:httransmit minimum:500 name:HYPERTRANSPORT_LINK1_TRANSMIT_BANDWIDTH : HyperTransport(tm) link 1 transmit bandwidth -event:0xf8 counters:0,1,2,3 um:httransmit minimum:500 name:HYPERTRANSPORT_LINK2_TRANSMIT_BANDWIDTH : HyperTransport(tm) link 2 transmit bandwidth -event:0x1f9 counters:0,1,2,3 um:httransmit minimum:500 name:HYPERTRANSPORT_LINK3_TRANSMIT_BANDWIDTH : HyperTransport(tm) link 3 transmit bandwidth - -# L3 Cache events -event:0x4e0 counters:0,1,2,3 um:l3_cache minimum:500 name:READ_REQUEST_L3_CACHE : Tracks the red requests from each core to L3 cache -event:0x4e1 counters:0,1,2,3 um:l3_cache minimum:500 name:L3_CACHE_MISSES : Tracks the L3 cache misses from each core -event:0x4e2 counters:0,1,2,3 um:l3_fill minimum:500 name:L3_FILLS_CAUSED_BY_L2_EVICTIONS : Tracks the L3 fills caused by L2 evictions per core -event:0x4e3 counters:0,1,2,3 um:l3_evict minimum:500 name:L3_EVICTIONS : Tracks the state of the L3 line when it was evicted - diff -uprN -X dontdiff oprofile-cvs-original/events/x86-64/family10/unit_masks oprofile-cvs-ibs/events/x86-64/family10/unit_masks --- oprofile-cvs-original/events/x86-64/family10/unit_masks 2008-01-28 16:05:27.000000000 -0600 +++ oprofile-cvs-ibs/events/x86-64/family10/unit_masks 1969-12-31 18:00:00.000000000 -0600 @@ -1,312 +0,0 @@ -# -# Family 10 unit masks -# -# Copyright OProfile authors -# Copyright (c) Advanced Micro Devices, 2006. -# Contributed by Ray Bryant <ra...@am...> -# Jason Yeh <jas...@am...> -# Suravee Suthikulpanit <sur...@am...> -# -name:zero type:mandatory default:0x0 - 0x0 No unit mask -name:moesi type:bitmask default:0x1f - 0x10 (M)odified cache state - 0x08 (O)wner cache state - 0x04 (E)xclusive cache state - 0x02 (S)hared cache state - 0x01 (I)nvalid cache state - 0x1f All cache states -name:moess type:bitmask default:0x1e - 0x01 Refill from northbridge - 0x02 Shared-state line from L2 - 0x04 Exclusive-state line from L2 - 0x08 Owner-state line from L2 - 0x10 Modified-state line from L2 - 0x1e All cache states except refill from northbridge -name:fpu_ops type:bitmask default:0x3f - 0x01 Add pipe ops excluding load ops and SSE move ops - 0x02 Multiply pipe ops excluding load ops and SSE move ops - 0x04 Store pipe ops excluding load ops and SSE move ops - 0x08 Add pipe load ops and SSE move ops - 0x10 Multiply pipe load ops and SSE move ops - 0x20 Store pipe load ops and SSE move ops - 0x3F all ops -name:segregload type:bitmask default:0x7f - 0x01 ES register - 0x02 CS register - 0x04 SS register - 0x08 DS register - 0x10 FS register - 0x20 GS register - 0x40 HS register -name:fpu_instr type:bitmask default:0x07 - 0x01 x87 instructions - 0x02 MMX & 3DNow instructions - 0x04 SSE & SSE2 instructions -name:fpu_fastpath type:bitmask default:0x07 - 0x01 With low op in position 0 - 0x02 With low op in position 1 - 0x04 With low op in position 2 -name:fpu_exceptions type:bitmask default:0x0f - 0x01 x87 reclass microfaults - 0x02 SSE retype microfaults - 0x04 SSE reclass microfaults - 0x08 SSE and x87 microtraps -name:page_access type:bitmask default:0xff - 0x01 DCT0 Page hit - 0x02 DCT0 Page miss - 0x04 DCT0 Page conflict - 0x08 DCT1 Page hit - 0x10 DCT1 Page miss - 0x20 DCT1 Page Conflict - 0x40 Write request - 0x80 Read request -name:mem_page_overflow type:bitmask default:0x03 - 0x01 DCT0 Page Table Overflow - 0x02 DCT1 Page Table Overflow -name:turnaround type:bitmask default:0x3f - 0x01 DCT0 DIMM (chip select) turnaround - 0x02 DCT0 Read to write turnaround - 0x04 DCT0 Write to read turnaround - 0x08 DCT1 DIMM (chip select) turnaround - 0x10 DCT1 Read to write turnaround - 0x20 DCT1 Write to read turnaround -name:saturation type:bitmask default:0x0f - 0x01 Memory controller high priority bypass - 0x02 Memory controller medium priority bypass - 0x04 DCT0 DCQ bypass - 0x08 DCT1 DCQ bypass -name:slot_missed type:bitmask default:0x03 - 0x01 DCT0 Command slots missed - 0x02 DCT2 Command slots missed -name:sizecmds type:bitmask default:0x3f - 0x01 non-posted write byte (1-32 bytes) - 0x02 non-posted write dword (1-16 dwords) - 0x04 posted write byte (1-32 bytes) - 0x08 posted write dword (1-16 dwords) - 0x10 read byte (4 bytes) - 0x20 read dword (1-16 dwords) -name:probe type:bitmask default:0xff - 0x01 Probe miss - 0x02 Probe hit clean - 0x04 Probe hit dirty without memory cancel - 0x08 Probe hit dirty with memory cancel - 0x10 Upstream display refresh/ISOC reads - 0x20 Upstream non-display refresh reads - 0x40 Upstream ISOC writes - 0x80 Upstream non-ISOC writes -name:l2_internal type:bitmask default:0x3f - 0x01 IC fill - 0x02 DC fill - 0x04 TLB fill (page table walks) - 0x08 Tag snoop request - 0x10 Canceled request - 0x20 Hardware prefetch from data cache -name:l2_req_miss type:bitmask default:0x0f - 0x01 IC fill - 0x02 DC fill (includes possible replays) - 0x04 TLB page table walk - 0x08 Hardwareprefetch from data cache -name:l2_fill type:bitmask default:0x03 - 0x01 L2 fills (victims from L1 caches, TLB page table walks and data prefetches) - 0x02 L2 Writebacks to system -name:gart type:bitmask default:0xff - 0x01 GART aperture hit on access from CPU - 0x02 GART aperture hit on access from I/O - 0x04 GART miss - 0x08 GART/DEV Request hit table walk in progress - 0x10 DEV hit - 0x20 DEV miss - 0x40 DEV error - 0x80 GART/DEV multiple table walk in progress -name:cpiorequests type:bitmask default:0x08 - 0x01 IO to IO - 0x04 IO to Mem - 0x08 CPU to IO - 0x10 To remote node - 0x20 To local node - 0x40 From remote node - 0x80 From local node -name:cacheblock type:bitmask default:0x3d - 0x01 Victim Block (Writeback) - 0x04 Read Block (Dcache load miss refill) - 0x08 Read Block Shared (Icache refill) - 0x10 Read Block Modified (Dcache store miss refill) - 0x20 Change to Dirty (first store to clean block already in cache) -name:dataprefetch type:bitmask default:0x03 - 0x01 Cancelled prefetches - 0x02 Prefetch attempts -name:memreqtype type:bitmask default:0x83 - 0x01 Requests to non-cacheable (UC) memory - 0x02 Requests to write-combining (WC) memory or WC buffer flushes to WB memory - 0x80 Streaming store (SS) requests -name:systemreadresponse type:bitmask default:0x17 - 0x01 Exclusive - 0x02 Modified - 0x04 Shared - 0x10 Data Error -name:l1_dlb_miss_l2_hit type:bitmask default:0x03 - 0x01 L2 4K TLB hit - 0x02 L2 2M TLB hit -name:l1_l2_dlb_miss type:bitmask default:0x07 - 0x01 4K TLB reload - 0x02 2M TLB reload - 0x04 1G TLB reload -name:ecc type:bitmask default:0x0f - 0x01 Scrubber error - 0x02 Piggyback scrubber errors - 0x04 Load pipe error - 0x08 Store write pip error -name:prefetch type:bitmask default:0x07 - 0x01 Load (Prefetch, PrefetchT0/T1/T2) - 0x02 Store (PrefetchW) - 0x04 NTA (PrefetchNTA) -name:locked_instruction_dcache_miss type:bitmask default:0x02 - 0x02 Data cache misses by locked instructions -name:quadword_transfer type:bitmask default:0x01 - 0x01 Quadword write transfer -name:thermal_status type:bitmask default:0x7c - 0x04 Number of times the HTC trip point is crossed - 0x08 Number of clocks when STC trip point active - 0x10 Number of times the STC trip point is crossed - 0x20 Number of clocks HTC P-state is inactive - 0x40 Number of clocks HTC P-state is active -name:mem_control_request type:bitmask default:0x78 - 0x01 Write requests - 0x02 Read Requests including Prefetch - 0x04 Prefetch Request - 0x08 32 Bytes Sized Writes - 0x10 64 Bytes Sized Writes - 0x20 32 Bytes Sized Reads - 0x40 64 Byte Sized Reads - 0x80 Read Requests while writes pending in DCQ -name:httransmit type:bitmask default:0xbf - 0x01 Command DWORD sent - 0x02 DWORD sent - 0x04 Buffer release DWORD sent - 0x08 Nop DW sent (idle) - 0x10 Address extension DWORD sent - 0x20 Per packet CRC sent - 0x80 SubLink Mask -name:lock_ops type:bitmask default:0x0f - 0x01 Number of locked instructions executed - 0x02 Cycles in speculative phase - 0x04 Cycles in non-speculative phase (including cache miss penalty) - 0x08 Cache miss penalty in cycles -name:sse_ops type:bitmask default:0x7f - 0x01 Single Precision add/subtract ops - 0x02 Single precision multiply ops - 0x04 Single precision divide/square root ops - 0x08 Double precision add/subtract ops - 0x10 Double precision multiply ops - 0x20 Double precision divide/square root ops - 0x40 OP type, 0=uops 1=FLOPS -name:move_ops type:bitmask default:0x0f - 0x01 Merging low quadword move uops - 0x02 Merging high quadword move uops - 0x04 All other merging move uops - 0x08 All other move uops -name:serial_ops type:bitmask default:0x0f - 0x01 SSE bottom-executing uops retired - 0x02 SSE bottom-serializing uops retired - 0x04 x87 bottom-executing uops retired - 0x08 x87 bottom-serializing uops retired -name:serial_ops_sched type:bitmask default:0x03 - 0x01 Number of cycles a bottom-execute uops in FP scheduler - 0x02 Number of cycles a bottom-serializing uops in FP scheduler -name:store_to_load type:bitmask default:0x07 - 0x01 Address mismatches (starting byte not the same) - 0x02 Store is smaller than load - 0x04 Misaligned -name:moesi_gh type:bitmask default:0x1f - 0x01 (I)nvalid cache state - 0x02 (S)hared cache state - 0x04 (E)xclusive cache state - 0x08 (O)wner cache state - 0x10 (M)odified cache state - 0x20 Cache line evict brought by PrefetchNTA - 0x40 Cache line evict not brought by PrefetchNTA - 0x1f All cache states except PrefetchNTA -name:l1_dtlb_hit type:bitmask default:0x07 - 0x01 L1 4K TLB hit - 0x02 L1 2M TLB hit - 0x04 L1 1G TLB hit -name:soft_prefetch type:bitmask default:0x09 - 0x01 Hit in L1 - 0x08 Hit in L2 -name:l1_l2_itlb_miss type:bitmask default:0x03 - 0x01 Instruction fetches to 4K pages - 0x02 Instruction fetches to 2M pages -name:cpu_dram_req type:bitmask default:0xff - 0x01 From local node to node 0 - 0x02 From local node to node 1 - 0x04 From local node to node 2 - 0x08 From local node to node 3 - 0x10 From local node to node 4 - 0x20 From local node to node 5 - 0x40 From local node to node 6 - 0x80 From local node to node 7 -name:io_dram_req type:bitmask default:0xff - 0x01 From local node to node 0 - 0x02 From local node to node 1 - 0x04 From local node to node 2 - 0x08 From local node to node 3 - 0x10 From local node to node 4 - 0x20 From local node to node 5 - 0x40 From local node to node 6 - 0x80 From local node to node 7 -name:cpu_read_lat_0_3 type:bitmask default:0xff - 0x01 Read block - 0x02 Read block shared - 0x04 Read block modified - 0x08 Change to dirty - 0x10 From local node to node 0 - 0x20 From local node to node 1 - 0x40 From local node to node 2 - 0x80 From local node to node 3 -name:cpu_read_lat_4_7 type:bitmask default:0xff - 0x01 Read block - 0x02 Read block shared - 0x04 Read block modified - 0x08 Change to dirty - 0x10 From local node to node 4 - 0x20 From local node to node 5 - 0x40 From local node to node 6 - 0x80 From local node to node 7 -name:cpu_comm_lat type:bitmask default:0xf7 - 0x01 Read sized - 0x02 Write sized - 0x04 Victim block - 0x08 Node group select. 0=Nodes 0-3. 1=Nodes 4-7 - 0x10 From local node to node 0/4 - 0x20 From local node to node 1/5 - 0x40 From local node to node 2/6 - 0x80 From local node to node 3/7 -name:l3_cache type:bitmask default:0xf7 - 0x01 Read Block Exclusive (Data cache read) - 0x02 Read Block Shared (Instruciton cache read) - 0x04 Read Block Modify - 0x10 Core 0 Select - 0x20 Core 1 Select - 0x40 Core 2 Select - 0x80 Core 3 Select -name:l3_fill type:bitmask default:0xff - 0x01 Shared - 0x02 Exclusive - 0x04 Owned - 0x08 Modified - 0x10 Core 0 Select - 0x20 Core 1 Select - 0x40 Core 2 Select - 0x80 Core 3 Select -name:l3_evict type:bitmask default:0x0f - 0x01 Shared - 0x02 Exclusive - 0x04 Owned - 0x08 Modified -name:icache_invalidated type:bitmask default:0x0f - 0x01 Invalidating probe that did not hit any in-flight instructions - 0x02 Invalidating probe that hit one or more in-flight instructions - 0x04 SMC that did not hit any in-flight instructions - 0x08 SMC that hit one or more in-flight instructions - diff -uprN -X dontdiff oprofile-cvs-original/events/x86-64/family10h/events oprofile-cvs-ibs/events/x86-64/family10h/events --- oprofile-cvs-original/events/x86-64/family10h/events 1969-12-31 18:00:00.000000000 -0600 +++ oprofile-cvs-ibs/events/x86-64/family10h/events 2008-01-29 13:19:33.000000000 -0600 @@ -0,0 +1,206 @@ +# +# AMD Family 10 processor performance events +# +# Copyright OProfile authors +# Copyright (c) 2006-2008 Advanced Micro Devices +# Contributed by Ray Bryant <raybry at amd.com>, +# Jason Yeh <jason.yeh at amd.com> +# Suravee Suthikulpanit <suravee.suthikulpanit at amd.com> +# +# Sources: BIOS and Kernel Developer's Guide for AMD Family 10h Processors, +# Publication# 31116, Revision 3.00, 7 September 2007 +# +# Software Optimization Guide for AMD Family 10h Processors, +# Publication# 40546, Revision 3.04, September 2007 +# +# This file was last updated on 11 January 2008. +# +# Floating point events +event:0x00 counters:0,1,2,3 um:fpu_ops minimum:500 name:DISPATCHED_FPU_OPS : Dispatched FPU ops +event:0x01 counters:0,1,2,3 um:zero minimum:500 name:CYCLES_FPU_EMPTY : The number of cycles in which the PFU is empty +event:0x02 counters:0,1,2,3 um:zero minimum:500 name:DISPATCHED_FPU_OPS_FAST_FLAG : The number of FPU operations that use the fast flag interface +event:0x03 counters:0,1,2,3 um:sse_ops minimum:500 name:RETIRED_SSE_OPS : The number of SSE ops or uops retired +event:0x04 counters:0,1,2,3 um:move_ops minimum:500 name:RETIRED_MOVE_OPS : The number of move uops retired +event:0x05 counters:0,1,2,3 um:serial_ops minimum:500 name:RETIRED_SERIALIZING_OPS : The number of serializing uops retired. +event:0x06 counters:0,1,2,3 um:serial_ops_sched minimum:500 name:SERIAL_UOPS_IN_FP_SCHED : Number of cycles a serializing uop is in the FP scheduler +# Load, Store, and TLB events +event:0x20 counters:0,1,2,3 um:segregload minimum:500 name:SEGMENT_REGISTER_LOADS : Segment register loads +event:0x21 counters:0,1,2,3 um:zero minimum:500 name:PIPELINE_RESTART_DUE_TO_SELF_MODIFYING_CODE : Micro-architectural re-sync caused by self modifying code +event:0x22 counters:0,1,2,3 um:zero minimum:500 name:PIPELINE_RESTART_DUE_TO_PROBE_HIT : Micro-architectural re-sync caused by snoop +event:0x23 counters:0,1,2,3 um:zero minimum:500 name:LS_BUFFER_2_FULL_CYCLES : Cycles LS Buffer 2 Full +event:0x24 counters:0,1,2,3 um:lock_ops minimum:500 name:LOCKED_OPS : Locked operations +event:0x26 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_CLFLUSH_INSTRUCTIONS : Retired CLFLUSH instructions +event:0x27 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_CPUID_INSTRUCTIONS : Retired CPUID instructions +event:0x2a counters:0,1,2,3 um:store_to_load minimum:500 name:CANCELLED_STORE_TO_LOAD : Counts the number of cancelled store to load forward operations +event:0x2b counters:0,1,2,3 um:zero minimum:500 name:SMIS_RECEIVED : Counts the number of SMIs received by the processor +# Data Cache event +event:0x40 counters:0,1,2,3 um:zero minimum:500 name:DATA_CACHE_ACCESSES : Data cache accesses +event:0x41 counters:0,1,2,3 um:zero minimum:500 name:DATA_CACHE_MISSES : Data cache misses +# Note: unit mask 0x01 counts same events as event select 0x43 +event:0x42 counters:0,1,2,3 um:moess minimum:500 name:DATA_CACHE_REFILLS_FROM_L2_OR_NORTHBRIDGE : Data cache refills from L2 or Northbridge +event:0x43 counters:0,1,2,3 um:moesi minimum:500 name:DATA_CACHE_REFILLS_FROM_NORTHBRIDGE : Data cache refills from Northbridge +event:0x44 counters:0,1,2,3 um:moesi_gh minimum:500 name:DATA_CACHE_LINES_EVICTED : Data cache lines evicted +event:0x45 counters:0,1,2,3 um:l1_dtlb_miss_l2_hit minimum:500 name:L1_DTLB_MISS_AND_L2_DTLB_HIT : L1 DTLB miss and L2 DTLB hit +event:0x46 counters:0,1,2,3 um:l1_l2_dtlb_miss minimum:500 name:L1_DTLB_AND_L2_DTLB_MISS : L1 DTLB and L2 DTLB miss +event:0x47 counters:0,1,2,3 um:zero minimum:500 name:MISALIGNED_ACCESSES : Misaligned Accesses +event:0x48 counters:0,1,2,3 um:zero minimum:500 name:MICRO_ARCH_LATE_CANCEL_ACCESS : Microarchitectural late cancel of an access +event:0x49 counters:0,1,2,3 um:zero minimum:500 name:MICRO_ARCH_EARLY_CANCEL_ACCESS : Microarchitectural early cancel of an access +event:0x4a counters:0,1,2,3 um:ecc minimum:500 name:1_BIT_ECC_ERRORS : Single-bit ECC errors recorded by scrubber +event:0x4b counters:0,1,2,3 um:prefetch minimum:500 name:PREFETCH_INSTRUCTIONS_DISPATCHED : The number of prefetch instructions dispatched by the decoder +event:0x4c counters:0,1,2,3 um:locked_instruction_dcache_miss minimum:500 name:LOCKED_INSTRUCTIONS_DCACHE_MISSES : The number of dta cache misses by locked instructions. +event:0x4d counters:0,1,2,3 um:l1_dtlb_hit minimum:500 name:L1_DTLB_HIT : L1 DTLB hit +event:0x52 counters:0,1,2,3 um:soft_prefetch minimum:500 name:INEFFECTIVE_SW_PREFETCHES : Number of software prefetches that did not fetch data outside of processor core +event:0x54 counters:0,1,2,3 um:zero minimum:500 name:GLOBAL_TLB_FLUSHES : The number of global TLB flushes +# L2 Cache and System Interface events +event:0x65 counters:0,1,2,3 um:memreqtype minimum:500 name:MEMORY_REQUESTS : Memory requests by type +event:0x67 counters:0,1,2,3 um:dataprefetch minimum:500 name:DATA_PREFETCHES : Data prefetcher +event:0x6c counters:0,1,2,3 um:systemreadresponse minimum:500 name:NORTHBRIDGE_READ_RESPONSES : Northbridge read responses by coherency state +event:0x6d counters:0,1,2,3 um:octword_transfer minimum:500 name:OCTWORD_WRITE_TRANSFERS : Octwords written to system +event:0x76 counters:0,1,2,3 um:zero minimum:3000 name:CPU_CLK_UNHALTED : Cycles outside of halt state +event:0x7d counters:0,1,2,3 um:l2_internal minimum:500 name:REQUESTS_TO_L2 : Requests to L2 Cache +event:0x7e counters:0,1,2,3 um:l2_req_miss minimum:500 name:L2_CACHE_MISS : L2 cache misses +event:0x7f counters:0,1,2,3 um:l2_fill minimum:500 name:L2_CACHE_FILL_WRITEBACK : L2 fill/writeback +# Instruction Cache events +event:0x80 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_FETCHES : Instruction cache fetches (RevE) +event:0x81 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_MISSES : Instruction cache misses +event:0x82 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_REFILLS_FROM_L2 : Instruction cache refills from L2 +event:0x83 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_REFILLS_FROM_SYSTEM : Instruction cache refills from system +event:0x84 counters:0,1,2,3 um:zero minimum:500 name:L1_ITLB_MISS_AND_L2_ITLB_HIT : L1 ITLB miss and L2 ITLB hit +event:0x85 counters:0,1,2,3 um:l1_l2_itlb_miss minimum:500 name:L1_ITLB_MISS_AND_L2_ITLB_MISS : L1 ITLB miss and L2 ITLB miss +event:0x86 counters:0,1,2,3 um:zero minimum:500 name:PIPELINE_RESTART_DUE_TO_INSTRUCTION_STREAM_PROBE : Pipeline restart due to instruction stream probe +event:0x87 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_FETCH_STALL : Instruction fetch stall +event:0x88 counters:0,1,2,3 um:zero minimum:500 name:RETURN_STACK_HITS : Return stack hit +event:0x89 counters:0,1,2,3 um:zero minimum:500 name:RETURN_STACK_OVERFLOWS : Return stack overflow +event:0x8b counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_VICTIMS : Number of instruction cache lines evicticed to the L2 cache +event:0x8c counters:0,1,2,3 um:icache_invalidated minimum:500 name:INSTRUCTION_CACHE_INVALIDATED : Instruction cache lines invalidated +event:0x99 counters:0,1,2,3 um:zero minimum:500 name:ITLB_RELOADS : The number of ITLB reloads requests +event:0x9a counters:0,1,2,3 um:zero minimum:500 name:ITLB_RELOADS_ABORTED : The number of ITLB reloads aborted +# Execution Unit events +event:0xc0 counters:0,1,2,3 um:zero minimum:3000 name:RETIRED_INSTRUCTIONS : Retired instructions (includes exceptions, interrupts, re-syncs) +event:0xc1 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_UOPS : Retired micro-ops +event:0xc2 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_BRANCH_INSTRUCTIONS : Retired branches (conditional, unconditional, exceptions, interrupts) +event:0xc3 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS : Retired mispredicted branch instructions +event:0xc4 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_TAKEN_BRANCH_INSTRUCTIONS : Retired taken branch instructions +event:0xc5 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_TAKEN_BRANCH_INSTRUCTIONS_MISPREDICTED : Retired taken branches mispredicted +event:0xc6 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_FAR_CONTROL_TRANSFERS : Retired far control transfers +event:0xc7 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_BRANCH_RESYNCS : Retired branches resyncs (only non-control transfer branches) +event:0xc8 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_NEAR_RETURNS : Retired near returns +event:0xc9 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_NEAR_RETURNS_MISPREDICTED : Retired near returns mispredicted +event:0xca counters:0,1,2,3 um:zero minimum:500 name:RETIRED_INDIRECT_BRANCHES_MISPREDICTED : Retired indirect branches mispredicted +event:0xcb counters:0,1,2,3 um:fpu_instr minimum:500 name:RETIRED_MMX_FP_INSTRUCTIONS : Retired MMX/FP instructions +event:0xcc counters:0,1,2,3 um:fpu_fastpath minimum:500 name:RETIRED_FASTPATH_DOUBLE_OP_INSTRUCTIONS : Retired FastPath double-op instructions +event:0xcd counters:0,1,2,3 um:zero minimum:500 name:INTERRUPTS_MASKED_CYCLES : Cycles with interrupts masked (IF=0) +event:0xce counters:0,1,2,3 um:zero minimum:500 name:INTERRUPTS_MASKED_CYCLES_WITH_INTERRUPT_PENDING : Cycles with interrupts masked while interrupt pending +event:0xcf counters:0,1,2,3 um:zero minimum:10 name:INTERRUPTS_TAKEN : Number of taken hardware interrupts +event:0xd0 counters:0,1,2,3 um:zero minimum:500 name:DECODER_EMPTY : Nothing to dispatch (decoder empty) +event:0xd1 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALLS : Dispatch stalls +event:0xd2 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_BRANCH_ABORT : Dispatch stall from branch abort to retire +event:0xd3 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_SERIALIZATION : Dispatch stall for serialization +event:0xd4 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_SEGMENT_LOAD : Dispatch stall for segment load +event:0xd5 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_REORDER_BUFFER_FULL : Dispatch stall for reorder buffer full +event:0xd6 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_RESERVATION_STATION_FULL : Dispatch stall when reservation stations are full +event:0xd7 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_FPU_FULL : Dispatch stall when FPU is full +event:0xd8 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_LS_FULL : Dispatch stall when LS is full +event:0xd9 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_WAITING_FOR_ALL_QUIET : Dispatch stall when waiting for all to be quiet +event:0xda counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_FAR_TRANSFER_OR_RESYNC : Dispatch Stall for Far Transfer or Resync to Retire +event:0xdb counters:0,1,2,3 um:fpu_exceptions minimum:1 name:FPU_EXCEPTIONS : FPU exceptions +event:0xdc counters:0,1,2,3 um:zero minimum:1 name:DR0_BREAKPOINTS : The number of matches on the address in breakpoint register DR0 +event:0xdd counters:0,1,2,3 um:zero minimum:1 name:DR1_BREAKPOINTS : The number of matches on the address in breakpoint register DR1 +event:0xde counters:0,1,2,3 um:zero minimum:1 name:DR2_BREAKPOINTS : The number of matches on the address in breakpoint register DR2 +event:0xdf counters:0,1,2,3 um:zero minimum:1 name:DR3_BREAKPOINTS : The number of matches on the address in breakpoint register DR3 +# Memory Controler events +event:0xe0 counters:0,1,2,3 um:page_access minimum:500 name:DRAM_ACCESSES : DRAM accesses +event:0xe1 counters:0,1,2,3 um:mem_page_overflow minimum:500 name:MEMORY_CONTROLLER_PAGE_TABLE_OVERFLOWS : Memory controller page table overflows +event:0xe2 counters:0,1,2,3 um:slot_missed minimum:500 name:MEMORY_CONTROLLER_SLOT_MISSED : Memory controller DRAM command slots missed +event:0xe3 counters:0,1,2,3 um:turnaround minimum:500 name:MEMORY_CONTROLLER_TURNAROUNDS : Memory controller turnarounds +event:0xe4 counters:0,1,2,3 um:saturation minimum:500 name:MEMORY_CONTROLLER_BYPASS_COUNTER_SATURATION : Memory controller bypass saturation +event:0xe8 counters:0,1,2,3 um:thermal_status minimum:500 name:THERMAL_STATUS : Thermal status +event:0xe9 counters:0,1,2,3 um:cpiorequests minimum:500 name:CPU_IO_REQUESTS_TO_MEMORY_IO : CPU/IO Requests to Memory/IO +event:0xea counters:0,1,2,3 um:cacheblock minimum:500 name:CACHE_BLOCK_COMMANDS : Cache block commands +event:0xeb counters:0,1,2,3 um:sizecmds minimum:500 name:SIZED_COMMANDS : Sized commands +event:0xec counters:0,1,2,3 um:probe minimum:500 name:PROBE_RESPONSES_AND_UPSTREAM_REQUESTS : Probe responses and upstream requests +event:0xee counters:0,1,2,3 um:gart minimum:500 name:GART_EVENTS : GART events +event:0x1f0 counters:0,1,2,3 um:mem_control_request minimum:500 name:MEMORY_CONTROLLER_REQUESTS : Sized read/write activity. +# Crossbar events +event:0x1e0 counters:0,1,2,3 um:cpu_dram_req minimum:500 name:CPU_DRAM_REQUEST_TO_NODE : CPU to DRAM requests to target node +event:0x1e1 counters:0,1,2,3 um:io_dram_req minimum:500 name:IO_DRAM_REQUEST_TO_NODE : IO to DRAM requests to target node +event:0x1e2 counters:0,1,2,3 um:cpu_read_lat_0_3 minimum:500 name:CPU_READ_COMMAND_LATENCY_NODE_0_3 : Latency between the local node and remote node +event:0x1e3 counters:0,1,2,3 um:cpu_read_lat_0_3 minimum:500 name:CPU_READ_COMMAND_REQUEST_NODE_0_3 : Number of requests that a latency measurement is made for Event 0x1E2 +event:0x1e4 counters:0,1,2,3 um:cpu_read_lat_4_7 minimum:500 name:CPU_READ_COMMAND_LATENCY_NODE_4_7 : Latency between the local node and remote node +event:0x1e5 counters:0,1,2,3 um:cpu_read_lat_4_7 minimum:500 name:CPU_READ_COMMAND_REQUEST_NODE_4_7 : Number of requests that a latency measurement is made for Event 0x1E2 +event:0x1e6 counters:0,1,2,3 um:cpu_comm_lat minimum:500 name:CPU_COMMAND_LATENCY_TARGET : Determine latency between the local node and a remote node. +event:0x1e7 counters:0,1,2,3 um:cpu_comm_lat minimum:500 name:CPU_REQUEST_TARGET : Number of requests that a latency measurement is made for Event 0x1E6 +# Link events +event:0xf6 counters:0,1,2,3 um:httransmit minimum:500 name:HYPERTRANSPORT_LINK0_TRANSMIT_BANDWIDTH : HyperTransport(tm) link 0 transmit bandwidth +event:0xf7 counters:0,1,2,3 um:httransmit minimum:500 name:HYPERTRANSPORT_LINK1_TRANSMIT_BANDWIDTH : HyperTransport(tm) link 1 transmit bandwidth +event:0xf8 counters:0,1,2,3 um:httransmit minimum:500 name:HYPERTRANSPORT_LINK2_TRANSMIT_BANDWIDTH : HyperTransport(tm) link 2 transmit bandwidth +event:0x1f9 counters:0,1,2,3 um:httransmit minimum:500 name:HYPERTRANSPORT_LINK3_TRANSMIT_BANDWIDTH : HyperTransport(tm) link 3 transmit bandwidth +# L3 Cache events +event:0x4e0 counters:0,1,2,3 um:l3_cache minimum:500 name:READ_REQUEST_L3_CACHE : Number of read requests from each core to L3 cache +event:0x4e1 counters:0,1,2,3 um:l3_cache minimum:500 name:L3_CACHE_MISSES : Number of L3 cache misses from each core +event:0x4e2 counters:0,1,2,3 um:l3_fill minimum:500 name:L3_FILLS_CAUSED_BY_L2_EVICTIONS : Number of L3 fills caused by L2 evictions per core +event:0x4e3 counters:0,1,2,3 um:l3_evict minimum:500 name:L3_EVICTIONS : Number of L3 cache line evictions by cache state +############################### +# IBS EVENTS +############################### +event:0xf000 counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_SAMPLES : IBS fetch samples +event:0xf001 counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_KILLED : IBS fetch killed +event:0xf002 counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_ATTEMPTED : IBS fetch attempted +event:0xf003 counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_COMPLETED : IBS fetch completed +event:0xf004 counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_ABORTED : IBS fetch aborted +event:0xf005 counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_ITLB_HITS : IBS ITLB hit +event:0xf006 counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_L1_ITLB_MISSES_L2_ITLB_HITS : IBS L1 ITLB misses (and L2 ITLB hits) +event:0xf007 counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_L1_ITLB_MISSES_L2_ITLB_MISSES : IBS L1 L2 ITLB miss +event:0xf008 counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_ICACHE_MISSES : IBS Instruction cache misses +event:0xf009 counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_ICACHE_HITS : IBS Instruction cache hit +event:0xf00A counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_4K_PAGE : IBS 4K page translation +event:0xf00B counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_2M_PAGE : IBS 2M page translation +# +event:0xf00E counters:0,1,2,3 um:zero minimum:500 name:IBS_FETCH_LATENCY : IBS fetch latency +# +event:0xf100 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_ALL : IBS all op samples +event:0xf101 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_TAG_TO_RETIRE : IBS tag-to-retire cycles +event:0xf102 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_COMP_TO_RET : IBS completion-to-retire cycles +event:0xf103 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_BRANCH_RETIRED : IBS branch op +event:0xf104 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_MISPREDICTED_BRANCH : IBS mispredicted branch op +event:0xf105 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_TAKEN_BRANCH : IBS taken branch op +event:0xf106 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_MISPREDICTED_BRANCH_TAKEN : IBS mispredicted taken branch op +event:0xf107 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_RETURNS : IBS return op +event:0xf108 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_MISPREDICTED_RETURNS : IBS mispredicted return op +event:0xf109 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_RESYNC : IBS resync op +event:0xf200 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_ALL_LOAD_STORE : IBS all load store ops +event:0xf201 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_LOAD : IBS load ops +event:0xf202 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_STORE : IBS store ops +event:0xf203 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_L1_DTLB_HITS : IBS L1 DTLB hit +event:0xf204 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_L1_DTLB_MISS_L2_DTLB_HIT : IBS L1 DTLB misses L2 hits +event:0xf205 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_L1_L2_DTLB_MISS : IBS L1 and L2 DTLB misses +event:0xf206 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_DATA_CACHE_MISS : IBS data cache misses +event:0xf207 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_DATA_HITS : IBS data cache hits +event:0xf208 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_MISALIGNED_DATA_ACC : IBS misaligned data access +event:0xf209 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_BANK_CONF_LOAD : IBS bank conflict on load op +event:0xf20A counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_BANK_CONF_STORE : IBS bank conflict on store op +event:0xf20B counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_FORWARD : IBS store-to-load forwarded +event:0xf20C counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_CANCELLED : IBS store-to-load cancelled +event:0xf20D counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_DCUC_MEM_ACC : IBS UC memory access +event:0xf20E counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_DCWC_MEM_ACC : IBS WC memory access +event:0xf20F counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_LOCKED : IBS locked operation +event:0xf210 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_MAB_HIT : IBS MAB hit +event:0xf211 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_L1_DTLB_4K : IBS L1 DTLB 4K page +event:0xf212 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_L1_DTLB_2M : IBS L1 DTLB 2M page +event:0xf213 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_L1_DTLB_1G : IBS L1 DTLB 1G page +event:0xf215 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_L2_DTLB_4K : IBS L2 DTLB 4K page +event:0xf216 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_L2_DTLB_2M : IBS L2 DTLB 2M page +event:0xf219 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_DC_LOAD_LAT : IBS data cache miss load latency +event:0xf240 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_LOCAL_ONLY : IBS northbridge local +event:0xf241 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_REMOTE_ONLY : IBS northbridge remote +event:0xf242 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_LOCAL_L3 : IBS northbridge local L3 +event:0xf243 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_LOCAL_CACHE : IBS northbridge local core L1 or L2 cache +event:0xf244 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_REMOTE_CACHE : IBS northbridge local core L1, L2, L3 cache +event:0xf245 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_LOCAL_DRAM : IBS northbridge local DRAM +event:0xf246 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_REMOTE_DRAM : IBS northbridge remote DRAM +event:0xf247 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_LOCAL_OTHER : IBS northbridge local APIC MMIO Config PCI +event:0xf248 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_REMOTE_OTHER : IBS northbridge remote APIC MMIO Config PCI +event:0xf249 counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_CACHE_MODIFIED : IBS northbridge cache modified state +event:0xf24A counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_CACHE_OWNED : IBS northbridge cache owned state +event:0xf24B counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_LOCAL_CACHE_LAT : IBS northbridge local cache latency +event:0xf24C counters:0,1,2,3 um:zero minimum:500 name:IBS_OP_NB_REMOTE_CACHE_LAT : IBS northbridge remote cache latency diff -uprN -X dontdiff oprofile-cvs-original/events/x86-64/family10h/unit_masks oprofile-cvs-ibs/events/x86-64/family10h/unit_masks --- oprofile-cvs-original/events/x86-64/family10h/unit_masks 1969-12-31 18:00:00.000000000 -0600 +++ oprofile-cvs-ibs/events/x86-64/family10h/unit_masks 2008-01-29 09:39:46.000000000 -0600 @@ -0,0 +1,342 @@ +# +# AMD Family 10 processor unit masks +# +# Copyright OProfile authors +# Copyright (c) 2006-2008 Advanced Micro Devices +# Contributed by Ray Bryant <raybry at amd.com> +# Jason Yeh <jason.yeh at amd.com> +# Suravee Suthikulpanit <suravee.suthikulpanit at amd.com> +# +# Sources: BIOS and Kernel Developer's Guide for AMD Family 10h Processors, +# Publication# 31116, Revision 3.00, September 7, 2007 +# +# Software Optimization Guide for AMD Family 10h Processors, +# Publication# 40546, Revision 3.04, September 2007 +# +# This file was last updated on 11 January 2008. +# +name:zero type:mandatory default:0x0 + 0x0 No unit mask +name:moesi type:bitmask default:0x1f + 0x01 (I)nvalid cache state + 0x02 (S)hared cache state + 0x04 (E)xclusive cache state + 0x08 (O)wner cache state + 0x10 (M)odified cache state + 0x1f All cache states +name:moess type:bitmask default:0x1e + 0x01 Refill from northbridge + 0x02 Shared-state line from L2 + 0x04 Exclusive-state line from L2 + 0x08 Owner-state line from L2 + 0x10 Modified-state line from L2 + 0x1e All cache states except refill from northbridge +name:fpu_ops type:bitmask default:0x3f + 0x01 Add pipe ops excluding load ops and SSE move ops + 0x02 Multiply pipe ops excluding load ops and SSE move ops + 0x04 Store pipe ops excluding load ops and SSE move ops + 0x08 Add pipe load ops and SSE move ops + 0x10 Multiply pipe load ops and SSE move ops + 0x20 Store pipe load ops and SSE move ops + 0x3f All ops +name:segregload type:bitmask default:0x7f + 0x01 ES register + 0x02 CS register + 0x04 SS register + 0x08 DS register + 0x10 FS register + 0x20 GS register + 0x40 HS register +name:fpu_instr type:bitmask default:0x07 + 0x01 x87 instructions + 0x02 MMX & 3DNow instructions + 0x04 SSE & SSE2 instructions +name:fpu_fastpath type:bitmask default:0x07 + 0x01 With low op in position 0 + 0x02 With low op in position 1 + 0x04 With low op in position 2 +name:fpu_exceptions type:bitmask default:0x0f + 0x01 x87 reclass microfaults + 0x02 SSE retype microfaults + 0x04 SSE reclass microfaults + 0x08 SSE and x87 microtraps +name:page_access type:bitmask default:0xff + 0x01 DCT0 Page hit + 0x02 DCT0 Page miss + 0x04 DCT0 Page conflict + 0x08 DCT1 Page hit + 0x10 DCT1 Page miss + 0x20 DCT1 Page Conflict + 0x40 Write request + 0x80 Read request +name:mem_page_overflow type:bitmask default:0x03 + 0x01 DCT0 Page Table Overflow + 0x02 DCT1 Page Table Overflow +name:turnaround type:bitmask default:0x3f + 0x01 DCT0 DIMM (chip select) turnaround + 0x02 DCT0 Read to write turnaround + 0x04 DCT0 Write to read turnaround + 0x08 DCT1 DIMM (chip select) turnaround + 0x10 DCT1 Read to write turnaround + 0x20 DCT1 Write to read turnaround +name:saturation type:bitmask default:0x0f + 0x01 Memory controller high priority bypass + 0x02 Memory controller medium priority bypass + 0x04 DCT0 DCQ bypass + 0x08 DCT1 DCQ bypass +name:slot_missed type:bitmask default:0x03 + 0x01 DCT0 Command slots missed + 0x02 DCT2 Command slots missed +name:sizecmds type:bitmask default:0x3f + 0x01 Non-posted write byte (1-32 bytes) + 0x02 Non-posted write DWORD (1-16 DWORDs) + 0x04 Posted write byte (1-32 bytes) + 0x08 Posted write DWORD (1-16 DWORDs) + 0x10 Read byte (4 bytes) + 0x20 Read DWORD (1-16 DWORDs) +name:probe type:bitmask default:0xff + 0x01 Probe miss + 0x02 Probe hit clean + 0x04 Probe hit dirty without memory cancel + 0x08 Probe hit dirty with memory cancel + 0x10 Upstream display refresh/ISOC reads + 0x20 Upstream non-display refresh reads + 0x40 Upstream ISOC writes + 0x80 Upstream non-ISOC writes +name:l2_internal type:bitmask default:0x3f + 0x01 IC fill + 0x02 DC fill + 0x04 TLB fill (page table walks) + 0x08 Tag snoop request + 0x10 Canceled request + 0x20 Hardware prefetch from data cache +name:l2_req_miss type:bitmask default:0x0f + 0x01 IC fill + 0x02 DC fill (includes possible replays) + 0x04 TLB page table walk + 0x08 Hardware prefetch from data cache +name:l2_fill type:bitmask default:0x03 + 0x01 L2 fills (victims from L1 caches, TLB page table walks and data prefetches) + 0x02 L2 writebacks to system +name:gart type:bitmask default:0xff + 0x01 GART aperture hit on access from CPU + 0x02 GART aperture hit on access from I/O + 0x04 GART miss + 0x08 GART/DEV request hit table walk in progress + 0x10 DEV hit + 0x20 DEV miss + 0x40 DEV error + 0x80 GART/DEV multiple table walk in progress +name:cpiorequests type:bitmask default:0xa2 + 0xa1 Requests Local I/O to Local I/O + 0xa2 Requests Local I/O to Local Memory + 0xa3 Requests Local I/O to Local (I/O or Mem) + 0xa4 Requests Local CPU to Local I/O + 0xa5 Requests Local (CPU or I/O) to Local I/O + 0xa8 Requests Local CPU to Local Memory + 0xaa Requests Local (CPU or I/O) to Local Memory + 0xac Requests Local CPU to Local (I/O or Mem) + 0xaf Requests Local (CPU or I/O) to Local (I/O or Mem) + 0x91 Requests Local I/O to Remote I/O + 0x92 Requests Local I/O to Remote Memory + 0x93 Requests Local I/O to Remote (I/O or Mem) + 0x94 Requests Local CPU to Remote I/O + 0x95 Requests Local (CPU or I/O) to Remote I/O + 0x98 Requests Local CPU to Remote Memory + 0x9a Requests Local (CPU or I/O) to Remote Memory + 0x9c Requests Local CPU to Remote (I/O or Mem) + 0x9f Requests Local (CPU or I/O) to Remote (I/O or Mem) + 0xb1 Requests Local I/O to Any I/O + 0xb2 Requests Local I/O to Any Memory + 0xb3 Requests Local I/O to Any (I/O or Mem) + 0xb4 Requests Local CPU to Any I/O + 0xb5 Requests Local (CPU or I/O) to Any I/O + 0xb8 Requests Local CPU to Any Memory + 0xba Requests Local (CPU or I/O) to Any Memory + 0xbc Requests Local CPU to Any (I/O or Mem) + 0xbf Requests Local (CPU or I/O) to Any (I/O or Mem) + 0x61 Requests Remote I/O to Local I/O + 0x64 Requests Remote CPU to Local I/O + 0x65 Requests Remote (CPU or I/O) to Local I/O +name:cacheblock type:bitmask default:0x3d + 0x01 Victim Block (Writeback) + 0x04 Read Block (Dcache load miss refill) + 0x08 Read Block Shared (Icache refill) + 0x10 Read Block Modified (Dcache store miss refill) + 0x20 Change to Dirty (first store to clean block already in cache) +name:dataprefetch type:bitmask default:0x03 + 0x01 Cancelled prefetches + 0x02 Prefetch attempts +name:memreqtype type:bitmask default:0x83 + 0x01 Requests to non-cacheable (UC) memory + 0x02 Requests to write-combining (WC) memory or WC buffer flushes to WB memory + 0x80 Streaming store (SS) requests +name:systemreadresponse type:bitmask default:0x17 + 0x01 Exclusive + 0x02 Modified + 0x04 Shared + 0x10 Data Error +name:l1_dtlb_miss_l2_hit type:bitmask default:0x03 + 0x01 L2 4K TLB hit + 0x02 L2 2M TLB hit +name:l1_l2_dtlb_miss type:bitmask default:0x07 + 0x01 4K TLB reload + 0x02 2M TLB reload + 0x04 1G TLB reload +name:ecc type:bitmask default:0x0f + 0x01 Scrubber error + 0x02 Piggyback scrubber errors + 0x04 Load pipe error + 0x08 Store write pip error +name:prefetch type:bitmask default:0x07 + 0x01 Load (Prefetch, PrefetchT0/T1/T2) + 0x02 Store (PrefetchW) + 0x04 NTA (PrefetchNTA) +name:locked_instruction_dcache_miss type:bitmask default:0x02 + 0x02 Data cache misses by locked instructions +name:octword_transfer type:bitmask default:0x01 + 0x01 Octword write transfer +name:thermal_status type:bitmask default:0x7c + 0x04 Number of times the HTC trip point is crossed + 0x08 Number of clocks when STC trip point active + 0x10 Number of times the STC trip point is crossed + 0x20 Number of clocks HTC P-state is inactive + 0x40 Number of clocks HTC P-state is active +name:mem_control_request type:bitmask default:0x78 + 0x01 Write requests + 0x02 Read Requests including Prefetch + 0x04 Prefetch Request + 0x08 32 Bytes Sized Writes + 0x10 64 Bytes Sized Writes + 0x20 32 Bytes Sized Reads + 0x40 64 Byte Sized Reads + 0x80 Read requests sent to the DCT while write requests are pending in the DCQ +name:httransmit type:bitmask default:0xbf + 0x01 Command DWORD sent + 0x02 Data DWORD sent + 0x04 Buffer release DWORD sent + 0x08 Nop DW sent (idle) + 0x10 Address extension DWORD sent + 0x20 Per packet CRC sent + 0x80 SubLink Mask +name:lock_ops type:bitmask default:0x0f + 0x01 Number of locked instructions executed + 0x02 Cycles in speculative phase + 0x04 Cycles in non-speculative phase (including cache miss penalty) + 0x08 Cache miss penalty in cycles +name:sse_ops type:bitmask default:0x7f + 0x01 Single Precision add/subtract ops + 0x02 Single precision multiply ops + 0x04 Single precision divide/square root ops + 0x08 Double precision add/subtract ops + 0x10 Double precision multiply ops + 0x20 Double precision divide/square root ops + 0x40 OP type: 0=uops 1=FLOPS +name:move_ops type:bitmask default:0x0f + 0x01 Merging low quadword move uops + 0x02 Merging high quadword move uops + 0x04 All other merging move uops + 0x08 All other move uops +name:serial_ops type:bitmask default:0x0f + 0x01 SSE bottom-executing uops retired + 0x02 SSE bottom-serializing uops retired + 0x04 x87 bottom-executing uops retired + 0x08 x87 bottom-serializing uops retired +name:serial_ops_sched type:bitmask default:0x03 + 0x01 Number of cycles a bottom-execute uops in FP scheduler + 0x02 Number of cycles a bottom-serializing uops in FP scheduler +name:store_to_load type:bitmask default:0x07 + 0x01 Address mismatches (starting byte not the same) + 0x02 Store is smaller than load + 0x04 Misaligned +name:moesi_gh type:bitmask default:0x1f + 0x01 (I)nvalid cache state + 0x02 (S)hared cache state + 0x04 (E)xclusive cache state + 0x08 (O)wner cache state + 0x10 (M)odified cache state + 0x20 Cache line evicted brought into the cache by PrefetchNTA + 0x40 Cache line evicted not brought into the cache by PrefetchNTA +name:l1_dtlb_hit type:bitmask default:0x07 + 0x01 L1 4K TLB hit + 0x02 L1 2M TLB hit + 0x04 L1 1G TLB hit +name:soft_prefetch type:bitmask default:0x09 + 0x01 Software prefetch hit in L1 + 0x08 Software prefetch hit in L2 +name:l1_l2_itlb_miss type:bitmask default:0x03 + 0x01 Instruction fetches to a 4K page + 0x02 Instruction fetches to a 2M page +name:cpu_dram_req type:bitmask default:0xff + 0x01 From local node to node 0 + 0x02 From local node to node 1 + 0x04 From local node to node 2 + 0x08 From local node to node 3 + 0x10 From local node to node 4 + 0x20 From local node to node 5 + 0x40 From local node to node 6 + 0x80 From local node to node 7 +name:io_dram_req type:bitmask default:0xff + 0x01 From local node to node 0 + 0x02 From local node to node 1 + 0x04 From local node to node 2 + 0x08 From local node to node 3 + 0x10 From local node to node 4 + 0x20 From local node to node 5 + 0x40 From local node to node 6 + 0x80 From local node to node 7 +name:cpu_read_lat_0_3 type:bitmask default:0xff + 0x01 Read block + 0x02 Read block shared + 0x04 Read block modified + 0x08 Change to Dirty + 0x10 From local node to node 0 + 0x20 From local node to node 1 + 0x40 From local node to node 2 + 0x80 From local node to node 3 +name:cpu_read_lat_4_7 type:bitmask default:0xff + 0x01 Read block + 0x02 Read block shared + 0x04 Read block modified + 0x08 Change to Dirty + 0x10 From local node to node 4 + 0x20 From local node to node 5 + 0x40 From local node to node 6 + 0x80 From local node to node 7 +name:cpu_comm_lat type:bitmask default:0xf7 + 0x01 Read sized + 0x02 Write sized + 0x04 Victim block + 0x08 Node group select: 0=Nodes 0-3, 1=Nodes 4-7 + 0x10 From local node to node 0/4 + 0x20 From local node to node 1/5 + 0x40 From local node to node 2/6 + 0... [truncated message content] |
From: Jason Y. <jas...@am...> - 2008-07-23 15:58:54
|
This patch contains the rest of daemon changes to data structures to store and write IBS events to sample file, and new stats counting number of IBS samples. Signed-off-by: Jason Yeh <jas...@am...> --- daemon/Makefile.am | 3 oprofile-ibs/daemon/liblegacy/opd_proc.c | 2 oprofile-ibs/daemon/liblegacy/opd_sample_files.c | 10 +- oprofile-ibs/daemon/opd_events.c | 46 +++++++--- oprofile-ibs/daemon/opd_events.h | 7 - oprofile-ibs/daemon/opd_mangling.c | 15 ++- oprofile-ibs/daemon/opd_mangling.h | 3 oprofile-ibs/daemon/opd_printf.h | 2 oprofile-ibs/daemon/opd_sfile.c | 98 +++++++++++++++++++---- oprofile-ibs/daemon/opd_sfile.h | 7 + oprofile-ibs/daemon/opd_stats.c | 2 oprofile-ibs/daemon/opd_stats.h | 1 oprofile-ibs/daemon/oprofiled.c | 11 ++ oprofile-ibs/daemon/oprofiled.h | 2 14 files changed, 162 insertions(+), 47 deletions(-) diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/Makefile.am oprofile-ibs/daemon/Makefile.am --- oprofile-cvs/daemon/Makefile.am 2008-04-28 16:23:24.000000000 -0500 +++ oprofile-ibs/daemon/Makefile.am 2008-07-22 11:03:36.000000000 -0500 @@ -26,7 +26,8 @@ oprofiled_SOURCES = \ opd_perfmon.c \ opd_anon.h \ opd_anon.c \ - opd_spu.c + opd_spu.c \ + opd_ibs.h LIBS=@POPT_LIBS@ @LIBERTY_LIBS@ diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/liblegacy/opd_proc.c oprofile-ibs/daemon/liblegacy/opd_proc.c --- oprofile-cvs/daemon/liblegacy/opd_proc.c 2005-08-17 14:15:41.000000000 -0500 +++ oprofile-ibs/daemon/liblegacy/opd_proc.c 2008-07-22 11:03:36.000000000 -0500 @@ -143,7 +143,7 @@ void opd_put_image_sample(struct opd_ima sfile = image->sfiles[cpu_number][counter]; } - err = odb_update_node(&sfile->sample_file, offset); + err = odb_update_node(&sfile->sample_file, offset, 1); if (err) { fprintf(stderr, "%s\n", strerror(err)); abort(); diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/liblegacy/opd_sample_files.c oprofile-ibs/daemon/liblegacy/opd_sample_files.c --- oprofile-cvs/daemon/liblegacy/opd_sample_files.c 2007-05-10 18:42:33.000000000 -0500 +++ oprofile-ibs/daemon/liblegacy/opd_sample_files.c 2008-07-22 11:11:47.000000000 -0500 @@ -69,7 +69,7 @@ static char * opd_mangle_filename(struct { char * mangled; struct mangle_values values; - struct opd_event * event = find_counter_event(counter); + struct opd_event * event = find_counter_event(counter, 0); values.flags = 0; if (image->kernel) @@ -142,8 +142,12 @@ retry: goto out; } - fill_header(odb_get_data(&sfile->sample_file), counter, 0, 0, - image->kernel, 0, 0, 0, image->mtime); + fill_header(odb_get_data(&sfile->sample_file), counter, + 0, 0, + image->kernel, 0, + 0, 0, + image->mtime, 0); + out: free(mangled); diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_events.c oprofile-ibs/daemon/opd_events.c --- oprofile-cvs/daemon/opd_events.c 2007-05-10 18:42:32.000000000 -0500 +++ oprofile-ibs/daemon/opd_events.c 2008-07-22 11:03:36.000000000 -0500 @@ -13,6 +13,7 @@ #include "opd_events.h" #include "opd_printf.h" +#include "opd_ibs.h" #include "oprofiled.h" #include "op_string.h" @@ -22,13 +23,16 @@ #include "op_libiberty.h" #include "op_hw_config.h" #include "op_sample_file.h" +#include "op_events.h" #include <stdlib.h> #include <stdio.h> extern op_cpu cpu_type; +extern int ibs_fetch_count; +extern int ibs_op_count; -struct opd_event opd_events[OP_MAX_COUNTERS]; +struct opd_event opd_events[OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS]; static double cpu_speed; @@ -91,11 +95,6 @@ void opd_parse_events(char const * event return; } - if (!ev || !strlen(ev)) { - fprintf(stderr, "oprofiled: no events passed.\n"); - exit(EXIT_FAILURE); - } - verbprintf(vmisc, "Events: %s\n", ev); c = ev; @@ -125,13 +124,33 @@ void opd_parse_events(char const * event } -struct opd_event * find_counter_event(unsigned long counter) +struct opd_event * find_counter_event(unsigned long counter, int ibs) { size_t i; - - for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) { - if (counter == opd_events[i].counter) - return &opd_events[i]; + struct op_event * ibs_lookup_event; + + /* If IBS is enabled, use events_list to fill appropriate opd_event */ + if (ibs) { + ibs_lookup_event = op_find_event(cpu_type, COUNTER_TO_IBS_EVENT(counter)); + + if (!ibs_lookup_event) { + abort(); + } + + opd_events[counter].name = op_xstrndup(ibs_lookup_event->name, strlen(ibs_lookup_event->name)); + opd_events[counter].value = COUNTER_TO_IBS_EVENT(counter); + opd_events[counter].counter = counter; + opd_events[counter].count = + IS_IBS_FETCH(COUNTER_TO_IBS_EVENT(counter))?ibs_fetch_count:ibs_op_count; + opd_events[counter].um = 0; + opd_events[counter].kernel = 1; + opd_events[counter].user = 1; + return &opd_events[counter]; + } else { + for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) { + if (counter == opd_events[i].counter) + return &opd_events[i]; + } } fprintf(stderr, "Unknown event for counter %lu\n", counter); @@ -143,9 +162,10 @@ struct opd_event * find_counter_event(un void fill_header(struct opd_header * header, unsigned long counter, vma_t anon_start, vma_t cg_to_anon_start, int is_kernel, int cg_to_is_kernel, - int spu_samples, uint64_t embed_offset, time_t mtime) + int spu_samples, uint64_t embed_offset, time_t mtime, + int ibs) { - struct opd_event * event = find_counter_event(counter); + struct opd_event * event = find_counter_event(counter, ibs); memset(header, '\0', sizeof(struct opd_header)); header->version = OPD_VERSION; diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_events.h oprofile-ibs/daemon/opd_events.h --- oprofile-cvs/daemon/opd_events.h 2007-05-10 18:42:32.000000000 -0500 +++ oprofile-ibs/daemon/opd_events.h 2008-07-22 11:03:36.000000000 -0500 @@ -34,14 +34,15 @@ extern struct opd_event opd_events[]; void opd_parse_events(char const * events); /** Find the event for the given counter */ -struct opd_event * find_counter_event(unsigned long counter); +struct opd_event * find_counter_event(unsigned long counter, int ibs); struct opd_header; /** fill the sample file header with event info etc. */ void fill_header(struct opd_header * header, unsigned long counter, - vma_t anon_start, vma_t anon_end, + vma_t anon_start, vma_t anon_end, int is_kernel, int cg_to_is_kernel, - int spu_samples, uint64_t embed_offset, time_t mtime); + int spu_samples, uint64_t embed_offset, time_t mtime, + int ibs); #endif /* OPD_EVENTS_H */ diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_mangling.c oprofile-ibs/daemon/opd_mangling.c --- oprofile-cvs/daemon/opd_mangling.c 2007-05-10 18:42:33.000000000 -0500 +++ oprofile-ibs/daemon/opd_mangling.c 2008-07-22 11:03:36.000000000 -0500 @@ -66,11 +66,12 @@ static char * mangle_anon(struct anon_ma static char * -mangle_filename(struct sfile * last, struct sfile const * sf, int counter, int cg) +mangle_filename(struct sfile * last, struct sfile const * sf, int counter, int cg, + int ibs) { - char * mangled; + char * mangled = NULL; struct mangle_values values; - struct opd_event * event = find_counter_event(counter); + struct opd_event * event = find_counter_event(counter, ibs); values.flags = 0; @@ -139,7 +140,8 @@ mangle_filename(struct sfile * last, str int opd_open_sample_file(odb_t * file, struct sfile * last, - struct sfile * sf, int counter, int cg) + struct sfile * sf, int counter, int cg, + int ibs) { char * mangled; char const * binary; @@ -147,7 +149,7 @@ int opd_open_sample_file(odb_t * file, s vma_t last_start = 0; int err; - mangled = mangle_filename(last, sf, counter, cg); + mangled = mangle_filename(last, sf, counter, cg, ibs); if (!mangled) return EINVAL; @@ -194,7 +196,8 @@ retry: sf->anon ? sf->anon->start : 0, last_start, !!sf->kernel, last ? !!last->kernel : 0, spu_profile, sf->embedded_offset, - binary ? op_get_mtime(binary) : 0); + (binary ? op_get_mtime(binary) : 0 ), + ibs); out: sfile_put(sf); diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_mangling.h oprofile-ibs/daemon/opd_mangling.h --- oprofile-cvs/daemon/opd_mangling.h 2004-05-29 11:29:40.000000000 -0500 +++ oprofile-ibs/daemon/opd_mangling.h 2008-07-22 11:03:36.000000000 -0500 @@ -28,6 +28,7 @@ struct sfile; * Returns 0 on success. */ int opd_open_sample_file(odb_t * file, struct sfile * last, - struct sfile * sf, int counter, int cg); + struct sfile * sf, int counter, int cg, + int ibs); #endif /* OPD_MANGLING_H */ diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_printf.h oprofile-ibs/daemon/opd_printf.h --- oprofile-cvs/daemon/opd_printf.h 2004-01-29 14:00:26.000000000 -0600 +++ oprofile-ibs/daemon/opd_printf.h 2008-07-22 11:03:36.000000000 -0500 @@ -22,6 +22,8 @@ extern int vsamples; extern int varcs; /// kernel module handling extern int vmodule; +/// ibs debuging +extern int vibs_debug; /// all others not fitting in above category, not voluminous. extern int vmisc; diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_sfile.c oprofile-ibs/daemon/opd_sfile.c --- oprofile-cvs/daemon/opd_sfile.c 2007-05-10 18:42:33.000000000 -0500 +++ oprofile-ibs/daemon/opd_sfile.c 2008-07-22 14:53:29.000000000 -0500 @@ -28,6 +28,9 @@ #define HASH_SIZE 2048 #define HASH_BITS (HASH_SIZE - 1) +extern int ibs_fetch_count; +extern int ibs_op_count; + /** All sfiles are hashed into these lists */ static struct list_head hashes[HASH_SIZE]; @@ -180,7 +183,7 @@ create_sfile(unsigned long hash, struct sf->kernel = ki; sf->anon = trans->anon; - for (i = 0 ; i < op_nr_counters ; ++i) + for (i = 0 ; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) odb_init(&sf->files[i]); for (i = 0; i < CG_HASH_SIZE; ++i) @@ -275,7 +278,7 @@ static void sfile_dup(struct sfile * to, memcpy(to, from, sizeof (struct sfile)); - for (i = 0 ; i < op_nr_counters ; ++i) + for (i = 0 ; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) odb_init(&to->files[i]); for (i = 0; i < CG_HASH_SIZE; ++i) @@ -293,15 +296,22 @@ static odb_t * get_file(struct transient struct cg_entry * cg; struct list_head * pos; unsigned long hash; + unsigned long counter = trans->event; odb_t * file; - if (trans->event >= op_nr_counters) { - fprintf(stderr, "%s: Invalid counter %lu\n", __FUNCTION__, - trans->event); - abort(); + if ((ibs_fetch_count || ibs_op_count) && (trans->ibs_fetch || trans->ibs_op)) { + /* Translate IBS event value to counter */ + counter = IBS_EVENT_TO_COUNTER(counter); + } else { + /* Disable counter number checking for IBS */ + if (counter >= op_nr_counters) { + fprintf(stderr, "%s: Invalid counter %lu\n", __FUNCTION__, + counter); + abort(); + } } - file = &sf->files[trans->event]; + file = &sf->files[counter]; if (!is_cg) goto open; @@ -314,7 +324,7 @@ static odb_t * get_file(struct transient list_for_each(pos, &sf->cg_hash[hash]) { cg = list_entry(pos, struct cg_entry, hash); if (sfile_equal(last, &cg->to)) { - file = &cg->to.files[trans->event]; + file = &cg->to.files[counter]; goto open; } } @@ -322,15 +332,17 @@ static odb_t * get_file(struct transient cg = xmalloc(sizeof(struct cg_entry)); sfile_dup(&cg->to, last); list_add(&cg->hash, &sf->cg_hash[hash]); - file = &cg->to.files[trans->event]; + file = &cg->to.files[counter]; open: - if (!odb_open_count(file)) - opd_open_sample_file(file, last, sf, trans->event, is_cg); + if (!odb_open_count(file)) { + opd_open_sample_file(file, last, sf, counter, is_cg, trans->ibs_fetch || trans->ibs_op); + } /* Error is logged by opd_open_sample_file */ - if (!odb_open_count(file)) + if (!odb_open_count(file)) { return NULL; + } return file; } @@ -407,7 +419,7 @@ static void sfile_log_arc(struct transie key = to & (0xffffffff); key |= ((uint64_t)from) << 32; - err = odb_update_node(file, key); + err = odb_update_node(file, key, 1); if (err) { fprintf(stderr, "%s: %s\n", __FUNCTION__, strerror(err)); abort(); @@ -415,6 +427,18 @@ static void sfile_log_arc(struct transie } +/* + * Function: sfile_log_sample + * + * This function logs a single event sample. It gets the oprofile database + * file using the information in the transient struct trans. It converts + * the PC for kernel, anonymous and JIT samples from an absolute address + * to an offset. If a file is not found for the sample, the sample is tallied + * as a lost sample. Finally, odb_insert() is called to actually insert the + * sample into the oprofile database. This function tallies a single + * event sample, so the count value passed to odb_insert() is one. + * + */ void sfile_log_sample(struct transient const * trans) { int err; @@ -437,7 +461,47 @@ void sfile_log_sample(struct transient c if (trans->current->anon) pc -= trans->current->anon->start; - + + if (vsamples) + verbose_sample(trans, pc); + + if (!file) { + opd_stats[OPD_LOST_SAMPLEFILE]++; + return; + } + + err = odb_update_node(file, (uint64_t)pc, 1); + if (err) { + fprintf(stderr, "%s: %s\n", __FUNCTION__, strerror(err)); + abort(); + } +} + + +void sfile_log_sample_count(struct transient const * trans, + unsigned long int count) +{ + int err; + vma_t pc = trans->pc; + odb_t * file; + + if (trans->tracing == TRACING_ON) { + /* can happen if kernel sample falls through the cracks, + * see opd_put_sample() */ + if (trans->last) + sfile_log_arc(trans); + return; + } + + file = get_file(trans, 0); + + /* absolute value -> offset */ + if (trans->current->kernel) + pc -= trans->current->kernel->start; + + if (trans->current->anon) + pc -= trans->current->anon->start; + if (vsamples) verbose_sample(trans, pc); @@ -446,7 +510,7 @@ void sfile_log_sample(struct transient c return; } - err = odb_update_node(file, (uint64_t)pc); + err = odb_update_node(file, (odb_key_t)pc, (odb_value_t)count); if (err) { fprintf(stderr, "%s: %s\n", __FUNCTION__, strerror(err)); abort(); @@ -459,7 +523,7 @@ static int close_sfile(struct sfile * sf size_t i; /* it's OK to close a non-open odb file */ - for (i = 0; i < op_nr_counters; ++i) + for (i = 0; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) odb_close(&sf->files[i]); return 0; @@ -478,7 +542,7 @@ static int sync_sfile(struct sfile * sf, { size_t i; - for (i = 0; i < op_nr_counters; ++i) + for (i = 0; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) odb_sync(&sf->files[i]); return 0; diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_sfile.h oprofile-ibs/daemon/opd_sfile.h --- oprofile-cvs/daemon/opd_sfile.h 2007-05-10 18:42:33.000000000 -0500 +++ oprofile-ibs/daemon/opd_sfile.h 2008-07-22 11:03:36.000000000 -0500 @@ -18,6 +18,7 @@ #include "op_hw_config.h" #include "op_types.h" #include "op_list.h" +#include "opd_ibs.h" #include <sys/types.h> @@ -61,7 +62,7 @@ struct sfile { /** true if this file should be ignored in profiles */ int ignored; /** opened sample files */ - odb_t files[OP_MAX_COUNTERS]; + odb_t files[OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS]; /** hash table of opened cg sample files */ struct list_head cg_hash[CG_HASH_SIZE]; }; @@ -107,6 +108,10 @@ struct sfile * sfile_find(struct transie /** Log the sample in a previously located sfile. */ void sfile_log_sample(struct transient const * trans); +/** Log the event/cycle count in a previously located sfile */ +void sfile_log_sample_count(struct transient const * trans, + unsigned long int count); + /** initialise hashes */ void sfile_init(void); diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_stats.c oprofile-ibs/daemon/opd_stats.c --- oprofile-cvs/daemon/opd_stats.c 2008-06-23 14:52:29.000000000 -0500 +++ oprofile-ibs/daemon/opd_stats.c 2008-07-22 11:03:36.000000000 -0500 @@ -18,6 +18,7 @@ #include <stdlib.h> #include <stdio.h> + unsigned long opd_stats[OPD_MAX_STATS]; /** @@ -50,6 +51,7 @@ void opd_print_stats(void) opd_stats[OPD_LOST_SAMPLEFILE]); printf("Nr. samples lost due to no permanent mapping: %lu\n", opd_stats[OPD_LOST_NO_MAPPING]); + printf("Nr. IBS samples mapped: %lu\n", opd_stats[OPD_IBS_SAMPLE]); print_if("Nr. event lost due to buffer overflow: %u\n", "/dev/oprofile/stats", "event_lost_overflow", 1); print_if("Nr. samples lost due to no mapping: %u\n", diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_stats.h oprofile-ibs/daemon/opd_stats.h --- oprofile-cvs/daemon/opd_stats.h 2005-04-23 21:36:53.000000000 -0500 +++ oprofile-ibs/daemon/opd_stats.h 2008-07-22 11:03:36.000000000 -0500 @@ -23,6 +23,7 @@ enum { OPD_SAMPLES, /**< nr. samples */ OPD_LOST_NO_MAPPING, /**< nr samples lost due to no mapping */ OPD_DUMP_COUNT, /**< nr. of times buffer is read */ OPD_DANGLING_CODE, /**< nr. partial code notifications (buffer overflow */ + OPD_IBS_SAMPLE, /**< nr. of IBS samples mapped */ OPD_MAX_STATS /**< end of stats */ }; diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/oprofiled.c oprofile-ibs/daemon/oprofiled.c --- oprofile-cvs/daemon/oprofiled.c 2008-04-28 16:23:23.000000000 -0500 +++ oprofile-ibs/daemon/oprofiled.c 2008-07-22 11:03:36.000000000 -0500 @@ -30,6 +30,7 @@ #include "op_lockfile.h" #include "op_list.h" #include "op_fileio.h" +#include "op_events.h" #include <sys/types.h> #include <sys/resource.h> @@ -57,6 +58,7 @@ int vsamples; int varcs; int vmodule; int vmisc; +int vibs_debug; int separate_lib; int separate_kernel; int separate_thread; @@ -64,6 +66,8 @@ int separate_cpu; int no_vmlinux; char * vmlinux; char * kernel_range; +int ibs_fetch_count = 0; +int ibs_op_count = 0; char * session_dir; int no_xen; char * xenimage; @@ -94,6 +98,8 @@ static struct poptOption options[] = { { "events", 'e', POPT_ARG_STRING, &events, 0, "events list", "[events]" }, { "version", 'v', POPT_ARG_NONE, &showvers, 0, "show version", NULL, }, { "verbose", 'V', POPT_ARG_STRING, &verbose, 0, "be verbose in log file", "all,sfile,arcs,samples,module,misc", }, + { "ibs-fetch", 'i', POPT_ARG_INT, &ibs_fetch_count, 0, "AMD IBS Fetch mode", "[0 to Max]", }, + { "ibs-op", 'o', POPT_ARG_INT, &ibs_op_count, 0, "AMD IBS OP mode", "[0 to Max]", }, POPT_AUTOHELP { NULL, 0, 0, NULL, 0, NULL, NULL, }, }; @@ -353,6 +359,7 @@ static void opd_handle_verbose_option(ch varcs = 1; vmodule = 1; vmisc = 1; + vibs_debug = 1; } else if (!strcmp(name, "sfile")) { vsfile = 1; } else if (!strcmp(name, "arcs")) { @@ -363,6 +370,8 @@ static void opd_handle_verbose_option(ch vmodule = 1; } else if (!strcmp(name, "misc")) { vmisc = 1; + } else if (!strcmp(name, "ibs_debug")) { + vibs_debug = 1; } else { fprintf(stderr, "unknown verbose options\n"); exit(EXIT_FAILURE); @@ -426,7 +435,7 @@ static void opd_options(int argc, char c } } - if (events == NULL) { + if (events == NULL && (ibs_fetch_count || ibs_op_count)) { fprintf(stderr, "oprofiled: no events specified.\n"); poptPrintHelp(optcon, stderr, 0); exit(EXIT_FAILURE); diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/oprofiled.h oprofile-ibs/daemon/oprofiled.h --- oprofile-cvs/daemon/oprofiled.h 2008-04-28 16:23:23.000000000 -0500 +++ oprofile-ibs/daemon/oprofiled.h 2008-07-22 11:03:36.000000000 -0500 @@ -65,5 +65,7 @@ extern char * kernel_range; extern int no_xen; extern char * xenimage; extern char * xen_range; +extern int ibs_fetch; +extern int ibs_op; #endif /* OPROFILED_H */ |
From: Maynard J. <may...@us...> - 2008-07-29 18:57:10
|
Jason Yeh wrote: > This patch contains the rest of daemon changes to data structures to store and write > IBS events to sample file, and new stats counting number of IBS samples. > Jason, Sorry it took me so long to respond to this. I'll do my best to reply to the other patches in this set this week. A couple general comments about this patch: Several lines are longer than 80 chars. check_style.py (with -l option) will catch these. Also, watch for extra spaces at the end of lines. > > Signed-off-by: Jason Yeh <jas...@am...> > --- > daemon/Makefile.am | 3 > oprofile-ibs/daemon/liblegacy/opd_proc.c | 2 > oprofile-ibs/daemon/liblegacy/opd_sample_files.c | 10 +- > oprofile-ibs/daemon/opd_events.c | 46 +++++++--- > oprofile-ibs/daemon/opd_events.h | 7 - > oprofile-ibs/daemon/opd_mangling.c | 15 ++- > oprofile-ibs/daemon/opd_mangling.h | 3 > oprofile-ibs/daemon/opd_printf.h | 2 > oprofile-ibs/daemon/opd_sfile.c | 98 +++++++++++++++++++---- > oprofile-ibs/daemon/opd_sfile.h | 7 + > oprofile-ibs/daemon/opd_stats.c | 2 > oprofile-ibs/daemon/opd_stats.h | 1 > oprofile-ibs/daemon/oprofiled.c | 11 ++ > oprofile-ibs/daemon/oprofiled.h | 2 > 14 files changed, 162 insertions(+), 47 deletions(-) > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/Makefile.am oprofile-ibs/daemon/Makefile.am > --- oprofile-cvs/daemon/Makefile.am 2008-04-28 16:23:24.000000000 -0500 > +++ oprofile-ibs/daemon/Makefile.am 2008-07-22 11:03:36.000000000 -0500 > @@ -26,7 +26,8 @@ oprofiled_SOURCES = \ > opd_perfmon.c \ > opd_anon.h \ > opd_anon.c \ > - opd_spu.c > + opd_spu.c \ > + opd_ibs.h > Technically, this change should be in patch 2/7. > > LIBS=@POPT_LIBS@ @LIBERTY_LIBS@ > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/liblegacy/opd_proc.c oprofile-ibs/daemon/liblegacy/opd_proc.c > --- oprofile-cvs/daemon/liblegacy/opd_proc.c 2005-08-17 14:15:41.000000000 -0500 > +++ oprofile-ibs/daemon/liblegacy/opd_proc.c 2008-07-22 11:03:36.000000000 -0500 > @@ -143,7 +143,7 @@ void opd_put_image_sample(struct opd_ima > sfile = image->sfiles[cpu_number][counter]; > } > > - err = odb_update_node(&sfile->sample_file, offset); > + err = odb_update_node(&sfile->sample_file, offset, 1); > if (err) { > fprintf(stderr, "%s\n", strerror(err)); > abort(); > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/liblegacy/opd_sample_files.c oprofile-ibs/daemon/liblegacy/opd_sample_files.c > --- oprofile-cvs/daemon/liblegacy/opd_sample_files.c 2007-05-10 18:42:33.000000000 -0500 > +++ oprofile-ibs/daemon/liblegacy/opd_sample_files.c 2008-07-22 11:11:47.000000000 -0500 > @@ -69,7 +69,7 @@ static char * opd_mangle_filename(struct > { > char * mangled; > struct mangle_values values; > - struct opd_event * event = find_counter_event(counter); > + struct opd_event * event = find_counter_event(counter, 0); > > values.flags = 0; > if (image->kernel) > @@ -142,8 +142,12 @@ retry: > goto out; > } > > - fill_header(odb_get_data(&sfile->sample_file), counter, 0, 0, > - image->kernel, 0, 0, 0, image->mtime); > + fill_header(odb_get_data(&sfile->sample_file), counter, > + 0, 0, > + image->kernel, 0, > + 0, 0, > + image->mtime, 0); > This is a superfluous change. > + > > out: > free(mangled); > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_events.c oprofile-ibs/daemon/opd_events.c > --- oprofile-cvs/daemon/opd_events.c 2007-05-10 18:42:32.000000000 -0500 > +++ oprofile-ibs/daemon/opd_events.c 2008-07-22 11:03:36.000000000 -0500 > @@ -13,6 +13,7 @@ > > #include "opd_events.h" > #include "opd_printf.h" > +#include "opd_ibs.h" > #include "oprofiled.h" > > #include "op_string.h" > @@ -22,13 +23,16 @@ > #include "op_libiberty.h" > #include "op_hw_config.h" > #include "op_sample_file.h" > +#include "op_events.h" > > #include <stdlib.h> > #include <stdio.h> > > extern op_cpu cpu_type; > +extern int ibs_fetch_count; > +extern int ibs_op_count; > > -struct opd_event opd_events[OP_MAX_COUNTERS]; > +struct opd_event opd_events[OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS]; > Is there any way to avoid having to *always* allocate space for IBS events? Also, I note that OP_MAX_IBS_COUNTERS is defined as '600' in patch 2, even though there aren't *really* 600 events. Think about some way to compress the array of events. > > static double cpu_speed; > > @@ -91,11 +95,6 @@ void opd_parse_events(char const * event > return; > } > > - if (!ev || !strlen(ev)) { > - fprintf(stderr, "oprofiled: no events passed.\n"); > - exit(EXIT_FAILURE); > - } > Why remove this ^^^ ? > - > verbprintf(vmisc, "Events: %s\n", ev); > > c = ev; > @@ -125,13 +124,33 @@ void opd_parse_events(char const * event > } > > > -struct opd_event * find_counter_event(unsigned long counter) > +struct opd_event * find_counter_event(unsigned long counter, int ibs) > { > size_t i; > - > - for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) { > - if (counter == opd_events[i].counter) > - return &opd_events[i]; > + struct op_event * ibs_lookup_event; > + > + /* If IBS is enabled, use events_list to fill appropriate opd_event */ > + if (ibs) { > + ibs_lookup_event = op_find_event(cpu_type, COUNTER_TO_IBS_EVENT(counter)); > + > + if (!ibs_lookup_event) { > + abort(); > + } > + > + opd_events[counter].name = op_xstrndup(ibs_lookup_event->name, strlen(ibs_lookup_event->name)); > > + opd_events[counter].value = COUNTER_TO_IBS_EVENT(counter); > + opd_events[counter].counter = counter; > + opd_events[counter].count = > + IS_IBS_FETCH(COUNTER_TO_IBS_EVENT(counter))?ibs_fetch_count:ibs_op_count; > Please put spaces between each component of the above conditional expression. > + opd_events[counter].um = 0; > + opd_events[counter].kernel = 1; > + opd_events[counter].user = 1; > + return &opd_events[counter]; > + } else { > + for (i = 0; i < op_nr_counters && opd_events[i].name; ++i) { > + if (counter == opd_events[i].counter) > + return &opd_events[i]; > + } > } > > fprintf(stderr, "Unknown event for counter %lu\n", counter); > @@ -143,9 +162,10 @@ struct opd_event * find_counter_event(un > void fill_header(struct opd_header * header, unsigned long counter, > vma_t anon_start, vma_t cg_to_anon_start, > int is_kernel, int cg_to_is_kernel, > - int spu_samples, uint64_t embed_offset, time_t mtime) > + int spu_samples, uint64_t embed_offset, time_t mtime, > + int ibs) > alignment > { > - struct opd_event * event = find_counter_event(counter); > + struct opd_event * event = find_counter_event(counter, ibs); > > memset(header, '\0', sizeof(struct opd_header)); > header->version = OPD_VERSION; > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_events.h oprofile-ibs/daemon/opd_events.h > --- oprofile-cvs/daemon/opd_events.h 2007-05-10 18:42:32.000000000 -0500 > +++ oprofile-ibs/daemon/opd_events.h 2008-07-22 11:03:36.000000000 -0500 > @@ -34,14 +34,15 @@ extern struct opd_event opd_events[]; > void opd_parse_events(char const * events); > > /** Find the event for the given counter */ > -struct opd_event * find_counter_event(unsigned long counter); > +struct opd_event * find_counter_event(unsigned long counter, int ibs); > > struct opd_header; > > /** fill the sample file header with event info etc. */ > void fill_header(struct opd_header * header, unsigned long counter, > - vma_t anon_start, vma_t anon_end, > + vma_t anon_start, vma_t anon_end, superfluous change -- extra space at the end of line > > int is_kernel, int cg_to_is_kernel, > - int spu_samples, uint64_t embed_offset, time_t mtime); > + int spu_samples, uint64_t embed_offset, time_t mtime, > + int ibs); > alignment > > #endif /* OPD_EVENTS_H */ > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_mangling.c oprofile-ibs/daemon/opd_mangling.c > --- oprofile-cvs/daemon/opd_mangling.c 2007-05-10 18:42:33.000000000 -0500 > +++ oprofile-ibs/daemon/opd_mangling.c 2008-07-22 11:03:36.000000000 -0500 > @@ -66,11 +66,12 @@ static char * mangle_anon(struct anon_ma > > > static char * > -mangle_filename(struct sfile * last, struct sfile const * sf, int counter, int cg) > +mangle_filename(struct sfile * last, struct sfile const * sf, int counter, int cg, > + int ibs) > { > - char * mangled; > + char * mangled = NULL; > struct mangle_values values; > - struct opd_event * event = find_counter_event(counter); > + struct opd_event * event = find_counter_event(counter, ibs); > > values.flags = 0; > > @@ -139,7 +140,8 @@ mangle_filename(struct sfile * last, str > > > int opd_open_sample_file(odb_t * file, struct sfile * last, > - struct sfile * sf, int counter, int cg) > + struct sfile * sf, int counter, int cg, > + int ibs) > alignment > { > char * mangled; > char const * binary; > @@ -147,7 +149,7 @@ int opd_open_sample_file(odb_t * file, s > vma_t last_start = 0; > int err; > > - mangled = mangle_filename(last, sf, counter, cg); > + mangled = mangle_filename(last, sf, counter, cg, ibs); > > if (!mangled) > return EINVAL; > @@ -194,7 +196,8 @@ retry: > sf->anon ? sf->anon->start : 0, last_start, > !!sf->kernel, last ? !!last->kernel : 0, > spu_profile, sf->embedded_offset, > - binary ? op_get_mtime(binary) : 0); > + (binary ? op_get_mtime(binary) : 0 ), > + ibs); > alignment > > out: > sfile_put(sf); > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_mangling.h oprofile-ibs/daemon/opd_mangling.h > --- oprofile-cvs/daemon/opd_mangling.h 2004-05-29 11:29:40.000000000 -0500 > +++ oprofile-ibs/daemon/opd_mangling.h 2008-07-22 11:03:36.000000000 -0500 > @@ -28,6 +28,7 @@ struct sfile; > * Returns 0 on success. > */ > int opd_open_sample_file(odb_t * file, struct sfile * last, > - struct sfile * sf, int counter, int cg); > + struct sfile * sf, int counter, int cg, > + int ibs); > alignment > > #endif /* OPD_MANGLING_H */ > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_printf.h oprofile-ibs/daemon/opd_printf.h > --- oprofile-cvs/daemon/opd_printf.h 2004-01-29 14:00:26.000000000 -0600 > +++ oprofile-ibs/daemon/opd_printf.h 2008-07-22 11:03:36.000000000 -0500 > @@ -22,6 +22,8 @@ extern int vsamples; > extern int varcs; > /// kernel module handling > extern int vmodule; > +/// ibs debuging > +extern int vibs_debug; > /// all others not fitting in above category, not voluminous. > extern int vmisc; > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_sfile.c oprofile-ibs/daemon/opd_sfile.c > --- oprofile-cvs/daemon/opd_sfile.c 2007-05-10 18:42:33.000000000 -0500 > +++ oprofile-ibs/daemon/opd_sfile.c 2008-07-22 14:53:29.000000000 -0500 > @@ -28,6 +28,9 @@ > #define HASH_SIZE 2048 > #define HASH_BITS (HASH_SIZE - 1) > > +extern int ibs_fetch_count; > +extern int ibs_op_count; > + > /** All sfiles are hashed into these lists */ > static struct list_head hashes[HASH_SIZE]; > > @@ -180,7 +183,7 @@ create_sfile(unsigned long hash, struct > sf->kernel = ki; > sf->anon = trans->anon; > > - for (i = 0 ; i < op_nr_counters ; ++i) > + for (i = 0 ; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) > odb_init(&sf->files[i]); > > for (i = 0; i < CG_HASH_SIZE; ++i) > @@ -275,7 +278,7 @@ static void sfile_dup(struct sfile * to, > > memcpy(to, from, sizeof (struct sfile)); > > - for (i = 0 ; i < op_nr_counters ; ++i) > + for (i = 0 ; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) > odb_init(&to->files[i]); > > for (i = 0; i < CG_HASH_SIZE; ++i) > @@ -293,15 +296,22 @@ static odb_t * get_file(struct transient > struct cg_entry * cg; > struct list_head * pos; > unsigned long hash; > + unsigned long counter = trans->event; > odb_t * file; > > - if (trans->event >= op_nr_counters) { > - fprintf(stderr, "%s: Invalid counter %lu\n", __FUNCTION__, > - trans->event); > - abort(); > + if ((ibs_fetch_count || ibs_op_count) && (trans->ibs_fetch || trans->ibs_op)) { > + /* Translate IBS event value to counter */ > + counter = IBS_EVENT_TO_COUNTER(counter); > + } else { > + /* Disable counter number checking for IBS */ > + if (counter >= op_nr_counters) { > + fprintf(stderr, "%s: Invalid counter %lu\n", __FUNCTION__, > + counter); > + abort(); > + } > } > > - file = &sf->files[trans->event]; > + file = &sf->files[counter]; > > if (!is_cg) > goto open; > @@ -314,7 +324,7 @@ static odb_t * get_file(struct transient > list_for_each(pos, &sf->cg_hash[hash]) { > cg = list_entry(pos, struct cg_entry, hash); > if (sfile_equal(last, &cg->to)) { > - file = &cg->to.files[trans->event]; > + file = &cg->to.files[counter]; > goto open; > } > } > @@ -322,15 +332,17 @@ static odb_t * get_file(struct transient > cg = xmalloc(sizeof(struct cg_entry)); > sfile_dup(&cg->to, last); > list_add(&cg->hash, &sf->cg_hash[hash]); > - file = &cg->to.files[trans->event]; > + file = &cg->to.files[counter]; > > open: > - if (!odb_open_count(file)) > - opd_open_sample_file(file, last, sf, trans->event, is_cg); > + if (!odb_open_count(file)) { > + opd_open_sample_file(file, last, sf, counter, is_cg, trans->ibs_fetch || trans->ibs_op); > + } > > /* Error is logged by opd_open_sample_file */ > - if (!odb_open_count(file)) > + if (!odb_open_count(file)) { > return NULL; > + } > > return file; > } > @@ -407,7 +419,7 @@ static void sfile_log_arc(struct transie > key = to & (0xffffffff); > key |= ((uint64_t)from) << 32; > > - err = odb_update_node(file, key); > + err = odb_update_node(file, key, 1); > if (err) { > fprintf(stderr, "%s: %s\n", __FUNCTION__, strerror(err)); > abort(); > @@ -415,6 +427,18 @@ static void sfile_log_arc(struct transie > } > > > +/* > + * Function: sfile_log_sample > + * > + * This function logs a single event sample. It gets the oprofile database > + * file using the information in the transient struct trans. It converts > + * the PC for kernel, anonymous and JIT samples from an absolute address > + * to an offset. If a file is not found for the sample, the sample is tallied > + * as a lost sample. Finally, odb_insert() is called to actually insert the > + * sample into the oprofile database. This function tallies a single > + * event sample, so the count value passed to odb_insert() is one. > + * > + */ > void sfile_log_sample(struct transient const * trans) > { > int err; > @@ -437,7 +461,47 @@ void sfile_log_sample(struct transient c > > if (trans->current->anon) > pc -= trans->current->anon->start; > - > + > + if (vsamples) > + verbose_sample(trans, pc); > + > + if (!file) { > + opd_stats[OPD_LOST_SAMPLEFILE]++; > + return; > + } > + > + err = odb_update_node(file, (uint64_t)pc, 1); > + if (err) { > + fprintf(stderr, "%s: %s\n", __FUNCTION__, strerror(err)); > + abort(); > + } > +} > + > + > +void sfile_log_sample_count(struct transient const * trans, > + unsigned long int count) > Too much duplication of code between sfile_log_sample and sfile_log_sample_count. Why not just modify the original function to handle the extra argument? > +{ > + int err; > + vma_t pc = trans->pc; > + odb_t * file; > + > + if (trans->tracing == TRACING_ON) { > + /* can happen if kernel sample falls through the cracks, > + * see opd_put_sample() */ > + if (trans->last) > + sfile_log_arc(trans); > + return; > + } > + > + file = get_file(trans, 0); > + > + /* absolute value -> offset */ > + if (trans->current->kernel) > + pc -= trans->current->kernel->start; > + > + if (trans->current->anon) > + pc -= trans->current->anon->start; > + > if (vsamples) > verbose_sample(trans, pc); > > @@ -446,7 +510,7 @@ void sfile_log_sample(struct transient c > return; > } > > - err = odb_update_node(file, (uint64_t)pc); > + err = odb_update_node(file, (odb_key_t)pc, (odb_value_t)count); > if (err) { > fprintf(stderr, "%s: %s\n", __FUNCTION__, strerror(err)); > abort(); > @@ -459,7 +523,7 @@ static int close_sfile(struct sfile * sf > size_t i; > > /* it's OK to close a non-open odb file */ > - for (i = 0; i < op_nr_counters; ++i) > + for (i = 0; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) > odb_close(&sf->files[i]); > > return 0; > @@ -478,7 +542,7 @@ static int sync_sfile(struct sfile * sf, > { > size_t i; > > - for (i = 0; i < op_nr_counters; ++i) > + for (i = 0; i < OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS; ++i) > odb_sync(&sf->files[i]); > > return 0; > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_sfile.h oprofile-ibs/daemon/opd_sfile.h > --- oprofile-cvs/daemon/opd_sfile.h 2007-05-10 18:42:33.000000000 -0500 > +++ oprofile-ibs/daemon/opd_sfile.h 2008-07-22 11:03:36.000000000 -0500 > @@ -18,6 +18,7 @@ > #include "op_hw_config.h" > #include "op_types.h" > #include "op_list.h" > +#include "opd_ibs.h" > > #include <sys/types.h> > > @@ -61,7 +62,7 @@ struct sfile { > /** true if this file should be ignored in profiles */ > int ignored; > /** opened sample files */ > - odb_t files[OP_MAX_COUNTERS]; > + odb_t files[OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS]; > /** hash table of opened cg sample files */ > struct list_head cg_hash[CG_HASH_SIZE]; > }; > @@ -107,6 +108,10 @@ struct sfile * sfile_find(struct transie > /** Log the sample in a previously located sfile. */ > void sfile_log_sample(struct transient const * trans); > > +/** Log the event/cycle count in a previously located sfile */ > +void sfile_log_sample_count(struct transient const * trans, > + unsigned long int count); > + > /** initialise hashes */ > void sfile_init(void); > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_stats.c oprofile-ibs/daemon/opd_stats.c > --- oprofile-cvs/daemon/opd_stats.c 2008-06-23 14:52:29.000000000 -0500 > +++ oprofile-ibs/daemon/opd_stats.c 2008-07-22 11:03:36.000000000 -0500 > @@ -18,6 +18,7 @@ > #include <stdlib.h> > #include <stdio.h> > > + > unsigned long opd_stats[OPD_MAX_STATS]; > > /** > @@ -50,6 +51,7 @@ void opd_print_stats(void) > opd_stats[OPD_LOST_SAMPLEFILE]); > printf("Nr. samples lost due to no permanent mapping: %lu\n", > opd_stats[OPD_LOST_NO_MAPPING]); > + printf("Nr. IBS samples mapped: %lu\n", opd_stats[OPD_IBS_SAMPLE]); > print_if("Nr. event lost due to buffer overflow: %u\n", > "/dev/oprofile/stats", "event_lost_overflow", 1); > print_if("Nr. samples lost due to no mapping: %u\n", > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/opd_stats.h oprofile-ibs/daemon/opd_stats.h > --- oprofile-cvs/daemon/opd_stats.h 2005-04-23 21:36:53.000000000 -0500 > +++ oprofile-ibs/daemon/opd_stats.h 2008-07-22 11:03:36.000000000 -0500 > @@ -23,6 +23,7 @@ enum { OPD_SAMPLES, /**< nr. samples */ > OPD_LOST_NO_MAPPING, /**< nr samples lost due to no mapping */ > OPD_DUMP_COUNT, /**< nr. of times buffer is read */ > OPD_DANGLING_CODE, /**< nr. partial code notifications (buffer overflow */ > + OPD_IBS_SAMPLE, /**< nr. of IBS samples mapped */ > OPD_MAX_STATS /**< end of stats */ > }; > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/oprofiled.c oprofile-ibs/daemon/oprofiled.c > --- oprofile-cvs/daemon/oprofiled.c 2008-04-28 16:23:23.000000000 -0500 > +++ oprofile-ibs/daemon/oprofiled.c 2008-07-22 11:03:36.000000000 -0500 > @@ -30,6 +30,7 @@ > #include "op_lockfile.h" > #include "op_list.h" > #include "op_fileio.h" > +#include "op_events.h" > > #include <sys/types.h> > #include <sys/resource.h> > @@ -57,6 +58,7 @@ int vsamples; > int varcs; > int vmodule; > int vmisc; > +int vibs_debug; > int separate_lib; > int separate_kernel; > int separate_thread; > @@ -64,6 +66,8 @@ int separate_cpu; > int no_vmlinux; > char * vmlinux; > char * kernel_range; > +int ibs_fetch_count = 0; > +int ibs_op_count = 0; > char * session_dir; > int no_xen; > char * xenimage; > @@ -94,6 +98,8 @@ static struct poptOption options[] = { > { "events", 'e', POPT_ARG_STRING, &events, 0, "events list", "[events]" }, > { "version", 'v', POPT_ARG_NONE, &showvers, 0, "show version", NULL, }, > { "verbose", 'V', POPT_ARG_STRING, &verbose, 0, "be verbose in log file", "all,sfile,arcs,samples,module,misc", }, > + { "ibs-fetch", 'i', POPT_ARG_INT, &ibs_fetch_count, 0, "AMD IBS Fetch mode", "[0 to Max]", }, > + { "ibs-op", 'o', POPT_ARG_INT, &ibs_op_count, 0, "AMD IBS OP mode", "[0 to Max]", }, > As we've already discussed, the "ibs-*" options will be removed. > POPT_AUTOHELP > { NULL, 0, 0, NULL, 0, NULL, NULL, }, > }; > @@ -353,6 +359,7 @@ static void opd_handle_verbose_option(ch > varcs = 1; > vmodule = 1; > vmisc = 1; > + vibs_debug = 1; > } else if (!strcmp(name, "sfile")) { > vsfile = 1; > } else if (!strcmp(name, "arcs")) { > @@ -363,6 +370,8 @@ static void opd_handle_verbose_option(ch > vmodule = 1; > } else if (!strcmp(name, "misc")) { > vmisc = 1; > + } else if (!strcmp(name, "ibs_debug")) { > + vibs_debug = 1; > } else {OP_MAX_IBS_COUNTERS > fprintf(stderr, "unknown verbose options\n"); > exit(EXIT_FAILURE); > @@ -426,7 +435,7 @@ static void opd_options(int argc, char c > } > } > > - if (events == NULL) { > + if (events == NULL && (ibs_fetch_count || ibs_op_count)) { > fprintf(stderr, "oprofiled: no events specified.\n"); > poptPrintHelp(optcon, stderr, 0); > exit(EXIT_FAILURE); > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/daemon/oprofiled.h oprofile-ibs/daemon/oprofiled.h > --- oprofile-cvs/daemon/oprofiled.h 2008-04-28 16:23:23.000000000 -0500 > +++ oprofile-ibs/daemon/oprofiled.h 2008-07-22 11:03:36.000000000 -0500 > @@ -65,5 +65,7 @@ extern char * kernel_range; > extern int no_xen; > extern char * xenimage; > extern char * xen_range; > +extern int ibs_fetch; > +extern int ibs_op; > > #endif /* OPROFILED_H */ > > > > |
From: Jason Y. <jas...@am...> - 2008-07-23 15:59:48
|
This patch includes more changes to back end data structure to support IBS events. Signed-off-by: Jason Yeh <jas...@am...> --- libdb/db_insert.c | 8 ++++---- oprofile-ibs/libdb/odb.h | 2 +- oprofile-ibs/libpp/profile_container.cpp | 7 +++++++ oprofile-ibs/libpp/profile_container.h | 2 ++ oprofile-ibs/libpp/sample_container.cpp | 10 ++++++++++ oprofile-ibs/libpp/sample_container.h | 3 +++ oprofile-ibs/libutil++/op_bfd.cpp | 5 +++++ oprofile-ibs/libutil++/op_bfd.h | 3 +++ oprofile-ibs/libutil/op_fileio.c | 20 ++++++++++++++++++++ oprofile-ibs/libutil/op_fileio.h | 13 +++++++++++++ 10 files changed, 68 insertions(+), 5 deletions(-) diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libdb/db_insert.c oprofile-ibs/libdb/db_insert.c --- oprofile-cvs/libdb/db_insert.c 2005-08-17 14:15:42.000000000 -0500 +++ oprofile-ibs/libdb/db_insert.c 2008-07-22 11:04:50.000000000 -0500 @@ -49,7 +49,7 @@ static inline int add_node(odb_data_t * return 0; } -int odb_update_node(odb_t * odb, odb_key_t key) +int odb_update_node(odb_t * odb, odb_key_t key, odb_value_t value) { odb_index_t index; odb_node_t * node; @@ -60,8 +60,8 @@ int odb_update_node(odb_t * odb, odb_key while (index) { node = &data->node_base[index]; if (node->key == key) { - if (node->value + 1 != 0) { - node->value += 1; + if (node->value + value != 0) { + node->value += value; } else { /* post profile tools must handle overflow */ /* FIXME: the tricky way will be just to add @@ -92,7 +92,7 @@ int odb_update_node(odb_t * odb, odb_key index = node->next; } - return add_node(data, key, 1); + return add_node(data, key, value); } diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libdb/odb.h oprofile-ibs/libdb/odb.h --- oprofile-cvs/libdb/odb.h 2005-08-17 14:15:42.000000000 -0500 +++ oprofile-ibs/libdb/odb.h 2008-07-22 11:04:50.000000000 -0500 @@ -178,7 +178,7 @@ void odb_hash_free_stat(odb_hash_stat_t * * returns EXIT_SUCCESS on success, EXIT_FAILURE on failure */ -int odb_update_node(odb_t * odb, odb_key_t key); +int odb_update_node(odb_t * odb, odb_key_t key, odb_value_t value); /** Add a new node w/o regarding if a node with the same key already exists * diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libpp/profile_container.cpp oprofile-ibs/libpp/profile_container.cpp --- oprofile-cvs/libpp/profile_container.cpp 2008-05-08 12:17:55.000000000 -0500 +++ oprofile-ibs/libpp/profile_container.cpp 2008-07-22 11:04:50.000000000 -0500 @@ -161,6 +161,12 @@ profile_container::add_samples(op_bfd co } +void profile_container::delete_samples(symbol_entry const * symbol, bfd_vma vma) +{ + samples->erase(symbol, vma); +} + + symbol_collection const profile_container::select_symbols(symbol_choice & choice) const { @@ -331,3 +337,4 @@ symbol_container::symbols_t::iterator pr { return symbols->end(); } + diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libpp/profile_container.h oprofile-ibs/libpp/profile_container.h --- oprofile-cvs/libpp/profile_container.h 2007-12-11 07:16:15.000000000 -0600 +++ oprofile-ibs/libpp/profile_container.h 2008-07-22 11:04:50.000000000 -0500 @@ -66,6 +66,8 @@ public: void add(profile_t const & profile, op_bfd const & abfd, std::string const & app_name, size_t pclass); + void delete_samples(symbol_entry const * symbol, bfd_vma vma); + /// Find a symbol from its image_name, vma, return zero if no symbol /// for this image at this vma symbol_entry const * find_symbol(std::string const & image_name, diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libpp/sample_container.cpp oprofile-ibs/libpp/sample_container.cpp --- oprofile-cvs/libpp/sample_container.cpp 2008-02-15 12:28:18.000000000 -0600 +++ oprofile-ibs/libpp/sample_container.cpp 2008-07-22 11:04:50.000000000 -0500 @@ -109,6 +109,16 @@ sample_container::find_by_vma(symbol_ent } +void +sample_container::erase(symbol_entry const * symbol, bfd_vma vma) +{ + sample_index_t key(symbol, vma); + samples_iterator it = samples.find(key); + if (it != samples.end()) + samples.erase(key); +} + + count_array_t sample_container::accumulate_samples(debug_name_id filename, size_t linenr) const diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libpp/sample_container.h oprofile-ibs/libpp/sample_container.h --- oprofile-cvs/libpp/sample_container.h 2003-08-10 19:59:18.000000000 -0500 +++ oprofile-ibs/libpp/sample_container.h 2008-07-22 11:04:50.000000000 -0500 @@ -44,6 +44,9 @@ public: /// samples into an existing one. Can only be done before any lookups void insert(symbol_entry const * symbol, sample_entry const &); + /// erase the sample entry for the given image_name and vma if any + void erase(symbol_entry const * symbol, bfd_vma vma); + /// return nr of samples in the given filename count_array_t accumulate_samples(debug_name_id filename_id) const; diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libutil/op_fileio.c oprofile-ibs/libutil/op_fileio.c --- oprofile-cvs/libutil/op_fileio.c 2005-04-22 14:29:21.000000000 -0500 +++ oprofile-ibs/libutil/op_fileio.c 2008-07-22 11:04:50.000000000 -0500 @@ -56,6 +56,26 @@ void op_close_file(FILE * fp) } +void op_read_file(FILE * fp, void * buf, size_t size) +{ + size_t count; + + count = fread(buf, size, 1, fp); + + if (count != 1) { + if (feof(fp)) { + fprintf(stderr, + "oprofiled:op_read_file: read less than expected %lu bytes\n", + (unsigned long)size); + } else { + fprintf(stderr, + "oprofiled:op_read_file: error reading\n"); + } + exit(EXIT_FAILURE); + } +} + + void op_write_file(FILE * fp, void const * buf, size_t size) { size_t written; diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libutil/op_fileio.h oprofile-ibs/libutil/op_fileio.h --- oprofile-cvs/libutil/op_fileio.h 2005-04-22 14:29:21.000000000 -0500 +++ oprofile-ibs/libutil/op_fileio.h 2008-07-22 11:04:50.000000000 -0500 @@ -41,6 +41,19 @@ FILE * op_try_open_file(char const * nam FILE * op_open_file(char const * name, char const * mode); /** + * op_read_file - read a file + * @param fp file pointer + * @param buf buffer + * @param size size in bytes to read + * + * Read from a file. It is considered an error + * if anything less than size bytes is read. + * Failure is fatal. + */ +void op_read_file(FILE * fp, void * buf, size_t size); + + +/** * op_read_int_from_file - parse an ASCII value from a file into an integer * @param filename name of file to parse integer value from * @param fatal non-zero if any error must be fatal diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libutil++/op_bfd.cpp oprofile-ibs/libutil++/op_bfd.cpp --- oprofile-cvs/libutil++/op_bfd.cpp 2008-07-03 12:12:06.000000000 -0500 +++ oprofile-ibs/libutil++/op_bfd.cpp 2008-07-22 11:04:50.000000000 -0500 @@ -290,6 +290,11 @@ void op_bfd::add_symbols(op_bfd::symbols << dec << syms.size() << hex << endl; } +unsigned long op_bfd::filepos(symbol_index_t sym_index) const +{ + return syms[sym_index].filepos(); +} + bfd_vma op_bfd::offset_to_pc(bfd_vma offset) const { diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libutil++/op_bfd.h oprofile-ibs/libutil++/op_bfd.h --- oprofile-cvs/libutil++/op_bfd.h 2008-07-03 12:12:06.000000000 -0500 +++ oprofile-ibs/libutil++/op_bfd.h 2008-07-22 11:04:50.000000000 -0500 @@ -163,6 +163,9 @@ public: /** return the relocated PC value for the given file offset */ bfd_vma offset_to_pc(bfd_vma offset) const; + + unsigned long filepos(symbol_index_t sym_index) const; + /** * If passed 0, return the file position of the .text section. * Otherwise, return the filepos of a section with a matching |
From: Maynard J. <may...@us...> - 2008-07-30 01:27:13
|
Jason Yeh wrote: > This patch includes more changes to back end data structure to support IBS events. > These changes seem OK. But see the comment near the end about the suggested renaming of your new ob_bfd::filepos function. -Maynard > Signed-off-by: Jason Yeh <jas...@am...> > --- > libdb/db_insert.c | 8 ++++---- > oprofile-ibs/libdb/odb.h | 2 +- > oprofile-ibs/libpp/profile_container.cpp | 7 +++++++ > oprofile-ibs/libpp/profile_container.h | 2 ++ > oprofile-ibs/libpp/sample_container.cpp | 10 ++++++++++ > oprofile-ibs/libpp/sample_container.h | 3 +++ > oprofile-ibs/libutil++/op_bfd.cpp | 5 +++++ > oprofile-ibs/libutil++/op_bfd.h | 3 +++ > oprofile-ibs/libutil/op_fileio.c | 20 ++++++++++++++++++++ > oprofile-ibs/libutil/op_fileio.h | 13 +++++++++++++ > 10 files changed, 68 insertions(+), 5 deletions(-) > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libdb/db_insert.c oprofile-ibs/libdb/db_insert.c > --- oprofile-cvs/libdb/db_insert.c 2005-08-17 14:15:42.000000000 -0500 > +++ oprofile-ibs/libdb/db_insert.c 2008-07-22 11:04:50.000000000 -0500 > @@ -49,7 +49,7 @@ static inline int add_node(odb_data_t * > return 0; > } > > -int odb_update_node(odb_t * odb, odb_key_t key) > +int odb_update_node(odb_t * odb, odb_key_t key, odb_value_t value) > { > odb_index_t index; > odb_node_t * node; > @@ -60,8 +60,8 @@ int odb_update_node(odb_t * odb, odb_key > while (index) { > node = &data->node_base[index]; > if (node->key == key) { > - if (node->value + 1 != 0) { > - node->value += 1; > + if (node->value + value != 0) { > + node->value += value; > } else { > /* post profile tools must handle overflow */ > /* FIXME: the tricky way will be just to add > @@ -92,7 +92,7 @@ int odb_update_node(odb_t * odb, odb_key > index = node->next; > } > > - return add_node(data, key, 1); > + return add_node(data, key, value); > } > > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libdb/odb.h oprofile-ibs/libdb/odb.h > --- oprofile-cvs/libdb/odb.h 2005-08-17 14:15:42.000000000 -0500 > +++ oprofile-ibs/libdb/odb.h 2008-07-22 11:04:50.000000000 -0500 > @@ -178,7 +178,7 @@ void odb_hash_free_stat(odb_hash_stat_t > * > * returns EXIT_SUCCESS on success, EXIT_FAILURE on failure > */ > -int odb_update_node(odb_t * odb, odb_key_t key); > +int odb_update_node(odb_t * odb, odb_key_t key, odb_value_t value); > > /** Add a new node w/o regarding if a node with the same key already exists > * > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libpp/profile_container.cpp oprofile-ibs/libpp/profile_container.cpp > --- oprofile-cvs/libpp/profile_container.cpp 2008-05-08 12:17:55.000000000 -0500 > +++ oprofile-ibs/libpp/profile_container.cpp 2008-07-22 11:04:50.000000000 -0500 > @@ -161,6 +161,12 @@ profile_container::add_samples(op_bfd co > } > > > +void profile_container::delete_samples(symbol_entry const * symbol, bfd_vma vma) > +{ > + samples->erase(symbol, vma); > +} > + > + > symbol_collection const > profile_container::select_symbols(symbol_choice & choice) const > { > @@ -331,3 +337,4 @@ symbol_container::symbols_t::iterator pr > { > return symbols->end(); > } > + > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libpp/profile_container.h oprofile-ibs/libpp/profile_container.h > --- oprofile-cvs/libpp/profile_container.h 2007-12-11 07:16:15.000000000 -0600 > +++ oprofile-ibs/libpp/profile_container.h 2008-07-22 11:04:50.000000000 -0500 > @@ -66,6 +66,8 @@ public: > void add(profile_t const & profile, op_bfd const & abfd, > std::string const & app_name, size_t pclass); > > + void delete_samples(symbol_entry const * symbol, bfd_vma vma); > + > /// Find a symbol from its image_name, vma, return zero if no symbol > /// for this image at this vma > symbol_entry const * find_symbol(std::string const & image_name, > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libpp/sample_container.cpp oprofile-ibs/libpp/sample_container.cpp > --- oprofile-cvs/libpp/sample_container.cpp 2008-02-15 12:28:18.000000000 -0600 > +++ oprofile-ibs/libpp/sample_container.cpp 2008-07-22 11:04:50.000000000 -0500 > @@ -109,6 +109,16 @@ sample_container::find_by_vma(symbol_ent > } > > > +void > +sample_container::erase(symbol_entry const * symbol, bfd_vma vma) > +{ > + sample_index_t key(symbol, vma); > + samples_iterator it = samples.find(key); > + if (it != samples.end()) > + samples.erase(key); > +} > + > + > count_array_t > sample_container::accumulate_samples(debug_name_id filename, > size_t linenr) const > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libpp/sample_container.h oprofile-ibs/libpp/sample_container.h > --- oprofile-cvs/libpp/sample_container.h 2003-08-10 19:59:18.000000000 -0500 > +++ oprofile-ibs/libpp/sample_container.h 2008-07-22 11:04:50.000000000 -0500 > @@ -44,6 +44,9 @@ public: > /// samples into an existing one. Can only be done before any lookups > void insert(symbol_entry const * symbol, sample_entry const &); > > + /// erase the sample entry for the given image_name and vma if any > + void erase(symbol_entry const * symbol, bfd_vma vma); > + > /// return nr of samples in the given filename > count_array_t accumulate_samples(debug_name_id filename_id) const; > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libutil/op_fileio.c oprofile-ibs/libutil/op_fileio.c > --- oprofile-cvs/libutil/op_fileio.c 2005-04-22 14:29:21.000000000 -0500 > +++ oprofile-ibs/libutil/op_fileio.c 2008-07-22 11:04:50.000000000 -0500 > @@ -56,6 +56,26 @@ void op_close_file(FILE * fp) > } > > > +void op_read_file(FILE * fp, void * buf, size_t size) > +{ > + size_t count; > + > + count = fread(buf, size, 1, fp); > + > + if (count != 1) { > + if (feof(fp)) { > + fprintf(stderr, > + "oprofiled:op_read_file: read less than expected %lu bytes\n", > + (unsigned long)size); > + } else { > + fprintf(stderr, > + "oprofiled:op_read_file: error reading\n"); > + } > + exit(EXIT_FAILURE); > + } > +} > + > + > void op_write_file(FILE * fp, void const * buf, size_t size) > { > size_t written; > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libutil/op_fileio.h oprofile-ibs/libutil/op_fileio.h > --- oprofile-cvs/libutil/op_fileio.h 2005-04-22 14:29:21.000000000 -0500 > +++ oprofile-ibs/libutil/op_fileio.h 2008-07-22 11:04:50.000000000 -0500 > @@ -41,6 +41,19 @@ FILE * op_try_open_file(char const * nam > FILE * op_open_file(char const * name, char const * mode); > > /** > + * op_read_file - read a file > + * @param fp file pointer > + * @param buf buffer > + * @param size size in bytes to read > + * > + * Read from a file. It is considered an error > + * if anything less than size bytes is read. > + * Failure is fatal. > + */ > +void op_read_file(FILE * fp, void * buf, size_t size); > + > + > +/** > * op_read_int_from_file - parse an ASCII value from a file into an integer > * @param filename name of file to parse integer value from > * @param fatal non-zero if any error must be fatal > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libutil++/op_bfd.cpp oprofile-ibs/libutil++/op_bfd.cpp > --- oprofile-cvs/libutil++/op_bfd.cpp 2008-07-03 12:12:06.000000000 -0500 > +++ oprofile-ibs/libutil++/op_bfd.cpp 2008-07-22 11:04:50.000000000 -0500 > @@ -290,6 +290,11 @@ void op_bfd::add_symbols(op_bfd::symbols > << dec << syms.size() << hex << endl; > } > > +unsigned long op_bfd::filepos(symbol_index_t sym_index) const > Suggest naming this "filepos_for_sym" to better describe its functino. > +{ > + return syms[sym_index].filepos(); > +} > + > > bfd_vma op_bfd::offset_to_pc(bfd_vma offset) const > { > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libutil++/op_bfd.h oprofile-ibs/libutil++/op_bfd.h > --- oprofile-cvs/libutil++/op_bfd.h 2008-07-03 12:12:06.000000000 -0500 > +++ oprofile-ibs/libutil++/op_bfd.h 2008-07-22 11:04:50.000000000 -0500 > @@ -163,6 +163,9 @@ public: > /** return the relocated PC value for the given file offset */ > bfd_vma offset_to_pc(bfd_vma offset) const; > > + > + unsigned long filepos(symbol_index_t sym_index) const; > + > /** > * If passed 0, return the file position of the .text section. > * Otherwise, return the filepos of a section with a matching > > > > > |
From: Jason Y. <jas...@am...> - 2008-07-23 16:02:03
|
This patch includes changes to recognize IBS events. Signed-off-by: Jason Yeh <jas...@am...> --- oprofile-ibs/libop/op_events.c | 19 +++++++++++++++---- utils/ophelp.c | 13 +++++++++++++ 2 files changed, 28 insertions(+), 4 deletions(-) diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/utils/ophelp.c oprofile-ibs/utils/ophelp.c --- oprofile-cvs/utils/ophelp.c 2008-02-22 10:17:49.000000000 -0600 +++ oprofile-ibs/utils/ophelp.c 2008-07-22 14:57:26.000000000 -0500 @@ -73,6 +73,19 @@ static void help_for_event(struct op_eve do_arch_specific_event_help(event); nr_counters = op_get_nr_counters(cpu_type); + /* + * Sanity check + */ + if(!event) + return; + + /* + * Check for IBS derived events, we do not want + * to list these events + */ + if( event->name != NULL && strncmp(event->name,"IBS",3) == 0) + return; + printf("%s", event->name); printf(": (counter: "); diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libop/op_events.c oprofile-ibs/libop/op_events.c --- oprofile-cvs/libop/op_events.c 2008-02-22 10:17:48.000000000 -0600 +++ oprofile-ibs/libop/op_events.c 2008-07-22 14:57:02.000000000 -0500 @@ -432,10 +432,20 @@ static void load_events(op_cpu cpu_type) char * um_file; char * dir; struct list_head * pos; + static op_cpu last_cpu_type = 0; - if (!list_empty(&events_list)) - return; - + if(last_cpu_type != cpu_type) + { + last_cpu_type = cpu_type; + + // Empty the list and reinitialize it. + op_free_events(); + } + else + { + if (!list_empty(&events_list)) + return; + } dir = getenv("OPROFILE_EVENTS_DIR"); if (dir == NULL) dir = OP_DATADIR; @@ -691,7 +701,8 @@ struct op_event * op_find_event(op_cpu c { struct op_event * event; - load_events(cpu_type); + if (list_empty(&events_list)) + load_events(cpu_type); event = find_event(nr); |
From: Maynard J. <may...@us...> - 2008-07-30 01:37:27
|
Jason Yeh wrote: > This patch includes changes to recognize IBS events. > > Signed-off-by: Jason Yeh <jas...@am...> > --- > oprofile-ibs/libop/op_events.c | 19 +++++++++++++++---- > utils/ophelp.c | 13 +++++++++++++ > 2 files changed, 28 insertions(+), 4 deletions(-) > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/utils/ophelp.c > oprofile-ibs/utils/ophelp.c > --- oprofile-cvs/utils/ophelp.c 2008-02-22 10:17:49.000000000 -0600 > +++ oprofile-ibs/utils/ophelp.c 2008-07-22 14:57:26.000000000 -0500 > @@ -73,6 +73,19 @@ static void help_for_event(struct op_eve > do_arch_specific_event_help(event); > nr_counters = op_get_nr_counters(cpu_type); > > + /* > + * Sanity check > + */ > + if(!event) > + return; > + > + /* > + * Check for IBS derived events, we do not want > + * to list these events > + */ > + if( event->name != NULL && strncmp(event->name,"IBS",3) == 0) > + return; > + > printf("%s", event->name); > > printf(": (counter: "); > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/libop/op_events.c > oprofile-ibs/libop/op_events.c > --- oprofile-cvs/libop/op_events.c 2008-02-22 10:17:48.000000000 -0600 > +++ oprofile-ibs/libop/op_events.c 2008-07-22 14:57:02.000000000 -0500 > @@ -432,10 +432,20 @@ static void load_events(op_cpu cpu_type) > char * um_file; > char * dir; > struct list_head * pos; > + static op_cpu last_cpu_type = 0; > > - if (!list_empty(&events_list)) > - return; > - > + if(last_cpu_type != cpu_type) > + { > + last_cpu_type = cpu_type; > + > + // Empty the list and reinitialize it. > + op_free_events(); > + } > + else > + { > + if (!list_empty(&events_list)) > + return; > + } > Why are these changes necessary? > dir = getenv("OPROFILE_EVENTS_DIR"); > if (dir == NULL) > dir = OP_DATADIR; > @@ -691,7 +701,8 @@ struct op_event * op_find_event(op_cpu c > { > struct op_event * event; > > - load_events(cpu_type); > + if (list_empty(&events_list)) > + load_events(cpu_type); > > event = find_event(nr); > > > > > > |
From: Jason Y. <jas...@am...> - 2008-07-23 16:03:07
|
This patch includes changes to opannotate. Signed-off-by: Jason Yeh <jas...@am...> --- opannotate.cpp | 211 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 202 insertions(+), 9 deletions(-) diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/pp/opannotate.cpp oprofile-ibs/pp/opannotate.cpp --- oprofile-cvs/pp/opannotate.cpp 2007-11-06 07:45:40.000000000 -0600 +++ oprofile-ibs/pp/opannotate.cpp 2008-07-22 11:05:22.000000000 -0500 @@ -176,6 +176,92 @@ string asm_line_annotation(symbol_entry } +/// NOTE: This function annotates a list<string> containing output from objdump. +/// It uses a list iterator, and a sample_container iterator which iterates +/// from the beginning to the end, and compare sample address +/// against the instruction address on the asm line. +/// +/// There are 2 cases of annotation: +/// 1. If sample address matches current line address, annotate the current line. +/// 2. If (previous line address < sample address < current line address), +/// then we annotate previous line. This case happens when sample address +/// is not aligned with the instruction address, which is seen when profile +/// using the instruction fetch mode of AMD Instruction-Based Sampling (IBS). +/// +string asm_list_annotation(symbol_entry const * last_symbol, + list<string>::iterator sit, + sample_container::samples_iterator & samp_it ) +{ + // do not use the bfd equivalent: + // - it does not skip space at begin + // - we does not need cross architecture compile so the native + // strtoull must work, assuming unsigned long long can contain a vma + // and on 32/64 bits box bfd_vma is 64 bits + // gcc 2.91.66 workaround + string value = *sit; + bfd_vma vma = strtoull(value.c_str(), NULL, 16); + string str; + + sample_entry const * sample = &samp_it->second; + + if (sample->vma == vma) { + str += count_str(sample->counts, samples->samples_count()); + + // For each events + for (size_t i = 1; i < nr_events; ++i) + str += " "; + + str += " :"; + *sit = str + *sit; + if(samp_it != samples->end()) + ++samp_it; + samples->delete_samples(last_symbol, vma); + } else if (sample->vma < vma) { + // vma of the current line is greater than vma of the sample + + // Get the string of previous line + list<string>::iterator sit_prev = sit; + --sit_prev; + string prev_line = *sit_prev; + + // sub_string is the part that contains adderss + string sub_string = prev_line.substr(prev_line.find(":", 0)+1, prev_line.length()); + + bfd_vma vma_prev = strtoull(sub_string.c_str(), NULL, 16); + + // Need to check if prev_vma < sample->vma + if( vma_prev < sample->vma) + { + // Aggregate sample with previous line if it already has samples + sample_entry * prev_sample = (sample_entry *)samples->find_sample(last_symbol, vma_prev); + if (prev_sample) + prev_sample->counts += sample->counts; + + str += count_str(sample->counts, samples->samples_count()); + + // For each events + for (size_t i = 1; i < nr_events; ++i) + str += " "; + + str += " :"; + *sit_prev = str + sub_string; + *sit = annotation_fill + *sit; + if(samp_it != samples->end()) + ++samp_it; + samples->delete_samples(last_symbol, sample->vma); + }else{ + *sit = annotation_fill + *sit; + if(samp_it != samples->end()) + ++samp_it; + } + } else { + *sit = annotation_fill + *sit; + } + + return str; +} + + string symbol_annotation(symbol_entry const * symbol) { if (!symbol) @@ -269,11 +355,123 @@ symbol_entry const * output_objdump_asm_ } +/// NOTE: This function is similar to the function "output_objdump_asm_line" above. +/// It operates on the list<string> instead of just one string. +symbol_entry const * annotate_objdump_str_list(symbol_entry const * last_symbol, + string const & app_name, + list<string>::iterator sit, + symbol_collection const & symbols, + bool & do_output, + sample_container::samples_iterator & samp_it) +{ + // output of objdump is a human readable form and can contain some + // ambiguity so this code is dirty. It is also optimized a little bit + // so it is difficult to simplify it without breaking something ... + + // line of interest are: "[:space:]*[:xdigit:]?[ :]", the last char of + // this regexp dis-ambiguate between a symbol line and an asm line. If + // source contain line of this form an ambiguity occur and we rely on + // the robustness of this code. + string str = *sit; + size_t pos = 0; + while (pos < str.length() && isspace(str[pos])) + ++pos; + + if (pos == str.length() || !isxdigit(str[pos])) { + if (do_output) { + *sit = annotation_fill + str; + return last_symbol; + } + } + + while (pos < str.length() && isxdigit(str[pos])) + ++pos; + + if (pos == str.length() || (!isspace(str[pos]) && str[pos] != ':')) { + if (do_output) { + *sit = annotation_fill + str; + return last_symbol; + } + } + + if (is_symbol_line(str, pos)) { + + last_symbol = find_symbol(app_name, str); + + // ! complexity: linear in number of symbol must use sorted + // by address vector and lower_bound ? + // Note this use a pointer comparison. It work because symbols + // pointer are unique + if (find(symbols.begin(), symbols.end(), last_symbol) + != symbols.end()) { + do_output = true; + } else { + do_output = false; + } + if(do_output){ + *sit += symbol_annotation(last_symbol); + + // Realign the sample iterator to + // the beginning of this symbols + samp_it = samples->begin(last_symbol); + } + } else { + // not a symbol, probably an asm line. + if(do_output){ + asm_list_annotation(last_symbol, sit, samp_it); + } + } + + return last_symbol; +} + + +void output_objdump_str_list( symbol_collection const & symbols, + string const & app_name, + list<string> & asm_lines) +{ + symbol_entry const * last_symbol = 0; + + // to filter output of symbols (filter based on command line options) + bool do_output = true; + + // We simultaneously walk the two structures (list and sample_container) + // which are sorted by address. and do address comparision. + list<string>::iterator sit = asm_lines.begin(); + list<string>::iterator send = asm_lines.end(); + sample_container::samples_iterator samp_it = samples->begin(); + + for(; sit != send; sit++) { + last_symbol = annotate_objdump_str_list(last_symbol, + app_name, + sit, + symbols, + do_output, + samp_it); + + if(!do_output) { + *sit = ""; + } + } + + // Printing objdump output to stdout + sit = asm_lines.begin(); + for(; sit != send; ++sit) { + string str = *sit; + if(str.length() != 0) + cout << str << '\n'; + } +} + + + + void do_one_output_objdump(symbol_collection const & symbols, string const & image_name, string const & app_name, bfd_vma start, bfd_vma end) { vector<string> args; + list<string> asm_lines; args.push_back("-d"); args.push_back("--no-show-raw-insn"); @@ -301,16 +499,14 @@ void do_one_output_objdump(symbol_collec return; } - // to filter output of symbols (filter based on command line options) - bool do_output = true; - - symbol_entry const * last_symbol = 0; + // Read each output line from objdump and store in a list. string str; while (reader.getline(str)) { - last_symbol = output_objdump_asm_line(last_symbol, app_name, - str, symbols, do_output); + asm_lines.push_back(str); } + output_objdump_str_list(symbols, app_name, asm_lines); + // objdump always returns SUCCESS so we must rely on the stderr state // of objdump. If objdump error message is cryptic our own error // message will be probably also cryptic @@ -693,9 +889,6 @@ int opannotate(options::spec const & spe nr_events = classes.v.size(); - samples.reset(new profile_container(true, true, - classes.extra_found_images)); - list<string> images; list<inverted_profile> iprofiles = invert_profiles(classes); |
From: Maynard J. <may...@us...> - 2008-07-30 17:10:08
|
Jason Yeh wrote: > This patch includes changes to opannotate. > In your introductory note for the 7 patches, you mention that both opreport and opannotate would need changes to allow for the fact that IBS sample addresses can be in the middle of an instruction, but I don't see any changes in your patch set to opreport. Did you find that no changes were needed. General style comments on this patch: Some alignment problems and lines too long. > Signed-off-by: Jason Yeh <jas...@am...> > --- > opannotate.cpp | 211 ++++++++++++++++++++++++++++++++++++++++++++++++++++++--- > 1 file changed, 202 insertions(+), 9 deletions(-) > > diff -uprN -X /home/jasonyeh/dontdiff oprofile-cvs/pp/opannotate.cpp oprofile-ibs/pp/opannotate.cpp > --- oprofile-cvs/pp/opannotate.cpp 2007-11-06 07:45:40.000000000 -0600 > +++ oprofile-ibs/pp/opannotate.cpp 2008-07-22 11:05:22.000000000 -0500 > @@ -176,6 +176,92 @@ string asm_line_annotation(symbol_entry > } > > > +/// NOTE: This function annotates a list<string> containing output from objdump. > +/// It uses a list iterator, and a sample_container iterator which iterates > +/// from the beginning to the end, and compare sample address > +/// against the instruction address on the asm line. > +/// > +/// There are 2 cases of annotation: > +/// 1. If sample address matches current line address, annotate the current line. > +/// 2. If (previous line address < sample address < current line address), > +/// then we annotate previous line. This case happens when sample address > +/// is not aligned with the instruction address, which is seen when profile > +/// using the instruction fetch mode of AMD Instruction-Based Sampling (IBS). > +/// > +string asm_list_annotation(symbol_entry const * last_symbol, > + list<string>::iterator sit, > A better name than "sit", please. > + sample_container::samples_iterator & samp_it ) > +{ > + // do not use the bfd equivalent: > + // - it does not skip space at begin > + // - we does not need cross architecture compile so the native > + // strtoull must work, assuming unsigned long long can contain a vma > + // and on 32/64 bits box bfd_vma is 64 bits > + // gcc 2.91.66 workaround > + string value = *sit; > + bfd_vma vma = strtoull(value.c_str(), NULL, 16); > + string str; > + > + sample_entry const * sample = &samp_it->second; > + > + if (sample->vma == vma) { > + str += count_str(sample->counts, samples->samples_count()); > + > + // For each events > + for (size_t i = 1; i < nr_events; ++i) > + str += " "; > + > + str += " :"; > + *sit = str + *sit; > + if(samp_it != samples->end()) > + ++samp_it; > + samples->delete_samples(last_symbol, vma); > + } else if (sample->vma < vma) { > + // vma of the current line is greater than vma of the sample > + > + // Get the string of previous line > + list<string>::iterator sit_prev = sit; > + --sit_prev; > + string prev_line = *sit_prev; > + > + // sub_string is the part that contains adderss > + string sub_string = prev_line.substr(prev_line.find(":", 0)+1, prev_line.length()); > + > + bfd_vma vma_prev = strtoull(sub_string.c_str(), NULL, 16); > + > + // Need to check if prev_vma < sample->vma > + if( vma_prev < sample->vma) > + { > + // Aggregate sample with previous line if it already has samples > + sample_entry * prev_sample = (sample_entry *)samples->find_sample(last_symbol, vma_prev); > + if (prev_sample) > + prev_sample->counts += sample->counts; > + > + str += count_str(sample->counts, samples->samples_count()); > + > + // For each events > + for (size_t i = 1; i < nr_events; ++i) > + str += " "; > + > + str += " :"; > + *sit_prev = str + sub_string; > + *sit = annotation_fill + *sit; > + if(samp_it != samples->end()) > + ++samp_it; > + samples->delete_samples(last_symbol, sample->vma); > + }else{ > + *sit = annotation_fill + *sit; > + if(samp_it != samples->end()) > + ++samp_it; > + } > + } else { > + *sit = annotation_fill + *sit; > + } > + > + return str; > +} > + > + > string symbol_annotation(symbol_entry const * symbol) > { > if (!symbol) > @@ -269,11 +355,123 @@ symbol_entry const * output_objdump_asm_ > } > > > +/// NOTE: This function is similar to the function "output_objdump_asm_line" above. > Your patch replaces the use of output_objdump_asm_line with this new function, annotate_objdump_str_list. Dead code should be removed. I think I understand why all of these changes are necessary to support IBS, but a brief overview of the changes would be helpful. I don't suppose there's any way to short cut the processing for when we're not doing IBS. I'd like to see a comparison of run times for 'opannotate -a' for a full system profile, before and after your changes. > +/// It operates on the list<string> instead of just one string. > +symbol_entry const * annotate_objdump_str_list(symbol_entry const * last_symbol, > + string const & app_name, > + list<string>::iterator sit, > A better name than "sit", please. > + symbol_collection const & symbols, > + bool & do_output, > + sample_container::samples_iterator & samp_it) > +{ > + // output of objdump is a human readable form and can contain some > + // ambiguity so this code is dirty. It is also optimized a little bit > + // so it is difficult to simplify it without breaking something ... > + > + // line of interest are: "[:space:]*[:xdigit:]?[ :]", the last char of > + // this regexp dis-ambiguate between a symbol line and an asm line. If > + // source contain line of this form an ambiguity occur and we rely on > + // the robustness of this code. > + string str = *sit; > + size_t pos = 0; > + while (pos < str.length() && isspace(str[pos])) > + ++pos; > + > + if (pos == str.length() || !isxdigit(str[pos])) { > + if (do_output) { > + *sit = annotation_fill + str; > + return last_symbol; > + } > + } > + > + while (pos < str.length() && isxdigit(str[pos])) > + ++pos; > + > + if (pos == str.length() || (!isspace(str[pos]) && str[pos] != ':')) { > + if (do_output) { > + *sit = annotation_fill + str; > alignment > + return last_symbol; > + } > + } > + > + if (is_symbol_line(str, pos)) { > + > + last_symbol = find_symbol(app_name, str); > + > + // ! complexity: linear in number of symbol must use sorted > + // by address vector and lower_bound ? > + // Note this use a pointer comparison. It work because symbols > + // pointer are unique > + if (find(symbols.begin(), symbols.end(), last_symbol) > + != symbols.end()) { > + do_output = true; > + } else { > + do_output = false; > + } > + if(do_output){ > + *sit += symbol_annotation(last_symbol); > + > + // Realign the sample iterator to > + // the beginning of this symbols > + samp_it = samples->begin(last_symbol); > + } > + } else { > + // not a symbol, probably an asm line. > + if(do_output){ > + asm_list_annotation(last_symbol, sit, samp_it); > + } > + } > + > + return last_symbol; > +} > + > + > +void output_objdump_str_list( symbol_collection const & symbols, > + string const & app_name, > + list<string> & asm_lines) > +{ > + symbol_entry const * last_symbol = 0; > + > + // to filter output of symbols (filter based on command line options) > + bool do_output = true; > + > + // We simultaneously walk the two structures (list and sample_container) > + // which are sorted by address. and do address comparision. > + list<string>::iterator sit = asm_lines.begin(); > + list<string>::iterator send = asm_lines.end(); > + sample_container::samples_iterator samp_it = samples->begin(); > + > + for(; sit != send; sit++) { > + last_symbol = annotate_objdump_str_list(last_symbol, > + app_name, > + sit, > + symbols, > + do_output, > + samp_it); > + > + if(!do_output) { > + *sit = ""; > + } > + } > + > + // Printing objdump output to stdout > + sit = asm_lines.begin(); > + for(; sit != send; ++sit) { > + string str = *sit; > + if(str.length() != 0) > + cout << str << '\n'; > + } > +} > + > + > + > + > void do_one_output_objdump(symbol_collection const & symbols, > string const & image_name, string const & app_name, > bfd_vma start, bfd_vma end) > { > vector<string> args; > + list<string> asm_lines; > > args.push_back("-d"); > args.push_back("--no-show-raw-insn"); > @@ -301,16 +499,14 @@ void do_one_output_objdump(symbol_collec > return; > } > > - // to filter output of symbols (filter based on command line options) > - bool do_output = true; > - > - symbol_entry const * last_symbol = 0; > + // Read each output line from objdump and store in a list. > string str; > while (reader.getline(str)) { > - last_symbol = output_objdump_asm_line(last_symbol, app_name, > - str, symbols, do_output); > + asm_lines.push_back(str); > } > > + output_objdump_str_list(symbols, app_name, asm_lines); > + > // objdump always returns SUCCESS so we must rely on the stderr state > // of objdump. If objdump error message is cryptic our own error > // message will be probably also cryptic > @@ -693,9 +889,6 @@ int opannotate(options::spec const & spe > > nr_events = classes.v.size(); > > - samples.reset(new profile_container(true, true, > - classes.extra_found_images)); > I'm not understanding what this is about. If we don't add a new profile_container to samples, we're going to blow up in other parts of opannotate that use samples. > - > list<string> images; > > list<inverted_profile> iprofiles = invert_profiles(classes); > > > > > |
From: Jason Y. <jas...@am...> - 2008-07-30 18:18:45
|
Maynard Johnson wrote: > Jason Yeh wrote: >> This patch includes changes to opannotate. >> > In your introductory note for the 7 patches, you mention that both > opreport and opannotate would need changes to allow for the fact that > IBS sample addresses can be in the middle of an instruction, but I don't > see any changes in your patch set to opreport. Did you find that no > changes were needed. Thanks for review the code. There is no change for opreport. > > General style comments on this patch: Some alignment problems and > lines too long. I will make sure I ran the check_style.py before sending the next version of the patches. >> +/// NOTE: This function is similar to the function "output_objdump_asm_line" above. >> > Your patch replaces the use of output_objdump_asm_line with this new > function, annotate_objdump_str_list. Dead code should be removed. I > think I understand why all of these changes are necessary to support > IBS, but a brief overview of the changes would be helpful. I don't > suppose there's any way to short cut the processing for when we're not > doing IBS. I'd like to see a comparison of run times for 'opannotate > -a' for a full system profile, before and after your changes. ok. I will do that. >> @@ -693,9 +889,6 @@ int opannotate(options::spec const & spe >> >> nr_events = classes.v.size(); >> >> - samples.reset(new profile_container(true, true, >> - classes.extra_found_images)); >> > I'm not understanding what this is about. If we don't add a new > profile_container to samples, we're going to blow up in other parts of > opannotate that use samples. I can't remember the rationale behind it. It may have been a bug. Jason |
From: John L. <le...@mo...> - 2008-07-30 18:23:32
|
On Wed, Jul 30, 2008 at 01:19:25PM -0500, Jason Yeh wrote: > > General style comments on this patch: Some alignment problems and > > lines too long. > > I will make sure I ran the check_style.py before sending the next version > of the patches. Note that this is a guide and isn't perfect. In particular, you should make sure its recommendations are also applied in places it didn't catch, and generally make sure your changes use the same spacing as existing code. Don't take check_style.py as gospel if what it's saying seems stupid. regards john |