From: Suravee S. <sur...@am...> - 2009-04-14 17:49:08
|
Change Notes: ============ Changes from revision 4: - Made modifications based on feedback from Maynard. * Run check_style.py script * Fixed coding style to follow the guideline * Remove trans_IBS_OP_log_dcmissinfo * Rework some logics in opannotate * Remove libpp changes - Tested with kernel-2.8.29.1 (rebuild the kernel after enabling IBS) - "make distcheck" passed Changes from revision 3: - Tested with kernel-2.8.29.1 (rebuild the kernel after enabling IBS) - "make distcheck" passed - Modified to use OProfile Extended-feature Interface - Fixed opannotate bug - Add IBS data filtering Introduction ============ These patches extend Oprofile to support Instruction Based Sampling (IBS) available on AMD Family 10h processors. The specification of IBS is described in section 2.17.2 of "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors". IBS provides wide range of precise information on instruction fetch phase and execution phase. The document "Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors" explains and demonstrates the uses of IBS in details. The patches are made against the head of CVS. The required kernel support is in the kernel patch starting from patch-2.6.28-rc2. Design Outline ============== = Terms = EBS: Event based sampling IBS: Instructions based sampling = opcontrol changes = Enabling IBS profiling is done simply by specifying IBS events through the "--event=" options similar to the event-based profiling: * opcontrol --event=IBS_FETCH_XXXX:<count>:<um>:<kernel>:<user> * opcontrol --event=IBS_OP_XXXX:<count>:<um>:<kernel>:<user> IBS performance events are listed in the event/unitmask files. opcontrol has been modified to handle these events, configure the driver interface (/dev/oprofile/ibs_fetch/... and /dev/oprofile/ibs_op/...) and start oprofiled with the appropriate options based on the users input. = Driver interface changes = Two directories, /dev/oprofile/ibs_fetch and /dev/oprofile/ibs_op are added to the oprofilefs allowing the control of MSRs through oprofile.ko module. Both directories contains device file enable and max_count. The file "enable" enables and disables the functionalities of the directory containing it. The "max_count" file specifies the maximum count value of the periodic op/fetch counter (bit 15:0 of MSR 0xC001_1030 and 0xC001_1033). Directory "ibs_fetch" contains "ran_enable" file in addition to the files mentioned. It corresponds to bit 57 of MSR 0xC001_1030. When enabled, bits 3:0 of the fetch counter are randomized when IBS fetch is set to start the fetch counter. Directory "ibs_op" contains "dispatched_op" file in addition to the files mentioned. It corresponds to bit 19 of MSR 0xC001_1033. This bit selects the mode of instruction tagging for IBS-Op, (0: Count Clock Cycle, 1:Count dispatched ops) = Daemon changes = To differentiate IBS events from EBS events and to accommodate the fact that IBS events are not uniform in length when read from buffer. Two escape codes "IBS_FETCH_SAMPLE" and "IBS_OP_SAMPLE" and their handlers are added. Each IBS sample contains encapsulates multitudes of data. For example, single IBS fetch data contains information of instruction cache L2TLB miss, instruction cache L1TLB miss, L1 TLB page size, instruction cache miss, linear address, physical address, etc. Howver, only the performance data specified by users (through opcontrol) are logged. OProfile Extended-feature Interface is used to hook up IBS handler, which translates and logs IBS samples. = Reporting tool changes = Virtual address associated with IBS fetch may lie in the middle of an instruction. opannotate are modified to handle this case when printing out report. References ================ "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors" (http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116.pdf) Drongowski, Paul. "Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors". 2007. (http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf) |
From: Suravee S. <sur...@am...> - 2009-04-14 17:49:12
|
This is the ChangeLog. ChangeLog | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) Signed-off-by: Suravee Suthikulpanit <sur...@am...> --- diff -paurN oprofile-base/ChangeLog oprofile-ibs-latest/ChangeLog --- oprofile-base/ChangeLog 2009-04-10 16:11:19.000000000 -0500 +++ oprofile-ibs-latest/ChangeLog 2009-04-13 17:36:36.000000000 -0500 @@ -1,3 +1,23 @@ +2009-04-06 Suravee Suthikulpanit <sur...@am...> + + * daemon/Makefile.am: + * daemon/opd_extended.c: + * daemon/opd_ibs.c: New File + * daemon/opd_ibs.h: New File + * daemon/opd_ibs_macro.h: New File + * daemon/opd_ibs_trans.c: New File + * daemon/opd_ibs_trans.h: New File + * daemon/opd_interface.h: + * daemon/opd_sfile.c: + * daemon/opd_sfile.h: + * daemon/opd_trans.c: + * libdb/db_insert.c: + * libdb/odb.h: + * events/x86-64/family10/events: + * events/x86-64/family10/unit_masks: + * pp/opannotate.cpp: + * utils/opcontrol: Add IBS support + 2009-04-08 William Cohen <wc...@re...> * configure.in: Add check for basename declaration |
From: Suravee S. <sur...@am...> - 2009-04-14 17:49:23
|
This patch contains IBS-related changes for daemon and libdb including : * Implement IBS Sample processing (daemon/opd_ibs.[h,c]) * Interface with Oprofile Extended-feature Interface. * Implement IBS Statistics * A new API called odb_update_node_with_offset which extends odb_update_node to allow adding a specific offset to the current value. daemon/Makefile.am | 7 daemon/opd_extended.c | 3 daemon/opd_ibs.c | 692 +++++++++++++++++++++++++++++++++++++++++++++++++ daemon/opd_ibs.h | 137 +++++++++ daemon/opd_ibs_macro.h | 366 +++++++++++++++++++++++++ daemon/opd_ibs_trans.c | 554 +++++++++++++++++++++++++++++++++++++++ daemon/opd_ibs_trans.h | 31 ++ daemon/opd_interface.h | 9 daemon/opd_sfile.c | 15 - daemon/opd_sfile.h | 4 daemon/opd_trans.c | 9 libdb/db_insert.c | 13 libdb/odb.h | 16 + 13 files changed, 1845 insertions(+), 11 deletions(-) Signed-off-by: Suravee Suthikulpanit <sur...@am...> --- diff -paurN oprofile-base/libdb/db_insert.c oprofile-ibs-latest/libdb/db_insert.c --- oprofile-base/libdb/db_insert.c 2005-08-17 14:15:42.000000000 -0500 +++ oprofile-ibs-latest/libdb/db_insert.c 2009-04-01 21:30:00.000000000 -0500 @@ -51,6 +51,13 @@ static inline int add_node(odb_data_t * int odb_update_node(odb_t * odb, odb_key_t key) { + return odb_update_node_with_offset(odb, key, 1); +} + +int odb_update_node_with_offset(odb_t * odb, + odb_key_t key, + unsigned long int offset) +{ odb_index_t index; odb_node_t * node; odb_data_t * data; @@ -60,8 +67,8 @@ int odb_update_node(odb_t * odb, odb_key while (index) { node = &data->node_base[index]; if (node->key == key) { - if (node->value + 1 != 0) { - node->value += 1; + if (node->value + offset != 0) { + node->value += offset; } else { /* post profile tools must handle overflow */ /* FIXME: the tricky way will be just to add @@ -92,7 +99,7 @@ int odb_update_node(odb_t * odb, odb_key index = node->next; } - return add_node(data, key, 1); + return add_node(data, key, offset); } diff -paurN oprofile-base/libdb/odb.h oprofile-ibs-latest/libdb/odb.h --- oprofile-base/libdb/odb.h 2005-08-17 14:15:42.000000000 -0500 +++ oprofile-ibs-latest/libdb/odb.h 2009-04-01 21:30:00.000000000 -0500 @@ -180,6 +180,22 @@ void odb_hash_free_stat(odb_hash_stat_t */ int odb_update_node(odb_t * odb, odb_key_t key); +/** + * odb_update_node_with_offset + * @param odb the data base object to setup + * @param key the hash key + * @param offset the offset to be added + * + * update info at key by adding the specified offset to its associated value, + * if the key does not exist a new node is created and the value associated + * is set to offset. + * + * returns EXIT_SUCCESS on success, EXIT_FAILURE on failure + */ +int odb_update_node_with_offset(odb_t * odb, + odb_key_t key, + unsigned long int offset); + /** Add a new node w/o regarding if a node with the same key already exists * * returns EXIT_SUCCESS on success, EXIT_FAILURE on failure diff -paurN oprofile-base/daemon/Makefile.am oprofile-ibs-latest/daemon/Makefile.am --- oprofile-base/daemon/Makefile.am 2009-04-01 15:57:36.000000000 -0500 +++ oprofile-ibs-latest/daemon/Makefile.am 2009-04-01 21:30:00.000000000 -0500 @@ -28,7 +28,12 @@ oprofiled_SOURCES = \ opd_anon.c \ opd_spu.c \ opd_extended.h \ - opd_extended.c + opd_extended.c \ + opd_ibs.h \ + opd_ibs.c \ + opd_ibs_macro.h \ + opd_ibs_trans.h \ + opd_ibs_trans.c LIBS=@POPT_LIBS@ @LIBERTY_LIBS@ diff -paurN oprofile-base/daemon/opd_interface.h oprofile-ibs-latest/daemon/opd_interface.h --- oprofile-base/daemon/opd_interface.h 2008-01-11 11:49:24.000000000 -0600 +++ oprofile-ibs-latest/daemon/opd_interface.h 2009-04-01 21:30:00.000000000 -0500 @@ -35,11 +35,14 @@ #if defined(__powerpc__) #define SPU_PROFILING_CODE 11 #define SPU_CTX_SWITCH_CODE 12 -#define DOMAIN_SWITCH_CODE 13 -#define LAST_CODE 14 #else #define DOMAIN_SWITCH_CODE 11 -#define LAST_CODE 12 +/* Code 12 is now considered an unknown escape code */ #endif + +/* AMD's Instruction-Based Sampling (IBS) escape code */ +#define IBS_FETCH_SAMPLE 13 +#define IBS_OP_SAMPLE 14 +#define LAST_CODE 15 #endif /* OPD_INTERFACE_H */ diff -paurN oprofile-base/daemon/opd_trans.c oprofile-ibs-latest/daemon/opd_trans.c --- oprofile-base/daemon/opd_trans.c 2009-04-01 15:57:36.000000000 -0500 +++ oprofile-ibs-latest/daemon/opd_trans.c 2009-04-08 12:23:10.000000000 -0500 @@ -258,6 +258,9 @@ static void code_xen_enter(struct transi extern void code_spu_profiling(struct transient * trans); extern void code_spu_ctx_switch(struct transient * trans); +extern void code_ibs_fetch_sample(struct transient * trans); +extern void code_ibs_op_sample(struct transient * trans); + handler_t handlers[LAST_CODE + 1] = { &code_unknown, &code_ctx_switch, @@ -274,8 +277,12 @@ handler_t handlers[LAST_CODE + 1] = { #if defined(__powerpc__) &code_spu_profiling, &code_spu_ctx_switch, -#endif +#else &code_unknown, + &code_unknown, +#endif + &code_ibs_fetch_sample, + &code_ibs_op_sample, }; extern void (*special_processor)(struct transient *); diff -paurN oprofile-base/daemon/opd_extended.c oprofile-ibs-latest/daemon/opd_extended.c --- oprofile-base/daemon/opd_extended.c 2009-04-01 15:57:36.000000000 -0500 +++ oprofile-ibs-latest/daemon/opd_extended.c 2009-04-01 21:33:52.000000000 -0500 @@ -19,12 +19,15 @@ * if extended feature is enabled */ static int opd_ext_feat_index; +extern struct opd_ext_handlers ibs_handlers; + /** * OProfile Extended Feature Table * * This table contains a list of extended features. */ static struct opd_ext_feature ext_feature_table[] = { + {"ibs", &ibs_handlers }, { NULL, NULL } }; diff -paurN oprofile-base/daemon/opd_sfile.h oprofile-ibs-latest/daemon/opd_sfile.h --- oprofile-base/daemon/opd_sfile.h 2009-04-01 15:57:36.000000000 -0500 +++ oprofile-ibs-latest/daemon/opd_sfile.h 2009-04-01 21:33:52.000000000 -0500 @@ -109,6 +109,10 @@ struct sfile * sfile_find(struct transie /** Log the sample in a previously located sfile. */ void sfile_log_sample(struct transient const * trans); +/** Log the event/cycle count in a previously located sfile */ +void sfile_log_sample_count(struct transient const * trans, + unsigned long int count); + /** initialise hashes */ void sfile_init(void); diff -paurN oprofile-base/daemon/opd_sfile.c oprofile-ibs-latest/daemon/opd_sfile.c --- oprofile-base/daemon/opd_sfile.c 2009-04-01 15:57:36.000000000 -0500 +++ oprofile-ibs-latest/daemon/opd_sfile.c 2009-04-13 18:30:32.000000000 -0500 @@ -127,7 +127,7 @@ trans_match(struct transient const * tra } -static int +int sfile_equal(struct sfile const * sf, struct sfile const * sf2) { return do_match(sf, sf2->cookie, sf2->app_cookie, sf2->kernel, @@ -275,7 +275,7 @@ lru: } -static void sfile_dup(struct sfile * to, struct sfile * from) +void sfile_dup(struct sfile * to, struct sfile * from) { size_t i; @@ -428,6 +428,13 @@ static void sfile_log_arc(struct transie void sfile_log_sample(struct transient const * trans) { + sfile_log_sample_count(trans, 1); +} + + +void sfile_log_sample_count(struct transient const * trans, + unsigned long int count) +{ int err; vma_t pc = trans->pc; odb_t * file; @@ -457,7 +464,9 @@ void sfile_log_sample(struct transient c return; } - err = odb_update_node(file, (odb_key_t)pc); + err = odb_update_node_with_offset(file, + (odb_key_t)pc, + count); if (err) { fprintf(stderr, "%s: %s\n", __FUNCTION__, strerror(err)); abort(); diff -paurN oprofile-base/daemon/opd_ibs.h oprofile-ibs-latest/daemon/opd_ibs.h --- oprofile-base/daemon/opd_ibs.h 1969-12-31 18:00:00.000000000 -0600 +++ oprofile-ibs-latest/daemon/opd_ibs.h 2009-04-13 18:26:37.000000000 -0500 @@ -0,0 +1,137 @@ +/** + * @file daemon/opd_ibs.h + * AMD Family10h Instruction Based Sampling (IBS) handling. + * + * @remark Copyright 2008 OProfile authors + * @remark Read the file COPYING + * + * @author Jason Yeh <jas...@am...> + * @author Paul Drongowski <pau...@am...> + * @author Suravee Suthikulpanit <sur...@am...> + * Copyright (c) 2008 Advanced Micro Devices, Inc. + */ + +#ifndef OPD_IBS_H +#define OPD_IBS_H + +#include <stdint.h> + +#include "opd_ibs_macro.h" + +struct transient; +struct opd_event; + +/** + * IBS information is processed in two steps. The first step decodes + * hardware-level IBS information and saves it in decoded form. The + * second step translates the decoded IBS information into IBS derived + * events. IBS information is tallied and is reported as derived events. + */ + +struct ibs_sample { + struct ibs_fetch_sample * fetch; + struct ibs_op_sample * op; +}; + +/** + * This struct represents the hardware-level IBS fetch information. + * Each field corresponds to a model-specific register (MSR.) See the + * BIOS and Kernel Developer's Guide for AMD Model Family 10h Processors + * for further details. + */ +struct ibs_fetch_sample { + unsigned long int rip; + /* MSRC001_1030 IBS Fetch Control Register */ + unsigned int ibs_fetch_ctl_low; + unsigned int ibs_fetch_ctl_high; + /* MSRC001_1031 IBS Fetch Linear Address Register */ + unsigned int ibs_fetch_lin_addr_low; + unsigned int ibs_fetch_lin_addr_high; + /* MSRC001_1032 IBS Fetch Physical Address Register */ + unsigned int ibs_fetch_phys_addr_low; + unsigned int ibs_fetch_phys_addr_high; + unsigned int dummy_event; +}; + + + +/** This struct represents the hardware-level IBS op information. */ +struct ibs_op_sample { + unsigned long int rip; + /* MSRC001_1034 IBS Op Logical Address Register */ + unsigned int ibs_op_lin_addr_low; + unsigned int ibs_op_lin_addr_high; + /* MSRC001_1035 IBS Op Data Register */ + unsigned int ibs_op_data1_low; + unsigned int ibs_op_data1_high; + /* MSRC001_1036 IBS Op Data 2 Register */ + unsigned int ibs_op_data2_low; + unsigned int ibs_op_data2_high; + /* MSRC001_1037 IBS Op Data 3 Register */ + unsigned int ibs_op_data3_low; + unsigned int ibs_op_data3_high; + unsigned int ibs_op_ldst_linaddr_low; + unsigned int ibs_op_ldst_linaddr_high; + unsigned int ibs_op_phys_addr_low; + unsigned int ibs_op_phys_addr_high; +}; + + +enum IBSL1PAGESIZE { + L1TLB4K = 0, + L1TLB2M, + L1TLB1G, + L1TLB_INVALID +}; + + +/** + * Handle an IBS fetch sample escape code sequence. An IBS fetch sample + * is represented as an escape code sequence. (See the comment for the + * function code_ibs_op_sample() for the sequence of entries in the event + * buffer.) When this function is called, the ESCAPE_CODE and IBS_FETCH_CODE + * have already been removed from the event buffer. Thus, 7 more event buffer + * entries are needed in order to process a complete IBS fetch sample. + */ +extern void code_ibs_fetch_sample(struct transient * trans); + +/** + * Handle an IBS op sample escape code sequence. An IBS op sample + * is represented as an escape code sequence: + * + * IBS fetch IBS op + * --------------- ---------------- + * ESCAPE_CODE ESCAPE_CODE + * IBS_FETCH_CODE IBS_OP_CODE + * Offset Offset + * IbsFetchLinAd low IbsOpRip low <-- Logical (virtual) RIP + * IbsFetchLinAd high IbsOpRip high <-- Logical (virtual) RIP + * IbsFetchCtl low IbsOpData low + * IbsFetchCtl high IbsOpData high + * IbsFetchPhysAd low IbsOpData2 low + * IbsFetchPhysAd high IbsOpData2 high + * IbsOpData3 low + * IbsOpData3 high + * IbsDcLinAd low + * IbsDcLinAd high + * IbsDcPhysAd low + * IbsDcPhysAd high + * + * When this function is called, the ESCAPE_CODE and IBS_OP_CODE have + * already been removed from the event buffer. Thus, 13 more event buffer + * entries are needed to process a complete IBS op sample. + * + * The IbsFetchLinAd and IbsOpRip are the linear (virtual) addresses + * that were generated by the IBS hardware. These addresses are mapped + * into the offset. + */ +extern void code_ibs_op_sample(struct transient * trans); + +/** Log the specified IBS derived event. */ +extern void opd_log_ibs_event(unsigned int event, struct transient * trans); + +/** Log the specified IBS cycle count. */ +extern void opd_log_ibs_count(unsigned int event, struct transient * trans, unsigned int count); + + +#endif /*OPD_IBS_H*/ diff -paurN oprofile-base/daemon/opd_ibs.c oprofile-ibs-latest/daemon/opd_ibs.c --- oprofile-base/daemon/opd_ibs.c 1969-12-31 18:00:00.000000000 -0600 +++ oprofile-ibs-latest/daemon/opd_ibs.c 2009-04-13 18:26:05.000000000 -0500 @@ -0,0 +1,692 @@ +/** + * @file daemon/opd_ibs.c + * AMD Family10h Instruction Based Sampling (IBS) handling. + * + * @remark Copyright 2007 OProfile authors + * @remark Read the file COPYING + * + * @author Jason Yeh <jas...@am...> + * @author Paul Drongowski <pau...@am...> + * @author Suravee Suthikulpanit <sur...@am...> + * Copyright (c) 2008 Advanced Micro Devices, Inc. + */ + +#include "op_hw_config.h" +#include "op_events.h" +#include "op_string.h" +#include "op_libiberty.h" +#include "opd_printf.h" +#include "opd_trans.h" +#include "opd_events.h" +#include "opd_kernel.h" +#include "opd_anon.h" +#include "opd_sfile.h" +#include "opd_interface.h" +#include "opd_mangling.h" +#include "opd_extended.h" +#include "opd_ibs.h" +#include "opd_ibs_trans.h" +#include "opd_ibs_macro.h" + +#include <stdlib.h> +#include <stdio.h> +#include <errno.h> +#include <string.h> + +extern op_cpu cpu_type; +extern int no_event_ok; +extern int sfile_equal(struct sfile const * sf, struct sfile const * sf2); +extern void sfile_dup(struct sfile * to, struct sfile * from); + +/* IBS Select Arrays/Counters */ +static unsigned int ibs_selected_size; +static unsigned int ibs_fetch_selected_flag; +static unsigned int ibs_fetch_selected_size; +static unsigned int ibs_op_selected_flag; +static unsigned int ibs_op_selected_size; +static unsigned int ibs_op_ls_selected_flag; +static unsigned int ibs_op_ls_selected_size; +static unsigned int ibs_op_nb_selected_flag; +static unsigned int ibs_op_nb_selected_size; + +/* IBS Statistics */ +static unsigned long ibs_fetch_sample_stats; +static unsigned long ibs_fetch_incomplete_stats; +static unsigned long ibs_op_sample_stats; +static unsigned long ibs_op_incomplete_stats; +static unsigned long ibs_derived_event_stats; + +/* + * IBS Virtual Counter + */ +struct opd_event ibs_vc[OP_MAX_IBS_COUNTERS]; + +/* IBS Virtual Counter Index(VCI) Map*/ +unsigned int ibs_vci_map[OP_MAX_IBS_COUNTERS]; + +/** + * This function converts IBS fetch event flags and values into + * derived events. If the tagged (sampled) fetched caused a derived + * event, the derived event is tallied. + */ +static void opd_log_ibs_fetch(struct transient * trans) +{ + struct ibs_fetch_sample * trans_fetch = ((struct ibs_sample*)(trans->ext))->fetch; + if (!trans_fetch) + return; + + trans_ibs_fetch(trans, ibs_fetch_selected_flag, ibs_fetch_selected_size); +} + + +/** + * This function translates the IBS op event flags and values into + * IBS op derived events. If an op derived event occured, it's tallied. + */ +static void opd_log_ibs_op(struct transient * trans) +{ + struct ibs_op_sample * trans_op = ((struct ibs_sample*)(trans->ext))->op; + if (!trans_op) + return; + + trans_ibs_op(trans, ibs_op_selected_flag, ibs_op_selected_size); + trans_ibs_op_ls(trans, ibs_op_ls_selected_flag, ibs_op_ls_selected_size); + trans_ibs_op_nb(trans, ibs_op_nb_selected_flag, ibs_op_nb_selected_size); +} + + +static void opd_put_ibs_sample(struct transient * trans) +{ + unsigned long long event = 0; + struct kernel_image * k_image = NULL; + struct ibs_fetch_sample * trans_fetch = ((struct ibs_sample*)(trans->ext))->fetch; + + if (!enough_remaining(trans, 1)) { + trans->remaining = 0; + return; + } + + /* IBS can generate samples with invalid dcookie and + * in kernel address range. Map such samples to vmlinux + * only if the user either specifies a range, or vmlinux. + */ + if (trans->cookie == INVALID_COOKIE + && (k_image = find_kernel_image(trans)) != NULL + && (k_image->start != 0 && k_image->end != 0) + && trans->in_kernel == 0) + trans->in_kernel = 1; + + if (trans->tracing != TRACING_ON) + trans->event = event; + + /* sfile can change at each sample for kernel */ + if (trans->in_kernel != 0) + clear_trans_current(trans); + + if (!trans->in_kernel && trans->cookie == NO_COOKIE) + trans->anon = find_anon_mapping(trans); + + /* get the current sfile if needed */ + if (!trans->current) + trans->current = sfile_find(trans); + + /* + * can happen if kernel sample falls through the cracks, or if + * it's a sample from an anon region we couldn't find + */ + if (!trans->current) + goto out; + + if (trans_fetch) + opd_log_ibs_fetch(trans); + else + opd_log_ibs_op(trans); +out: + /* switch to trace mode */ + if (trans->tracing == TRACING_START) + trans->tracing = TRACING_ON; + + update_trans_last(trans); +} + + +void code_ibs_fetch_sample(struct transient * trans) +{ + struct ibs_fetch_sample * trans_fetch = NULL; + + if (!enough_remaining(trans, 7)) { + verbprintf(vext, "not enough remaining\n"); + trans->remaining = 0; + ibs_fetch_incomplete_stats++; + return; + } + + ibs_fetch_sample_stats++; + + trans->ext = xmalloc(sizeof(struct ibs_sample)); + ((struct ibs_sample*)(trans->ext))->fetch = xmalloc(sizeof(struct ibs_fetch_sample)); + trans_fetch = ((struct ibs_sample*)(trans->ext))->fetch; + + trans_fetch->rip = pop_buffer_value(trans); + + trans_fetch->ibs_fetch_lin_addr_low = pop_buffer_value(trans); + trans_fetch->ibs_fetch_lin_addr_high = pop_buffer_value(trans); + + trans_fetch->ibs_fetch_ctl_low = pop_buffer_value(trans); + trans_fetch->ibs_fetch_ctl_high = pop_buffer_value(trans); + trans_fetch->ibs_fetch_phys_addr_low = pop_buffer_value(trans); + trans_fetch->ibs_fetch_phys_addr_high = pop_buffer_value(trans); + + verbprintf(vsamples, + "FETCH_X CPU:%ld PID:%ld RIP:%lx CTL_H:%x LAT:%d P_HI:%x P_LO:%x L_HI:%x L_LO:%x\n", + trans->cpu, + (long)trans->tgid, + trans_fetch->rip, + (trans_fetch->ibs_fetch_ctl_high >> 16) & 0x3ff, + (trans_fetch->ibs_fetch_ctl_high) & 0xffff, + trans_fetch->ibs_fetch_phys_addr_high, + trans_fetch->ibs_fetch_phys_addr_low, + trans_fetch->ibs_fetch_lin_addr_high, + trans_fetch->ibs_fetch_lin_addr_low) ; + + /* Overwrite the trans->pc with the more accurate trans_fetch->rip */ + trans->pc = trans_fetch->rip; + + opd_put_ibs_sample(trans); + + free(trans_fetch); + free(trans->ext); + trans->ext = NULL; +} + + +void code_ibs_op_sample(struct transient * trans) +{ + struct ibs_op_sample * trans_op= NULL; + + if (!enough_remaining(trans, 13)) { + verbprintf(vext, "not enough remaining\n"); + trans->remaining = 0; + ibs_op_incomplete_stats++; + return; + } + + ibs_op_sample_stats++; + + trans->ext = xmalloc(sizeof(struct ibs_sample)); + ((struct ibs_sample*)(trans->ext))->op = xmalloc(sizeof(struct ibs_op_sample)); + trans_op = ((struct ibs_sample*)(trans->ext))->op; + + trans_op->rip = pop_buffer_value(trans); + + trans_op->ibs_op_lin_addr_low = pop_buffer_value(trans); + trans_op->ibs_op_lin_addr_high = pop_buffer_value(trans); + + trans_op->ibs_op_data1_low = pop_buffer_value(trans); + trans_op->ibs_op_data1_high = pop_buffer_value(trans); + trans_op->ibs_op_data2_low = pop_buffer_value(trans); + trans_op->ibs_op_data2_high = pop_buffer_value(trans); + trans_op->ibs_op_data3_low = pop_buffer_value(trans); + trans_op->ibs_op_data3_high = pop_buffer_value(trans); + trans_op->ibs_op_ldst_linaddr_low = pop_buffer_value(trans); + trans_op->ibs_op_ldst_linaddr_high = pop_buffer_value(trans); + trans_op->ibs_op_phys_addr_low = pop_buffer_value(trans); + trans_op->ibs_op_phys_addr_high = pop_buffer_value(trans); + + verbprintf(vsamples, + "IBS_OP_X CPU:%ld PID:%d RIP:%lx D1HI:%x D1LO:%x D2LO:%x D3HI:%x D3LO:%x L_LO:%x P_LO:%x\n", + trans->cpu, + trans->tgid, + trans_op->rip, + trans_op->ibs_op_data1_high, + trans_op->ibs_op_data1_low, + trans_op->ibs_op_data2_low, + trans_op->ibs_op_data3_high, + trans_op->ibs_op_data3_low, + trans_op->ibs_op_ldst_linaddr_low, + trans_op->ibs_op_phys_addr_low); + + /* Overwrite the trans->pc with the more accurate trans_op->rip */ + trans->pc = trans_op->rip; + + opd_put_ibs_sample(trans); + + free(trans_op); + free(trans->ext); + trans->ext = NULL; +} + + +/** Convert IBS event to value used for data structure indexing */ +static unsigned long ibs_event_to_counter(unsigned long x) +{ + unsigned long ret = ~0UL; + + if (IS_IBS_FETCH(x)) + ret = (x - IBS_FETCH_BASE); + else if (IS_IBS_OP(x)) + ret = (x - IBS_OP_BASE + IBS_FETCH_MAX); + else if (IS_IBS_OP_LS(x)) + ret = (x - IBS_OP_LS_BASE + IBS_OP_MAX + IBS_FETCH_MAX); + else if (IS_IBS_OP_NB(x)) + ret = (x - IBS_OP_NB_BASE + IBS_OP_LS_MAX + IBS_OP_MAX + IBS_FETCH_MAX); + + return (ret != ~0UL) ? ret + OP_MAX_COUNTERS : ret; +} + + +void opd_log_ibs_event(unsigned int event, + struct transient * trans) +{ + ibs_derived_event_stats++; + trans->event = event; + sfile_log_sample_count(trans, 1); +} + + +void opd_log_ibs_count(unsigned int event, + struct transient * trans, + unsigned int count) +{ + ibs_derived_event_stats++; + trans->event = event; + sfile_log_sample_count(trans, count); +} + + +static unsigned long get_ibs_vci_key(unsigned int event) +{ + unsigned long key = ibs_event_to_counter(event); + if (key == ~0UL || key < OP_MAX_COUNTERS) + return ~0UL; + + key = key - OP_MAX_COUNTERS; + + return key; +} + + +static int ibs_parse_and_set_events(char * str) +{ + char * tmp, * ptr, * tok1, * tok2 = NULL; + int is_done = 0; + struct op_event * event = NULL; + op_cpu cpu_type = CPU_NO_GOOD; + unsigned long key; + + if (!str) + return -1; + + cpu_type = op_get_cpu_type(); + op_events(cpu_type); + + tmp = op_xstrndup(str, strlen(str)); + ptr = tmp; + + while (is_done != 1 + && (tok1 = strtok_r(ptr, ",", &tok2)) != NULL) { + + if ((ptr = strstr(tok1, ":")) != NULL) { + *ptr = '\0'; + is_done = 1; + } + + // Resove event number + event = find_event_by_name(tok1); + if (!event) + return -1; + + // Grouping + if (IS_IBS_FETCH(event->val)) { + ibs_fetch_selected_flag |= 1 << IBS_FETCH_OFFSET(event->val); + ibs_fetch_selected_size++; + } else if (IS_IBS_OP(event->val)) { + ibs_op_selected_flag |= 1 << IBS_OP_OFFSET(event->val); + ibs_op_selected_size++; + } else if (IS_IBS_OP_LS(event->val)) { + ibs_op_ls_selected_flag |= 1 << IBS_OP_LS_OFFSET(event->val); + ibs_op_ls_selected_size++; + } else if (IS_IBS_OP_NB(event->val)) { + ibs_op_nb_selected_flag |= 1 << IBS_OP_NB_OFFSET(event->val); + ibs_op_nb_selected_size++; + } else { + return -1; + } + + key = get_ibs_vci_key(event->val); + if (key == ~0UL) + return -1; + + ibs_vci_map[key] = ibs_selected_size; + + /* Initialize part of ibs_vc */ + ibs_vc[ibs_selected_size].name = tok1; + ibs_vc[ibs_selected_size].value = event->val; + ibs_vc[ibs_selected_size].counter = ibs_selected_size + OP_MAX_COUNTERS; + ibs_vc[ibs_selected_size].kernel = 1; + ibs_vc[ibs_selected_size].user = 1; + + ibs_selected_size++; + + ptr = NULL; + } + + return 0; +} + + +static int ibs_parse_counts(char * str, unsigned long int * count) +{ + char * tmp, * tok1, * tok2 = NULL, *end = NULL; + if (!str) + return -1; + + tmp = op_xstrndup(str, strlen(str)); + tok1 = strtok_r(tmp, ":", &tok2); + *count = strtoul(tok1, &end, 10); + if ((end && *end) || *count == 0 + || errno == EINVAL || errno == ERANGE) { + fprintf(stderr,"Invalid count (%s)\n", str); + return -1; + } + + return 0; +} + + +static int ibs_parse_and_set_um_fetch(char const * str) +{ + if (!str) + return -1; + return 0; +} + + + +static int ibs_parse_and_set_um_op(char const * str, unsigned long int * ibs_op_um) +{ + char * end = NULL; + if (!str) + return -1; + + *ibs_op_um = strtoul(str, &end, 16); + if ((end && *end) || errno == EINVAL || errno == ERANGE) { + fprintf(stderr,"Invalid unitmaks (%s)\n", str); + return -1; + } + return 0; +} + + +static int ibs_init(char const * argv) +{ + char * tmp, * ptr, * tok1, * tok2 = NULL; + unsigned int i = 0; + unsigned long int ibs_fetch_count = 0; + unsigned long int ibs_op_count = 0; + unsigned long int ibs_op_um = 0; + + if (!argv) + return -1; + + if (empty_line(argv) != 0) + return -1; + + tmp = op_xstrndup(argv, strlen(argv)); + ptr = (char *) skip_ws(tmp); + + // "fetch:event1,event2,....:count:um|op:event1,event2,.....:count:um" + tok1 = strtok_r(ptr, "|", &tok2); + + while (tok1 != NULL) { + + if (!strncmp("fetch:", tok1, strlen("fetch:"))) { + // Get to event section + tok1 = tok1 + strlen("fetch:"); + if (ibs_parse_and_set_events(tok1) == -1) + return -1; + + // Get to count section + while (tok1) { + if (*tok1 == '\0') + return -1; + if (*tok1 != ':') { + tok1++; + } else { + tok1++; + break; + } + } + + if (ibs_parse_counts(tok1, &ibs_fetch_count) == -1) + return -1; + + // Get to um section + while (tok1) { + if (*tok1 == '\0') + return -1; + if (*tok1 != ':') { + tok1++; + } else { + tok1++; + break; + } + } + + if (ibs_parse_and_set_um_fetch(tok1) == -1) + return -1; + + } else if (!strncmp("op:", tok1, strlen("op:"))) { + // Get to event section + tok1 = tok1 + strlen("op:"); + if (ibs_parse_and_set_events(tok1) == -1) + return -1; + + // Get to count section + while (tok1) { + if (*tok1 == '\0') + return -1; + if (*tok1 != ':') { + tok1++; + } else { + tok1++; + break; + } + } + + if (ibs_parse_counts(tok1, &ibs_op_count) == -1) + return -1; + + // Get to um section + while (tok1) { + if (*tok1 == '\0') + return -1; + if (*tok1 != ':') { + tok1++; + } else { + tok1++; + break; + } + } + + if (ibs_parse_and_set_um_op(tok1, &ibs_op_um)) + return -1; + + } else + return -1; + + tok1 = strtok_r(NULL, "|", &tok2); + } + + /* Initialize ibs_vc */ + for (i = 0 ; i < ibs_selected_size ; i++) + { + if (IS_IBS_FETCH(ibs_vc[i].value)) { + ibs_vc[i].count = ibs_fetch_count; + ibs_vc[i].um = 0; + } else { + ibs_vc[i].count = ibs_op_count; + ibs_vc[i].um = ibs_op_um; + } + } + + // Allow no event + no_event_ok = 1; + return 0; +} + + +static int ibs_print_stats() +{ + printf("Nr. IBS Fetch samples : %lu (%lu entries)\n", ibs_fetch_sample_stats, (ibs_fetch_sample_stats * 7)); + printf("Nr. IBS Fetch incompletes : %lu\n", ibs_fetch_incomplete_stats); + printf("Nr. IBS Op samples : %lu (%lu entries)\n", ibs_op_sample_stats, (ibs_op_sample_stats * 13)); + printf("Nr. IBS Op incompletes : %lu\n", ibs_op_incomplete_stats); + printf("Nr. IBS derived events : %lu\n", ibs_derived_event_stats); + return 0; +} + + +static int ibs_sfile_create(struct sfile * sf) +{ + unsigned int i; + sf->ext_files = xmalloc(ibs_selected_size * sizeof(odb_t)); + for (i = 0 ; i < ibs_selected_size ; ++i) + odb_init(&sf->ext_files[i]); + + return 0; +} + + +static int ibs_sfile_dup (struct sfile * to, struct sfile * from) +{ + unsigned int i; + if (from->ext_files != NULL) { + to->ext_files = xmalloc(ibs_selected_size * sizeof(odb_t)); + for (i = 0 ; i < ibs_selected_size ; ++i) + odb_init(&to->ext_files[i]); + } else { + to->ext_files = NULL; + } + return 0; +} + +static int ibs_sfile_close(struct sfile * sf) +{ + unsigned int i; + if (sf->ext_files != NULL) { + for (i = 0; i < ibs_selected_size ; ++i) + odb_close(&sf->ext_files[i]); + + free(sf->ext_files); + sf->ext_files= NULL; + } + return 0; +} + +static int ibs_sfile_sync(struct sfile * sf) +{ + unsigned int i; + if (sf->ext_files != NULL) { + for (i = 0; i < ibs_selected_size ; ++i) + odb_sync(&sf->ext_files[i]); + } + return 0; +} + +static odb_t * ibs_sfile_get(struct transient const * trans, int is_cg) +{ + struct sfile * sf = trans->current; + struct sfile * last = trans->last; + struct cg_entry * cg; + struct list_head * pos; + unsigned long hash; + odb_t * file; + unsigned long counter, ibs_vci, key; + + /* Note: "trans->event" for IBS is not the same as traditional + * events. Here, it has the actual event (0xfxxx), while the + * traditional event has the event index. + */ + key = get_ibs_vci_key(trans->event); + if (key == ~0UL) { + fprintf(stderr, "%s: Invalid IBS event %lu\n", __func__, trans->event); + abort(); + } + ibs_vci = ibs_vci_map[key]; + counter = ibs_vci + OP_MAX_COUNTERS; + + /* Creating IBS sfile if it not already exists */ + if (sf->ext_files == NULL) + ibs_sfile_create(sf); + + file = &(sf->ext_files[ibs_vci]); + if (!is_cg) + goto open; + + hash = last->hashval & (CG_HASH_SIZE - 1); + + /* Need to look for the right 'to'. Since we're looking for + * 'last', we use its hash. + */ + list_for_each(pos, &sf->cg_hash[hash]) { + cg = list_entry(pos, struct cg_entry, hash); + if (sfile_equal(last, &cg->to)) { + file = &(cg->to.ext_files[ibs_vci]); + goto open; + } + } + + cg = xmalloc(sizeof(struct cg_entry)); + sfile_dup(&cg->to, last); + list_add(&cg->hash, &sf->cg_hash[hash]); + file = &(cg->to.ext_files[ibs_vci]); + +open: + if (!odb_open_count(file)) + opd_open_sample_file(file, last, sf, counter, is_cg); + + /* Error is logged by opd_open_sample_file */ + if (!odb_open_count(file)) + return NULL; + + return file; +} + + +/** Filled opd_event structure with IBS derived event information + * from the given counter value. + */ +static struct opd_event * ibs_sfile_find_counter_event(unsigned long counter) +{ + unsigned long ibs_vci; + + if (counter >= OP_MAX_COUNTERS + OP_MAX_IBS_COUNTERS + || counter < OP_MAX_COUNTERS) { + fprintf(stderr,"Error: find_ibs_counter_event : " + "invalid counter value %lu.\n", counter); + abort(); + } + + ibs_vci = counter - OP_MAX_COUNTERS; + return &ibs_vc[ibs_vci]; +} + + +struct opd_ext_sfile_handlers ibs_sfile_handlers = +{ + .create = &ibs_sfile_create, + .dup = &ibs_sfile_dup, + .close = &ibs_sfile_close, + .sync = &ibs_sfile_sync, + .get = &ibs_sfile_get, + .find_counter_event = &ibs_sfile_find_counter_event +}; + + +struct opd_ext_handlers ibs_handlers = +{ + .ext_init = &ibs_init, + .ext_print_stats = &ibs_print_stats, + .ext_sfile = &ibs_sfile_handlers +}; diff -paurN oprofile-base/daemon/opd_ibs_macro.h oprofile-ibs-latest/daemon/opd_ibs_macro.h --- oprofile-base/daemon/opd_ibs_macro.h 1969-12-31 18:00:00.000000000 -0600 +++ oprofile-ibs-latest/daemon/opd_ibs_macro.h 2009-04-01 21:30:00.000000000 -0500 @@ -0,0 +1,366 @@ +/** + * @file daemon/opd_ibs_macro.h + * AMD Family10h Instruction Based Sampling (IBS) related macro. + * + * @remark Copyright 2008 OProfile authors + * @remark Read the file COPYING + * + * @author Jason Yeh <jas...@am...> + * @author Paul Drongowski <pau...@am...> + * @author Suravee Suthikulpanit <sur...@am...> + * Copyright (c) 2008 Advanced Micro Devices, Inc. + */ + +#ifndef OPD_IBS_MACRO_H +#define OPD_IBS_MACRO_H + +/** + * The following defines are bit masks that are used to select + * IBS fetch event flags and values at the MSR level. + */ +#define FETCH_MASK_LATENCY 0x0000ffff +#define FETCH_MASK_COMPLETE 0x00040000 +#define FETCH_MASK_IC_MISS 0x00080000 +#define FETCH_MASK_PHY_ADDR 0x00100000 +#define FETCH_MASK_PG_SIZE 0x00600000 +#define FETCH_MASK_L1_MISS 0x00800000 +#define FETCH_MASK_L2_MISS 0x01000000 +#define FETCH_MASK_KILLED \ + (FETCH_MASK_L1_MISS|FETCH_MASK_L2_MISS|FETCH_MASK_PHY_ADDR|\ + FETCH_MASK_COMPLETE|FETCH_MASK_IC_MISS) + + +/** + * The following defines are bit masks that are used to select + * IBS op event flags and values at the MSR level. + */ +#define BR_MASK_RETIRE 0x0000ffff +#define BR_MASK_BRN_RET 0x00000020 +#define BR_MASK_BRN_MISP 0x00000010 +#define BR_MASK_BRN_TAKEN 0x00000008 +#define BR_MASK_RETURN 0x00000004 +#define BR_MASK_MISP_RETURN 0x00000002 +#define BR_MASK_BRN_RESYNC 0x00000001 + +#define NB_MASK_L3_STATE 0x00000020 +#define NB_MASK_REQ_DST_PROC 0x00000010 +#define NB_MASK_REQ_DATA_SRC 0x00000007 + +#define DC_MASK_L2_HIT_1G 0x00080000 +#define DC_MASK_PHY_ADDR_VALID 0x00040000 +#define DC_MASK_LIN_ADDR_VALID 0x00020000 +#define DC_MASK_MAB_HIT 0x00010000 +#define DC_MASK_LOCKED_OP 0x00008000 +#define DC_MASK_WC_MEM_ACCESS 0x00004000 +#define DC_MASK_UC_MEM_ACCESS 0x00002000 +#define DC_MASK_ST_TO_LD_CANCEL 0x00001000 +#define DC_MASK_ST_TO_LD_FOR 0x00000800 +#define DC_MASK_ST_BANK_CONFLICT 0x00000400 +#define DC_MASK_LD_BANK_CONFLICT 0x00000200 +#define DC_MASK_MISALIGN_ACCESS 0x00000100 +#define DC_MASK_DC_MISS 0x00000080 +#define DC_MASK_L2_HIT_2M 0x00000040 +#define DC_MASK_L1_HIT_1G 0x00000020 +#define DC_MASK_L1_HIT_2M 0x00000010 +#define DC_MASK_L2_TLB_MISS 0x00000008 +#define DC_MASK_L1_TLB_MISS 0x00000004 +#define DC_MASK_STORE_OP 0x00000002 +#define DC_MASK_LOAD_OP 0x00000001 + + +/** + * IBS derived events: + * + * IBS derived events are identified by event select values which are + * similar to the event select values that identify performance monitoring + * counter (PMC) events. Event select values for IBS derived events begin + * at 0xf000. + * + * The definitions in this file *must* match definitions + * of IBS derived events in gh-events.xml and in the + * oprofile AMD Family 10h events file. More information + * about IBS derived events is given in the Software Oprimization + * Guide for AMD Family 10h Processors. + */ + +/** + * The following defines associate a 16-bit select value with an IBS + * derived fetch event. + */ +#define DE_IBS_FETCH_ALL 0xf000 +#define DE_IBS_FETCH_KILLED 0xf001 +#define DE_IBS_FETCH_ATTEMPTED 0xf002 +#define DE_IBS_FETCH_COMPLETED 0xf003 +#define DE_IBS_FETCH_ABORTED 0xf004 +#define DE_IBS_L1_ITLB_HIT 0xf005 +#define DE_IBS_ITLB_L1M_L2H 0xf006 +#define DE_IBS_ITLB_L1M_L2M 0xf007 +#define DE_IBS_IC_MISS 0xf008 +#define DE_IBS_IC_HIT 0xf009 +#define DE_IBS_FETCH_4K_PAGE 0xf00a +#define DE_IBS_FETCH_2M_PAGE 0xf00b +#define DE_IBS_FETCH_1G_PAGE 0xf00c +#define DE_IBS_FETCH_XX_PAGE 0xf00d +#define DE_IBS_FETCH_LATENCY 0xf00e + +#define IBS_FETCH_BASE 0xf000 +#define IBS_FETCH_END 0xf00e +#define IBS_FETCH_MAX (IBS_FETCH_END - IBS_FETCH_BASE + 1) +#define IS_IBS_FETCH(x) (IBS_FETCH_BASE <= x && x <= IBS_FETCH_END) +#define IBS_FETCH_OFFSET(x) (x - IBS_FETCH_BASE) + +/** + * The following defines associate a 16-bit select value with an IBS + * derived branch/return macro-op event. + */ +#define DE_IBS_OP_ALL 0xf100 +#define DE_IBS_OP_TAG_TO_RETIRE 0xf101 +#define DE_IBS_OP_COMP_TO_RETIRE 0xf102 +#define DE_IBS_BRANCH_RETIRED 0xf103 +#define DE_IBS_BRANCH_MISP 0xf104 +#define DE_IBS_BRANCH_TAKEN 0xf105 +#define DE_IBS_BRANCH_MISP_TAKEN 0xf106 +#define DE_IBS_RETURN 0xf107 +#define DE_IBS_RETURN_MISP 0xf108 +#define DE_IBS_RESYNC 0xf109 + +#define IBS_OP_BASE 0xf100 +#define IBS_OP_END 0xf109 +#define IBS_OP_MAX (IBS_OP_END - IBS_OP_BASE + 1) +#define IS_IBS_OP(x) (IBS_OP_BASE <= x && x <= IBS_OP_END) +#define IBS_OP_OFFSET(x) (x - IBS_OP_BASE) + +/** + * The following defines associate a 16-bit select value with an IBS + * derived load/store event. + */ +#define DE_IBS_LS_ALL_OP 0xf200 +#define DE_IBS_LS_LOAD_OP 0xf201 +#define DE_IBS_LS_STORE_OP 0xf202 +#define DE_IBS_LS_DTLB_L1H 0xf203 +#define DE_IBS_LS_DTLB_L1M_L2H 0xf204 +#define DE_IBS_LS_DTLB_L1M_L2M 0xf205 +#define DE_IBS_LS_DC_MISS 0xf206 +#define DE_IBS_LS_DC_HIT 0xf207 +#define DE_IBS_LS_MISALIGNED 0xf208 +#define DE_IBS_LS_BNK_CONF_LOAD 0xf209 +#define DE_IBS_LS_BNK_CONF_STORE 0xf20a +#define DE_IBS_LS_STL_FORWARDED 0xf20b +#define DE_IBS_LS_STL_CANCELLED 0xf20c +#define DE_IBS_LS_UC_MEM_ACCESS 0xf20d +#define DE_IBS_LS_WC_MEM_ACCESS 0xf20e +#define DE_IBS_LS_LOCKED_OP 0xf20f +#define DE_IBS_LS_MAB_HIT 0xf210 +#define DE_IBS_LS_L1_DTLB_4K 0xf211 +#define DE_IBS_LS_L1_DTLB_2M 0xf212 +#define DE_IBS_LS_L1_DTLB_1G 0xf213 +#define DE_IBS_LS_L1_DTLB_RES 0xf214 +#define DE_IBS_LS_L2_DTLB_4K 0xf215 +#define DE_IBS_LS_L2_DTLB_2M 0xf216 +#define DE_IBS_LS_L2_DTLB_1G 0xf217 +#define DE_IBS_LS_L2_DTLB_RES2 0xf218 +#define DE_IBS_LS_DC_LOAD_LAT 0xf219 + +#define IBS_OP_LS_BASE 0xf200 +#define IBS_OP_LS_END 0xf219 +#define IBS_OP_LS_MAX (IBS_OP_LS_END - IBS_OP_LS_BASE + 1) +#define IS_IBS_OP_LS(x) (IBS_OP_LS_BASE <= x && x <= IBS_OP_LS_END) +#define IBS_OP_LS_OFFSET(x) (x - IBS_OP_LS_BASE) + + +/** + * The following defines associate a 16-bit select value with an IBS + * derived Northbridge (NB) event. + */ +#define DE_IBS_NB_LOCAL 0xf240 +#define DE_IBS_NB_REMOTE 0xf241 +#define DE_IBS_NB_LOCAL_L3 0xf242 +#define DE_IBS_NB_LOCAL_CACHE 0xf243 +#define DE_IBS_NB_REMOTE_CACHE 0xf244 +#define DE_IBS_NB_LOCAL_DRAM 0xf245 +#define DE_IBS_NB_REMOTE_DRAM 0xf246 +#define DE_IBS_NB_LOCAL_OTHER 0xf247 +#define DE_IBS_NB_REMOTE_OTHER 0xf248 +#define DE_IBS_NB_CACHE_STATE_M 0xf249 +#define DE_IBS_NB_CACHE_STATE_O 0xf24a +#define DE_IBS_NB_LOCAL_LATENCY 0xf24b +#define DE_IBS_NB_REMOTE_LATENCY 0xf24c + +#define IBS_OP_NB_BASE 0xf240 +#define IBS_OP_NB_END 0xf24c +#define IBS_OP_NB_MAX (IBS_OP_NB_END - IBS_OP_NB_BASE + 1) +#define IS_IBS_OP_NB(x) (IBS_OP_NB_BASE <= x && x <= IBS_OP_NB_END) +#define IBS_OP_NB_OFFSET(x) (x - IBS_OP_NB_BASE) + + +#define OP_MAX_IBS_COUNTERS (IBS_FETCH_MAX + IBS_OP_MAX + IBS_OP_LS_MAX + IBS_OP_NB_MAX) + + +/** + * These macro decodes IBS hardware-level event flags and fields. + * Translation results are either zero (false) or non-zero (true), except + * the fetch latency, which is a 16-bit cycle count, and the fetch page size + * field, which is a 2-bit unsigned integer. + */ + +/** Bits 47:32 IbsFetchLat: instruction fetch latency */ +#define IBS_FETCH_FETCH_LATENCY(x) ((unsigned short)(x->ibs_fetch_ctl_high & FETCH_MASK_LATENCY)) + +/** Bit 50 IbsFetchComp: instruction fetch complete. */ +#define IBS_FETCH_FETCH_COMPLETION(x) ((x->ibs_fetch_ctl_high & FETCH_MASK_COMPLETE) != 0) + +/** Bit 51 IbsIcMiss: instruction cache miss. */ +#define IBS_FETCH_INST_CACHE_MISS(x) ((x->ibs_fetch_ctl_high & FETCH_MASK_IC_MISS) != 0) + +/** Bit 52 IbsPhyAddrValid: instruction fetch physical address valid. */ +#define IBS_FETCH_PHYS_ADDR_VALID(x) ((x->ibs_fetch_ctl_high & FETCH_MASK_PHY_ADDR) != 0) + +/** Bits 54:53 IbsL1TlbPgSz: instruction cache L1TLB page size. */ +#define IBS_FETCH_TLB_PAGE_SIZE(x) ((unsigned short)((x->ibs_fetch_ctl_high >> 21) & 0x3)) + +/** Bit 55 IbsL1TlbMiss: instruction cache L1TLB miss. */ +#define IBS_FETCH_M_L1_TLB_MISS(x) ((x->ibs_fetch_ctl_high & FETCH_MASK_L1_MISS) != 0) + +/** Bit 56 IbsL2TlbMiss: instruction cache L2TLB miss. */ +#define IBS_FETCH_L2_TLB_MISS(x) ((x->ibs_fetch_ctl_high & FETCH_MASK_L2_MISS) != 0) + +/** A fetch is a killed fetch if all the masked bits are clear */ +#define IBS_FETCH_KILLED(x) ((x->ibs_fetch_ctl_high & FETCH_MASK_KILLED) == 0) + +#define IBS_FETCH_INST_CACHE_HIT(x) (IBS_FETCH_FETCH_COMPLETION(x) && !IBS_FETCH_INST_CACHE_MISS(x)) + +#define IBS_FETCH_L1_TLB_HIT(x) (!IBS_FETCH_M_L1_TLB_MISS(x) && IBS_FETCH_PHYS_ADDR_VALID(x)) + +#define IBS_FETCH_ITLB_L1M_L2H(x) (IBS_FETCH_M_L1_TLB_MISS(x) && !IBS_FETCH_L2_TLB_MISS(x)) + +#define IBS_FETCH_ITLB_L1M_L2M(x) (IBS_FETCH_M_L1_TLB_MISS(x) && IBS_FETCH_L2_TLB_MISS(x)) + + +/** + * These macros translates IBS op event data from its hardware-level + * representation .It hides the MSR layout of IBS op data. + */ + +/** + * MSRC001_1035 IBS OP Data Register (IbsOpData) + * + * 15:0 IbsCompToRetCtr: macro-op completion to retire count + */ +#define IBS_OP_COM_TO_RETIRE_CYCLES(x) ((unsigned short)(x->ibs_op_data1_low & BR_MASK_RETIRE)) + +/** 31:16 tag_to_retire_cycles : macro-op tag to retire count. */ +#define IBS_OP_TAG_TO_RETIRE_CYCLES(x) ((unsigned short)((x->ibs_op_data1_low >> 16) & BR_MASK_RETIRE)) + +/** 32 op_branch_resync : resync macro-op. */ +#define IBS_OP_OP_BRANCH_RESYNC(x) ((x->ibs_op_data1_high & BR_MASK_BRN_RESYNC) != 0) + +/** 33 op_mispredict_return : mispredicted return macro-op. */ +#define IBS_OP_OP_MISPREDICT_RETURN(x) ((x->ibs_op_data1_high & BR_MASK_MISP_RETURN) != 0) + +/** 34 IbsOpReturn: return macro-op. */ +#define IBS_OP_OP_RETURN(x) ((x->ibs_op_data1_high & BR_MASK_RETURN) != 0) + +/** 35 IbsOpBrnTaken: taken branch macro-op. */ +#define IBS_OP_OP_BRANCH_TAKEN(x) ((x->ibs_op_data1_high & BR_MASK_BRN_TAKEN) != 0) + +/** 36 IbsOpBrnMisp: mispredicted branch macro-op. */ +#define IBS_OP_OP_BRANCH_MISPREDICT(x) ((x->ibs_op_data1_high & BR_MASK_BRN_MISP) != 0) + +/** 37 IbsOpBrnRet: branch macro-op retired. */ +#define IBS_OP_OP_BRANCH_RETIRED(x) ((x->ibs_op_data1_high & BR_MASK_BRN_RET) != 0) + +/** + * MSRC001_1036 IBS Op Data 2 Register (IbsOpData2) + * + * 5 NbIbsReqCacheHitSt: IBS L3 cache state + */ +#define IBS_OP_NB_IBS_CACHE_HIT_ST(x) ((x->ibs_op_data2_low & NB_MASK_L3_STATE) != 0) + +/** 4 NbIbsReqDstProc: IBS request destination processor */ +#define IBS_OP_NB_IBS_REQ_DST_PROC(x) ((x->ibs_op_data2_low & NB_MASK_REQ_DST_PROC) != 0) + +/** 2:0 NbIbsReqSrc: Northbridge IBS request data source */ +#define IBS_OP_NB_IBS_REQ_SRC(x) ((unsigned char)(x->ibs_op_data2_low & NB_MASK_REQ_DATA_SRC)) + +/** + * MSRC001_1037 IBS Op Data3 Register + * + * Bits 48:32 IbsDcMissLat + */ +#define IBS_OP_DC_MISS_LATENCY(x) ((unsigned short)(x->ibs_op_data3_high & 0xffff)) + +/** 0 IbsLdOp: Load op */ +#define IBS_OP_IBS_LD_OP(x) ((x->ibs_op_data3_low & DC_MASK_LOAD_OP) != 0) + +/** 1 IbsStOp: Store op */ +#define IBS_OP_IBS_ST_OP(x) ((x->ibs_op_data3_low & DC_MASK_STORE_OP) != 0) + +/** 2 ibs_dc_l1_tlb_miss: Data cache L1TLB miss */ +#define IBS_OP_IBS_DC_L1_TLB_MISS(x) ((x->ibs_op_data3_low & DC_MASK_L1_TLB_MISS) != 0) + +/** 3 ibs_dc_l2_tlb_miss: Data cache L2TLB miss */ +#define IBS_OP_IBS_DC_L2_TLB_MISS(x) ((x->ibs_op_data3_low & DC_MASK_L2_TLB_MISS) != 0) + +/** 4 IbsDcL1tlbHit2M: Data cache L1TLB hit in 2M page */ +#define IBS_OP_IBS_DC_L1_TLB_HIT_2MB(x) ((x->ibs_op_data3_low & DC_MASK_L1_HIT_2M) != 0) + +/** 5 ibs_dc_l1_tlb_hit_1gb: Data cache L1TLB hit in 1G page */ +#define IBS_OP_IBS_DC_L1_TLB_HIT_1GB(x) ((x->ibs_op_data3_low & DC_MASK_L1_HIT_1G) != 0) + +/** 6 ibs_dc_l2_tlb_hit_2mb: Data cache L2TLB hit in 2M page */ +#define IBS_OP_IBS_DC_L2_TLB_HIT_2MB(x) ((x->ibs_op_data3_low & DC_MASK_L2_HIT_2M) != 0) + +/** 7 ibs_dc_miss: Data cache miss */ +#define IBS_OP_IBS_DC_MISS(x) ((x->ibs_op_data3_low & DC_MASK_DC_MISS) != 0) + +/** 8 ibs_dc_miss_acc: Misaligned access */ +#define IBS_OP_IBS_DC_MISS_ACC(x) ((x->ibs_op_data3_low & DC_MASK_MISALIGN_ACCESS) != 0) + +/** 9 ibs_dc_ld_bnk_con: Bank conflict on load operation */ +#define IBS_OP_IBS_DC_LD_BNK_CON(x) ((x->ibs_op_data3_low & DC_MASK_LD_BANK_CONFLICT) != 0) + +/** 10 ibs_dc_st_bnk_con: Bank conflict on store operation */ +#define IBS_OP_IBS_DC_ST_BNK_CON(x) ((x->ibs_op_data3_low & DC_MASK_ST_BANK_CONFLICT) != 0) + +/** 11 ibs_dc_st_to_ld_fwd : Data forwarded from store to load operation */ +#define IBS_OP_IBS_DC_ST_TO_LD_FWD(x) ((x->ibs_op_data3_low & DC_MASK_ST_TO_LD_FOR) != 0) + +/** 12 ibs_dc_st_to_ld_can: Data forwarding from store to load operation cancelled */ +#define IBS_OP_IBS_DC_ST_TO_LD_CAN(x) ((x->ibs_op_data3_low & DC_MASK_ST_TO_LD_CANCEL) != 0) + +/** 13 ibs_dc_uc_mem_acc: UC memory access */ +#define IBS_OP_IBS_DC_UC_MEM_ACC(x) ((x->ibs_op_data3_low & DC_MASK_UC_MEM_ACCESS) != 0) + +/** 14 ibs_dc_wc_mem_acc : WC memory access */ +#define IBS_OP_IBS_DC_WC_MEM_ACC(x) ((x->ibs_op_data3_low & DC_MASK_WC_MEM_ACCESS) != 0) + +/** 15 ibs_locked_op: Locked operation */ +#define IBS_OP_IBS_LOCKED_OP(x) ((x->ibs_op_data3_low & DC_MASK_LOCKED_OP) != 0) + +/** 16 ibs_dc_mab_hit : MAB hit */ +#define IBS_OP_IBS_DC_MAB_HIT(x) ((x->ibs_op_data3_low & DC_MASK_MAB_HIT) != 0) + +/** 17 IbsDcLinAddrValid: Data cache linear address valid */ +#define IBS_OP_IBS_DC_LIN_ADDR_VALID(x) ((x->ibs_op_data3_low & DC_MASK_LIN_ADDR_VALID) != 0) + +/** 18 ibs_dc_phy_addr_valid: Data cache physical address valid */ +#define IBS_OP_IBS_DC_PHY_ADDR_VALID(x) ((x->ibs_op_data3_low & DC_MASK_PHY_ADDR_VALID) != 0) + +/** 19 ibs_dc_l2_tlb_hit_1gb: Data cache L2TLB hit in 1G page */ +#define IBS_OP_IBS_DC_L2_TLB_HIT_1GB(x) ((x->ibs_op_data3_low & DC_MASK_L2_HIT_1G) != 0) + + +/** + * Aggregate the IBS derived event. Increase the + * derived event count by one. + */ +#define AGG_IBS_EVENT(EV) opd_log_ibs_event(EV, trans) + +/** + * Aggregate the IBS latency/cycle counts. Increase the + * derived event count by the specified count value. + */ +#define AGG_IBS_COUNT(EV, COUNT) opd_log_ibs_count(EV, trans, COUNT) + + +#endif /*OPD_IBS_MACRO_H*/ diff -paurN oprofile-base/daemon/opd_ibs_trans.h oprofile-ibs-latest/daemon/opd_ibs_trans.h --- oprofile-base/daemon/opd_ibs_trans.h 1969-12-31 18:00:00.000000000 -0600 +++ oprofile-ibs-latest/daemon/opd_ibs_trans.h 2009-04-13 18:27:27.000000000 -0500 @@ -0,0 +1,31 @@ +/** + * @file daemon/opd_ibs_trans.h + * AMD Family10h Instruction Based Sampling (IBS) translation. + * + * @remark Copyright 2008 OProfile authors + * @remark Read the file COPYING + * + * @author Jason Yeh <jas...@am...> + * @author Paul Drongowski <pau...@am...> + * @author Suravee Suthikulpanit <sur...@am...> + * Copyright (c) 2008 Advanced Micro Devices, Inc. + */ + +#ifndef OPD_IBS_TRANS_H +#define OPD_IBS_TRANS_H + +struct ibs_fetch_sample; +struct ibs_op_sample; +struct transient; + +struct ibs_translation_table { + unsigned int event; + void (*translator)(struct transient *); +}; + + +extern void trans_ibs_fetch (struct transient * trans, unsigned int selected_flag, unsigned int size); +extern void trans_ibs_op (struct transient * trans, unsigned int selected_flag, unsigned int size); +extern void trans_ibs_op_ls (struct transient * trans, unsigned int selected_flag, unsigned int size); +extern void trans_ibs_op_nb (struct transient * trans, unsigned int selected_flag, unsigned int size); +#endif // OPD_IBS_TRANS_H diff -paurN oprofile-base/daemon/opd_ibs_trans.c oprofile-ibs-latest/daemon/opd_ibs_trans.c --- oprofile-base/daemon/opd_ibs_trans.c 1969-12-31 18:00:00.000000000 -0600 +++ oprofile-ibs-latest/daemon/opd_ibs_trans.c 2009-04-13 18:28:13.000000000 -0500 @@ -0,0 +1,554 @@ +/** + * @file daemon/opd_ibs_trans.c + * AMD Family10h Instruction Based Sampling (IBS) translation. + * + * @remark Copyright 2008 OProfile authors + * @remark Read the file COPYING + * + * @author Jason Yeh <jas...@am...> + * @author Paul Drongowski <pau...@am...> + * @author Suravee Suthikulpanit <sur...@am...> + * Copyright (c) 2008 Advanced Micro Devices, Inc. + */ + +#include "opd_ibs.h" +#include "opd_ibs_macro.h" +#include "opd_ibs_trans.h" +#include "opd_trans.h" +#include "opd_printf.h" + +#include <stdlib.h> +#include <stdio.h> + +#define MAX_EVENTS_PER_GROUP 32 + +/* + * --------------------- OP DERIVED FUNCTION + */ +void trans_ibs_fetch (struct transient * trans, unsigned int selected_flag, unsigned int size) +{ + struct ibs_fetch_sample * trans_fetch = ((struct ibs_sample*)(trans->ext))->fetch; + unsigned int i, j, mask = 1; + + for (i = IBS_FETCH_BASE, j =0 ; i <= IBS_FETCH_END && j < size ; i++, mask = mask << 1) { + + if ((selected_flag & mask) == 0) + continue; + + j++; + + switch (i) { + + case DE_IBS_FETCH_ALL: + /* IBS all fetch samples (kills + attempts) */ + AGG_IBS_EVENT(DE_IBS_FETCH_ALL); + break; + + case DE_IBS_FETCH_KILLED: + /* IBS killed fetches ("case 0") -- All interesting event + * flags are clear */ + if (IBS_FETCH_KILLED(trans_fetch)) + AGG_IBS_EVENT(DE_IBS_FETCH_KILLED); + break; + + case DE_IBS_FETCH_ATTEMPTED: + /* Any non-killed fetch is an attempted fetch */ + AGG_IBS_EVENT(DE_IBS_FETCH_ATTEMPTED); + break; + + case DE_IBS_FETCH_COMPLETED: + if (IBS_FETCH_FETCH_COMPLETION(trans_fetch)) + /* IBS Fetch Completed */ + AGG_IBS_EVENT(DE_IBS_FETCH_COMPLETED); + break; + + case DE_IBS_FETCH_ABORTED: + if (!IBS_FETCH_FETCH_COMPLETION(trans_fetch)) + /* IBS Fetch Aborted */ + AGG_IBS_EVENT(DE_IBS_FETCH_ABORTED); + break; + + case DE_IBS_L1_ITLB_HIT: + /* IBS L1 ITLB hit */ + if (IBS_FETCH_L1_TLB_HIT(trans_fetch)) + AGG_IBS_EVENT(DE_IBS_L1_ITLB_HIT); + break; + + case DE_IBS_ITLB_L1M_L2H: + /* IBS L1 ITLB miss and L2 ITLB hit */ + if (IBS_FETCH_ITLB_L1M_L2H(trans_fetch)) + AGG_IBS_EVENT(DE_IBS_ITLB_L1M_L2H); + break; + + case DE_IBS_ITLB_L1M_L2M: + /* IBS L1 & L2 ITLB miss; complete ITLB miss */ + if (IBS_FETCH_ITLB_L1M_L2M(trans_fetch)) + AGG_IBS_EVENT(DE_IBS_ITLB_L1M_L2M); + break; + + case DE_IBS_IC_MISS: + /* IBS instruction cache miss */ + if (IBS_FETCH_INST_CACHE_MISS(trans_fetch)) + AGG_IBS_EVENT(DE_IBS_IC_MISS); + break; + + case DE_IBS_IC_HIT: + /* IBS instruction cache hit */ + if (IBS_FETCH_INST_CACHE_HIT(trans_fetch)) + AGG_IBS_EVENT(DE_IBS_IC_HIT); + break; + + case DE_IBS_FETCH_4K_PAGE: + if (IBS_FETCH_PHYS_ADDR_VALID(trans_fetch) + && IBS_FETCH_TLB_PAGE_SIZE(trans_fetch) == L1TLB4K) + AGG_IBS_EVENT(DE_IBS_FETCH_4K_PAGE); + break; + + case DE_IBS_FETCH_2M_PAGE: + if (IBS_FETCH_PHYS_ADDR_VALID(trans_fetch) + && IBS_FETCH_TLB_PAGE_SIZE(trans_fetch) == L1TLB2M) + AGG_IBS_EVENT(DE_IBS_FETCH_2M_PAGE); + break; + + case DE_IBS_FETCH_1G_PAGE: + if (IBS_FETCH_PHYS_ADDR_VALID(trans_fetch) + && IBS_FETCH_TLB_PAGE_SIZE(trans_fetch) == L1TLB1G) + AGG_IBS_EVENT(DE_IBS_FETCH_1G_PAGE); + break; + + case DE_IBS_FETCH_XX_PAGE: + break; + + case DE_IBS_FETCH_LATENCY: + if (IBS_FETCH_FETCH_LATENCY(trans_fetch)) + AGG_IBS_COUNT(DE_IBS_FETCH_LATENCY, + IBS_FETCH_FETCH_LATENCY(trans_fetch)); + break; + default: + break; + } + } +} + +/* + * --------------------- OP DERIVED FUNCTION + */ +void trans_ibs_op (struct transient * trans, unsigned int selected_flag, unsigned int size) +{ + struct ibs_op_sample * trans_op = ((struct ibs_sample*)(trans->ext))->op; + unsigned int i, j, mask = 1; + + for (i = IBS_OP_BASE, j =0 ; i <= IBS_OP_END && j < size ; i++, mask = mask << 1) { + + if ((selected_flag & mask) == 0) + continue; + + j++; + + switch (i) { + + case DE_IBS_OP_ALL: + /* All IBS op samples */ + AGG_IBS_EVENT(DE_IBS_OP_ALL); + break; + + case DE_IBS_OP_TAG_TO_RETIRE: + /* Tally retire cycle counts for all sampled macro-ops + * IBS tag to retire cycles */ + if (IBS_OP_TAG_TO_RETIRE_CYCLES(trans_op)) + AGG_IBS_COUNT(DE_IBS_OP_TAG_TO_RETIRE, + IBS_OP_TAG_TO_RETIRE_CYCLES(trans_op)); + break; + + case DE_IBS_OP_COMP_TO_RETIRE: + /* IBS completion to retire cycles */ + if (IBS_OP_COM_TO_RETIRE_CYCLES(trans_op)) + AGG_IBS_COUNT(DE_IBS_OP_COMP_TO_RETIRE, + IBS_OP_COM_TO_RETIRE_CYCLES(trans_op)); + break; + + case DE_IBS_BRANCH_RETIRED: + if (IBS_OP_OP_BRANCH_RETIRED(trans_op)) + /* IBS Branch retired op */ + AGG_IBS_EVENT(DE_IBS_BRANCH_RETIRED) ; + break; + + case DE_IBS_BRANCH_MISP: + if (IBS_OP_OP_BRANCH_RETIRED(trans_op) + /* Test branch-specific event flags */ + /* IBS mispredicted Branch op */ + && IBS_OP_OP_BRANCH_MISPREDICT(trans_op)) + AGG_IBS_EVENT(DE_IBS_BRANCH_MISP) ; + break; + + case DE_IBS_BRANCH_TAKEN: + if (IBS_OP_OP_BRANCH_RETIRED(trans_op) + /* IBS taken Branch op */ + && IBS_OP_OP_BRANCH_TAKEN(trans_op)) + AGG_IBS_EVENT(DE_IBS_BRANCH_TAKEN); + break; + + case DE_IBS_BRANCH_MISP_TAKEN: + if (IBS_OP_OP_BRANCH_RETIRED(trans_op) + /* IBS mispredicted taken branch op */ + && IBS_OP_OP_BRANCH_TAKEN(trans_op) + && IBS_OP_OP_BRANCH_MISPREDICT(trans_op)) + AGG_IBS_EVENT(DE_IBS_BRANCH_MISP_TAKEN); + break; + + case DE_IBS_RETURN: + if (IBS_OP_OP_BRANCH_RETIRED(trans_op) + /* IBS return op */ + && IBS_OP_OP_RETURN(trans_op)) + AGG_IBS_EVENT(DE_IBS_RETURN); + break; + + case DE_IBS_RETURN_MISP: + if (IBS_OP_OP_BRANCH_RETIRED(trans_op) + /* IBS mispredicted return op */ + && IBS_OP_OP_RETURN(trans_op) + && IBS_OP_OP_BRANCH_MISPREDICT(trans_op)) + AGG_IBS_EVENT(DE_IBS_RETURN_MISP); + break; + + case DE_IBS_RESYNC: + /* Test for a resync macro-op */ + if (IBS_OP_OP_BRANCH_RESYNC(trans_op)) + AGG_IBS_EVENT(DE_IBS_RESYNC); + break; + default: + break; + } + } +} + + +/* + * --------------------- OP LS DERIVED FUNCTION + */ +void trans_ibs_op_ls (struct transient * trans, unsigned int selected_flag, unsigned int size) +{ + struct ibs_op_sample * trans_op = ((struct ibs_sample*)(trans->ext))->op; + unsigned int i, j, mask = 1; + + /* Preliminary check */ + if (!IBS_OP_IBS_LD_OP(trans_op) && !IBS_OP_IBS_ST_OP(trans_op)) + return; + + + for (i = IBS_OP_LS_BASE, j =0 ; i <= IBS_OP_LS_END && j < size ; i++, mask = mask << 1) { + + if ((selected_flag & mask) == 0) + continue; + + j++; + + switch (i) { + + case DE_IBS_LS_ALL_OP: + /* Count the number of LS op samples */ + AGG_IBS_EVENT(DE_IBS_LS_ALL_OP) ; + break; + + case DE_IBS_LS_LOAD_OP: + if (IBS_OP_IBS_LD_OP(trans_op)) + /* TALLy an IBS load derived event */ + AGG_IBS_EVENT(DE_IBS_LS_LOAD_OP) ; + break; + + case DE_IBS_LS_STORE_OP: + if (IBS_OP_IBS_ST_OP(trans_op)) + /* Count and handle store operations */ + AGG_IBS_EVENT(DE_IBS_LS_STORE_OP); + break; + + case DE_IBS_LS_DTLB_L1H: + if (IBS_OP_IBS_DC_LIN_ADDR_VALID(trans_op) + && !IBS_OP_IBS_DC_L1_TLB_MISS(trans_op)) + /* L1 DTLB hit -- This is the most frequent case */ + AGG_IBS_EVENT(DE_IBS_LS_DTLB_L1H); + break; + + case DE_IBS_LS_DTLB_L1M_L2H: + /* l2_translation_size = 1 */ + if (IBS_OP_IBS_DC_LIN_ADDR_VALID(trans_op) + && IBS_OP_IBS_DC_L1_TLB_MISS(trans_op) + && !IBS_OP_IBS_DC_L2_TLB_MISS(trans_op)) + /* L1 DTLB miss, L2 DTLB hit */ + AGG_IBS_EVENT(DE_IBS_LS_DTLB_L1M_L2H); + break; + + case DE_IBS_LS_DTLB_L1M_L2M: + if (IBS_OP_IBS_DC_LIN_ADDR_VALID(trans_op) + && IBS_OP_IBS_DC_L1_TLB_MISS(trans_op) + && IBS_OP_IBS_DC_L2_TLB_MISS(trans_op)) + /* L1 DTLB miss, L2 DTLB miss */ + AGG_IBS_EVENT(DE_IBS_LS_DTLB_L1M_L2M); + break; + + case DE_IBS_LS_DC_MISS: + if (IBS_OP_IBS_DC_MISS(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_DC_MISS); + break; + + case DE_IBS_LS_DC_HIT: + if (!IBS_OP_IBS_DC_MISS(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_DC_HIT); + break; + + case DE_IBS_LS_MISALIGNED: + if (IBS_OP_IBS_DC_MISS_ACC(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_MISALIGNED); + break; + + case DE_IBS_LS_BNK_CONF_LOAD: + if (IBS_OP_IBS_DC_LD_BNK_CON(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_BNK_CONF_LOAD); + break; + + case DE_IBS_LS_BNK_CONF_STORE: + if (IBS_OP_IBS_DC_ST_BNK_CON(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_BNK_CONF_STORE); + break; + + case DE_IBS_LS_STL_FORWARDED: + if (IBS_OP_IBS_LD_OP(trans_op) + /* Data forwarding info are valid only for load ops */ + && IBS_OP_IBS_DC_ST_TO_LD_FWD(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_STL_FORWARDED) ; + break; + + case DE_IBS_LS_STL_CANCELLED: + if (IBS_OP_IBS_LD_OP(trans_op)) + if (IBS_OP_IBS_DC_ST_TO_LD_CAN(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_STL_CANCELLED) ; + break; + + case DE_IBS_LS_UC_MEM_ACCESS: + if (IBS_OP_IBS_DC_UC_MEM_ACC(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_UC_MEM_ACCESS); + break; + + case DE_IBS_LS_WC_MEM_ACCESS: + if (IBS_OP_IBS_DC_WC_MEM_ACC(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_WC_MEM_ACCESS); + break; + + case DE_IBS_LS_LOCKED_OP: + if (IBS_OP_IBS_LOCKED_OP(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_LOCKED_OP); + break; + + case DE_IBS_LS_MAB_HIT: + if (IBS_OP_IBS_DC_MAB_HIT(trans_op)) + AGG_IBS_EVENT(DE_IBS_LS_MAB_HIT); + break; + + case DE_IBS_LS_L1_DTLB_4K: + /* l1_translation */ + if (IBS_OP_IBS_DC_LIN_ADDR_VALID(trans_op) + && !IBS_OP_IBS_DC_L1_TLB_MISS(trans_op) + + && !IBS_OP_IBS_DC_L1_TLB_HIT_2MB(trans_op) + && !IBS_OP_IBS_DC_L1_TLB_HIT_1GB(trans_op)) + /* This is the most common case, unfortunately */ + AGG_IBS_EVENT(DE_IBS_LS_L1_DTLB_4K) ; + break; + + case DE_IBS_LS_L1_DTLB_2M: + /* l1_translation */ + if (IBS_OP_IBS_DC_LIN_ADDR_VALID(trans_op) + && !IBS_OP_IBS_DC_L1_TLB_MISS(trans_op) + + && IBS_OP_IBS_DC_L1_TLB_HIT_2MB(trans_op)) + /* 2M L1 DTLB page translation */ + AGG_IBS_EVENT(DE_IBS_LS_L1_DTLB_2M); + break; + + case DE_IBS_LS_L1_DTLB_1G: + /* l1_translation */ + if (IBS_OP_IBS_DC_LIN_ADDR_VALID(trans_op) + && !IBS_OP_IBS_DC_L1_TLB_MISS(trans_op) + + && !IBS_OP_IBS_DC_L1_TLB_HIT_2MB(trans_op) + && IBS_OP_IBS_DC_L1_TLB_HIT_1GB(trans_op)) + /* 1G L1 DTLB page translation */ + AGG_IBS_EVENT(DE_IBS_LS_L1_DTLB_1G); + break; + + case DE_IBS_LS_L1_DTLB_RES: + break; + + case DE_IBS_LS_L2_DTLB_4K: + /* l2_translation_size = 1 */ + if (IBS_OP_IBS_DC_LIN_ADDR_VALID(trans_op) + && IBS_OP_IBS_DC_L1_TLB_MISS(trans_op) + && !IBS_OP_IBS_DC_L2_TLB_MISS(trans_op) + + /* L2 DTLB page translation */ + && !IBS_OP_IBS_DC_L2_TLB_HIT_2MB(trans_op) + && !IBS_OP_IBS_DC_L2_TLB_HIT_1GB(trans_op)) + /* 4K L2 DTLB page translation */ + AGG_IBS_EVENT(DE_IBS_LS_L2_DTLB_4K); + break; + + case DE_IBS_LS_L2_DTLB_2M: + /* l2_translation_size = 1 */ + if (IBS_OP_IBS_DC_LIN_ADDR_VALID(trans_op) + && IBS_OP_IBS_DC_L1_TLB_MISS(trans_op) + && !IBS_OP_IBS_DC_L2_TLB_MISS(trans_op) + + /* L2 DTLB page translation */ + && IBS_OP_IBS_DC_L2_TLB_HIT_2MB(trans_op) + && !IBS_OP_IBS_DC_L2_TLB_HIT_1GB(trans_op)) + /* 2M L2 DTLB page translation */ + AGG_IBS_EVENT(DE_IBS_LS_L2_DTLB_2M); + break; + + case DE_IBS_LS_L2_DTLB_1G: + /* l2_translation_size = 1 */ + if (IBS_OP_IBS_DC_LIN_ADDR_VALID(trans_op) + && IBS_OP_IBS_DC_L1_TLB_MISS(trans_op) + && !IBS_OP_IBS_DC_L2_TLB_MISS(trans_op) + + /* L2 DTLB page translation */ + && !IBS_OP_IBS_DC_L2_TLB_HIT_2MB(trans_op) + && IBS_OP_IBS_DC_L2_TLB_HIT_1GB(trans_op)) + /* 2M L2 DTLB page translation */ + AGG_IBS_EVENT(DE_IBS_LS_L2_DTLB_1G); + break; + + case DE_IBS_LS_L2_DTLB_RES2: + break; + + case DE_IBS_LS_DC_LOAD_LAT: + if (IBS_OP_IBS_LD_OP(trans_op) + /* If the load missed in DC, tally the DC load miss latency */ + && IBS_OP_IBS_DC_MISS(trans_op)) + /* DC load miss latency is only reliable for load ops */ + AGG_IBS_COUNT(DE_IBS_LS_DC_LOAD_LAT, + IBS_OP_DC_MISS_LATENCY(trans_op)) ; + break; + + default: + break; + } + } +} + +/* + * --------------------- OP NB DERIVED FUNCTION + * + * NB data is only guaranteed reliable for load operations + * that miss in L1 and L2 cache. NB data arrives too late + * to be reliable for store operations + */ +void trans_ibs_op_nb (struct transient * trans, unsigned int selected_flag, unsigned int size) +{ + struct ibs_op_sample * trans_op = ((struct ibs_sample*)(trans->ext))->op; + unsigned int i, j, mask = 1; + + /* Preliminary check */ + if (!IBS_OP_IBS_LD_OP(trans_op)) + return; + + if (!IBS_OP_IBS_DC_MISS(trans_op)) + return; + + if (IBS_OP_NB_IBS_REQ_SRC(trans_op) == 0) + return; + + for (i = IBS_OP_NB_BASE, j =0 ; i <= IBS_OP_NB_END && j < size ; i++, mask = mask << 1) { + + if ((selected_flag & m... [truncated message content] |
From: Suravee S. <sur...@am...> - 2009-04-14 17:49:36
|
This patch contains - IBS events and unitmask. - New family10h events (Pub 31116 Revision 3.20) events | 101 ++++++++++++++++++++++++++++++++++++++++++++++++++++++------- unit_masks | 46 ++++++++++++++++++--------- 2 files changed, 120 insertions(+), 27 deletions(-) Signed-off-by: Suravee Suthikulpanit <sur...@am...> --- diff -paurN oprofile-base/events/x86-64/family10/events oprofile-ibs-latest/events/x86-64/family10/events --- oprofile-base/events/x86-64/family10/events 2008-11-24 10:39:31.000000000 -0600 +++ oprofile-ibs-latest/events/x86-64/family10/events 2009-04-07 14:29:27.000000000 -0500 @@ -1,4 +1,3 @@ -# # AMD Family 10 processor performance events # # Copyright OProfile authors @@ -8,12 +7,17 @@ # Suravee Suthikulpanit <suravee.suthikulpanit at amd.com> # # Sources: BIOS and Kernel Developer's Guide for AMD Family 10h Processors, -# Publication# 31116, Revision 3.00, 7 September 2007 +# Publication# 31116, Revision 3.20, February 04, 2009 # # Software Optimization Guide for AMD Family 10h Processors, # Publication# 40546, Revision 3.04, September 2007 # -# This file was last updated on 11 January 2008. +# Revision: 1.1 +# +# ChangeLog: 06 April 2009. +# - Add IBS-derived events +# - Update from BKDG Rev 3.00 to Rev 3.20 +# - Add Events 165h, 1c0h, 1cfh, 1d3h-1d5h # # Floating point events event:0x00 counters:0,1,2,3 um:fpu_ops minimum:500 name:DISPATCHED_FPU_OPS : Dispatched FPU ops @@ -59,10 +63,11 @@ event:0x65 counters:0,1,2,3 um:memreqtyp event:0x67 counters:0,1,2,3 um:dataprefetch minimum:500 name:DATA_PREFETCHES : Data prefetcher event:0x6c counters:0,1,2,3 um:systemreadresponse minimum:500 name:NORTHBRIDGE_READ_RESPONSES : Northbridge read responses by coherency state event:0x6d counters:0,1,2,3 um:octword_transfer minimum:500 name:OCTWORD_WRITE_TRANSFERS : Octwords written to system -event:0x76 counters:0,1,2,3 um:zero minimum:3000 name:CPU_CLK_UNHALTED : Cycles outside of halt state +event:0x76 counters:0,1,2,3 um:zero minimum:50000 name:CPU_CLK_UNHALTED : Cycles outside of halt state event:0x7d counters:0,1,2,3 um:l2_internal minimum:500 name:REQUESTS_TO_L2 : Requests to L2 Cache event:0x7e counters:0,1,2,3 um:l2_req_miss minimum:500 name:L2_CACHE_MISS : L2 cache misses event:0x7f counters:0,1,2,3 um:l2_fill minimum:500 name:L2_CACHE_FILL_WRITEBACK : L2 fill/writeback +event:0x165 counters:0,1,2,3 um:page_size_mismatches minimum:500 name:PAGE_SIZE_MISMATCHES : Page Size Mismatches # Instruction Cache events event:0x80 counters:0,1,2,3 um:zero minimum:500 name:INSTRUCTION_CACHE_FETCHES : Instruction cache fetches (RevE) @@ -81,7 +86,7 @@ event:0x99 counters:0,1,2,3 um:zero mini event:0x9a counters:0,1,2,3 um:zero minimum:500 name:ITLB_RELOADS_ABORTED : The number of ITLB reloads aborted # Execution Unit events -event:0xc0 counters:0,1,2,3 um:zero minimum:3000 name:RETIRED_INSTRUCTIONS : Retired instructions (includes exceptions, interrupts, re-syncs) +event:0xc0 counters:0,1,2,3 um:zero minimum:50000 name:RETIRED_INSTRUCTIONS : Retired instructions (includes exceptions, interrupts, re-syncs) event:0xc1 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_UOPS : Retired micro-ops event:0xc2 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_BRANCH_INSTRUCTIONS : Retired branches (conditional, unconditional, exceptions, interrupts) event:0xc3 counters:0,1,2,3 um:zero minimum:500 name:RETIRED_MISPREDICTED_BRANCH_INSTRUCTIONS : Retired mispredicted branch instructions @@ -96,7 +101,7 @@ event:0xcb counters:0,1,2,3 um:fpu_instr event:0xcc counters:0,1,2,3 um:fpu_fastpath minimum:500 name:RETIRED_FASTPATH_DOUBLE_OP_INSTRUCTIONS : Retired FastPath double-op instructions event:0xcd counters:0,1,2,3 um:zero minimum:500 name:INTERRUPTS_MASKED_CYCLES : Cycles with interrupts masked (IF=0) event:0xce counters:0,1,2,3 um:zero minimum:500 name:INTERRUPTS_MASKED_CYCLES_WITH_INTERRUPT_PENDING : Cycles with interrupts masked while interrupt pending -event:0xcf counters:0,1,2,3 um:zero minimum:10 name:INTERRUPTS_TAKEN : Number of taken hardware interrupts +event:0xcf counters:0,1,2,3 um:zero minimum:500 name:INTERRUPTS_TAKEN : Number of taken hardware interrupts event:0xd0 counters:0,1,2,3 um:zero minimum:500 name:DECODER_EMPTY : Nothing to dispatch (decoder empty) event:0xd1 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALLS : Dispatch stalls event:0xd2 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_BRANCH_ABORT : Dispatch stall from branch abort to retire @@ -108,11 +113,16 @@ event:0xd7 counters:0,1,2,3 um:zero mini event:0xd8 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_LS_FULL : Dispatch stall when LS is full event:0xd9 counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_WAITING_FOR_ALL_QUIET : Dispatch stall when waiting for all to be quiet event:0xda counters:0,1,2,3 um:zero minimum:500 name:DISPATCH_STALL_FOR_FAR_TRANSFER_OR_RESYNC : Dispatch Stall for Far Transfer or Resync to Retire -event:0xdb counters:0,1,2,3 um:fpu_exceptions minimum:1 name:FPU_EXCEPTIONS : FPU exceptions -event:0xdc counters:0,1,2,3 um:zero minimum:1 name:DR0_BREAKPOINTS : The number of matches on the address in breakpoint register DR0 -event:0xdd counters:0,1,2,3 um:zero minimum:1 name:DR1_BREAKPOINTS : The number of matches on the address in breakpoint register DR1 -event:0xde counters:0,1,2,3 um:zero minimum:1 name:DR2_BREAKPOINTS : The number of matches on the address in breakpoint register DR2 -event:0xdf counters:0,1,2,3 um:zero minimum:1 name:DR3_BREAKPOINTS : The number of matches on the address in breakpoint register DR3 +event:0xdb counters:0,1,2,3 um:fpu_exceptions minimum:500 name:FPU_EXCEPTIONS : FPU exceptions +event:0xdc counters:0,1,2,3 um:zero minimum:500 name:DR0_BREAKPOINTS : The number of matches on the address in breakpoint register DR0 +event:0xdd counters:0,1,2,3 um:zero minimum:500 name:DR1_BREAKPOINTS : The number of matches on the address in breakpoint register DR1 +event:0xde counters:0,1,2,3 um:zero minimum:500 name:DR2_BREAKPOINTS : The number of matches on the address in breakpoint register DR2 +event:0xdf counters:0,1,2,3 um:zero minimum:500 name:DR3_BREAKPOINTS : The number of matches on the address in breakpoint register DR3 +event:0x1c0 counters:0,1,2,3 um:retired_x87_fp minimum:500 name:RETIRED_X87_FLOATING_POINT_OPERATIONS : Retired x87 Floating Point Operations (RevC and later) +event:0x1cf counters:0,1,2,3 um:zero minimum:50000 name:IBS_OPS_TAGGED : IBS Ops Tagged (RevC and later) +event:0x1d3 counters:0,1,2,3 um:zero minimum:500 name:LFENCE_INSTRUCTIONS_RETIRED : LFENCE Instructions Retired (RevC and later) +event:0x1d4 counters:0,1,2,3 um:zero minimum:500 name:SFENCE_INSTRUCTIONS_RETIRED : SFENCE Instructions Retired (RevC and later) +event:0x1d5 counters:0,1,2,3 um:zero minimum:500 name:MFENCE_INSTRUCTIONS_RETIRED : MFENCE Instructions Retired (RevC and later) # Memory Controler events event:0xe0 counters:0,1,2,3 um:page_access minimum:500 name:DRAM_ACCESSES : DRAM accesses @@ -149,3 +159,72 @@ event:0x4e0 counters:0,1,2,3 um:l3_cache event:0x4e1 counters:0,1,2,3 um:l3_cache minimum:500 name:L3_CACHE_MISSES : Number of L3 cache misses from each core event:0x4e2 counters:0,1,2,3 um:l3_fill minimum:500 name:L3_FILLS_CAUSED_BY_L2_EVICTIONS : Number of L3 fills caused by L2 evictions per core event:0x4e3 counters:0,1,2,3 um:l3_evict minimum:500 name:L3_EVICTIONS : Number of L3 cache line evictions by cache state + +############################### +# IBS FETCH EVENTS +############################### +event:0xf000 ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_ALL : All IBS fetch samples +event:0xf001 ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_KILLED : IBS fetch killed +event:0xf002 ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_ATTEMPTED : IBS fetch attempted +event:0xf003 ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_COMPLETED : IBS fetch completed +event:0xf004 ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_ABORTED : IBS fetch aborted +event:0xf005 ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_ITLB_HITS : IBS ITLB hit +event:0xf006 ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_L1_ITLB_MISSES_L2_ITLB_HITS : IBS L1 ITLB misses (and L2 ITLB hits) +event:0xf007 ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_L1_ITLB_MISSES_L2_ITLB_MISSES : IBS L1 L2 ITLB miss +event:0xf008 ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_ICACHE_MISSES : IBS Instruction cache misses +event:0xf009 ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_ICACHE_HITS : IBS Instruction cache hit +event:0xf00A ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_4K_PAGE : IBS 4K page translation +event:0xf00B ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_2M_PAGE : IBS 2M page translation +# +event:0xf00E ext:ibs_fetch um:zero minimum:50000 name:IBS_FETCH_LATENCY : IBS fetch latency + +############################### +# IBS OP EVENTS +############################### +event:0xf100 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_ALL : All IBS op samples +event:0xf101 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_TAG_TO_RETIRE : IBS tag-to-retire cycles +event:0xf102 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_COMP_TO_RET : IBS completion-to-retire cycles +event:0xf103 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_BRANCH_RETIRED : IBS branch op +event:0xf104 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_MISPREDICTED_BRANCH : IBS mispredicted branch op +event:0xf105 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_TAKEN_BRANCH : IBS taken branch op +event:0xf106 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_MISPREDICTED_BRANCH_TAKEN : IBS mispredicted taken branch op +event:0xf107 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_RETURNS : IBS return op +event:0xf108 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_MISPREDICTED_RETURNS : IBS mispredicted return op +event:0xf109 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_RESYNC : IBS resync op +event:0xf200 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_ALL_LOAD_STORE : IBS all load store ops +event:0xf201 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_LOAD : IBS load ops +event:0xf202 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_STORE : IBS store ops +event:0xf203 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_L1_DTLB_HITS : IBS L1 DTLB hit +event:0xf204 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_L1_DTLB_MISS_L2_DTLB_HIT : IBS L1 DTLB misses L2 hits +event:0xf205 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_L1_L2_DTLB_MISS : IBS L1 and L2 DTLB misses +event:0xf206 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_DATA_CACHE_MISS : IBS data cache misses +event:0xf207 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_DATA_HITS : IBS data cache hits +event:0xf208 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_MISALIGNED_DATA_ACC : IBS misaligned data access +event:0xf209 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_BANK_CONF_LOAD : IBS bank conflict on load op +event:0xf20A ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_BANK_CONF_STORE : IBS bank conflict on store op +event:0xf20B ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_FORWARD : IBS store-to-load forwarded +event:0xf20C ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_CANCELLED : IBS store-to-load cancelled +event:0xf20D ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_DCUC_MEM_ACC : IBS UC memory access +event:0xf20E ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_DCWC_MEM_ACC : IBS WC memory access +event:0xf20F ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_LOCKED : IBS locked operation +event:0xf210 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_MAB_HIT : IBS MAB hit +event:0xf211 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_L1_DTLB_4K : IBS L1 DTLB 4K page +event:0xf212 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_L1_DTLB_2M : IBS L1 DTLB 2M page +event:0xf213 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_L1_DTLB_1G : IBS L1 DTLB 1G page +event:0xf215 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_L2_DTLB_4K : IBS L2 DTLB 4K page +event:0xf216 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_L2_DTLB_2M : IBS L2 DTLB 2M page +event:0xf217 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_L2_DTLB_1G : IBS L2 DTLB 1G page +event:0xf219 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_DC_LOAD_LAT : IBS data cache miss load latency +event:0xf240 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_LOCAL_ONLY : IBS northbridge local +event:0xf241 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_REMOTE_ONLY : IBS northbridge remote +event:0xf242 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_LOCAL_L3 : IBS northbridge local L3 +event:0xf243 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_LOCAL_CACHE : IBS northbridge local core L1 or L2 cache +event:0xf244 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_REMOTE_CACHE : IBS northbridge local core L1, L2, L3 cache +event:0xf245 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_LOCAL_DRAM : IBS northbridge local DRAM +event:0xf246 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_REMOTE_DRAM : IBS northbridge remote DRAM +event:0xf247 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_LOCAL_OTHER : IBS northbridge local APIC MMIO Config PCI +event:0xf248 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_REMOTE_OTHER : IBS northbridge remote APIC MMIO Config PCI +event:0xf249 ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_CACHE_MODIFIED : IBS northbridge cache modified state +event:0xf24A ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_CACHE_OWNED : IBS northbridge cache owned state +event:0xf24B ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_LOCAL_CACHE_LAT : IBS northbridge local cache latency +event:0xf24C ext:ibs_op um:ibs_op minimum:50000 name:IBS_OP_NB_REMOTE_CACHE_LAT : IBS northbridge remote cache latency diff -paurN oprofile-base/events/x86-64/family10/unit_masks oprofile-ibs-latest/events/x86-64/family10/unit_masks --- oprofile-base/events/x86-64/family10/unit_masks 2008-05-20 19:31:07.000000000 -0500 +++ oprofile-ibs-latest/events/x86-64/family10/unit_masks 2009-04-10 15:45:16.000000000 -0500 @@ -3,17 +3,22 @@ # # Copyright OProfile authors # Copyright (c) 2006-2008 Advanced Micro Devices -# Contributed by Ray Bryant <raybry at amd.com> +# Contributed by Ray Bryant <raybry at amd.com>, # Jason Yeh <jason.yeh at amd.com> # Suravee Suthikulpanit <suravee.suthikulpanit at amd.com> # # Sources: BIOS and Kernel Developer's Guide for AMD Family 10h Processors, -# Publication# 31116, Revision 3.00, September 7, 2007 +# Publication# 31116, Revision 3.20, February 04, 2009 # # Software Optimization Guide for AMD Family 10h Processors, # Publication# 40546, Revision 3.04, September 2007 # -# This file was last updated on 11 January 2008. +# Revision: 1.1 +# +# ChangeLog: 06 April 2009. +# - Add IBS-derived events +# - Update from BKDG Rev 3.00 to Rev 3.20 +# - Add Events 165h, 1c0h, 1cfh, 1d3h-1d5h # name:zero type:mandatory default:0x0 0x0 No unit mask @@ -50,7 +55,7 @@ name:segregload type:bitmask default:0x7 name:fpu_instr type:bitmask default:0x07 0x01 x87 instructions 0x02 MMX & 3DNow instructions - 0x04 SSE & SSE2 instructions + 0x04 SSE instructions (SSE, SSE2, SSE3, and SSE4A) name:fpu_fastpath type:bitmask default:0x07 0x01 With low op in position 0 0x02 With low op in position 1 @@ -60,15 +65,13 @@ name:fpu_exceptions type:bitmask default 0x02 SSE retype microfaults 0x04 SSE reclass microfaults 0x08 SSE and x87 microtraps -name:page_access type:bitmask default:0xff +name:page_access type:bitmask default:0x3f 0x01 DCT0 Page hit 0x02 DCT0 Page miss 0x04 DCT0 Page conflict 0x08 DCT1 Page hit 0x10 DCT1 Page miss 0x20 DCT1 Page Conflict - 0x40 Write request - 0x80 Read request name:mem_page_overflow type:bitmask default:0x03 0x01 DCT0 Page Table Overflow 0x02 DCT1 Page Table Overflow @@ -163,7 +166,7 @@ name:cacheblock type:bitmask default:0x3 0x04 Read Block (Dcache load miss refill) 0x08 Read Block Shared (Icache refill) 0x10 Read Block Modified (Dcache store miss refill) - 0x20 Change to Dirty (first store to clean block already in cache) + 0x20 Change-to-Dirty (first store to clean block already in cache) name:dataprefetch type:bitmask default:0x03 0x01 Cancelled prefetches 0x02 Prefetch attempts @@ -171,14 +174,16 @@ name:memreqtype type:bitmask default:0x8 0x01 Requests to non-cacheable (UC) memory 0x02 Requests to write-combining (WC) memory or WC buffer flushes to WB memory 0x80 Streaming store (SS) requests -name:systemreadresponse type:bitmask default:0x17 +name:systemreadresponse type:bitmask default:0x1f 0x01 Exclusive 0x02 Modified 0x04 Shared + 0x08 Owned 0x10 Data Error -name:l1_dtlb_miss_l2_hit type:bitmask default:0x03 +name:l1_dtlb_miss_l2_hit type:bitmask default:0x07 0x01 L2 4K TLB hit 0x02 L2 2M TLB hit + 0x04 L2 1G TLB hit (RevC) name:l1_l2_dtlb_miss type:bitmask default:0x07 0x01 4K TLB reload 0x02 2M TLB reload @@ -216,7 +221,7 @@ name:httransmit type:bitmask default:0xb 0x02 Data DWORD sent 0x04 Buffer release DWORD sent 0x08 Nop DW sent (idle) - 0x10 Address extension DWORD sent + 0x10 Address DWORD sent 0x20 Per packet CRC sent 0x80 SubLink Mask name:lock_ops type:bitmask default:0x0f @@ -289,7 +294,7 @@ name:cpu_read_lat_0_3 type:bitmask defau 0x01 Read block 0x02 Read block shared 0x04 Read block modified - 0x08 Change to Dirty + 0x08 Change-to-Dirty 0x10 From local node to node 0 0x20 From local node to node 1 0x40 From local node to node 2 @@ -298,7 +303,7 @@ name:cpu_read_lat_4_7 type:bitmask defau 0x01 Read block 0x02 Read block shared 0x04 Read block modified - 0x08 Change to Dirty + 0x08 Change-to-Dirty 0x10 From local node to node 4 0x20 From local node to node 5 0x40 From local node to node 6 @@ -334,9 +339,18 @@ name:l3_evict type:bitmask default:0x0f 0x02 Exclusive 0x04 Owned 0x08 Modified -name:icache_invalidated type:bitmask default:0x0f +name:icache_invalidated type:bitmask default:0x03 0x01 Invalidating probe that did not hit any in-flight instructions 0x02 Invalidating probe that hit one or more in-flight instructions - 0x04 SMC that did not hit any in-flight instructions - 0x08 SMC that hit one or more in-flight instructions +name:page_size_mismatches type:bitmask default:0x07 + 0x01 Guest page size is larger than the host page size + 0x02 MTRR mismatch + 0x04 Host page size is larger than the guest page size +name:retired_x87_fp type:bitmask default:0x07 + 0x01 Add/subtract ops + 0x02 Multiply ops + 0x04 Divide ops +name:ibs_op type:bitmask default:0x01 + 0x00 Using IBS OP cycle count mode + 0x01 Using IBS OP dispatch count mode |
From: Suravee S. <sur...@am...> - 2009-04-14 17:49:46
|
This patch contains changes for opannotate to handle the case when virtual address associated with IBS fetch samples may lie in the middle of an instruction. opannotate.cpp | 246 +++++++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 178 insertions(+), 68 deletions(-) Signed-off-by: Suravee Suthikulpanit <sur...@am...> --- diff -paurN oprofile-base/pp/opannotate.cpp oprofile-ibs-latest/pp/opannotate.cpp --- oprofile-base/pp/opannotate.cpp 2007-11-06 07:45:40.000000000 -0600 +++ oprofile-ibs-latest/pp/opannotate.cpp 2009-04-14 12:10:22.000000000 -0500 @@ -148,31 +148,107 @@ string count_str(count_array_t const & c } -string asm_line_annotation(symbol_entry const * last_symbol, - string const & value) +/// NOTE: This function annotates a list<string> containing output from objdump. +/// It uses a list iterator, and a sample_container iterator which iterates +/// from the beginning to the end, and compare sample address +/// against the instruction address on the asm line. +/// +/// There are 2 cases of annotation: +/// 1. If sample address matches current line address, annotate the current line. +/// 2. If (previous line address < sample address < current line address), +/// then we annotate previous line. This case happens when sample address +/// is not aligned with the instruction address, which is seen when profile +/// using the instruction fetch mode of AMD Instruction-Based Sampling (IBS). +/// +int asm_list_annotation(symbol_entry const * last_symbol, + list<string>::iterator sit, + sample_container::samples_iterator & samp_it, + list<string> & asm_lines) { + int ret = 0; + + sample_entry const * sample = NULL; + + if (samp_it != samples->end()) + sample = &samp_it->second; + // do not use the bfd equivalent: // - it does not skip space at begin // - we does not need cross architecture compile so the native // strtoull must work, assuming unsigned long long can contain a vma // and on 32/64 bits box bfd_vma is 64 bits // gcc 2.91.66 workaround - bfd_vma vma = 0; - vma = strtoull(value.c_str(), NULL, 16); + bfd_vma vma = strtoull((*sit).c_str(), NULL, 16); - string str; + if (sample && sample->vma == vma) { + // Case 1 : Sample address match current line address. + string str = count_str(sample->counts, samples->samples_count()); - sample_entry const * sample = samples->find_sample(last_symbol, vma); - if (sample) { - str += count_str(sample->counts, samples->samples_count()); + // For each events for (size_t i = 1; i < nr_events; ++i) str += " "; - str += " :"; + + *sit = str + " :" + *sit; + if (samp_it != samples->end()) + ++samp_it; + + } else if (sample && sample->vma < vma) { + // Case 2 : vma of the current line is greater than vma of the sample + + // Get the string of previous assembly line + list<string>::iterator sit_prev = sit; + string prev_line, prev_vma_str; + string::size_type loc1, loc2; + while (sit_prev != asm_lines.begin()) { + --sit_prev; + prev_line = *sit_prev; + + loc1 = prev_line.find(":", 0); + if (loc1 != string::npos) { + loc2 = prev_line.find(":", loc1+1); + if (loc2 != string::npos) { + prev_vma_str = prev_line.substr(loc1+1, loc2); + break; + } + } + } + + bfd_vma vma_prev = strtoull(prev_vma_str.c_str(), NULL, 16); + + // Need to check if prev_vma < sample->vma + if (vma_prev != 0 && vma_prev < sample->vma) { + string str; + + // Get sample for previous line. + sample_entry * prev_sample = (sample_entry *)samples-> + find_sample(last_symbol, vma_prev); + if (prev_sample) { + // Aggregate sample with previous line if it already has samples + prev_sample->counts += sample->counts; + str = count_str(prev_sample->counts, samples->samples_count()); + } else { + str = count_str(sample->counts, samples->samples_count()); + } + + // For each events + for (size_t i = 1; i < nr_events; ++i) + str += " "; + + *sit_prev = str + " :" + prev_line.substr(loc1+1); + if (samp_it != samples->end()) + ++samp_it; + ret = -1; + } else { + // Failed to annotate the previous line. Skip sample. + *sit = annotation_fill + *sit; + if (samp_it != samples->end()) + ++samp_it; + } } else { - str = annotation_fill; + *sit = annotation_fill + *sit; } - return str; + return ret; } @@ -206,66 +282,103 @@ bool is_symbol_line(string const & str, } -symbol_entry const * output_objdump_asm_line(symbol_entry const * last_symbol, - string const & app_name, string const & str, - symbol_collection const & symbols, - bool & do_output) +void annotate_objdump_str_list(string const & app_name, + symbol_collection const & symbols, + list<string> & asm_lines) { - // output of objdump is a human readable form and can contain some - // ambiguity so this code is dirty. It is also optimized a little bit - // so it is difficult to simplify it without breaking something ... - - // line of interest are: "[:space:]*[:xdigit:]?[ :]", the last char of - // this regexp dis-ambiguate between a symbol line and an asm line. If - // source contain line of this form an ambiguity occur and we rely on - // the robustness of this code. + symbol_entry const * last_symbol = 0; + int ret = 0; - size_t pos = 0; - while (pos < str.length() && isspace(str[pos])) - ++pos; + // to filter output of symbols (filter based on command line options) + bool do_output = true; - if (pos == str.length() || !isxdigit(str[pos])) { - if (do_output) { - cout << annotation_fill << str << '\n'; - return last_symbol; + // We simultaneously walk the two structures (list and sample_container) + // which are sorted by address. and do address comparision. + list<string>::iterator sit = asm_lines.begin(); + list<string>::iterator send = asm_lines.end(); + sample_container::samples_iterator samp_it = samples->begin(); + + for (; sit != send; (!ret? sit++: sit)) { + // output of objdump is a human readable form and can contain some + // ambiguity so this code is dirty. It is also optimized a little bit + // so it is difficult to simplify it without breaking something ... + + // line of interest are: "[:space:]*[:xdigit:]?[ :]", the last char of + // this regexp dis-ambiguate between a symbol line and an asm line. If + // source contain line of this form an ambiguity occur and we rely on + // the robustness of this code. + string str = *sit; + size_t pos = 0; + while (pos < str.length() && isspace(str[pos])) + ++pos; + + if (pos == str.length() || !isxdigit(str[pos])) { + if (do_output) { + *sit = annotation_fill + str; + continue; + } } - } - while (pos < str.length() && isxdigit(str[pos])) - ++pos; + while (pos < str.length() && isxdigit(str[pos])) + ++pos; - if (pos == str.length() || (!isspace(str[pos]) && str[pos] != ':')) { - if (do_output) { - cout << annotation_fill << str << '\n'; - return last_symbol; + if (pos == str.length() || (!isspace(str[pos]) && str[pos] != ':')) { + if (do_output) { + *sit = annotation_fill + str; + continue; + } } - } - if (is_symbol_line(str, pos)) { - last_symbol = find_symbol(app_name, str); + if (is_symbol_line(str, pos)) { - // ! complexity: linear in number of symbol must use sorted - // by address vector and lower_bound ? - // Note this use a pointer comparison. It work because symbols - // pointer are unique - if (find(symbols.begin(), symbols.end(), last_symbol) - != symbols.end()) { - do_output = true; + last_symbol = find_symbol(app_name, str); + + // ! complexity: linear in number of symbol must use sorted + // by address vector and lower_bound ? + // Note this use a pointer comparison. It work because symbols + // pointer are unique + if (find(symbols.begin(), symbols.end(), last_symbol) + != symbols.end()) + do_output = true; + else + do_output = false; + + if (do_output) { + *sit += symbol_annotation(last_symbol); + + // Realign the sample iterator to + // the beginning of this symbols + samp_it = samples->begin(last_symbol); + } } else { - do_output = false; + // not a symbol, probably an asm line. + if (do_output) { + ret = asm_list_annotation(last_symbol, sit, samp_it, asm_lines); + } } - if (do_output) - cout << str << symbol_annotation(last_symbol) << '\n'; - - } else { - // not a symbol, probably an asm line. - if (do_output) - cout << asm_line_annotation(last_symbol, str) - << str << '\n'; + if (!do_output) + *sit = ""; } +} + - return last_symbol; +void output_objdump_str_list(symbol_collection const & symbols, + string const & app_name, + list<string> & asm_lines) +{ + + annotate_objdump_str_list(app_name, symbols, asm_lines); + + // Printing objdump output to stdout + list<string>::iterator sit = asm_lines.begin(); + list<string>::iterator send = asm_lines.end(); + sit = asm_lines.begin(); + for (; sit != send; ++sit) { + string str = *sit; + if (str.length() != 0) + cout << str << '\n'; + } } @@ -274,6 +387,7 @@ void do_one_output_objdump(symbol_collec bfd_vma start, bfd_vma end) { vector<string> args; + list<string> asm_lines; args.push_back("-d"); args.push_back("--no-show-raw-insn"); @@ -301,15 +415,12 @@ void do_one_output_objdump(symbol_collec return; } - // to filter output of symbols (filter based on command line options) - bool do_output = true; - - symbol_entry const * last_symbol = 0; + // Read each output line from objdump and store in a list. string str; - while (reader.getline(str)) { - last_symbol = output_objdump_asm_line(last_symbol, app_name, - str, symbols, do_output); - } + while (reader.getline(str)) + asm_lines.push_back(str); + + output_objdump_str_list(symbols, app_name, asm_lines); // objdump always returns SUCCESS so we must rely on the stderr state // of objdump. If objdump error message is cryptic our own error @@ -716,9 +827,8 @@ int opannotate(options::spec const & spe } if (!debug_info && !options::assembly) { - cerr << "no debug information available for any binary " - << "selected and --assembly not requested\n"; - exit(EXIT_FAILURE); + cerr << "opannotate (warning): no debug information available for binary " + << it->image << ", and --assembly not requested\n"; } annotate_source(images); |
From: Maynard J. <may...@us...> - 2009-04-15 15:07:58
|
Suravee Suthikulpanit wrote: > This patch contains changes for opannotate to handle the case when > virtual address associated with IBS fetch samples may lie in the middle of an > instruction. > > opannotate.cpp | 246 +++++++++++++++++++++++++++++++++++++++++---------------- > 1 file changed, 178 insertions(+), 68 deletions(-) > > Signed-off-by: Suravee Suthikulpanit <sur...@am...> Just a couple minor comments . . . see below. -Maynard > > > --- > > diff -paurN oprofile-base/pp/opannotate.cpp oprofile-ibs-latest/pp/opannotate.cpp > --- oprofile-base/pp/opannotate.cpp 2007-11-06 07:45:40.000000000 -0600 > +++ oprofile-ibs-latest/pp/opannotate.cpp 2009-04-14 12:10:22.000000000 -0500 > @@ -148,31 +148,107 @@ string count_str(count_array_t const & c > } > > > -string asm_line_annotation(symbol_entry const * last_symbol, > - string const & value) > +/// NOTE: This function annotates a list<string> containing output from objdump. > +/// It uses a list iterator, and a sample_container iterator which iterates > +/// from the beginning to the end, and compare sample address > +/// against the instruction address on the asm line. > +/// > +/// There are 2 cases of annotation: > +/// 1. If sample address matches current line address, annotate the current line. > +/// 2. If (previous line address < sample address < current line address), > +/// then we annotate previous line. This case happens when sample address > +/// is not aligned with the instruction address, which is seen when profile > +/// using the instruction fetch mode of AMD Instruction-Based Sampling (IBS). > +/// > +int asm_list_annotation(symbol_entry const * last_symbol, > + list<string>::iterator sit, > + sample_container::samples_iterator & samp_it, > + list<string> & asm_lines) > { > + int ret = 0; > + > + sample_entry const * sample = NULL; > + > + if (samp_it != samples->end()) > + sample = &samp_it->second; > + > // do not use the bfd equivalent: > // - it does not skip space at begin > // - we does not need cross architecture compile so the native > // strtoull must work, assuming unsigned long long can contain a vma > // and on 32/64 bits box bfd_vma is 64 bits > // gcc 2.91.66 workaround > - bfd_vma vma = 0; > - vma = strtoull(value.c_str(), NULL, 16); > + bfd_vma vma = strtoull((*sit).c_str(), NULL, 16); > > - string str; > + if (sample && sample->vma == vma) { > + // Case 1 : Sample address match current line address. > + string str = count_str(sample->counts, samples->samples_count()); > > - sample_entry const * sample = samples->find_sample(last_symbol, vma); > - if (sample) { > - str += count_str(sample->counts, samples->samples_count()); > + // For each events > for (size_t i = 1; i < nr_events; ++i) > str += " "; > - str += " :"; > + > + *sit = str + " :" + *sit; > + if (samp_it != samples->end()) > + ++samp_it; > + > + } else if (sample && sample->vma < vma) { > + // Case 2 : vma of the current line is greater than vma of the sample > + > + // Get the string of previous assembly line > + list<string>::iterator sit_prev = sit; > + string prev_line, prev_vma_str; > + string::size_type loc1, loc2; gcc complains that loc1 may be used uninitialized, which logically won't happen, but gcc doesn't know that. It's initialized inside the while loop (conditionally) and then used again at line 238. So to make gcc happy, you should init it to string::npos. > + while (sit_prev != asm_lines.begin()) { > + --sit_prev; > + prev_line = *sit_prev; > + > + loc1 = prev_line.find(":", 0); > + if (loc1 != string::npos) { > + loc2 = prev_line.find(":", loc1+1); > + if (loc2 != string::npos) { > + prev_vma_str = prev_line.substr(loc1+1, loc2); > + break; > + } > + } > + } > + > + bfd_vma vma_prev = strtoull(prev_vma_str.c_str(), NULL, 16); > + > + // Need to check if prev_vma < sample->vma > + if (vma_prev != 0 && vma_prev < sample->vma) { > + string str; > + > + // Get sample for previous line. > + sample_entry * prev_sample = (sample_entry *)samples-> > + find_sample(last_symbol, vma_prev); > + if (prev_sample) { > + // Aggregate sample with previous line if it already has samples > + prev_sample->counts += sample->counts; > + str = count_str(prev_sample->counts, samples->samples_count()); > + } else { > + str = count_str(sample->counts, samples->samples_count()); > + } > + > + // For each events > + for (size_t i = 1; i < nr_events; ++i) > + str += " "; > + > + *sit_prev = str + " :" + prev_line.substr(loc1+1); > + if (samp_it != samples->end()) > + ++samp_it; > + ret = -1; > + } else { > + // Failed to annotate the previous line. Skip sample. Isn't this something the user might want to know about? Is it serious enough for a warning message? Or should we have a verbose printout for it? > + *sit = annotation_fill + *sit; > + if (samp_it != samples->end()) > + ++samp_it; > + } > } else { > - str = annotation_fill; > + *sit = annotation_fill + *sit; > } > > - return str; > + return ret; > } > [snip] |
From: Suravee S. <sur...@am...> - 2009-04-16 17:32:17
|
Maynard, I will make changes below and will submit the patch file for 4/5 individually, (calling it Revision 5.1) since there is no other changes elsewhere. Suravee Maynard Johnson wrote: > Suravee Suthikulpanit wrote: >> This patch contains changes for opannotate to handle the case when >> virtual address associated with IBS fetch samples may lie in the middle of an >> instruction. >> >> opannotate.cpp | 246 +++++++++++++++++++++++++++++++++++++++++---------------- >> 1 file changed, 178 insertions(+), 68 deletions(-) >> >> Signed-off-by: Suravee Suthikulpanit <sur...@am...> > Just a couple minor comments . . . see below. > > -Maynard >> --- SNIP >> + >> + // Get the string of previous assembly line >> + list<string>::iterator sit_prev = sit; >> + string prev_line, prev_vma_str; >> + string::size_type loc1, loc2; > gcc complains that loc1 may be used uninitialized, which logically won't happen, but gcc doesn't know that. It's initialized inside the while loop (conditionally) and then used again at line 238. So to make gcc happy, you should init it to string::npos. [Suravee] I will take care of this. --- SNIP >> + >> + *sit_prev = str + " :" + prev_line.substr(loc1+1); >> + if (samp_it != samples->end()) >> + ++samp_it; >> + ret = -1; >> + } else { >> + // Failed to annotate the previous line. Skip sample. > > Isn't this something the user might want to know about? Is it serious enough for a warning message? Or should we have a verbose printout for it? > >> + *sit = annotation_fill + *sit; >> + if (samp_it != samples->end()) >> + ++samp_it; >> + } [Suravee] This is minor and should be ok with out a warning. |
From: Suravee S. <sur...@am...> - 2009-04-14 17:49:54
|
This patch contain changes to opcontrol. It adds code needed for configuring IBS. opcontrol needs to configure oprofiled (with appropriate --ext-feature flags), and driver interfaces (/dev/oprofile/ibs-fetch/... and /dev/oprofile/ibs-op/...) opcontrol | 181 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 175 insertions(+), 6 deletions(-) Signed-off-by: Suravee Suthikulpanit <sur...@am...> --- diff -paurN oprofile-base/utils/opcontrol oprofile-ibs-latest/utils/opcontrol --- oprofile-base/utils/opcontrol 2009-04-09 20:08:31.000000000 -0500 +++ oprofile-ibs-latest/utils/opcontrol 2009-04-13 18:51:08.000000000 -0500 @@ -118,7 +118,8 @@ do_help() -d/--dump flush the collected profiling data -t/--stop stop data collection -h/--shutdown stop data collection and kill daemon - -V/--verbose[=all,sfile,arcs,samples,module,misc] be verbose in the daemon log + -V/--verbose[=all,sfile,arcs,samples,module,misc,ext] + be verbose in the daemon log --reset clears out data from current session --save=name save data from current session to session_name --deinit unload the oprofile module and oprofilefs @@ -296,6 +297,12 @@ do_init() SEPARATE_THREAD=0 SEPARATE_CPU=0 CALLGRAPH=0 + IBS_FETCH_EVENTS="" + IBS_FETCH_COUNT=0 + IBS_FETCH_UNITMASK=0 + IBS_OP_EVENTS="" + IBS_OP_COUNT=0 + IBS_OP_UNITMASK=0 OPROFILED="$OPDIR/oprofiled" @@ -561,13 +568,16 @@ verify_counters() for f in `seq 0 $((NR_CHOSEN - 1))`; do get_event $f if test "$GOTEVENT" != ""; then + verify_ibs $GOTEVENT OPHELP_ARGS="$OPHELP_ARGS $GOTEVENT" fi done - HW_CTRS=`$OPHELP --check-events $OPHELP_ARGS --callgraph=$CALLGRAPH` - if test "$?" != 0; then - exit 1 + if test ! -z "$OPHELP_ARGS" ; then + HW_CTRS=`$OPHELP --check-events $OPHELP_ARGS --callgraph=$CALLGRAPH` + if test "$?" != 0; then + exit 1 + fi fi fi } @@ -983,7 +993,16 @@ do_kill_daemon() fi COUNT=`expr $COUNT + 1` - if test "$COUNT" -eq 15; then + + # IBS can generate a large number of samples/events. + # Therefore, extend the delay before killing + if test "$IBS_FETCH_COUNT" != "0" \ + -o "$IBS_OP_COUNT" != "0" ; then + DELAY_KILL=60 + else + DELAY_KILL=15 + fi + if test "$COUNT" -eq "$DELAY_KILL"; then echo "Daemon stuck shutting down; killing !" kill -9 `cat $LOCK_FILE` fi @@ -1299,6 +1318,17 @@ do_param_setup() set_ctr_param $f count 0 done + # Check if driver has IBS support + if test -d $MOUNT/ibs_fetch; then + # Reset driver's IBS fetch setting + set_param ibs_fetch/enable 0 + fi + + if test -d $MOUNT/ibs_op ; then + # Reset driver's IBS op setting + set_param ibs_op/enable 0 + fi + verify_counters OPROFILED_EVENTS= @@ -1342,6 +1372,13 @@ do_param_setup() fi fi + + if [ "$CTR" = "ibs_fetch" -o "$CTR" = "ibs_op" ] ; then + # Handle IBS events setup + do_param_setup_ibs + continue + fi + if test "$EVENT" = "RTC_INTERRUPTS"; then set_param rtc_value $COUNT $SYSCTL -w dev.oprofile.rtc_value=$COUNT @@ -1411,7 +1448,9 @@ do_start_daemon() --separate-thread=$SEPARATE_THREAD \ --separate-cpu=$SEPARATE_CPU" - OPD_ARGS="$OPD_ARGS --events=$OPROFILED_EVENTS" + if ! test -z "$OPROFILED_EVENTS"; then + OPD_ARGS="$OPD_ARGS --events=$OPROFILED_EVENTS" + fi if test "$VMLINUX" = "none"; then OPD_ARGS="$OPD_ARGS --no-vmlinux" @@ -1431,6 +1470,8 @@ do_start_daemon() OPD_ARGS="$OPD_ARGS --verbose=$VERBOSE" fi + help_start_daemon_with_ibs + vecho "executing oprofiled $OPD_ARGS" $OPROFILED $OPD_ARGS @@ -1787,6 +1828,134 @@ try_reset_sample_file() fi } +# +# Begin IBS Specific Functions +# +verify_ibs() +{ + IBS_EVENT=`echo $1| awk -F: '{print $1}'` + IBS_COUNT=`echo $1 | awk -F: '{print $2}'` + IBS_MASK=`echo $1 | awk -F: '{print $3}'` + + IBS_TYPE=`$OPHELP --check-events $1` + if test "$?" != "0" ; then + exit 1 + fi + + if [ "$IBS_TYPE" = "ibs_fetch " ] ; then + # Check IBS_COUNT consistency + if test "$IBS_FETCH_COUNT" = "0" ; then + IBS_FETCH_COUNT=$IBS_COUNT + IBS_FETCH_MASK=$IBS_MASK + elif test "$IBS_FETCH_COUNT" != "$IBS_COUNT" ; then + echo "All IBS Fetch must have the same count." + exit 1 + fi + + # Check IBS_MASK consistency + if test "$IBS_FETCH_MASK" != "$IBS_MASK" ; then + echo "All IBS Fetch must have the same unitmask." + exit 1 + fi + + elif [ "$IBS_TYPE" = "ibs_op " ] ; then + # Check IBS_COUNT consistency + if test "$IBS_OP_COUNT" = "0" ; then + IBS_OP_COUNT=$IBS_COUNT + IBS_OP_MASK=$IBS_MASK + elif test "$IBS_OP_COUNT" != "$IBS_COUNT" ; then + echo "All IBS Op must have the same count." + exit 1 + fi + + # Check IBS_MASK consistency + if test "$IBS_OP_MASK" != "$IBS_MASK" ; then + echo "All IBS Op must have the same unitmask." + exit 1 + fi + fi + + return +} + + +do_param_setup_ibs() +{ + if test "$KERNEL_SUPPORT" != "yes" ; then + echo "ERROR: No kernel support for IBS profiling." + exit 1 + fi + + + # Check if driver has IBS support + if test ! -d $MOUNT/ibs_fetch -o ! -d $MOUNT/ibs_op ; then + echo "ERROR: No kernel support for IBS profiling." + exit 1 + fi + + if test `echo $EVENT | \ + awk '{ print substr($0, 0, 10)}'` = "IBS_FETCH_" ; then + if test "$COUNT" != "0"; then + if [ "$IBS_FETCH_EVENTS" == "" ] ; then + IBS_FETCH_EVENTS="$EVENT" + else + IBS_FETCH_EVENTS="$IBS_FETCH_EVENTS,$EVENT" + fi + IBS_FETCH_COUNT=$COUNT + set_param ibs_fetch/max_count $COUNT + set_param ibs_fetch/rand_enable 1 + set_param ibs_fetch/enable 1 + else + set_param ibs_fetch/enable 0 + fi + + elif test `echo $EVENT | \ + awk '{ print substr($0, 0, 7)}'` = "IBS_OP_" ; then + if test "$COUNT" != "0"; then + if [ "$IBS_OP_EVENTS" == "" ] ; then + IBS_OP_EVENTS="$EVENT" + else + IBS_OP_EVENTS="$IBS_OP_EVENTS,$EVENT" + fi + IBS_OP_COUNT=$COUNT + IBS_OP_UNITMASK=$UNIT_MASK + + set_param ibs_op/max_count $COUNT + set_param ibs_op/enable 1 + + # NOTE: We default to use dispatched_op if available. + # Some of the older family10 system does not have + # dispatched_ops feature. + # dispatched op is enabled by bit 1 of the unitmask + if test -f $MOUNT/ibs_op/dispatched_ops ; then + IBS_OP_DISPATCHED_OP=$(( IBS_OP_UNITMASK & 0x1 )) + set_param ibs_op/dispatched_ops $IBS_OP_DISPATCHED_OP + fi + else + set_param ibs_op/enable 0 + fi + fi +} + + +help_start_daemon_with_ibs() +{ + if test "$IBS_FETCH_COUNT" != "0" -o "$IBS_OP_COUNT" != "0" ; then + OPD_ARGS="${OPD_ARGS} --ext-feature=ibs:" + if test "$IBS_FETCH_COUNT" != "0"; then + OPD_ARGS="${OPD_ARGS}fetch:$IBS_FETCH_EVENTS:$IBS_FETCH_COUNT:$IBS_FETCH_UNITMASK|" + fi + + if test "$IBS_OP_COUNT" != "0"; then + OPD_ARGS="${OPD_ARGS}op:$IBS_OP_EVENTS:$IBS_OP_COUNT:$IBS_OP_UNITMASK" + fi + fi +} + +# +# End IBS Specific Functions +# + # main # determine the location of opcontrol and related programs |
From: Maynard J. <may...@us...> - 2009-04-15 16:02:04
|
Suravee Suthikulpanit wrote: > Change Notes: OK, we're just about at the finish line. All patches except #4 are fine. I sent a separate note on patch 4. Regarding the ~17% slow-down in opannotate that I mentioned in the previous review . . . I realize that there's currently no way to ascertain if the samples being processed are IBS samples, so we have no way of switching between the older (faster) annotate algorithm and the newer (slower) annotate algorithm. There is an unused u32 in the sample file header (see libop/op_sample_file.h). Perhaps a byte of this word could be used to store an "enum ext_feature" value. Then opannotate could check that value (see libpp/populate_for_spu:is_spu_profile() for an example) to switch between fast and slow algorithms. I see this as a future enhancement that we can discuss after getting this patch set committed. So, once you take care of the minor issues I noted in my patch 4 review comments, I think we'll be ready for commit, unless other review comments come in. Also, have you had a chance to write up some documentation on using the extended feature support? Thanks. -Maynard > ============ > > Changes from revision 4: > - Made modifications based on feedback from Maynard. > * Run check_style.py script > * Fixed coding style to follow the guideline > * Remove trans_IBS_OP_log_dcmissinfo > * Rework some logics in opannotate > * Remove libpp changes > - Tested with kernel-2.8.29.1 (rebuild the kernel after enabling IBS) > - "make distcheck" passed > > Changes from revision 3: > - Tested with kernel-2.8.29.1 (rebuild the kernel after enabling IBS) > - "make distcheck" passed > - Modified to use OProfile Extended-feature Interface > - Fixed opannotate bug > - Add IBS data filtering > > > Introduction > ============ > > These patches extend Oprofile to support Instruction > Based Sampling (IBS) available on AMD Family 10h processors. > The specification of IBS is described in section 2.17.2 of "BIOS and > Kernel Developer's Guide (BKDG) For AMD Family 10h Processors". > > IBS provides wide range of precise information on instruction fetch > phase and execution phase. The document "Instruction-Based Sampling: > A New Performance Analysis Technique for AMD Family 10h Processors" > explains and demonstrates the uses of IBS in details. > > The patches are made against the head of CVS. The required kernel > support is in the kernel patch starting from patch-2.6.28-rc2. > > > Design Outline > ============== > > = Terms = > > EBS: Event based sampling > IBS: Instructions based sampling > > > = opcontrol changes = > > Enabling IBS profiling is done simply by specifying IBS events through > the "--event=" options similar to the event-based profiling: > > * opcontrol --event=IBS_FETCH_XXXX:<count>:<um>:<kernel>:<user> > * opcontrol --event=IBS_OP_XXXX:<count>:<um>:<kernel>:<user> > > IBS performance events are listed in the event/unitmask files. opcontrol > has been modified to handle these events, configure the driver interface > (/dev/oprofile/ibs_fetch/... and /dev/oprofile/ibs_op/...) and start oprofiled > with the appropriate options based on the users input. > > > = Driver interface changes = > > Two directories, /dev/oprofile/ibs_fetch and /dev/oprofile/ibs_op are > added to the oprofilefs allowing the control of MSRs through oprofile.ko > module. > > Both directories contains device file enable and max_count. > The file "enable" enables and disables the functionalities of the > directory containing it. The "max_count" file specifies the > maximum count value of the periodic op/fetch counter (bit 15:0 > of MSR 0xC001_1030 and 0xC001_1033). > > Directory "ibs_fetch" contains "ran_enable" file in addition to the > files mentioned. It corresponds to bit 57 of MSR 0xC001_1030. When > enabled, bits 3:0 of the fetch counter are randomized when IBS fetch > is set to start the fetch counter. > > Directory "ibs_op" contains "dispatched_op" file in addition to the > files mentioned. It corresponds to bit 19 of MSR 0xC001_1033. > This bit selects the mode of instruction tagging for IBS-Op, (0: Count Clock > Cycle, 1:Count dispatched ops) > > > = Daemon changes = > > To differentiate IBS events from EBS events and to accommodate the fact > that IBS events are not uniform in length when read from buffer. Two > escape codes "IBS_FETCH_SAMPLE" and "IBS_OP_SAMPLE" and their handlers > are added. > > Each IBS sample contains encapsulates multitudes of data. For example, > single IBS fetch data contains information of instruction cache L2TLB > miss, instruction cache L1TLB miss, L1 TLB page size, instruction cache > miss, linear address, physical address, etc. Howver, only the performance > data specified by users (through opcontrol) are logged. > > OProfile Extended-feature Interface is used to hook up IBS handler, which > translates and logs IBS samples. > > > = Reporting tool changes = > > Virtual address associated with IBS fetch may lie in the middle of an > instruction. opannotate are modified to handle this case when printing out report. > > > References > ================ > > "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors" > (http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116.pdf) > > Drongowski, Paul. "Instruction-Based Sampling: A New Performance > Analysis Technique for AMD Family 10h Processors". 2007. > (http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf) > > > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: Maynard J. <may...@us...> - 2009-04-17 18:57:07
|
Suravee, I applied your IBS support patches. Congratulations! :-) I integrated ChangeLog entries to each of your 4 code patches, so ended up committing 4 patches (instead of 5, since your first patch was just a standalone ChangeLog patch). Please do a fresh checkout from CVS and verify it all works. Thanks. -Maynard ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Suravee Suthikulpanit wrote: > Change Notes: > ============ > > Changes from revision 4: > - Made modifications based on feedback from Maynard. > * Run check_style.py script > * Fixed coding style to follow the guideline > * Remove trans_IBS_OP_log_dcmissinfo > * Rework some logics in opannotate > * Remove libpp changes > - Tested with kernel-2.8.29.1 (rebuild the kernel after enabling IBS) > - "make distcheck" passed > > Changes from revision 3: > - Tested with kernel-2.8.29.1 (rebuild the kernel after enabling IBS) > - "make distcheck" passed > - Modified to use OProfile Extended-feature Interface > - Fixed opannotate bug > - Add IBS data filtering > > > Introduction > ============ > > These patches extend Oprofile to support Instruction > Based Sampling (IBS) available on AMD Family 10h processors. > The specification of IBS is described in section 2.17.2 of "BIOS and > Kernel Developer's Guide (BKDG) For AMD Family 10h Processors". > > IBS provides wide range of precise information on instruction fetch > phase and execution phase. The document "Instruction-Based Sampling: > A New Performance Analysis Technique for AMD Family 10h Processors" > explains and demonstrates the uses of IBS in details. > > The patches are made against the head of CVS. The required kernel > support is in the kernel patch starting from patch-2.6.28-rc2. > > > Design Outline > ============== > > = Terms = > > EBS: Event based sampling > IBS: Instructions based sampling > > > = opcontrol changes = > > Enabling IBS profiling is done simply by specifying IBS events through > the "--event=" options similar to the event-based profiling: > > * opcontrol --event=IBS_FETCH_XXXX:<count>:<um>:<kernel>:<user> > * opcontrol --event=IBS_OP_XXXX:<count>:<um>:<kernel>:<user> > > IBS performance events are listed in the event/unitmask files. opcontrol > has been modified to handle these events, configure the driver interface > (/dev/oprofile/ibs_fetch/... and /dev/oprofile/ibs_op/...) and start oprofiled > with the appropriate options based on the users input. > > > = Driver interface changes = > > Two directories, /dev/oprofile/ibs_fetch and /dev/oprofile/ibs_op are > added to the oprofilefs allowing the control of MSRs through oprofile.ko > module. > > Both directories contains device file enable and max_count. > The file "enable" enables and disables the functionalities of the > directory containing it. The "max_count" file specifies the > maximum count value of the periodic op/fetch counter (bit 15:0 > of MSR 0xC001_1030 and 0xC001_1033). > > Directory "ibs_fetch" contains "ran_enable" file in addition to the > files mentioned. It corresponds to bit 57 of MSR 0xC001_1030. When > enabled, bits 3:0 of the fetch counter are randomized when IBS fetch > is set to start the fetch counter. > > Directory "ibs_op" contains "dispatched_op" file in addition to the > files mentioned. It corresponds to bit 19 of MSR 0xC001_1033. > This bit selects the mode of instruction tagging for IBS-Op, (0: Count Clock > Cycle, 1:Count dispatched ops) > > > = Daemon changes = > > To differentiate IBS events from EBS events and to accommodate the fact > that IBS events are not uniform in length when read from buffer. Two > escape codes "IBS_FETCH_SAMPLE" and "IBS_OP_SAMPLE" and their handlers > are added. > > Each IBS sample contains encapsulates multitudes of data. For example, > single IBS fetch data contains information of instruction cache L2TLB > miss, instruction cache L1TLB miss, L1 TLB page size, instruction cache > miss, linear address, physical address, etc. Howver, only the performance > data specified by users (through opcontrol) are logged. > > OProfile Extended-feature Interface is used to hook up IBS handler, which > translates and logs IBS samples. > > > = Reporting tool changes = > > Virtual address associated with IBS fetch may lie in the middle of an > instruction. opannotate are modified to handle this case when printing out report. > > > References > ================ > > "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors" > (http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116.pdf) > > Drongowski, Paul. "Instruction-Based Sampling: A New Performance > Analysis Technique for AMD Family 10h Processors". 2007. > (http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf) > > > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by: > High Quality Requirements in a Collaborative Environment. > Download a free trial of Rational Requirements Composer Now! > http://p.sf.net/sfu/www-ibm-com > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: Suravee S. <sur...@am...> - 2009-04-20 14:56:44
|
The new check out looks good. I did some test run and didn't see any problem. Thank you for all your help and feedbacks. I will work on the documentation and send it out this week. Suravee. Maynard Johnson wrote: > Suravee, > I applied your IBS support patches. Congratulations! :-) > > I integrated ChangeLog entries to each of your 4 code patches, so ended > up committing 4 patches (instead of 5, since your first patch was just a > standalone ChangeLog patch). Please do a fresh checkout from CVS and > verify it all works. > > Thanks. > -Maynard > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Suravee Suthikulpanit wrote: > > Change Notes: > > ============ > > > > Changes from revision 4: > > - Made modifications based on feedback from Maynard. > > * Run check_style.py script > > * Fixed coding style to follow the guideline > > * Remove trans_IBS_OP_log_dcmissinfo > > * Rework some logics in opannotate > > * Remove libpp changes > > - Tested with kernel-2.8.29.1 (rebuild the kernel after enabling IBS) > > - "make distcheck" passed > > > > Changes from revision 3: > > - Tested with kernel-2.8.29.1 (rebuild the kernel after enabling IBS) > > - "make distcheck" passed > > - Modified to use OProfile Extended-feature Interface > > - Fixed opannotate bug > > - Add IBS data filtering > > > > > > Introduction > > ============ > > > > These patches extend Oprofile to support Instruction > > Based Sampling (IBS) available on AMD Family 10h processors. > > The specification of IBS is described in section 2.17.2 of "BIOS and > > Kernel Developer's Guide (BKDG) For AMD Family 10h Processors". > > > > IBS provides wide range of precise information on instruction fetch > > phase and execution phase. The document "Instruction-Based Sampling: > > A New Performance Analysis Technique for AMD Family 10h Processors" > > explains and demonstrates the uses of IBS in details. > > > > The patches are made against the head of CVS. The required kernel > > support is in the kernel patch starting from patch-2.6.28-rc2. > > > > > > Design Outline > > ============== > > > > = Terms = > > > > EBS: Event based sampling > > IBS: Instructions based sampling > > > > > > = opcontrol changes = > > > > Enabling IBS profiling is done simply by specifying IBS events through > > the "--event=" options similar to the event-based profiling: > > > > * opcontrol --event=IBS_FETCH_XXXX:<count>:<um>:<kernel>:<user> > > * opcontrol --event=IBS_OP_XXXX:<count>:<um>:<kernel>:<user> > > > > IBS performance events are listed in the event/unitmask files. opcontrol > > has been modified to handle these events, configure the driver interface > > (/dev/oprofile/ibs_fetch/... and /dev/oprofile/ibs_op/...) and start > oprofiled > > with the appropriate options based on the users input. > > > > > > = Driver interface changes = > > > > Two directories, /dev/oprofile/ibs_fetch and /dev/oprofile/ibs_op are > > added to the oprofilefs allowing the control of MSRs through oprofile.ko > > module. > > > > Both directories contains device file enable and max_count. > > The file "enable" enables and disables the functionalities of the > > directory containing it. The "max_count" file specifies the > > maximum count value of the periodic op/fetch counter (bit 15:0 > > of MSR 0xC001_1030 and 0xC001_1033). > > > > Directory "ibs_fetch" contains "ran_enable" file in addition to the > > files mentioned. It corresponds to bit 57 of MSR 0xC001_1030. When > > enabled, bits 3:0 of the fetch counter are randomized when IBS fetch > > is set to start the fetch counter. > > > > Directory "ibs_op" contains "dispatched_op" file in addition to the > > files mentioned. It corresponds to bit 19 of MSR 0xC001_1033. > > This bit selects the mode of instruction tagging for IBS-Op, (0: > Count Clock > > Cycle, 1:Count dispatched ops) > > > > > > = Daemon changes = > > > > To differentiate IBS events from EBS events and to accommodate the fact > > that IBS events are not uniform in length when read from buffer. Two > > escape codes "IBS_FETCH_SAMPLE" and "IBS_OP_SAMPLE" and their handlers > > are added. > > > > Each IBS sample contains encapsulates multitudes of data. For example, > > single IBS fetch data contains information of instruction cache L2TLB > > miss, instruction cache L1TLB miss, L1 TLB page size, instruction cache > > miss, linear address, physical address, etc. Howver, only the performance > > data specified by users (through opcontrol) are logged. > > > > OProfile Extended-feature Interface is used to hook up IBS handler, which > > translates and logs IBS samples. > > > > > > = Reporting tool changes = > > > > Virtual address associated with IBS fetch may lie in the middle of an > > instruction. opannotate are modified to handle this case when > printing out report. > > > > > > References > > ================ > > > > "BIOS and Kernel Developer's Guide (BKDG) For AMD Family 10h Processors" > > > (http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31116.pdf) > > > > Drongowski, Paul. "Instruction-Based Sampling: A New Performance > > Analysis Technique for AMD Family 10h Processors". 2007. > > (http://developer.amd.com/assets/AMD_IBS_paper_EN.pdf) > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > This SF.net email is sponsored by: > > High Quality Requirements in a Collaborative Environment. > > Download a free trial of Rational Requirements Composer Now! > > http://p.sf.net/sfu/www-ibm-com > > _______________________________________________ > > oprofile-list mailing list > > opr...@li... > > https://lists.sourceforge.net/lists/listinfo/oprofile-list > > |