oprofile Log

Commit Date  
[a1f2b6] by Maynard Johnson Maynard Johnson

Make convertPerfData procedure more robust

The operf_read::convertPerfData function reads sample data
in perf_events format from either the temporary operf.data
file or a pipe, depending on whether or not operf is run
with the --lazy-conversion option. This patch makes the
reading/conversion process more robust so that if bad
data is found in the file or pipe, the process will display
helpful messages and end gracefully.

This patch also makes some other minor cleanups, correcting
some misspellings, etc.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-03-05 18:05:59 Tree
[79a183] by Maynard Johnson Maynard Johnson

The configure check to determine whether we should use libpfm or not
is intended only for the ppc64 architecture, but was incorrectly
hitting on the ppc32 architecture, too. Not only that, but it was using
'uname' which is not a good idea in cross-compile situtations.

Then, aside from that, we had several instances in the source code
of the following:
#if (defined(__powerpc__) || defined(__powerpc64__))
which incorrectly included ppc32 architecutre also, when it was intended
for use as PPC64 architecture.

This patch fixes both errors.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-27 21:41:14 Tree
[dec498] by Tulio Magno Quites Machado Filho Tulio Magno Quites Machado Filho , pushed by Maynard Johnson Maynard Johnson

Update configure.ac to work with automake 1.13

GNU automake 1.13 has removed support for AM_CONFIG_HEADER and for the 2
parameter version of AM_INIT_AUTOMAKE.

For reference, see these URLs:

Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>

2013-02-22 21:30:18 Tree
[7e5e18] by Maynard Johnson Maynard Johnson

operf does not properly sample child threads for already-running app

Example: When passing the 'java' command directly to operf, samples are
collected for all of the threads created by the JVM. However, if the
Java app is already running when the user starts operf with either
'--pid' or '--system-wide' option, zero samples are collected on the
child threads of the JVM. Note: The user program that is JITed by the
JVM is executed by a child thread.

This patch addresses the problem by:
- Keeping a list of child processes
- Synthesizing PERF_RECORD_COMM events for the main JVM process and all
the child processes
- Calling perf_event_open for the main JVM process and all child processes

These changes entailed some fairly major restructuring of some functions
and data structures of the operf_record class.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-21 17:09:10 Tree
[81ccb4] by Maynard Johnson Maynard Johnson

operf does not run opjitconv if --pid or --system-wide used

To stop operf when either '--pid' or '--system-wide' option is used, the user
must do a ctrl-C (or 'kill -SIGINT <operf_pid>'. If the user has not passed
'--lazy-conversion', the operf.cpp:convert_sample_data function is run as a
child process that does not have a SIGINT handler set up for it at the time
it's reading sample data from the pipe (which is being written to by the
operf-record process). The end result is that the operf-read process is
interrupted and stopped by the unhandled ctrl-C before it gets a chance to
run opjitconv.

Note: Another (minor) side effect of issue #1 above is that there may be
sample data left un-read in the pipe, and the type of app being profiled
(Java or not) is irrelevant.

This patch addresses the problem by cleaning up operf signal handler
procedures, making it clear which handlers are used by parent and which
by children, and then making sure those handlers are set up at the correct
time. I also found an extraneous unused signal handler defined in
operf_utils.cpp that I removed.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-11 23:14:34 Tree
[f8dd71] by Suravee Suthikulpanit Suravee Suthikulpanit

Add Support for AMD Generic Performance Events

AMD generic performance events are a small set of events which are generally available across several
AMD processor families. PERF has already provided supports for generic performance counters regardless
of the processor family. This will allow operf to work as soon as PERF able to supports the performance
counters, and does not have to wait for the more complete family-specific events and unit_masks files
to be added to OProfile.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

2013-02-07 16:49:30 Tree
[366ca2] by Suravee Suthikulpanit Suravee Suthikulpanit

Fix build issue with gcc-4.7.2 due to fgets

gcc complains about ignoring the return value of fgets.
Since building with -Werror, the build failed.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

2013-02-08 16:23:06 Tree
[cd5c7d] by Maynard Johnson Maynard Johnson

operf: Fix 'Permission denied' error on early perf_events kernels

The new operf tool available with OProfile 0.9.8 uses the perf_event_open
syscall to obtain access to the performance monitor counters and registers.
This syscall is implemented by the Linux Kernel Performance Events Subsystem
(aka "perf_events"). This perf_events subsystem was introduced in kernel
version 2.6.31, and it underwent a lot of changes in the first several versions
thereafter. Apparently, the operf tool, as currently written and operating today,
relies on certain kernel functionaility that was introduced later than some
kernels provided with some Linux distributions that supported perf_events in the
very early stages (e.g.,SLES 11 SP1). When attempting to profile with operf
(e.g., 'operf ls'), it fails with the message:

Unexpected error running operf: Permission denied
Please use the opcontrol command instead of operf.

The fix for this problem is to pass '-1' for the cpu arg on the
perf_event_open syscall when running on an early perf_events kernel.
Passing '-1' for the cpu arg was a requirement (in most circumstances)
on early perf_events kernels. Later kernels removed this requirement
so perf_event_open could be called for each cpu, even for single-app
profiling by non-root users. This is the standard usage model employed
by operf, which allows us to mmap kernel data space for each cpu, thus
giving a lot more memory for the kernel to record sample data.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-05 16:12:04 Tree
[cd8be5] by Maynard Johnson Maynard Johnson

Fix opreport header info on unit mask when operf is run without a UM specified

When a user runs operf and profiles with an event that needs a unit mask value,
the default unit mask value will be used if no UM value is specified. When
opreport prints its header information, you get something like the following:

CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
Counted int_misc events (Instruction decoder events) with a unit mask of 0x00
(rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is
sent to Instruction Decode Queue (IDQ) for this thread.) count 2000000

Notice that the unit mask value '0x00' is shown, even though the code actually
selects the default unit value of 0x40 for the int_misc event.

This patch fixes this issue. It also partially addresses the issue with
named unit mask showing up as '0x00' in opreport, too (see oprofile bug
It's not a very good solution to the named unit mask issue, but it's a better
than nothing until we can come up with a final (better) solution.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-28 22:20:46 Tree
[646eeb] by Maynard Johnson Maynard Johnson

Fix 32-bit compilation error

Added a 'ULL' suffix to an u64 variable definition so that -m32
build would not fail.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-24 21:09:29 Tree
[12d9e9] by Maynard Johnson Maynard Johnson

Unit mask bitmasks containing non-unique values should fail

When named unit mask support was added by Andi Kleen (Intel) in the 0.9.7
timeframe, it became possible for multiple unit masks to have the same
hex value for a given event. The (only) way to disambiguate these non-unique
unit mask values is to specify them by name. It was pointed out during patch
reviews that OR'ing together such unit masks (for bitmask types of unit
masks) was problematic. Andi asserted that bitmasks containing any of the
non-unique values simly would not be supported. Unfortunately, we dropped
the ball and did not document this restriction, and neither did we put any
checks into the code to prevent users from doing this.


On i386/sandybridge, the int_misc event has the following possible unit
mask values that can be put into a bitmask:

Unit masks (default 0x40)
0x40: rat_stall_cycles Cycles Resource Allocation Table (RAT) external
stall is sent to Instruction Decode Queue (IDQ) for this thread.
0x03: recovery_cycles Number of cycles waiting to be recover after Nuke due
to all other cases except JEClear. (extra: cmask=1)
0x03: recovery_stalls_count Edge applied to recovery_cycles, thus counts
occurrences. (extra: edge cmask=1)

The event specification of int_misc:2000000:43 will be incorrectly accepted
as valid even though it is clearly ambiguous. With this patch, such unit mask
bitmask specifications will be detected, and we will exit gracefully with an
informative error message.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-23 23:52:00 Tree
[745ede] by Daniel Hansel Daniel Hansel , pushed by Maynard Johnson Maynard Johnson

Change location to store intermediate JIT dump files

Since JIT support was added to Oprofile the intermediate JIT dump files
holding the sampling data collected for a Java process were stored in a
hard coded directory /var/lib/oprofile/jitdump that was world-writable.
The setting of a session specific directory (i.e. "--session-dir=...")
was not used anyway.

Now during profiling JIT dump files are stored under /tmp/.oprofile/jitdump.
When opjitconv has finished the conversion of the JIT dump files (the result
is stored under the default location (e.g. /var/lib/oprofile, ./oprofile_data
or the specified session directory) the intermediate JIT dump file will be

Signed-off-by: Daniel Hansel <daniel.hansel@linux.vnet.ibm.com>

2013-01-23 16:24:37 Tree
[b2c445] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

operf, add throttling and multiplexing stats

This patch checks to see if the event was throttled or multiplexed. The
events are recorded by creating a file with the name of the event in the
stats sub directory throttled or multiplexed respectively.

Functions are added to the post processing to print messages if multiplexing
and/or throttling occurred during the data collection.

The patch has been tested on an Intel Core(TM)2 Duo CPU T9400 2.53GHz
The following are excerpts from the script used to do the testing.

The events tested are as follows:

Each of the tests below were run with each of the following frequencies
to test with and without event throttling.



$path/operf --events $event1:$freq:0:1:1 --system-wide

$path/operf -l --events $event1:$freq:0:1:1 --system-wide

$path/operf --events $event1:$freq:0:1:1 --events $event2:$freq:0:1:1 --events
$event4:$freq:0:1:1 --events $event5:$freq:0:1:1 --events $event6:$freq:0:1:1
--events $event7:$freq:0:1:1 --system-wide

$path/operf -l --events $event1:$freq:0:1:1 --events $event2:$freq:0:1:1 --eve
nts $event4:$freq:0:1:1 --events $event5:$freq:0:1:1 --events $event6:$freq:0:
1:1 --events $event7:$freq:0:1:1 --system-wide

$path/operf --events $event1:$freq:0:1:1 dd bs=16 if=/dev/urandom of=/dev/nu
ll count=500000

$path/operf -l --events $event1:$freq:0:1:1 dd bs=16 if=/dev/urandom of=/dev
/null count=500000

$path/operf --events $event1:$freq:0:1:1 --events $event2:$freq:0:1:1 --events
$event4:$freq:0:1:1 --events $event5:$freq:0:1:1 --events $event6:$freq:0:1:1
--events $event7:$freq:0:1:1 dd bs=16 if=/dev/urandom of=/dev/null count=5000

$path/operf -l --events $event1:$freq:0:1:1 --events $event2:$freq:0:1:1 --ev
ents $event4:$freq:0:1:1 --events $event5:$freq:0:1:1 --events $event6:$freq:0
:1:1 --events $event7:$freq:0:1:1 dd bs=16 if=/dev/urandom of=/dev/null count

The tests described above were also performed on an IBM POWER7
3000.000000MHz revision : 2.1 with the the following events.


And the two sampling frequencies:


Signed-off-by: Carl Love <cel@us.ibm.com>

2013-01-23 15:19:47 Tree
[d3a2c6] by Maynard Johnson Maynard Johnson

Fix compile warnings/errors with gcc 4.7.3

On some distros, the struct poptOption in /usr/include/popt.h
has the argInfo field defined as int, but on other distros,
that field is defined as unsigned int. In libopt++/popt_options.cpp,
the option_base::option_base constructor passes an unsigned int
popt_flags argument that's intended to be assigned to the
argInfo field. With gcc 4.7.1, the following warning(error) occurs
on systems where the argInfo field is defined as an int:

popt_options.cpp: In constructor `popt::option_base::option_base
(const char*, char, const char*, const char*, void*, unsigned int)':
popt_options.cpp:255:51: error: narrowing conversion of `popt_flags'
from `unsigned int' to `int' inside { } is ill-formed in C++11 [-Werror=narrowing]
cc1plus: all warnings being treated as errors

The fix for this problem is to cast the popt_flags to the appropriate
type using 'typeof(opt.argInfo)'.

The second compile error (in pe_profiling/operf.cpp) is happening
because the variable 'value' is assigned, but not used after that.
This is dead code that should be removed.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-21 16:38:14 Tree
[6b0ad6] by Maynard Johnson Maynard Johnson , pushed by Suravee Suthikulpanit Suravee Suthikulpanit

Fix default numerical unit mask when using numeric and modify error messages.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

2013-01-16 17:15:20 Tree
[b655cc] by Marcin Juszkiewicz Marcin Juszkiewicz , pushed by Maynard Johnson Maynard Johnson

Add rmb() definition for AArch64 architecture

Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>

2013-01-16 15:30:33 Tree
[df62a5] by William Cohen William Cohen , pushed by Maynard Johnson Maynard Johnson

Fix ASSERT_SIDE_EFFECT problems found by coverity scan

Coverity pointed out that the some asserts in opimport.cpp had
assigments in them. The assigments should be outside the asserts and
the asserts should only evalate values and be side-effect free.

Signed-off-by: William Cohen <wcohen@redhat.com>

2013-01-15 18:39:19 Tree
[af915b] by Maynard Johnson Maynard Johnson

Fix bug where some invalid unit mask values are accepted as valid

Under certain conditions for events that use a bitmask of unit mask
values, an invalid unit mask value specified by the user may be
incorrectly accepted as valid.

i386/sandybridge: event specification l2_rqsts:200000:2
is accepted as valid, whereas there is no unit mask value of '2'.

See bug:
for more details.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-14 15:43:07 Tree
[360280] by Maynard Johnson Maynard Johnson

Allow ppc64 events to be specified with or without _GRP<n> suffix

All events for IBM PowerPC server processors (except CYCLES) have
a _GRP<n> suffix. This is because the legacy opcontrol profiler
can only profile events in the same group (i.e., having the same
_GRP<n> suffix). But operf has no such restriction because it
can multiplex events; thus, so we should allow the user to pass
event names without the _GRP<n> suffix.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-11 19:29:57 Tree
[e4d8c3] by Andreas Krebbel Andreas Krebbel , pushed by Maynard Johnson Maynard Johnson

Add support for IBM zEnterprise EC12 (zEC12)

This patch adds support for the latest release of the
IBM mainframe series - the IBM zEnterprise EC12 (zEC12).

The CPU measurement facility didn't change. So only the new CPU type
has to be tolerated.

Signed-off-by: Andreas Krebbel <krebbel@linux.vnet.ibm.com>

2013-01-11 14:24:47 Tree
[735d9e] by Maynard Johnson Maynard Johnson

ophelp lists events: Fix doc URL for ppc64 arch

When ophelp is used to list available events, it displays
some help text before the event list to direct the user
where to find more info. For the ppc64 architecture, a
stale URL was listed. This patch fixes that URL.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-10 20:24:26 Tree
[a5f5ac] by Maynard Johnson Maynard Johnson

Remove unnecessary/incorrect CYCLES_RND_SMPL event

This patch impacts only the ppc64 architecture.

The pseudo event CYCLES_RND_SMPL was added many years ago
based on a misunderstanding of how random sampling works
on IBM Power processors. The concept of random sampling is
used in conjunction with "marked events" so that when a
sample is taken, it can be attributed to the precise
instruction that caused the event. IBM Power processors have
many marked events -- e.g., PM_MRK_BR_TAKEN, PM_MRK_STALL_CMPLU,
PM_MRK_LD_MISS_L1. However, there is no marked cycles event;
it just would not make any sense to have one since there is
no instruction that can "cause" a cycle event.

This patch removes all traces of the bogus CYCLES_RND_SMPL event.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-10 17:21:56 Tree
[55d11f] by Maynard Johnson Maynard Johnson

Fix unused variable compile error on non-x86 type architectures

Commit e1ed25f091af2128497f8d8f78e27e0330155094 that was made on
Jan 2 causes a compile error on non-x86 architecteures. This
patch fixes that error.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-09 18:19:33 Tree
[560d43] by Maynard Johnson Maynard Johnson

Fix incorrect statement about using '0' for default unit mask

In the section titled 'Specifying performance counter events',
it was stated that if no unit mask value is specified, a default
value of '0' will be used. This is not correct. It should state
that the default unit mask value for the given event will be used.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-03 19:03:01 Tree
[e1ed25] by Maynard Johnson Maynard Johnson

Fix operf default unit mask handling

This patch addresses the problem reported to the oprofile-list having
subject heading of "other events than CPU_CLK_UNHALTED not working"

The operf tool mis-handles event specifications where the
unit mask is not specified, usually resulting in some bogus
config value that's passed to the perf_event_open call.
The end result is usually that opreport finds no samples.
In some cases, samples may be recorded, but they would
not be for the correct unit mask.

In lieu of applying this patch, the workaround for this bug is
to specify the default unit mask: e.g,
operf -e LLC_MISSES:6000:0x41 <my-app>

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-02 17:32:00 Tree
Older >