oprofile Log

Commit Date  
[442b5d] by Maynard Johnson Maynard Johnson

Fix broken --with-kernel configure option

The --with-kernel configure option was improperly expecting to
find necessary kernel headers in the pointed-to kernel source
tree in some guaranteed locations, but this was a bad assumption.
Instead, the user should run 'make headers_install' for their
custom kernel and use the location of where the headers were
installed when running oprofile's 'configure --with-kernel'.
This patch fixes the configure script to give helpful messages
to the user about how to properly install the kernel header

A secondary fix is made to m4/kernelversion.m4 to remove the
'-D__KERNEL__' flag. This flag isn't necessary, and I found that
it causes problems with some kernel versions.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-04-09 13:19:17 Tree
[18c4a6] by Maynard Johnson Maynard Johnson

Performance improvement for operf's perf_event-to-oprofile format conversion

This patch decreases the time needed for converting sample records from
perf_events format to oprofile sample file format by about 1/3. The
performance improvement is most notable when doing system-wide profiling
of a busy system and specifying '--lazy-conversion', where the conversion
process runs after a (long) profiling session has been ended.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-03-19 17:57:08 Tree
[d2df41] by Youquan Song Youquan Song , pushed by Maynard Johnson Maynard Johnson

Add Ivybridge EP support

run "opcontrol -l" at Ivybridge EP processor, it does not detect the CPU.
oprofile: available events for CPU type "Intel Architectural Perfmon"
and also does not show the perf event list.

After add the patch, it shows Ivybridge perf event list and the correct CPU:
oprofile: available events for CPU type "Intel Ivy Bridge microarchitecture"

Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>

2013-03-18 19:02:50 Tree
[7cf28f] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

operf, remove support to report multiplexing.

The multiplexing reporting doesn't work correctly when profiling
multi-threaded apps or apps that do fork/exec. The detection of
multiplexing doesn't work when processes migrate between CPUs.
The event is enabled on all CPUs. The running time stops when the
event migrates to another CPU however, the enabled time does not stop as it
is enabled on each CPU. The issue is that the running time across CPUs
doesn't add up to the enabled time because of the running time is not
increasing while the process is being migrated. This results in the running
time being less then the enabled time. There is no way to detect if the
reason the running time is less then the enabled time was do to migration
or due to multiplexing.

The support is being removed so that the operf tool is not incorrectly
flagging events for multiplexing.

Signed-off-by: Carl Love <cel@us.ibm.com>

2013-03-12 15:52:15 Tree
[71c5f8] by Ryo Onodera Ryo Onodera , pushed by Maynard Johnson Maynard Johnson

Add #include of stdint.h to opagent.h

Signed-off-by: Ryo Onodera <ryoqun@gmail.com>

2013-03-07 15:18:24 Tree
[04ee55] by Maynard Johnson Maynard Johnson

Fix seg fault due to incorrect array size initialization

The operf_sfile.cpp:create_sfile function creates and initializes the
array of 'struct operf_sfile' objects used for writing sample data to
oprofile formatted sample files. This function creates an array of
these objects, but was incorrectly creating an array of size 'OP_MAX_COUNTER'.
Since operf can multiplex events, we aren't limited to OP_MAX_COUNTER
events to profile simultaneously, so if the user specifies more than
OP_MAX_COUNTER events, the code that accesses this array was going
off the end and sometimes seg faulting.

This patch fixes the problem by defining OP_MAX_EVENTS to be '24' and
using that as the array size. Furthermore, if the user tries to specify
more than 24 events to profile, an error message is displayed:
Number of events specified is greater than allowed maximum of <n>
and operf aborts.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-03-07 15:04:46 Tree
[a1f2b6] by Maynard Johnson Maynard Johnson

Make convertPerfData procedure more robust

The operf_read::convertPerfData function reads sample data
in perf_events format from either the temporary operf.data
file or a pipe, depending on whether or not operf is run
with the --lazy-conversion option. This patch makes the
reading/conversion process more robust so that if bad
data is found in the file or pipe, the process will display
helpful messages and end gracefully.

This patch also makes some other minor cleanups, correcting
some misspellings, etc.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-03-05 18:05:59 Tree
[79a183] by Maynard Johnson Maynard Johnson

The configure check to determine whether we should use libpfm or not
is intended only for the ppc64 architecture, but was incorrectly
hitting on the ppc32 architecture, too. Not only that, but it was using
'uname' which is not a good idea in cross-compile situtations.

Then, aside from that, we had several instances in the source code
of the following:
#if (defined(__powerpc__) || defined(__powerpc64__))
which incorrectly included ppc32 architecutre also, when it was intended
for use as PPC64 architecture.

This patch fixes both errors.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-27 21:41:14 Tree
[dec498] by Tulio Magno Quites Machado Filho Tulio Magno Quites Machado Filho , pushed by Maynard Johnson Maynard Johnson

Update configure.ac to work with automake 1.13

GNU automake 1.13 has removed support for AM_CONFIG_HEADER and for the 2
parameter version of AM_INIT_AUTOMAKE.

For reference, see these URLs:

Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>

2013-02-22 21:30:18 Tree
[7e5e18] by Maynard Johnson Maynard Johnson

operf does not properly sample child threads for already-running app

Example: When passing the 'java' command directly to operf, samples are
collected for all of the threads created by the JVM. However, if the
Java app is already running when the user starts operf with either
'--pid' or '--system-wide' option, zero samples are collected on the
child threads of the JVM. Note: The user program that is JITed by the
JVM is executed by a child thread.

This patch addresses the problem by:
- Keeping a list of child processes
- Synthesizing PERF_RECORD_COMM events for the main JVM process and all
the child processes
- Calling perf_event_open for the main JVM process and all child processes

These changes entailed some fairly major restructuring of some functions
and data structures of the operf_record class.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-21 17:09:10 Tree
[81ccb4] by Maynard Johnson Maynard Johnson

operf does not run opjitconv if --pid or --system-wide used

To stop operf when either '--pid' or '--system-wide' option is used, the user
must do a ctrl-C (or 'kill -SIGINT <operf_pid>'. If the user has not passed
'--lazy-conversion', the operf.cpp:convert_sample_data function is run as a
child process that does not have a SIGINT handler set up for it at the time
it's reading sample data from the pipe (which is being written to by the
operf-record process). The end result is that the operf-read process is
interrupted and stopped by the unhandled ctrl-C before it gets a chance to
run opjitconv.

Note: Another (minor) side effect of issue #1 above is that there may be
sample data left un-read in the pipe, and the type of app being profiled
(Java or not) is irrelevant.

This patch addresses the problem by cleaning up operf signal handler
procedures, making it clear which handlers are used by parent and which
by children, and then making sure those handlers are set up at the correct
time. I also found an extraneous unused signal handler defined in
operf_utils.cpp that I removed.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-11 23:14:34 Tree
[f8dd71] by Suravee Suthikulpanit Suravee Suthikulpanit

Add Support for AMD Generic Performance Events

AMD generic performance events are a small set of events which are generally available across several
AMD processor families. PERF has already provided supports for generic performance counters regardless
of the processor family. This will allow operf to work as soon as PERF able to supports the performance
counters, and does not have to wait for the more complete family-specific events and unit_masks files
to be added to OProfile.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

2013-02-07 16:49:30 Tree
[366ca2] by Suravee Suthikulpanit Suravee Suthikulpanit

Fix build issue with gcc-4.7.2 due to fgets

gcc complains about ignoring the return value of fgets.
Since building with -Werror, the build failed.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

2013-02-08 16:23:06 Tree
[cd5c7d] by Maynard Johnson Maynard Johnson

operf: Fix 'Permission denied' error on early perf_events kernels

The new operf tool available with OProfile 0.9.8 uses the perf_event_open
syscall to obtain access to the performance monitor counters and registers.
This syscall is implemented by the Linux Kernel Performance Events Subsystem
(aka "perf_events"). This perf_events subsystem was introduced in kernel
version 2.6.31, and it underwent a lot of changes in the first several versions
thereafter. Apparently, the operf tool, as currently written and operating today,
relies on certain kernel functionaility that was introduced later than some
kernels provided with some Linux distributions that supported perf_events in the
very early stages (e.g.,SLES 11 SP1). When attempting to profile with operf
(e.g., 'operf ls'), it fails with the message:

Unexpected error running operf: Permission denied
Please use the opcontrol command instead of operf.

The fix for this problem is to pass '-1' for the cpu arg on the
perf_event_open syscall when running on an early perf_events kernel.
Passing '-1' for the cpu arg was a requirement (in most circumstances)
on early perf_events kernels. Later kernels removed this requirement
so perf_event_open could be called for each cpu, even for single-app
profiling by non-root users. This is the standard usage model employed
by operf, which allows us to mmap kernel data space for each cpu, thus
giving a lot more memory for the kernel to record sample data.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-05 16:12:04 Tree
[cd8be5] by Maynard Johnson Maynard Johnson

Fix opreport header info on unit mask when operf is run without a UM specified

When a user runs operf and profiles with an event that needs a unit mask value,
the default unit mask value will be used if no UM value is specified. When
opreport prints its header information, you get something like the following:

CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
Counted int_misc events (Instruction decoder events) with a unit mask of 0x00
(rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is
sent to Instruction Decode Queue (IDQ) for this thread.) count 2000000

Notice that the unit mask value '0x00' is shown, even though the code actually
selects the default unit value of 0x40 for the int_misc event.

This patch fixes this issue. It also partially addresses the issue with
named unit mask showing up as '0x00' in opreport, too (see oprofile bug
It's not a very good solution to the named unit mask issue, but it's a better
than nothing until we can come up with a final (better) solution.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-28 22:20:46 Tree
[646eeb] by Maynard Johnson Maynard Johnson

Fix 32-bit compilation error

Added a 'ULL' suffix to an u64 variable definition so that -m32
build would not fail.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-24 21:09:29 Tree
[12d9e9] by Maynard Johnson Maynard Johnson

Unit mask bitmasks containing non-unique values should fail

When named unit mask support was added by Andi Kleen (Intel) in the 0.9.7
timeframe, it became possible for multiple unit masks to have the same
hex value for a given event. The (only) way to disambiguate these non-unique
unit mask values is to specify them by name. It was pointed out during patch
reviews that OR'ing together such unit masks (for bitmask types of unit
masks) was problematic. Andi asserted that bitmasks containing any of the
non-unique values simly would not be supported. Unfortunately, we dropped
the ball and did not document this restriction, and neither did we put any
checks into the code to prevent users from doing this.


On i386/sandybridge, the int_misc event has the following possible unit
mask values that can be put into a bitmask:

Unit masks (default 0x40)
0x40: rat_stall_cycles Cycles Resource Allocation Table (RAT) external
stall is sent to Instruction Decode Queue (IDQ) for this thread.
0x03: recovery_cycles Number of cycles waiting to be recover after Nuke due
to all other cases except JEClear. (extra: cmask=1)
0x03: recovery_stalls_count Edge applied to recovery_cycles, thus counts
occurrences. (extra: edge cmask=1)

The event specification of int_misc:2000000:43 will be incorrectly accepted
as valid even though it is clearly ambiguous. With this patch, such unit mask
bitmask specifications will be detected, and we will exit gracefully with an
informative error message.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-23 23:52:00 Tree
[745ede] by Daniel Hansel Daniel Hansel , pushed by Maynard Johnson Maynard Johnson

Change location to store intermediate JIT dump files

Since JIT support was added to Oprofile the intermediate JIT dump files
holding the sampling data collected for a Java process were stored in a
hard coded directory /var/lib/oprofile/jitdump that was world-writable.
The setting of a session specific directory (i.e. "--session-dir=...")
was not used anyway.

Now during profiling JIT dump files are stored under /tmp/.oprofile/jitdump.
When opjitconv has finished the conversion of the JIT dump files (the result
is stored under the default location (e.g. /var/lib/oprofile, ./oprofile_data
or the specified session directory) the intermediate JIT dump file will be

Signed-off-by: Daniel Hansel <daniel.hansel@linux.vnet.ibm.com>

2013-01-23 16:24:37 Tree
[b2c445] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

operf, add throttling and multiplexing stats

This patch checks to see if the event was throttled or multiplexed. The
events are recorded by creating a file with the name of the event in the
stats sub directory throttled or multiplexed respectively.

Functions are added to the post processing to print messages if multiplexing
and/or throttling occurred during the data collection.

The patch has been tested on an Intel Core(TM)2 Duo CPU T9400 2.53GHz
The following are excerpts from the script used to do the testing.

The events tested are as follows:

Each of the tests below were run with each of the following frequencies
to test with and without event throttling.



$path/operf --events $event1:$freq:0:1:1 --system-wide

$path/operf -l --events $event1:$freq:0:1:1 --system-wide

$path/operf --events $event1:$freq:0:1:1 --events $event2:$freq:0:1:1 --events
$event4:$freq:0:1:1 --events $event5:$freq:0:1:1 --events $event6:$freq:0:1:1
--events $event7:$freq:0:1:1 --system-wide

$path/operf -l --events $event1:$freq:0:1:1 --events $event2:$freq:0:1:1 --eve
nts $event4:$freq:0:1:1 --events $event5:$freq:0:1:1 --events $event6:$freq:0:
1:1 --events $event7:$freq:0:1:1 --system-wide

$path/operf --events $event1:$freq:0:1:1 dd bs=16 if=/dev/urandom of=/dev/nu
ll count=500000

$path/operf -l --events $event1:$freq:0:1:1 dd bs=16 if=/dev/urandom of=/dev
/null count=500000

$path/operf --events $event1:$freq:0:1:1 --events $event2:$freq:0:1:1 --events
$event4:$freq:0:1:1 --events $event5:$freq:0:1:1 --events $event6:$freq:0:1:1
--events $event7:$freq:0:1:1 dd bs=16 if=/dev/urandom of=/dev/null count=5000

$path/operf -l --events $event1:$freq:0:1:1 --events $event2:$freq:0:1:1 --ev
ents $event4:$freq:0:1:1 --events $event5:$freq:0:1:1 --events $event6:$freq:0
:1:1 --events $event7:$freq:0:1:1 dd bs=16 if=/dev/urandom of=/dev/null count

The tests described above were also performed on an IBM POWER7
3000.000000MHz revision : 2.1 with the the following events.


And the two sampling frequencies:


Signed-off-by: Carl Love <cel@us.ibm.com>

2013-01-23 15:19:47 Tree
[d3a2c6] by Maynard Johnson Maynard Johnson

Fix compile warnings/errors with gcc 4.7.3

On some distros, the struct poptOption in /usr/include/popt.h
has the argInfo field defined as int, but on other distros,
that field is defined as unsigned int. In libopt++/popt_options.cpp,
the option_base::option_base constructor passes an unsigned int
popt_flags argument that's intended to be assigned to the
argInfo field. With gcc 4.7.1, the following warning(error) occurs
on systems where the argInfo field is defined as an int:

popt_options.cpp: In constructor `popt::option_base::option_base
(const char*, char, const char*, const char*, void*, unsigned int)':
popt_options.cpp:255:51: error: narrowing conversion of `popt_flags'
from `unsigned int' to `int' inside { } is ill-formed in C++11 [-Werror=narrowing]
cc1plus: all warnings being treated as errors

The fix for this problem is to cast the popt_flags to the appropriate
type using 'typeof(opt.argInfo)'.

The second compile error (in pe_profiling/operf.cpp) is happening
because the variable 'value' is assigned, but not used after that.
This is dead code that should be removed.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-21 16:38:14 Tree
[6b0ad6] by Maynard Johnson Maynard Johnson , pushed by Suravee Suthikulpanit Suravee Suthikulpanit

Fix default numerical unit mask when using numeric and modify error messages.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

2013-01-16 17:15:20 Tree
[b655cc] by Marcin Juszkiewicz Marcin Juszkiewicz , pushed by Maynard Johnson Maynard Johnson

Add rmb() definition for AArch64 architecture

Signed-off-by: Marcin Juszkiewicz <marcin.juszkiewicz@linaro.org>

2013-01-16 15:30:33 Tree
[df62a5] by William Cohen William Cohen , pushed by Maynard Johnson Maynard Johnson

Fix ASSERT_SIDE_EFFECT problems found by coverity scan

Coverity pointed out that the some asserts in opimport.cpp had
assigments in them. The assigments should be outside the asserts and
the asserts should only evalate values and be side-effect free.

Signed-off-by: William Cohen <wcohen@redhat.com>

2013-01-15 18:39:19 Tree
[af915b] by Maynard Johnson Maynard Johnson

Fix bug where some invalid unit mask values are accepted as valid

Under certain conditions for events that use a bitmask of unit mask
values, an invalid unit mask value specified by the user may be
incorrectly accepted as valid.

i386/sandybridge: event specification l2_rqsts:200000:2
is accepted as valid, whereas there is no unit mask value of '2'.

See bug:
for more details.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-14 15:43:07 Tree
[360280] by Maynard Johnson Maynard Johnson

Allow ppc64 events to be specified with or without _GRP<n> suffix

All events for IBM PowerPC server processors (except CYCLES) have
a _GRP<n> suffix. This is because the legacy opcontrol profiler
can only profile events in the same group (i.e., having the same
_GRP<n> suffix). But operf has no such restriction because it
can multiplex events; thus, so we should allow the user to pass
event names without the _GRP<n> suffix.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-11 19:29:57 Tree
Older >

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks