[74abfb] by Maynard Johnson Maynard Johnson

Fix Coverity issues identified against oprofile 0.9.8 release

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-05-15 18:14:43 Tree
[ecfbcc] by Maynard Johnson Maynard Johnson

Fix compile error that occurs with some versions of gcc

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-05-06 22:45:08 Tree
[c82b96] by Maynard Johnson Maynard Johnson

Use PMC5/PMC6 on ppc64 arch for run cycles/run instructions

The IBM Power processor architecture (ppc64) counts instructions
and cycles on PMC5 and PMC6 (respectively) when the run latch is
set (i.e., when not in idle state). On POWER6, these counters
were not capable of generating interrupts, so they could not be
used for profiling purposes; therefore, oprofile counted those
events (PM_RUN_INST_CMPL and PM_RUN_CYC) using other counters.
But with the newer POWER7 processor, PMC5 and PMC6 can generate
interrupts, so it makes sense to leverage those two counters
instead of using the other 4 (programmable) counters. Doing
so could, theoreticaly allow us to count up to 6 events
simultaenously without the kernel having to do multiplexing.

This patch will force PM_RUN_INST_CMPL and PM_RUN_CYC to be
counted on PMC5 and PMC6 (respectively) when running on an
IBM POWER7 system.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-04-26 19:05:28 Tree
[a39d41] by Maynard Johnson Maynard Johnson

Fix holes in operf system-wide profiling of forked processes

Using operf to do system-wide profiling of the specjbb benchmark
exposed some holes in how operf was processing the perf_events
data coming from the kernel. Some of the events we can get from
the kernel are:

The "COMM" event is to notify us of the start of an executable
application. The "FORK" event tells us when a process forks
another process. The "MMAP" event informs us when a shared library
(or executable anonymous memory, or the executable file itself, etc.)
has been mmap'ed into a process's address space. A "SAMPLE"
event occurs each time the kernel takes a sample for a process.

There is no guarantee in what order these events may arrive from
the kernel, and when a large system (say, 64 CPUs) is running
the specjbb benchmark full bore, with all processors pegged to
100%, you can get some very strange out-of-order looking
sequence of events. Things get even stranger when using Java7
versus Java6 since Java7 spawns many more threads.

The operf code had several issues where such out-of-order
events were not handled properly, so some major changes were
required in the code.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-04-25 15:53:37 Tree
[60c572] by Maynard Johnson Maynard Johnson

Catch and handle error from op_jit_convert function

In opjitconv.c:process_jit_dumpfile, we were not detecting a failure
from op_jit_convert, which (when a failure actually does occur there)
results in a very mysterious failure message from copy_elffile().
This patches rectifies that situation.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-04-19 21:34:03 Tree
[d57001] by Maynard Johnson Maynard Johnson

Fix opjitconv error message for bfd_set_arch_mach failure

When converting a JIT dump file to ELF, if we get a failure
calling the BFD function 'bfd_set_arch_mach', we were incorrectly
displaying the message:
bfd_set_format: No error

This corrects the error message to say "bfd_set_arch_mach: No error".

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-04-19 15:36:10 Tree
[3aa2fe] by Maynard Johnson Maynard Johnson

oprofile pp tools should print messages about lost samples

When operf completes running, it collects statistics about
lost samples, records them in the operf.log, and prints
a warning message if the number of lost samples exceeds
a pre-defined percentage (.01%) of the total number of
samples. However, when opreport or any of the other oprofile
post-processing tools are run, the statistics are not
readily available (only in the operf.log), so there is no
warning about lost samples. This patch persists those
statistics to files in the <session-dir>/samples/current/stats
dir, allowing the pp tools to access them later. These
stats files are also copied by oparchive, so even archived
profile data will have the statistics available.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-04-19 12:26:32 Tree
[715616] by Maynard Johnson Maynard Johnson

Flesh out user manual doc on oparchive/opimport commands

There have been numerous questions over the years from users
who have a need to analyze their profile data offline, on
systems other than where the data was collected. Often,
these users are completely unaware of the oparchive and
opimport commands which are intended for use in such
situations. Part of the problem has been a lack of detailed
documentation about these commands. This patch adds
some detail to the user manual documentation for these commands
and also renames the oparchive section from
"6. Archiving measurements (oparchive)" to "6. Analyzing profile
data on another system (oparchive)".

This patch was motivated by questions raised in OProfile bug

Signed-off-by Maynard Johnson <maynardj@us.ibm.com>

2013-04-17 13:50:13 Tree
[f54488] by Changbin Park Changbin Park , pushed by Maynard Johnson Maynard Johnson

Ensure parsed_filename's jit_dumpfile_exists variable is initialized before use

Signed-off-by: Changbin Park <changbin.park@lge.com>

2013-04-11 15:10:07 Tree
[442b5d] by Maynard Johnson Maynard Johnson

Fix broken --with-kernel configure option

The --with-kernel configure option was improperly expecting to
find necessary kernel headers in the pointed-to kernel source
tree in some guaranteed locations, but this was a bad assumption.
Instead, the user should run 'make headers_install' for their
custom kernel and use the location of where the headers were
installed when running oprofile's 'configure --with-kernel'.
This patch fixes the configure script to give helpful messages
to the user about how to properly install the kernel header

A secondary fix is made to m4/kernelversion.m4 to remove the
'-D__KERNEL__' flag. This flag isn't necessary, and I found that
it causes problems with some kernel versions.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-04-09 13:19:17 Tree
[18c4a6] by Maynard Johnson Maynard Johnson

Performance improvement for operf's perf_event-to-oprofile format conversion

This patch decreases the time needed for converting sample records from
perf_events format to oprofile sample file format by about 1/3. The
performance improvement is most notable when doing system-wide profiling
of a busy system and specifying '--lazy-conversion', where the conversion
process runs after a (long) profiling session has been ended.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-03-19 17:57:08 Tree
[d2df41] by Youquan Song Youquan Song , pushed by Maynard Johnson Maynard Johnson

Add Ivybridge EP support

run "opcontrol -l" at Ivybridge EP processor, it does not detect the CPU.
oprofile: available events for CPU type "Intel Architectural Perfmon"
and also does not show the perf event list.

After add the patch, it shows Ivybridge perf event list and the correct CPU:
oprofile: available events for CPU type "Intel Ivy Bridge microarchitecture"

Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Andi Kleen <ak@linux.intel.com>

2013-03-18 19:02:50 Tree
[7cf28f] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

operf, remove support to report multiplexing.

The multiplexing reporting doesn't work correctly when profiling
multi-threaded apps or apps that do fork/exec. The detection of
multiplexing doesn't work when processes migrate between CPUs.
The event is enabled on all CPUs. The running time stops when the
event migrates to another CPU however, the enabled time does not stop as it
is enabled on each CPU. The issue is that the running time across CPUs
doesn't add up to the enabled time because of the running time is not
increasing while the process is being migrated. This results in the running
time being less then the enabled time. There is no way to detect if the
reason the running time is less then the enabled time was do to migration
or due to multiplexing.

The support is being removed so that the operf tool is not incorrectly
flagging events for multiplexing.

Signed-off-by: Carl Love <cel@us.ibm.com>

2013-03-12 15:52:15 Tree
[71c5f8] by Ryo Onodera Ryo Onodera , pushed by Maynard Johnson Maynard Johnson

Add #include of stdint.h to opagent.h

Signed-off-by: Ryo Onodera <ryoqun@gmail.com>

2013-03-07 15:18:24 Tree
[04ee55] by Maynard Johnson Maynard Johnson

Fix seg fault due to incorrect array size initialization

The operf_sfile.cpp:create_sfile function creates and initializes the
array of 'struct operf_sfile' objects used for writing sample data to
oprofile formatted sample files. This function creates an array of
these objects, but was incorrectly creating an array of size 'OP_MAX_COUNTER'.
Since operf can multiplex events, we aren't limited to OP_MAX_COUNTER
events to profile simultaneously, so if the user specifies more than
OP_MAX_COUNTER events, the code that accesses this array was going
off the end and sometimes seg faulting.

This patch fixes the problem by defining OP_MAX_EVENTS to be '24' and
using that as the array size. Furthermore, if the user tries to specify
more than 24 events to profile, an error message is displayed:
Number of events specified is greater than allowed maximum of <n>
and operf aborts.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-03-07 15:04:46 Tree
[a1f2b6] by Maynard Johnson Maynard Johnson

Make convertPerfData procedure more robust

The operf_read::convertPerfData function reads sample data
in perf_events format from either the temporary operf.data
file or a pipe, depending on whether or not operf is run
with the --lazy-conversion option. This patch makes the
reading/conversion process more robust so that if bad
data is found in the file or pipe, the process will display
helpful messages and end gracefully.

This patch also makes some other minor cleanups, correcting
some misspellings, etc.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-03-05 18:05:59 Tree
[79a183] by Maynard Johnson Maynard Johnson

The configure check to determine whether we should use libpfm or not
is intended only for the ppc64 architecture, but was incorrectly
hitting on the ppc32 architecture, too. Not only that, but it was using
'uname' which is not a good idea in cross-compile situtations.

Then, aside from that, we had several instances in the source code
of the following:
#if (defined(__powerpc__) || defined(__powerpc64__))
which incorrectly included ppc32 architecutre also, when it was intended
for use as PPC64 architecture.

This patch fixes both errors.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-27 21:41:14 Tree
[dec498] by Tulio Magno Quites Machado Filho Tulio Magno Quites Machado Filho , pushed by Maynard Johnson Maynard Johnson

Update configure.ac to work with automake 1.13

GNU automake 1.13 has removed support for AM_CONFIG_HEADER and for the 2
parameter version of AM_INIT_AUTOMAKE.

For reference, see these URLs:

Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>

2013-02-22 21:30:18 Tree
[7e5e18] by Maynard Johnson Maynard Johnson

operf does not properly sample child threads for already-running app

Example: When passing the 'java' command directly to operf, samples are
collected for all of the threads created by the JVM. However, if the
Java app is already running when the user starts operf with either
'--pid' or '--system-wide' option, zero samples are collected on the
child threads of the JVM. Note: The user program that is JITed by the
JVM is executed by a child thread.

This patch addresses the problem by:
- Keeping a list of child processes
- Synthesizing PERF_RECORD_COMM events for the main JVM process and all
the child processes
- Calling perf_event_open for the main JVM process and all child processes

These changes entailed some fairly major restructuring of some functions
and data structures of the operf_record class.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-21 17:09:10 Tree
[81ccb4] by Maynard Johnson Maynard Johnson

operf does not run opjitconv if --pid or --system-wide used

To stop operf when either '--pid' or '--system-wide' option is used, the user
must do a ctrl-C (or 'kill -SIGINT <operf_pid>'. If the user has not passed
'--lazy-conversion', the operf.cpp:convert_sample_data function is run as a
child process that does not have a SIGINT handler set up for it at the time
it's reading sample data from the pipe (which is being written to by the
operf-record process). The end result is that the operf-read process is
interrupted and stopped by the unhandled ctrl-C before it gets a chance to
run opjitconv.

Note: Another (minor) side effect of issue #1 above is that there may be
sample data left un-read in the pipe, and the type of app being profiled
(Java or not) is irrelevant.

This patch addresses the problem by cleaning up operf signal handler
procedures, making it clear which handlers are used by parent and which
by children, and then making sure those handlers are set up at the correct
time. I also found an extraneous unused signal handler defined in
operf_utils.cpp that I removed.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-11 23:14:34 Tree
[f8dd71] by Suravee Suthikulpanit Suravee Suthikulpanit

Add Support for AMD Generic Performance Events

AMD generic performance events are a small set of events which are generally available across several
AMD processor families. PERF has already provided supports for generic performance counters regardless
of the processor family. This will allow operf to work as soon as PERF able to supports the performance
counters, and does not have to wait for the more complete family-specific events and unit_masks files
to be added to OProfile.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

2013-02-08 16:36:51 Tree
[366ca2] by Suravee Suthikulpanit Suravee Suthikulpanit

Fix build issue with gcc-4.7.2 due to fgets

gcc complains about ignoring the return value of fgets.
Since building with -Werror, the build failed.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

2013-02-08 16:28:53 Tree
[cd5c7d] by Maynard Johnson Maynard Johnson

operf: Fix 'Permission denied' error on early perf_events kernels

The new operf tool available with OProfile 0.9.8 uses the perf_event_open
syscall to obtain access to the performance monitor counters and registers.
This syscall is implemented by the Linux Kernel Performance Events Subsystem
(aka "perf_events"). This perf_events subsystem was introduced in kernel
version 2.6.31, and it underwent a lot of changes in the first several versions
thereafter. Apparently, the operf tool, as currently written and operating today,
relies on certain kernel functionaility that was introduced later than some
kernels provided with some Linux distributions that supported perf_events in the
very early stages (e.g.,SLES 11 SP1). When attempting to profile with operf
(e.g., 'operf ls'), it fails with the message:

Unexpected error running operf: Permission denied
Please use the opcontrol command instead of operf.

The fix for this problem is to pass '-1' for the cpu arg on the
perf_event_open syscall when running on an early perf_events kernel.
Passing '-1' for the cpu arg was a requirement (in most circumstances)
on early perf_events kernels. Later kernels removed this requirement
so perf_event_open could be called for each cpu, even for single-app
profiling by non-root users. This is the standard usage model employed
by operf, which allows us to mmap kernel data space for each cpu, thus
giving a lot more memory for the kernel to record sample data.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-05 16:12:04 Tree
[cd8be5] by Maynard Johnson Maynard Johnson

Fix opreport header info on unit mask when operf is run without a UM specified

When a user runs operf and profiles with an event that needs a unit mask value,
the default unit mask value will be used if no UM value is specified. When
opreport prints its header information, you get something like the following:

CPU: Intel Sandy Bridge microarchitecture, speed 2.401e+06 MHz (estimated)
Counted int_misc events (Instruction decoder events) with a unit mask of 0x00
(rat_stall_cycles Cycles Resource Allocation Table (RAT) external stall is
sent to Instruction Decode Queue (IDQ) for this thread.) count 2000000

Notice that the unit mask value '0x00' is shown, even though the code actually
selects the default unit value of 0x40 for the int_misc event.

This patch fixes this issue. It also partially addresses the issue with
named unit mask showing up as '0x00' in opreport, too (see oprofile bug
It's not a very good solution to the named unit mask issue, but it's a better
than nothing until we can come up with a final (better) solution.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-28 22:20:46 Tree
[646eeb] by Maynard Johnson Maynard Johnson

Fix 32-bit compilation error

Added a 'ULL' suffix to an u64 variable definition so that -m32
build would not fail.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-01-24 21:09:29 Tree
