oprofile Log

Commit Date  
[c884b7] by Maynard Johnson Maynard Johnson

Fix compile errors on Ubuntu 14.04

The gcc 4.8.2 on Ubuntu 14.04 complains about two different
types of issues that we've not seen older compilers complain about.
The complaints are warnings that are turned into errors, due to
our use of -Werror. The first type of error involves fprintf:

error: format not a string literal and no format arguments

I've found the following explanation for this change in gcc behavior:

If -Wformat is specified, also warn about uses of format
functions that represent possible security problems. At present,
this warns about calls to "printf" and "scanf" functions where
the format string is not a string literal and there are no format
arguments, as in "printf (foo);". This may be a security hole if
the format string came from untrusted input and contains %n.
(This is currently a subset of what -Wformat-nonliteral warns
about, but in future warnings may be added to -Wformat-security
that are not included in -Wformat-nonliteral.)

The second type of error is for not checking the return value of fgets.

This patch fixes these two issues and resolved the compilation problems.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-03-05 15:04:30 Tree
[344bac] by Maynard Johnson Maynard Johnson

Remove unused variable 'tab' from previous commit

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-02 16:23:45 Tree
[51e615] by Maynard Johnson Maynard Johnson

opreport from 'operf --callgraph' profile shows incorrect recursive calls

When you collect a callgraph profile with operf, the opreport output
incorrectly implies recursive calls. For example, a simple memcpy
testcase that has the following true callchain:
main -> do_my_memcpy -> memcpy (libc)

appears as follows with 'opreport --callgraph' (focusing here just
on the do_my_memcpy callers and callees):

4757 50.0000 memcpyt do_my_memcpy
4757 50.0000 memcpyt main
4757 6.3185 memcpyt do_my_memcpy
4757 49.9842 memcpyt do_my_memcpy
4757 49.9842 memcpyt do_my_memcpy [self]
3 0.0315 no-vmlinux /no-vmlinux

NOTE: Lines above the non-indented line show the callers of do_my_memcpy;
lines below the non-indented line show the callees of do_my_memcpy.
So it appears that do_my_memcpy calls itself, which it does not do.

If I use 'perf record' to get a callgraph profile, the 'perf report'
looks like the following:

6.88% memcpyt memcpyt [.] do_my_memcpy
--- do_my_memcpy

So here, too, it seems to me that do_my_memcpy calls do_my_memcpy.
When I reported this issue to perf/perf_events kernel developers,
I was basically told that this behavior was "by design".

This patch makes an effort to handle this issue by having operf drop
the first address in the callchain if and only if it is the same
address as the second address in the callchain.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-02 14:35:17 Tree
[46a673] by Maynard Johnson Maynard Johnson

Fix sample attribution problem when using multiple events

A serious bug was found that affects operf profiling with
multiple events. Samples for an event may be incorrectly
attributed to another event. For example, profiling on
a Sandybridge laptop with CPU_CLK_UNHALTED and INST_RETIRED
events produces the following summary counts from opreport:

samples| %| samples| %|
32412 100.000 20104 100.000 foo

Using operf to produce separate profiles for these two events
results in these sample counts:

samples| %|
18962 100.000 foo

samples| %|
33464 100.000 foo

This patch fixes the problem.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-20 15:10:46 Tree
[f76a07] by Maynard Johnson Maynard Johnson

Fix two makefiles to use -Werror, and fix resulting compiler errors

I discovered that the Makefile.am files in libperf_events and
libpe_utils did not set AM_CXXFLAGS = @OP_CXXFLAGS@, and thus,
the extra -W flags that are added to OP_CXXFLAGS (in configure.ac)
were not being used when building these two directories. Once
I corrected this problem with the makefiles and rebuilt the
source tree, the g++ compiler found a number of minor issues
and ended in error due to the -Werror flag. This patch contains
fixes for the two Makefile.am files, as well as fixes for the
compiler's warnings-turned-to-errors. These were all minor issues
that I'm fairly confident should not have caused any functional
problems. Both manual testing and oprofile testsuite pass
with this patch applied.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-19 18:57:14 Tree
[87bf15] by Maynard Johnson Maynard Johnson

Fix operf/opreport kernel throttling detection

With oprofile 0.9.8, the operf command below correctly produces the following
output on an Intel Core 2 Duo/RHEL 6.4 system when using too high of a
sampling rate:

$ operf -e CPU_CLK_UNHALTED:20000 ./memcpyt 200000000
operf: Profiler started
Num iterations passed is 200000000
memcpyt starting with PID 2423
source_address: 7fff689b7003
dest_address: 7fff689b5007
200000000 interations of memcpy(d+7.s+3,65) requires 10.016 seconds
* * * * WARNING: Profiling rate was throttled back by the kernel * * * *
The number of samples actually recorded is less than expected, but is
probably still statistically valid. Decreasing the sampling rate is the
best option if you want to avoid throttling.

Profiling done.

The same operf command using current upstream oprofile (and 0.9.9) produces no
throttling message. But by comparing the number of samples with profile runs
using a lower sampling rate (i.e., count value >=100000 for CPU_CLK_UNHALTED),
I can see that the kernel must be throttling, because we're not collecting
enough samples for the given sampling rate.

Additionally, the opreport command should report when throttling has
occurred for the profile data being analyzed. This enhancement was made
post-0.9.8, but was broken at some point before 0.9.9 was released, so that
this informational message is also now missing from opreport.

This patch fixes both issues (and they were, indeed, separate bugs).

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-18 20:59:37 Tree
[a5f539] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

Add support for getting the Kernel symbols from /proc/kallsyms

This patch reads the /proc/kallsyms file to get the kernel symbols
if the user hasn't specified a vmlinux file.

Signed-off-by: Carl Love <carll@us.ibm.com>

2013-12-11 18:05:49 Tree
[65bbb3] by Maynard Johnson Maynard Johnson

Add more helpful info about dealing with lost samples

When operf detects that more than a certain percentage of
samples were lost, it displays a warning message when it
stops. This patch adds to that message a suggestion to
lower the sampling rate. This patch also updates the
operf man page with information on how to control the
sampling rate.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-03 19:26:59 Tree
[810bb8] by Maynard Johnson Maynard Johnson

Fix spurious "backtraces skipped due to no file mapping" log entries

When using operf to do callgraph profiling, the following message may be

WARNING: Lost samples detected! See .../oprofile_data/samples/operf.log for details.

And in the operf.log, you may see something like:

Nr. backtraces skipped due to no file mapping: 267

A bug in the code is causing most of these "no file mapping" counts.
This patch fixes that problem.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-11-14 22:25:43 Tree
[25c0a6] by Maynard Johnson Maynard Johnson

Fix minor issues found with Eclipse CDT code analysis

The Kepler release of Eclipse/CDT includes a Code Analysis
feature that automatically runs when you open a file in the
CDT editor. Several warning messages are given for various
files, and this patch fixes those issues.

I have not opened *every* file in the oprofile source to
have it analyzed, so there may be other issues found in the
future. I tried analyzing the whole project, but the
function broke with some kind of stack overflow error.
I then tried analyzing a directory, and that seemed to
not work correctly -- identifying things that aren't really

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-11-08 17:32:11 Tree
[44d156] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

Ocount, print the unit mask, kernel and user modes if specified for the event

The unit mask, kernel and user mode can all be optionally specified by the user.
Currently, these values are not being printed with the event name and the
counts for the events. This patch will print this information only if
the user specifies one or more of these qualifiers with the event specifier.

Signed-off-by: Carl Love <carll@us.ibm.com>

2013-11-05 18:42:39 Tree
[ebde58] by Maynard Johnson Maynard Johnson

Converge operf and ocount utility functions

When the ocount tool was developed, a number of utility
functions were needed that were very similar to operf utility
functions, with just minor changes. The decision was made at
the time to copy these functions into ocount and change them
as needed. To avoid dual maintenance on very similar functions,
we should converge the two tools to use one common set of utility
functions. The main reason for not doing so in the first place
was to make it easier to review ocount patches and not have to
look at operf changes at the same time.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-10-09 18:12:21 Tree
[961b96] by Maynard Johnson Maynard Johnson

Fix compile error on precise_ip field in early versions of perf_event.h

Early "perf_events" kernels did not yet include the "precise_ip" field
in the perf_event_attr struct (defined in perf_event.h). This patch
makes configure check for the existence of that field, and it defines
a new macro, HAVE_PERF_PRECISE_IP, which will be set to '1' if the
field exists or '0' otherwise. The operf_counter.cpp code that was
failing to compile will now use the precise_ip field conditionally,
based of the HAVE_PERF_PRECISE_IP macro.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>
Acked by: Andi Kleen andi@firstfloor.org

2013-07-22 15:27:42 Tree
[2dcd13] by William Cohen William Cohen , pushed by Maynard Johnson Maynard Johnson

Avoid changing the number formatting for cout and cerr streams

Coverity static tool found a number of places where number formatting
was changed and might lead to some oddly formatted output for later
stream output. This patch ensures that the number formatting only
applies to that particular message.

Signed-off-by: William Cohen <wcohen@redhat.com>

2013-06-05 18:14:43 Tree
[6ee980] by Maynard Johnson Maynard Johnson

Fix Coverity errors found on May 20, 2013 git snapshot

Coverity identified the following errors on scans run from May 7 through
May 20, 2013:

Wrapper object use after free,Memory - illegal accesses,/agents/jvmpi/jvmpi_oprofile.cpp,compiled_method_load(JVMPI_Event *)
Unchecked return value,Error handling issues,/daemon/opd_mangling.c,opd_open_sample_file
Dereference after null check,Null pointer dereferences,/daemon/opd_sfile.c,sfile_hash
Uninitialized scalar field,Uninitialized members,/gui/oprof_start_config.cpp,config_setting::config_setting()
Division or modulo by zero,Integer handling issues,/libdb/db_stat.c,odb_hash_stat
Resource leak,Resource leaks,/libop/op_cpu_type.c,_auxv_fetch
Resource leak,Resource leaks,/libop/op_cpu_type.c,fetch_at_hw_platform
Negative array index read,Memory - illegal accesses,/libop/op_events.c,_is_um_valid_bitmask
Write to pointer after free,Memory - corruptions,/libop/op_events.c,read_events
Read from pointer after free,Memory - illegal accesses,/libop/op_events.c,_is_um_valid_bitmask
Dereference after null check,Null pointer dereferences,/libop/op_mangle.c,op_mangle_filename
Dereference after null check,Null pointer dereferences,/libop/op_mangle.c,op_mangle_filename
Time of check time of use,Security best practices violations,/libopagent/opagent.c,op_open_agent
Improper use of negative value,Integer handling issues,/libperf_events/operf_counter.cpp,operf_record::setup()
Double free,Memory - corruptions,/libperf_events/operf_counter.cpp,operf_record::setup()
Uninitialized pointer read,Memory - illegal accesses,/libperf_events/operf_counter.cpp,<unnamed>::_get_perf_event_from_file(mmap_info &)
Unchecked return value,Error handling issues,/libperf_events/operf_mangling.cpp,"operf_open_sample_file(odb_t *, operf_sfile *, operf_sfile *, int, int)"
Using invalid iterator,API usage errors,/libperf_events/operf_process_info.cpp,operf_process_info::try_disassociate_from_parent(char *)
Non-array delete for scalars,Memory - illegal accesses,/libregex/op_regex.cpp,"<unnamed>::op_regerror(int, const re_pattern_buffer &)"
Resource leak,Resource leaks,/libutil++/op_bfd.cpp,"op_bfd::op_bfd(const std::basic_string<char, std::char_traits<char>, std::allocator<char>>&, const string_filter &, const extra_images &, bool &)"
Explicit null dereferenced,Null pointer dereferences,/opjitconv/create_bfd.c,fill_symtab
Resource leak,Resource leaks,/opjitconv/opjitconv.c,_cleanup_jitdumps
Use of untrusted string value,Insecure data handling,/opjitconv/opjitconv.c,main
Resource leak,Resource leaks,/pe_profiling/operf.cpp,_get_cpu_for_perf_events_cap()
Dereference null return value,Null pointer dereferences,/pe_profiling/operf.cpp,_process_session_dir()
Incorrect deallocator used,API usage errors,/pe_profiling/operf.cpp,_process_events_list()


This patch fixes those errors.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-05-28 13:19:25 Tree
[74abfb] by Maynard Johnson Maynard Johnson

Fix Coverity issues identified against oprofile 0.9.8 release

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-05-15 18:14:43 Tree
[c82b96] by Maynard Johnson Maynard Johnson

Use PMC5/PMC6 on ppc64 arch for run cycles/run instructions

The IBM Power processor architecture (ppc64) counts instructions
and cycles on PMC5 and PMC6 (respectively) when the run latch is
set (i.e., when not in idle state). On POWER6, these counters
were not capable of generating interrupts, so they could not be
used for profiling purposes; therefore, oprofile counted those
events (PM_RUN_INST_CMPL and PM_RUN_CYC) using other counters.
But with the newer POWER7 processor, PMC5 and PMC6 can generate
interrupts, so it makes sense to leverage those two counters
instead of using the other 4 (programmable) counters. Doing
so could, theoreticaly allow us to count up to 6 events
simultaenously without the kernel having to do multiplexing.

This patch will force PM_RUN_INST_CMPL and PM_RUN_CYC to be
counted on PMC5 and PMC6 (respectively) when running on an
IBM POWER7 system.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-04-26 19:05:28 Tree
[a39d41] by Maynard Johnson Maynard Johnson

Fix holes in operf system-wide profiling of forked processes

Using operf to do system-wide profiling of the specjbb benchmark
exposed some holes in how operf was processing the perf_events
data coming from the kernel. Some of the events we can get from
the kernel are:

The "COMM" event is to notify us of the start of an executable
application. The "FORK" event tells us when a process forks
another process. The "MMAP" event informs us when a shared library
(or executable anonymous memory, or the executable file itself, etc.)
has been mmap'ed into a process's address space. A "SAMPLE"
event occurs each time the kernel takes a sample for a process.

There is no guarantee in what order these events may arrive from
the kernel, and when a large system (say, 64 CPUs) is running
the specjbb benchmark full bore, with all processors pegged to
100%, you can get some very strange out-of-order looking
sequence of events. Things get even stranger when using Java7
versus Java6 since Java7 spawns many more threads.

The operf code had several issues where such out-of-order
events were not handled properly, so some major changes were
required in the code.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-04-25 15:53:37 Tree
[3aa2fe] by Maynard Johnson Maynard Johnson

oprofile pp tools should print messages about lost samples

When operf completes running, it collects statistics about
lost samples, records them in the operf.log, and prints
a warning message if the number of lost samples exceeds
a pre-defined percentage (.01%) of the total number of
samples. However, when opreport or any of the other oprofile
post-processing tools are run, the statistics are not
readily available (only in the operf.log), so there is no
warning about lost samples. This patch persists those
statistics to files in the <session-dir>/samples/current/stats
dir, allowing the pp tools to access them later. These
stats files are also copied by oparchive, so even archived
profile data will have the statistics available.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-04-19 12:26:32 Tree
[18c4a6] by Maynard Johnson Maynard Johnson

Performance improvement for operf's perf_event-to-oprofile format conversion

This patch decreases the time needed for converting sample records from
perf_events format to oprofile sample file format by about 1/3. The
performance improvement is most notable when doing system-wide profiling
of a busy system and specifying '--lazy-conversion', where the conversion
process runs after a (long) profiling session has been ended.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-03-19 17:57:08 Tree
[7cf28f] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

operf, remove support to report multiplexing.

The multiplexing reporting doesn't work correctly when profiling
multi-threaded apps or apps that do fork/exec. The detection of
multiplexing doesn't work when processes migrate between CPUs.
The event is enabled on all CPUs. The running time stops when the
event migrates to another CPU however, the enabled time does not stop as it
is enabled on each CPU. The issue is that the running time across CPUs
doesn't add up to the enabled time because of the running time is not
increasing while the process is being migrated. This results in the running
time being less then the enabled time. There is no way to detect if the
reason the running time is less then the enabled time was do to migration
or due to multiplexing.

The support is being removed so that the operf tool is not incorrectly
flagging events for multiplexing.

Signed-off-by: Carl Love <cel@us.ibm.com>

2013-03-12 15:52:15 Tree
[04ee55] by Maynard Johnson Maynard Johnson

Fix seg fault due to incorrect array size initialization

The operf_sfile.cpp:create_sfile function creates and initializes the
array of 'struct operf_sfile' objects used for writing sample data to
oprofile formatted sample files. This function creates an array of
these objects, but was incorrectly creating an array of size 'OP_MAX_COUNTER'.
Since operf can multiplex events, we aren't limited to OP_MAX_COUNTER
events to profile simultaneously, so if the user specifies more than
OP_MAX_COUNTER events, the code that accesses this array was going
off the end and sometimes seg faulting.

This patch fixes the problem by defining OP_MAX_EVENTS to be '24' and
using that as the array size. Furthermore, if the user tries to specify
more than 24 events to profile, an error message is displayed:
Number of events specified is greater than allowed maximum of <n>
and operf aborts.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-03-07 15:04:46 Tree
[a1f2b6] by Maynard Johnson Maynard Johnson

Make convertPerfData procedure more robust

The operf_read::convertPerfData function reads sample data
in perf_events format from either the temporary operf.data
file or a pipe, depending on whether or not operf is run
with the --lazy-conversion option. This patch makes the
reading/conversion process more robust so that if bad
data is found in the file or pipe, the process will display
helpful messages and end gracefully.

This patch also makes some other minor cleanups, correcting
some misspellings, etc.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-03-05 18:05:59 Tree
[79a183] by Maynard Johnson Maynard Johnson

The configure check to determine whether we should use libpfm or not
is intended only for the ppc64 architecture, but was incorrectly
hitting on the ppc32 architecture, too. Not only that, but it was using
'uname' which is not a good idea in cross-compile situtations.

Then, aside from that, we had several instances in the source code
of the following:
#if (defined(__powerpc__) || defined(__powerpc64__))
which incorrectly included ppc32 architecutre also, when it was intended
for use as PPC64 architecture.

This patch fixes both errors.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-27 21:41:14 Tree
[7e5e18] by Maynard Johnson Maynard Johnson

operf does not properly sample child threads for already-running app

Example: When passing the 'java' command directly to operf, samples are
collected for all of the threads created by the JVM. However, if the
Java app is already running when the user starts operf with either
'--pid' or '--system-wide' option, zero samples are collected on the
child threads of the JVM. Note: The user program that is JITed by the
JVM is executed by a child thread.

This patch addresses the problem by:
- Keeping a list of child processes
- Synthesizing PERF_RECORD_COMM events for the main JVM process and all
the child processes
- Calling perf_event_open for the main JVM process and all child processes

These changes entailed some fairly major restructuring of some functions
and data structures of the operf_record class.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-02-21 17:09:10 Tree
Older >

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

No, thanks