Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

oprofile Log


Commit Date  
[51e615] by Maynard Johnson Maynard Johnson

opreport from 'operf --callgraph' profile shows incorrect recursive calls

When you collect a callgraph profile with operf, the opreport output
incorrectly implies recursive calls. For example, a simple memcpy
testcase that has the following true callchain:
main -> do_my_memcpy -> memcpy (libc)

appears as follows with 'opreport --callgraph' (focusing here just
on the do_my_memcpy callers and callees):

4757 50.0000 memcpyt do_my_memcpy
4757 50.0000 memcpyt main
4757 6.3185 memcpyt do_my_memcpy
4757 49.9842 memcpyt do_my_memcpy
4757 49.9842 memcpyt do_my_memcpy [self]
3 0.0315 no-vmlinux /no-vmlinux

NOTE: Lines above the non-indented line show the callers of do_my_memcpy;
lines below the non-indented line show the callees of do_my_memcpy.
So it appears that do_my_memcpy calls itself, which it does not do.

If I use 'perf record' to get a callgraph profile, the 'perf report'
looks like the following:

6.88% memcpyt memcpyt [.] do_my_memcpy
|
--- do_my_memcpy
main
__libc_start_main

So here, too, it seems to me that do_my_memcpy calls do_my_memcpy.
When I reported this issue to perf/perf_events kernel developers,
I was basically told that this behavior was "by design".

This patch makes an effort to handle this issue by having operf drop
the first address in the callchain if and only if it is the same
address as the second address in the callchain.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-02 14:35:17 Tree
[746b5c] by Maynard Johnson Maynard Johnson

Update TODO list

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-20 15:26:28 Tree
[46a673] by Maynard Johnson Maynard Johnson

Fix sample attribution problem when using multiple events

A serious bug was found that affects operf profiling with
multiple events. Samples for an event may be incorrectly
attributed to another event. For example, profiling on
a Sandybridge laptop with CPU_CLK_UNHALTED and INST_RETIRED
events produces the following summary counts from opreport:

CPU_CLK_UNHALTED |INST_RETIRED |
samples| %| samples| %|
------------------------------------
32412 100.000 20104 100.000 foo

Using operf to produce separate profiles for these two events
results in these sample counts:

CPU_CLK_UNHALTED |
samples| %|
------------------
18962 100.000 foo

INST_RETIRED |
samples| %|
------------------
33464 100.000 foo

This patch fixes the problem.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-20 15:10:46 Tree
[f76a07] by Maynard Johnson Maynard Johnson

Fix two makefiles to use -Werror, and fix resulting compiler errors

I discovered that the Makefile.am files in libperf_events and
libpe_utils did not set AM_CXXFLAGS = @OP_CXXFLAGS@, and thus,
the extra -W flags that are added to OP_CXXFLAGS (in configure.ac)
were not being used when building these two directories. Once
I corrected this problem with the makefiles and rebuilt the
source tree, the g++ compiler found a number of minor issues
and ended in error due to the -Werror flag. This patch contains
fixes for the two Makefile.am files, as well as fixes for the
compiler's warnings-turned-to-errors. These were all minor issues
that I'm fairly confident should not have caused any functional
problems. Both manual testing and oprofile testsuite pass
with this patch applied.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-19 18:57:14 Tree
[87bf15] by Maynard Johnson Maynard Johnson

Fix operf/opreport kernel throttling detection

With oprofile 0.9.8, the operf command below correctly produces the following
output on an Intel Core 2 Duo/RHEL 6.4 system when using too high of a
sampling rate:

----------------------
$ operf -e CPU_CLK_UNHALTED:20000 ./memcpyt 200000000
operf: Profiler started
Num iterations passed is 200000000
memcpyt starting with PID 2423
source_address: 7fff689b7003
dest_address: 7fff689b5007
200000000 interations of memcpy(d+7.s+3,65) requires 10.016 seconds
* * * * WARNING: Profiling rate was throttled back by the kernel * * * *
The number of samples actually recorded is less than expected, but is
probably still statistically valid. Decreasing the sampling rate is the
best option if you want to avoid throttling.

Profiling done.
----------------------

The same operf command using current upstream oprofile (and 0.9.9) produces no
throttling message. But by comparing the number of samples with profile runs
using a lower sampling rate (i.e., count value >=100000 for CPU_CLK_UNHALTED),
I can see that the kernel must be throttling, because we're not collecting
enough samples for the given sampling rate.

Additionally, the opreport command should report when throttling has
occurred for the profile data being analyzed. This enhancement was made
post-0.9.8, but was broken at some point before 0.9.9 was released, so that
this informational message is also now missing from opreport.

This patch fixes both issues (and they were, indeed, separate bugs).

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-18 20:59:37 Tree
[4f5a0d] by Maynard Johnson Maynard Johnson

Allow all native events for IBM POWER8 in POWER7 compat mode

Certain older Linux distributions will support the new IBM POWER8
processor, but only in a limited mode, since much of the new
kernel code needed to fully support the POWER8 was not backported
to these older distros. This limited mode is referred to as
"POWER7 compat mode" since the kernel can support only the features
that were also available on that earlier IBM processor.

Changes I originally made to support POWER8 assumed that there
would not be full POWER8 performance monitor unit capabilities when
in POWER7 compat mode, and thus, the current oprofile code supports
only a limited subset of POWER8 events (i.e., events which were also
available on the POWER7). However, I've recently been made aware
that these older distros actually do have complete backports of the
POWER8 perf_events kernel subsystem code, making them fully aware of
all POWER8 events. This patch allows operf and ocount to use all
of the POWER8 events, regardless of what mode or distribution we
are running on.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-17 22:04:33 Tree
[4f3e25] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

Fix kallsyms support for callgraph and debug-info opreport options

This patch is a fix for the recent patch to add support for obtaining
the kernel symbols from kallsyms if no vmlinux file was specified. The
opreport tool was seg faulting for the command "opreport -g -l". The
change in file libutil++/op_bfd.cpp and libutil++/bfd_support.h fixes
the segmentation fault by returning when the bfd file is either not
valid or a pseudo BFD. The pseudo BFD is used when the symbols
were obtained from kallsyms rather than from an actul vmlinux file.

A second issue was the symbol name from /proc/kallsyms was not being
printed when "--callgraph" was specified with operf and opreport tools.
The issue was due to calling the wrong bfd constructor when generating
the callgraph information. If the callee or caller image file is
kallsyms, then the kallsyms bfd constructor must be called to obtain the
symbol information. The changes to file libpp/callgraph_container.cpp
fixes this issue.

Changed a comment to consistently refer to the kallsyms BFD file as a
pseudo BFD file rather then a fake BFD file.

Signed-off-by: Carl E. Love <carll@us.ibm.com>

2013-12-17 21:29:54 Tree
[a5f539] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

Add support for getting the Kernel symbols from /proc/kallsyms

This patch reads the /proc/kallsyms file to get the kernel symbols
if the user hasn't specified a vmlinux file.

Signed-off-by: Carl Love <carll@us.ibm.com>

2013-12-11 18:05:49 Tree
[726b23] by Maynard Johnson Maynard Johnson

Add explanation of kernel/user bits in event specification

This patch adds a paragraph to the ocount and operf man pages
to explain the kernel/user bits in the event specification.
A few other minor cleanups were done also.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-04 17:56:55 Tree
[65bbb3] by Maynard Johnson Maynard Johnson

Add more helpful info about dealing with lost samples

When operf detects that more than a certain percentage of
samples were lost, it displays a warning message when it
stops. This patch adds to that message a suggestion to
lower the sampling rate. This patch also updates the
operf man page with information on how to control the
sampling rate.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-03 19:26:59 Tree
[810bb8] by Maynard Johnson Maynard Johnson

Fix spurious "backtraces skipped due to no file mapping" log entries

When using operf to do callgraph profiling, the following message may be
displayed:

WARNING: Lost samples detected! See .../oprofile_data/samples/operf.log for details.

And in the operf.log, you may see something like:

Nr. backtraces skipped due to no file mapping: 267

A bug in the code is causing most of these "no file mapping" counts.
This patch fixes that problem.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-11-14 22:25:43 Tree
[25c0a6] by Maynard Johnson Maynard Johnson

Fix minor issues found with Eclipse CDT code analysis

The Kepler release of Eclipse/CDT includes a Code Analysis
feature that automatically runs when you open a file in the
CDT editor. Several warning messages are given for various
files, and this patch fixes those issues.

I have not opened *every* file in the oprofile source to
have it analyzed, so there may be other issues found in the
future. I tried analyzing the whole project, but the
function broke with some kind of stack overflow error.
I then tried analyzing a directory, and that seemed to
not work correctly -- identifying things that aren't really
problems.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-11-08 17:32:11 Tree
[a339a0] by Maynard Johnson Maynard Johnson

ophelp schema is not included in installed files

A one-line change in doc/Makefile.am was needed in order for
'make install' to put ophelp.xsd in <installdir>/share/doc/oprofile.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-11-07 14:24:05 Tree
[a856df] by Maynard Johnson Maynard Johnson

Add pseudo event for POWER7 to count rising edge events

This patch is specific to the IBM Power architecture.
The patch adds the capability to detect events where the
"_EDGE_COUNT" suffix has been appended to a real native event
name. The intent of such an event is to detect the rising edge
of the corresponding real native event. This "edge detection"
technique is useful for events that normally count the number
of cycles that a particular condition is true.
Since such "pseudo events" have not been formally defined in
processor documentation, libpfm does not know about them; thus,
we must convert them to their real native event equivalent in
order to get the base code. We thenset the "edge detect" bit
(the LSB) in the event code.

This patch adds one new POWER7 event, PM_GCT_NOSLOT_CYC_EDGE_COUNT,
which uses the edge detection.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-11-06 22:43:35 Tree
[d840b9] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

Duplicate event specs passed to ocount show up twice in output

Invoking 'ocount' and passing an events list that contains
duplicate event specifications results in redundant data
collection. See the example below:

$ocount -e CPU_CLK_UNHALTED,CPU_CLK_UNHALTED /bin/true
Events were actively counted for 1192874 nanoseconds.
Event counts (actual) for /bin/true:
Event Count % time counted
CPU_CLK_UNHALTED 2,374,832 100.00
CPU_CLK_UNHALTED 2,374,832 100.00

The solution implemented with this patch is to store the input
event specs in a set, thus, exact duplicate event specs will be
automatically ignored.

Signed-off-by: Carl Love <cel@us.ibm.com>

2013-11-06 14:59:25 Tree
[44d156] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

Ocount, print the unit mask, kernel and user modes if specified for the event

The unit mask, kernel and user mode can all be optionally specified by the user.
Currently, these values are not being printed with the event name and the
counts for the events. This patch will print this information only if
the user specifies one or more of these qualifiers with the event specifier.

Signed-off-by: Carl Love <carll@us.ibm.com>

2013-11-05 18:42:39 Tree
[ef501a] by Maynard Johnson Maynard Johnson

Fix handling of default named unit masks longer than 11 chars

The handling of default unit masks that are names instead of hex
values is new with oprofile 0.9.9. I've discovered a bug in this
handling when the name exceeds 11 characters. For example, on
Sandybridge, the following ocount command fails:

[mpjohn@oc1757000783 test-stuff]$ ocount -e l1d_blocks ls
Cannot find unit mask bank_confli for l1d_blocks
Unable to find unit mask info for bank_confli for event l1d_blocks

This problem was due to the char array ('mask') being too small.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-10-31 16:11:06 Tree
[e55a4a] by Maynard Johnson Maynard Johnson

Cleanup TODO list

I removed some obsolete stuff and added some new, but there
are likely still some TODOs in this file that are not valid
any longer.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-10-24 19:49:03 Tree
[fb9529] by Maynard Johnson Maynard Johnson

Fix operf/ocount default unit mask selection

Many events (particularly in the x86* architectures)
require a unit mask value to specify the exact event
type. For such events, a default unit mask value
is assigned. When a user runs operf, ocount, or
opcontrol and specifies such an event but does not
specify a unit mask, the default unit mask should be
selected and used by the tool. A bug was discovered
with operf and ocount where the unit mask value in
this situation was being set to '0' instead of the
default unit mask value. This patch fixes the bug.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-10-15 19:58:16 Tree
[4b1497] by Andi Kleen Andi Kleen , pushed by Maynard Johnson Maynard Johnson

Add support for Intel Silvermont processor

Just add the event list for Intel Silvermont based systems
(Avoton, BayTrail) and the usual changes for a new CPU.
No new code otherwise.

The model number list is incomplete at this point, more will
be added in the future.

I also finally removed the top level event list descriptions.
All the events are only described in the unit masks now
(Intel doesn't really have a top level event, and I had
to invent descriptions, which was error prone and
often wrong)

I also removed some outdated document number references.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

2013-10-10 18:12:28 Tree
[a2811b] by Maynard Johnson Maynard Johnson

configure error message for missing libpfm is not informative enough

On the ppc64 architecture, the libpfm library is used to get perf_events
encodings for events, so the configure script checks for the availability
of that library when building for ppc64. If the library is missing, the
configure error message is:

checking for perfmon/pfmlib.h... no
configure: error: pfmlib.h not found; usually provided in papi devel package

However, some newer distros (like Fedora 19) are now delivering separate
packages for libpfm and papi, instead of bundling them together. The patch
provided herein changes the configure message to reflect that change in
packaging.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-10-09 19:27:54 Tree
[ebde58] by Maynard Johnson Maynard Johnson

Converge operf and ocount utility functions

When the ocount tool was developed, a number of utility
functions were needed that were very similar to operf utility
functions, with just minor changes. The decision was made at
the time to copy these functions into ocount and change them
as needed. To avoid dual maintenance on very similar functions,
we should converge the two tools to use one common set of utility
functions. The main reason for not doing so in the first place
was to make it easier to review ocount patches and not have to
look at operf changes at the same time.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-10-09 18:12:21 Tree
[3795ee] by Maynard Johnson Maynard Johnson

Add two new POWER8 events that are needed for stall analysis

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-09-25 16:15:30 Tree
[b91794] by Ting Liu Ting Liu , pushed by Maynard Johnson Maynard Johnson

Add freescale e6500 support

Signed-off-by: Zhenhua Luo <zhenhua.luo@freescale.com>
Signed-off-by: Ting Liu <b28495@freescale.com>

2013-09-05 12:45:52 Tree
[ca3f79] by Ting Liu Ting Liu , pushed by Maynard Johnson Maynard Johnson

Add freescale e500mc support

Signed-off-by: George Stephen <Stephen.George@freescale.com>
Signed-off-by: Zhenhua Luo <zhenhua.luo@freescale.com>
Signed-off-by: Ting Liu <b28495@freescale.com>

2013-09-05 12:43:55 Tree
Older >