oprofile Log


Commit Date  
[31389d] by Maynard Johnson Maynard Johnson

Fix PM_RUN_CYC and PM_RUN_INST_CMPL event codes broken by previous commit

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-02-07 16:27:46 Tree
[029735] by Maynard Johnson Maynard Johnson

Fix various event names and codes for IBM architected and POWER8 events

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-02-07 14:58:28 Tree
[7243fa] by Maynard Johnson Maynard Johnson

Make cpu type POWER8E equivalent to POWER8

Recent mainline kernel changes resulted in a cpu type of
"POWER8E" being displayed in /proc/cpuinfo for certain revisions
of the IBM POWER8 processor model. But for profiling and
counting of native events, we can ignore the differences between
POWER8 and POWER8E. This patch addresses that issue.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-02-04 14:27:10 Tree
[717d45] by Maynard Johnson Maynard Johnson

Fix up event codes for marked architected events

Fourteen events in the set of architected events had the wrong
event encoding. All 14 were "marked" events, used in random
sampling.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-02-03 23:50:54 Tree
[fd05da] by Maynard Johnson Maynard Johnson

Remove 'extra' attribute from ophelp XML output; bump schema version

As discussed on the oprofile mailing list on Sep 24, 2013, there is
no value add in keeping the 'extra' attribute in ophelp's XML output.
The previous commit added the 'name' field to the XML output, and
that is actual valuable information that consumers of the XML output
should use when coding event specifications to pass to operf or
ocount.

This patch removes the 'extra' attribute and also bumps the schema
version (both in the ophelp.xsd and the XML instance documents).
The schema bump is needed mostly due to removing the 'extra' attribute;
but another reason for it is to draw attention to the new 'name'
attribute, which consumers really must use (when present) in order
to be sure they can properly specify the unitmask that the user
requests.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-02-03 14:47:30 Tree
[c0b7ac] by Maynard Johnson Maynard Johnson

Make operf/ocount detect invalid timer mode from opcontrol

Certain architecture/processor models are not supported by the
legacy oprofile kernel driver, so the cpu type detected (when
opcontrol is run) is "timer". If a user runs opcontrol on such
a system and does not unload the oprofile kernel module (using
'opcontrol --deinit') prior to using operf or ocount, the operf
and ocount tools will fail; for example, running 'operf <cmd>'
(with no event specified) will fail with the following unhelpful
message:

Unable to find default event

This failure is due to how operf and ocount ascertain the cpu
type by calling libop/op_events.c:op_get_cpu_type(). That function
will look first in the oprofilefs (/dev/oprofile/cpu_type, in particular)
to try to determine the running cpu type. In the case described above,
the /dev/oprofile/cpu_type file contains 'timer', so operf and ocount
then try to find the default event for cpu type 'timer', but since timer
mode is not supported on either operf or ocount, there is default event,
so the tool fails.

Additionally, the ophelp command, when run in the same situation (where
the /dev/oprofile/cpu_type file contains 'timer') will not display the
expected list of native events; instead it simply says:
Using timer interrupt.

This patch makes operf and ocount exit immediately with a helpful message
if cpu type 'timer' is detected. A helpful message is also added to ophelp
for the same circumstances.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-31 15:05:18 Tree
[ed40d8] by William Cohen William Cohen , pushed by Maynard Johnson Maynard Johnson

Print unit mask name where applicable in ophelp XML output

Some Intel architectures have named unit masks and it would be useful
to include the unit mask name in the XML output. This patch also
updates the ophelp.xsd schema file to include the optional unit
mask 'name' field.

Signed-off-by: William Cohen <wcohen@redhat.com>

2014-01-28 17:05:46 Tree
[be6d22] by Maynard Johnson Maynard Johnson

Fix issues detected by Coverity

Will Cohen ran Coverity against oprofile and reported some issues
on Nov 20, 2013. I submitted the current oprofile source to the
Coverity webpage, and a couple new issues were detected. This
patch addresses most of these issues. Some issues are either
false positives from Coverity's analysis or have been marked
as "Intentional" so as to have Coverity ignore them.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-27 21:44:18 Tree
[d6ca28] by Maynard Johnson Maynard Johnson

Reduce overhead of operf waiting for profiled app to end

The original implementation was very inefficient in how operf
waited for the profiled app to end. We reduce much of the overhead
by doing a nanosleep for 100 ms, waking up, checking the status
of the app (via waitpid), and going back to sleep again if the app
is still running.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-24 23:07:10 Tree
[65176c] by Maynard Johnson Maynard Johnson

Fix regression in IBM POWER8 running in POWER7 compat mode

A commit made on Dec 17, 2013 ("Allow all native events for IBM POWER8
in POWER7 compat mode) broke support for POWER8 in POWER7 compat mode.
Instead, oprofile attempts to treat it as a normal POWER7 processor,
which is not correct. A user reported the following error when
running operf with the default CYCLES event:

terminate called after throwing an instance of 'std::runtime_error'
what(): libpfm cannot find event code for CYCLES; cannot continue
Aborted

This patch fixes this problem.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-21 20:43:02 Tree
[1c1636] by Maynard Johnson Maynard Johnson

Minor man page cleanups for the ocount command

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-17 22:52:45 Tree
[d0eb98] by Leonid Moiseichuk Leonid Moiseichuk , pushed by Maynard Johnson Maynard Johnson

Add minimal (armv7-common) support for ARMv7 Krait

Just common ARM PMU events are supported with this patch due
to lack of available documentation.

Other developers reports that operf also works nice.
Tested on device and using "make distcheck".
Signed-off-by: Leonid Moiseichuk <l.moiseichuk@samsung.com>

2014-01-10 15:13:28 Tree
[e3edba] by Maynard Johnson Maynard Johnson

Whitespace fix in configure.ac from previous commit

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-09 21:53:08 Tree
[a265c5] by Maynard Johnson Maynard Johnson

Enable oprofile for new ppc64le architecture

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-09 21:47:09 Tree
[88ed74] by Maynard Johnson Maynard Johnson

Fix "Unable to open cpu_type file for reading" for IBM POWER7+

Using operf to do profiling on an IBM POWER7+ may result in
the following error message:

Unable to open cpu_type file for reading

This patch fixes the problem. There is also a simple workaround of
running 'opcontrol --init'.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-09 21:07:21 Tree
[286f23] by Maynard Johnson Maynard Johnson

Fix compile errors occurring with gcc 4.8.x

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-08 18:31:24 Tree
[bb56ad] by Maynard Johnson Maynard Johnson

Fix ocount man page and usage regarding counting modes

The ocount tool must be run with one and only one of the following
counting modes:
o system-wide
o process-list
o thread-list
o cpu-list
o command [args]

The ocount man page and usage printout was missing the logical OR
separator ('|') between cpu-list and command modes. This patch
fixes that.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-03 15:49:29 Tree
[344bac] by Maynard Johnson Maynard Johnson

Remove unused variable 'tab' from previous commit

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-02 16:23:45 Tree
[51e615] by Maynard Johnson Maynard Johnson

opreport from 'operf --callgraph' profile shows incorrect recursive calls

When you collect a callgraph profile with operf, the opreport output
incorrectly implies recursive calls. For example, a simple memcpy
testcase that has the following true callchain:
main -> do_my_memcpy -> memcpy (libc)

appears as follows with 'opreport --callgraph' (focusing here just
on the do_my_memcpy callers and callees):

4757 50.0000 memcpyt do_my_memcpy
4757 50.0000 memcpyt main
4757 6.3185 memcpyt do_my_memcpy
4757 49.9842 memcpyt do_my_memcpy
4757 49.9842 memcpyt do_my_memcpy [self]
3 0.0315 no-vmlinux /no-vmlinux

NOTE: Lines above the non-indented line show the callers of do_my_memcpy;
lines below the non-indented line show the callees of do_my_memcpy.
So it appears that do_my_memcpy calls itself, which it does not do.

If I use 'perf record' to get a callgraph profile, the 'perf report'
looks like the following:

6.88% memcpyt memcpyt [.] do_my_memcpy
|
--- do_my_memcpy
main
__libc_start_main

So here, too, it seems to me that do_my_memcpy calls do_my_memcpy.
When I reported this issue to perf/perf_events kernel developers,
I was basically told that this behavior was "by design".

This patch makes an effort to handle this issue by having operf drop
the first address in the callchain if and only if it is the same
address as the second address in the callchain.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-01-02 14:35:17 Tree
[746b5c] by Maynard Johnson Maynard Johnson

Update TODO list

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-20 15:26:28 Tree
[46a673] by Maynard Johnson Maynard Johnson

Fix sample attribution problem when using multiple events

A serious bug was found that affects operf profiling with
multiple events. Samples for an event may be incorrectly
attributed to another event. For example, profiling on
a Sandybridge laptop with CPU_CLK_UNHALTED and INST_RETIRED
events produces the following summary counts from opreport:

CPU_CLK_UNHALTED |INST_RETIRED |
samples| %| samples| %|
------------------------------------
32412 100.000 20104 100.000 foo

Using operf to produce separate profiles for these two events
results in these sample counts:

CPU_CLK_UNHALTED |
samples| %|
------------------
18962 100.000 foo

INST_RETIRED |
samples| %|
------------------
33464 100.000 foo

This patch fixes the problem.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-20 15:10:46 Tree
[f76a07] by Maynard Johnson Maynard Johnson

Fix two makefiles to use -Werror, and fix resulting compiler errors

I discovered that the Makefile.am files in libperf_events and
libpe_utils did not set AM_CXXFLAGS = @OP_CXXFLAGS@, and thus,
the extra -W flags that are added to OP_CXXFLAGS (in configure.ac)
were not being used when building these two directories. Once
I corrected this problem with the makefiles and rebuilt the
source tree, the g++ compiler found a number of minor issues
and ended in error due to the -Werror flag. This patch contains
fixes for the two Makefile.am files, as well as fixes for the
compiler's warnings-turned-to-errors. These were all minor issues
that I'm fairly confident should not have caused any functional
problems. Both manual testing and oprofile testsuite pass
with this patch applied.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-19 18:57:14 Tree
[87bf15] by Maynard Johnson Maynard Johnson

Fix operf/opreport kernel throttling detection

With oprofile 0.9.8, the operf command below correctly produces the following
output on an Intel Core 2 Duo/RHEL 6.4 system when using too high of a
sampling rate:

----------------------
$ operf -e CPU_CLK_UNHALTED:20000 ./memcpyt 200000000
operf: Profiler started
Num iterations passed is 200000000
memcpyt starting with PID 2423
source_address: 7fff689b7003
dest_address: 7fff689b5007
200000000 interations of memcpy(d+7.s+3,65) requires 10.016 seconds
* * * * WARNING: Profiling rate was throttled back by the kernel * * * *
The number of samples actually recorded is less than expected, but is
probably still statistically valid. Decreasing the sampling rate is the
best option if you want to avoid throttling.

Profiling done.
----------------------

The same operf command using current upstream oprofile (and 0.9.9) produces no
throttling message. But by comparing the number of samples with profile runs
using a lower sampling rate (i.e., count value >=100000 for CPU_CLK_UNHALTED),
I can see that the kernel must be throttling, because we're not collecting
enough samples for the given sampling rate.

Additionally, the opreport command should report when throttling has
occurred for the profile data being analyzed. This enhancement was made
post-0.9.8, but was broken at some point before 0.9.9 was released, so that
this informational message is also now missing from opreport.

This patch fixes both issues (and they were, indeed, separate bugs).

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-18 20:59:37 Tree
[4f5a0d] by Maynard Johnson Maynard Johnson

Allow all native events for IBM POWER8 in POWER7 compat mode

Certain older Linux distributions will support the new IBM POWER8
processor, but only in a limited mode, since much of the new
kernel code needed to fully support the POWER8 was not backported
to these older distros. This limited mode is referred to as
"POWER7 compat mode" since the kernel can support only the features
that were also available on that earlier IBM processor.

Changes I originally made to support POWER8 assumed that there
would not be full POWER8 performance monitor unit capabilities when
in POWER7 compat mode, and thus, the current oprofile code supports
only a limited subset of POWER8 events (i.e., events which were also
available on the POWER7). However, I've recently been made aware
that these older distros actually do have complete backports of the
POWER8 perf_events kernel subsystem code, making them fully aware of
all POWER8 events. This patch allows operf and ocount to use all
of the POWER8 events, regardless of what mode or distribution we
are running on.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2013-12-17 22:04:33 Tree
[4f3e25] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

Fix kallsyms support for callgraph and debug-info opreport options

This patch is a fix for the recent patch to add support for obtaining
the kernel symbols from kallsyms if no vmlinux file was specified. The
opreport tool was seg faulting for the command "opreport -g -l". The
change in file libutil++/op_bfd.cpp and libutil++/bfd_support.h fixes
the segmentation fault by returning when the bfd file is either not
valid or a pseudo BFD. The pseudo BFD is used when the symbols
were obtained from kallsyms rather than from an actul vmlinux file.

A second issue was the symbol name from /proc/kallsyms was not being
printed when "--callgraph" was specified with operf and opreport tools.
The issue was due to calling the wrong bfd constructor when generating
the callgraph information. If the callee or caller image file is
kallsyms, then the kallsyms bfd constructor must be called to obtain the
symbol information. The changes to file libpp/callgraph_container.cpp
fixes this issue.

Changed a comment to consistently refer to the kallsyms BFD file as a
pseudo BFD file rather then a fake BFD file.

Signed-off-by: Carl E. Love <carll@us.ibm.com>

2013-12-17 21:29:54 Tree
Older >