oprofile Log


Commit Date  
[5f11dd] by Andi Kleen Andi Kleen , pushed by Maynard Johnson Maynard Johnson

Update the Haswell events to the latest version

Some minor changes to the previous version, but it should be more
consistent with other tools now.

The event name descriptions have been dropped. They were never all that
useful anyways because the event is defined by the unit masks.
Now all events with more than one unit mask only have a description
in the unit masks.

As a new feature any known Errata to the event are referenced.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

2014-07-17 21:23:38 Tree
[893c18] by Andi Kleen Andi Kleen , pushed by Maynard Johnson Maynard Johnson

Improve error message for non-unique unit mask

For the case where the user does not specify a UM and the default UM
is a non-unique hex value, the error message printed is the following:

Default unit mask not supported for this event.
Please specify a unit mask by name, using the first word of the unit mask description.

For cases where the user wrongly specifies a non-unique hex value for a UM
when they should have specified it by name, the message will be like the
following example:

Unit mask (0x1) is non unique.
Please specify a unit mask by name, using the first word of the unit mask description.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

2014-07-17 17:55:42 Tree
[5ce12e] by Andi Kleen Andi Kleen , pushed by Maynard Johnson Maynard Johnson

Fix some problems in the Broadwell events

Fix some problems in the previous commit of the Broadwell events.
Most flags were missing due to a bug in the generation script.
This patch also re-adds proper PEBS events.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

2014-07-17 17:45:09 Tree
[6d6921] by Andi Kleen Andi Kleen , pushed by Maynard Johnson Maynard Johnson

Add oprofile support for Broadwell microarchitecture

This patch adds the event list of the Intel Broadwell architecture.
Hopefully this can still make 1.0

The patch is very straight forward: just add the model numbers and
type in the usual places and add the event list.

Passes make check

Some notes:
- Haswell included one Broadwell model number by mistake. I moved
that to Broadwell now.
- oprofile doesn't support umask sub events with different counter
constraints than other events. This affects a few events on Broadwell.
However it's not a problem when oprofile uses perf as a backend,
as perf will know how to schedule these events (once it gets the
Broadwell support). It won't work correctly with the old driver.
Most of these events are not too useful for sampling, so in practice
it's not a real problem.
- As usual PEBS events and events with offcore mask and uncore
events are missing.

Signed-off-by: Andi Kleen <ak@linux.intel.com>

2014-07-16 13:03:54 Tree
[9fba36] by Maynard Johnson Maynard Johnson

Fix 'Invalid argument' running 'opcontrol --start --callgraph=<n>' in Timer mode

When a processor type does not support oprofile event-based profiling,
the oprofile kernel driver writes "timer" to the /dev/oprofile/cpu_type
file. For some architectures, the architecture-specific oprofile
kernel driver does not set 'oprofile_operations.backtrace' when in timer
mode. When the opcontrol "--start" option is being executed,
one of the actions taken is to write the CALLGRAPH value from the daemonrc
file to /dev/oprofile/backtrace_depth. The function defined to respond to
writes to this file is drivers/oprofile/oprofile_files.c:depth_write().
In that function, if 'oprofile_operations.backtrace' is not set, it
returns -EINVAL, resulting in the following opcontrol error:
opcontrol: line 1172: echo: write error: Invalid argument

This patch detects when the system is in timer mode and handles this error
appropriately -- if simply writing '0' to backtrace_depth, the error is
ignored; otherwise, print a message that call graph is not supported on
this system in TIMER mode.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-07-14 12:56:39 Tree
[9c662b] by Maynard Johnson Maynard Johnson

Make sure hypervisor is excluded from ocount and operf

Since we have no interface support in the event specification to
allow the user to select or de-select counting events in hypervisor,
and also since the output of ocount and opreport do not support the
concept of hypervisor, we should exclude hypervisor from counting
and profiling. There's a bug in the current code such that the
user may or may not get hypervisor events included. This patch
explicitly excludes hypervisor.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-07-07 14:03:35 Tree
[77a576] by Maynard Johnson Maynard Johnson

Add event spec examples to operf and ocount man pages

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-06-27 13:14:36 Tree
[8933ea] by Maynard Johnson Maynard Johnson

Fix spelling error in previous commit

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-06-26 17:28:57 Tree
[ac0257] by Maynard Johnson Maynard Johnson

Fix memset problem in previous commit; fails to compile on Fedora 20

Building oprofile on Fedora 20 pointed out a problem in memset call
from previous commit. The build error is:

In function 'void* memset(void*, int, size_t)',
inlined from 'int operf_record::_start_recoding_new_thread(pid_t)'
at operf_counter.cpp:769:28:
/usr/include/powerpc64le-linux-gnu/bits/string3.h:81:32: error: call
to '__warn_memset_zero_len' declared with attribute warning:
memset used with constant zero length parameter; this could be due
to transposed parameters [-Werror]
__warn_memset_zero_len ();

This patch fixes the problem.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-06-26 13:58:53 Tree
[eb7558] by Maynard Johnson Maynard Johnson

Improve sample collection in multi-threaded apps when using "--pid" option

See oprofile bug # 260 (https://sourceforge.net/p/oprofile/bugs/260/) for reference.

operf may fail to collect samples for new threads or forked
processes in a multi-threaded app under certain conditions:

- If operf is started on a multi-threaded app (i.e., one that uses
pthreads) with the "--pid" option
- If said app has already started at least one thread
- If the app either creates new threads or forks new processes
after operf has started

then no samples are collected for those new threads/processes.

This patch fixes that problem (mostly) and documents the limitations and
issues when using the "--pid" option. The limitations and issues are:

1. When using "--pid" to profile a multi-threaded application that also forks
new processes, samples for processes that are forked before profiling is
started may not be recorded (depending on timing of thread creation and
when operf is started)

2. The "--lazy-conversion" option is not recommended to be used in conjunction
with the --pid option for profiling multi-threaded processes. Depending on
the order of thread creation (or forking of new processes), you may not get
any samples for the new threads/processes.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-06-23 19:40:35 Tree
[d3e895] by Maynard Johnson Maynard Johnson

Fix vsyscall sample collection on x86 architectures

On certain architectures running older kernels (x86* on kernel 3.0 and older, I think),
a static mapping is placed into every process's memory map to provide vsyscall
functionality. This mapping is labeled '[vsyscall]'. This mapping is a mechanism
for reducing latency of a handful of calls (e.g., gettimeofday) that would otherwise
require a more expensive system call.

When operf is used to collect a profile on an application that uses vsyscall functions,
the samples taken within the vsyscall mapping are lost. For example, profiling the
program below [1] would result in a "Lost samples detected" message from operf, and the
operf.log would show something like the following:

-- OProfile/operf Statistics --
Nr. non-backtrace samples: 71964
Nr. kernel samples: 203
Nr. user space samples: 71761
Nr. samples lost due to sample address not in expected range for domain: 0
Nr. lost kernel samples: 0
Nr. samples lost due to sample file open failure: 0
Nr. samples lost due to no permanent mapping: 63590
^-- BAD!!
For some reason (which I don't care to investigate since vsyscall is now replaced
by vDSO), the kernel's perf_events subsystem does not send a PERF_RECORD_MMAP
message for this mapping. This patch adds a function to synthesize such a
message so that samples taken in the vsyscall memory range can be correctly
attributed. Note that when using operf with "--pid" or "--system-wide", this new
function need not be called since when either of those two options are used, operf
already manually generates PERF_RECORD_MMAP messages for all memory mappings for
the process(es) being profiled. However, a fix was needed in that area of operf
to recognize the '[vsyscall]' label and to create the corresponding PERF_RECORD_MMAP
for it.

[1]
------------------

int main(void)
{
struct timeval time;
int rc;
while (1)
rc = gettimeofday(&time, NULL);

return rc;
}
-------------------

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-06-12 17:38:48 Tree
[98f57a] by William Cohen William Cohen , pushed by Maynard Johnson Maynard Johnson

Remove unused functions causing errors in recent gcc

The Fedora rawhide compiler is now stricter and will treat the
warnings for unused functions as errors and stop the compile. This patch
removes two unused functions in the code.

Signed-off-by: William Cohen <wcohen@redhat.com>

2014-06-10 14:56:55 Tree
[53ad8c] by Maynard Johnson Maynard Johnson

Fix sample data pipe partial read handling

Unless the user specifies the "--lazy-conversion" option with operf,
there will be separate "record" and "convert" processes started
by operf. The record process reads the sample data from the kernel
(via mmap'ed memory) and sends that data to the convert process
over a pipe. The convert process takes the raw sample data
and converts it to oprofile sample files. In the procedure for
reading the pipe, the convert process does two reads: one read to
obtain the sample data event header (which will give the size of
the whole sample data event), and a second read to get the rest
of the event record. For both of these reads, there is code to handle
return values that imply something other than a successful read of the
record. If the value returned is > 0 but less than the expected
length, we go back and do the read *again* to get the rest of the
record. However, in the second read (to obtain the bulk of the event
record), we were doing a "goto" to the wrong place -- we were going
back to the location of the first read.

This problem, in actual practice, is not likely to occur, since the
record process should normally be able to write an entire event
record to the pipe wihout interruption. But if it does occur, the
convert process will probably fail with a message like the following:

Event num -1 for id <x> is invalid. Sample data appears to be corrupted.

In fact, about a year ago, a user had reported such a problem. It was
intermittent, and was not reproducible except on that user's particular
system, which was a very large server system. He was doing a system-wide
profile of a Java benchmark. By lowering the sampling rate, the problem
went away. At the time, I was not able to find the problem, but I suspect
that the problem described above that's addressed by this patch is very
likely the cause of his problem, too. I've contacted this user, but so
far, he's not been able to reproduce the original problem in order to
effectively test this patch. Nevertheless, I'm confident that this fix
is a correct one and should be done.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-06-09 14:47:13 Tree
[949ed6] by Carl Love Carl Love , pushed by Maynard Johnson Maynard Johnson

opreport: header timestamps are different for kallsyms file

The time stamp for kallsyms can be different because it is not a real
file. Hence, when there are samples from the kallsyms file and the
following conditions are met you get an error about the time stamps
not matching.

- operf was run with the '--separate-thread' option
- operf was run either as root or as normal user where
/proc/sys/kernel/kptr_restrict is set to 0
- The application being profiled is a multi-threaded app that
executes both pthread_create and fork[2]

This patch fixes the issue by assigning the time stamp of zero when
the source file is kallsyms.

Signed-off-by: Carl Love <cel@us.ibm.com>

2014-06-09 14:15:59 Tree
[5646af] by Maynard Johnson Maynard Johnson

opreport XML: binary-level count field issues

See oprofile bug # 236 (https://sourceforge.net/p/oprofile/bugs/236/).

There are several issues relating to the use of the 'count' element
defined in opreport.xsd. For example, below is the current schema
definition for the 'binary' element. Note the usage of the 'count'
element:

<xs:element name="binary">
<xs:complexType>
<xs:sequence>
<xs:element minOccurs="1" maxOccurs="1" ref="count"/>
<xs:element minOccurs="0" maxOccurs="unbounded" ref="symbol"/>
<!-- When the separate=lib option is used an binary
can contain a list of library Modules. -->
<xs:element minOccurs="0" maxOccurs="unbounded" ref="module"/>
</xs:sequence>
<xs:attribute name="name" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>

There have been questions from users whether the 'count' element
associated with the 'binary' element is supposed to represent a
total count across all modules for the executable or if it is only
the count for the executable itself (the answer is the latter).

Additionally, it's possible that there may be no samples at all
for the binary file -- i.e., all samples collected were for module
elements -- thus, the minOccurs attribute for the 'count' element
of 'binary' should be '0'.

Finally, using xmllint on a XML instance document created from
opreport on a profile run that specified "--separate-cpu" identified
that the instance document was invalid when compared against its
associated schema file (opreport.xsd). Reviewing the schema, I
realized that all usages of the 'count' element were wrong insofar
as the maxOccurs attribute. Instead of being set to '1', maxOccurs
should be 'unbounded' since we can have multiple 'count' elements
associated with any given higher level element (e.g., 'binary')
if there are multiple classes in the profile. Multiple classes
will exist for a profile for various reasons -- e.g., profiling with
'--separate-cpu', or multiple events.

This patch addresses these issues. The major version number of the
schema is not being changed -- only the minor number. This is because
instance documents that previously validated using the old schema
will still be valid with the new schema.

A testsuite patch is being developed to validate XML instance documents
for various scenarios.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-05-29 15:10:41 Tree
[f72665] by Maynard Johnson Maynard Johnson

Update events for IBM POWER8 processor

The initial support for the IBM POWER8 processor was added to oprofile in
May 2013. Some events were held back as their descriptions may have exposed
information about the POWER8 architecture that IBM wanted to remain private
until the official announcement. Some other events were held back because they
had not yet been verified. The POWER8 has now been announced and all events
have been verified, so we can now publish all events.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-05-14 18:50:12 Tree
[4a8af0] by Maynard Johnson Maynard Johnson

Minor error-handling fixes needed for ocount

The main fix in this patch is to properly handle the case
where perf_event_open fails after we have already started
the app for which the user requested to count events.
Prior to this fix, ocount would simply exit, leaving the
app to run to completion. This patch causes ocount
to kill the app if we encounter a perf_event_open error.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-05-13 18:41:44 Tree
[592d7a] by Maynard Johnson Maynard Johnson

operf log may over-report "sample address not in expected range for domain"

When find_mapping_for_sample() is called, only the sample address is passed.
However, in certain situations, the address alone is not enough to find the
proper matching operf_mmap object. For example, hypervisor addresses can
overlap with userspace addresses where the vdso gets mapped in a 32-bit
process on the ppc64 architecture. If an operf_mmap object has already been
created with a particular memory address range and then a sample address
from VDSO in a 32-bit process needs to be processed, we may incorrectly match
that address with the hypervisor operf_mmap object. With this patch, we now
will pass a flag ('hypervisor_sample') and ensure that we only return an
operf_mmap object if the address is a match AND the flag value matches with
operf_mmap.is_hypervisor.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-05-13 16:34:15 Tree
[d22dc1] by Maynard Johnson Maynard Johnson

Bug #266: exclude/include files option doesn't work for opannotate -a

The opannotate man page did not make it clear at all that the --exclude-file
and --include-file options apply *only* when specifying "--source"
annotation. This patch updates the man page to clarify the behavior.
It also adds a check in opannotate to disallow specifying either of these
options when "--assembly" is specified.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-05-13 15:46:25 Tree
[53d86a] by Maynard Johnson Maynard Johnson

Change user guide to clarify callgraph is not supported for JIT samples

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-05-13 15:29:01 Tree
[369998] by Maynard Johnson Maynard Johnson

Send 'Unable to obtain appname' message to stdout

In operf_process_info::set_appname, we look at /proc/<PID>/exe
to see if we can obtain the process's appname. However, the
process may have already ended, in which case we print the message

"Unable to obtain appname from /proc/<PID>/exe"

This error message should *NOT* be sent to stderr, but instead,
should go to stdout so that it's not seen on the screen when doing:
operf --verbose=all <cmd> > out-file

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-05-13 14:58:15 Tree
[008e47] by Aaro Koskinen Aaro Koskinen , pushed by Maynard Johnson Maynard Johnson

configure: fix test-for-synth check with GCC 4.9.0

With GCC 4.9.0 oprofile 0.9.9 build fails on non-PPC platfroms because
the "test-for-synth" configure check result is incorrect: There is a NULL
pointer dereference in the test program, so the compiler seems to optimize
the rest of the code away, and the test will always succeed regardless
whether powerpc_elf64_vec/bfd_elf64_powerpc_vec are present or not.
Fix by allocating the referred struct statically.

While at it, also include stdio.h to avoid a compiler warning.

Signed-off-by: Aaro Koskinen <aaro.koskinen@iki.fi>

2014-05-12 13:28:46 Tree
[63b569] by Alan Modra Alan Modra , pushed by Maynard Johnson Maynard Johnson

Tidy powerpc64 bfd target check

Testing for a bfd_target vector might (will!) break. See
https://sourceware.org/ml/binutils/2014-04/msg00283.html

It's safer to ask BFD for the target name. I left the direct target
vector checks in configure tests, and updated them, even though the
target vector is no longer used in oprofile code, because a run-time
configure test for powerpc64 support in bfd:
#include <bfd.h>
int main(void)
{ return !bfd_find_target("elf64-powerpc", (void *)0); }
unfortunately isn't possible when cross-compiling.

The bfd_target vector tests could be omitted if we aren't bothered by
the small runtime overhead of a strncmp on targets other than
powerpc64.

* libutil++/bfd_support.cpp (get_synth_symbols): Don't check for
ppc64 target vector, use bfd_get_target to return the target
name instead.
* m4/binutils.m4: Modernize bfd_get_synthetic_symtab checks to
use AC_LINK_IFELSE. Check for either powerpc_elf64_vec or
bfd_elf64_powerpc_vec.

Signed-off-by: Alan Modra <amodra@gmail.com>

2014-05-02 12:54:08 Tree
[69e1b1] by Maynard Johnson Maynard Johnson

Add 4 more edge detect events for use in CPI analysis

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-04-08 18:13:22 Tree
[b07c8a] by Maynard Johnson Maynard Johnson

Allow root to remove old jitdump files from /tmp/.oprofile/jitdump

Currently, the opjitconv program reqiress that the owner of an old
jitdump file and the user running operf must be the same in order
to allow deletion of said jitdump file. The root user should be
allowed to do this, too, which is what this patch does.

Signed-off-by: Maynard Johnson <maynardj@us.ibm.com>

2014-04-08 15:20:45 Tree
Older >