On Tue, May 27, 2003 at 04:41:54PM -0500, Jimmy DeWitt wrote:
> Do you have any documentation on how oprofile works(internals)?
No (guilt twang ...). But coupled with an overview, the kernel code in
2.5 should be pretty understandable. I've commented some of the
> Is the oprofile kernel specific code marked? Which kernel files contain
> oprofile code?
arch/<arch>/oprofile/ (perf counter driving code)
> Any overview(s) before I start looking at code?
OK, basically we have a buffer per-cpu (cpu_buffer.c). That is where raw
EIP samples are put (by calling oprofile_add_sample). Additionally, we
add information there when we switch a task (storing the task structure
pointer) and when we switch from kernel mode <-> user mode.
On every needed event (see below), and after a timeout, we synchronise
all the cpu buffers into the event_buffer (buffer_sync.c,
event_buffer.c). This is what the daemon reads. To synch, we must
convert the EIP values into (dentry, offset) tuples. The dentry will be
the file mapped in for the task, andd the offset is basically the EIP
value minus the start of the file mapping in the task's address space.
So for each buffer we keep track of the task at that point (looking at
the recorded task switches) and convert the EIPs into the tuple by
walking the task's vma list.
Obviously, we can't record the path of the dentry into the event buffer.
So instead a dentry cookie is used. This is a value that userspace can
later use to lookup the actual path of the dentry. The dentries are held
in memory until profiling is over (fs/dcookies.c). The value is just a
unique ID for a particular dentry.
Because we store task_struct *, and we look at the task's VMA list, we
have to make sure to process the buffers on certain events, such as the
task exiting (we can't let the task_struct * become invalid) and
unmappings of executable regions. This is done by a couple of small
hooks into the kernel core (profile_<blah>() hooks).
The reading of the CPU buffers is done concurrently with the writing of
EIPI values by oprofile_add_sample(). A simple head/tail
producer/consumer thing is used to handle this.
We also store the "main" mapping, for identifying applications
(/bin/bash) versus mapped libraries etc. (/lib/libc.so). Additionally we
store the current CPU in the event buffer for later per-cpu separation.
The oprofile daemon reads this buffer. Each time it sees a dcookie it
looks it up via the system call, and stores the result. So by going
through the buffer. it can write the sample offset to the appropriate
Hope that helps