Thread: Nice Work.

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi All,

	I started to play with oprofile, and I must say that I am pretty
impressed.

	I've worked on Alpha performance tools (such as Iprobe and
DCPI) for about five years and ported a few to Alpha/Linux.  I've also
built a GUI for DCPI and tools to visualize performance data.

	In any event, oprofile is much like DCPI, and I am impressed
by how much of DCPI-similar functionality your reproduced in an
open-source tool.

	Let me tell you some of the things that I've learned, that
might help you in your project.

0) DCPI's dcpilabel function might be a helpful feature to add to oprofile.

Here's a definition:

"- dcpilabel - new command
  - It runs a specified program and labels all collected profiles with a
    user-defined label.  This label can be passed to analysis tools to
    focus their attention on just the profiles with the specified label.
    This functionality can be used for example to compare two runs of a
    program within the same epoch, or to find the samples that fall within
    the kernel during the run of a program."

1) A per cpu breakdown of performance statistics is very important.

I don't have oprofile yet working on an SMP box, but this feature
(which is lacking in DCPI), is asked for again and again on DCPI/Tru64.

As Linux grows to solve bigger and bigger problems, this will be more
important.  (It could be the killer app for oprofile in kernel development.)

2)  Could oprofile adopts something similar to DCPI's concept of Epochs?

Here's the explanation of Epochs from the DCPI page:

"All samples are organized into non-overlapping epochs, each of which
contains samples for some time interval. A new epoch is started (and
the previous epoch terminated) using the dcpiepoch command. "

I thought that oprofile had this feature and called them "sessions" (a
much more sensible name, IMHO.)   Unfortunately, I can't figure out anyway to
specify a new session.

3) How does oprofile deal with modules on ramdisks?

When I run my stock redhat 7.2, oprofile thinks that my SCSI drivers are in
/lib/.

"/proc/ksyms" says that my SCSI drivers are in /lib/, because that's where they
exist on the ramdisk.

"/lib" disappears when the system finishes booting, and oprofile can't find
the "/lib/" files.

4) Can sample totals be stored at the beginning of the sample files?

Performance is VERY slow when mapping and unmapping all of the images
on my 128M machine.  (Everything freezes while op_time is working)

I believe that this is because it is mmaping all of the files in the sample
directory.   This isn't necessary for op_time.  It only needs to know totals.

The overhead of tracking this should be pretty small, but the
performance benefit when running op_time would be huge. (Especially
for the really big sample files.)

5) Can the sample files be compressed?

Your sample files compress enormously.

-rw-r--r--    1 ezolt    csdpg    23296560 Mar  4 16:36 red-carpet
-rw-r--r--    1 ezolt    csdpg       37487 Mar  4 16:37 red-carpet.gz

By using zlib to read and write them, you could dramatically reduce
the amount of I/O necessary to total all of the samples.

6) Please keep the columns consistent/spread-sheet friendly.

The column layout of the output from op_time and oprofpp differ.  If one
were to write a GUI that sits on top of oprofile, it would be easier to
parse the output if it was identical in all cases.

(from image to function to source line to assembly line)

op_time:
/usr/lib/mozilla/components/libgklayout.so 46796 4.40626%
/usr/lib/libgdk-1.2.so.0.9.1 61809 5.81987%
/usr/lib/mozilla/components/libgfx_gtk.so 63646 5.99284%
/lib/i686/libc-2.2.4.so 90628 8.53344%
/usr/src/linux-2.4.7-10/vmlinux 256315 24.1343%

oprofpp:
memmove[0x00088500]: 6.2965% (5650 samples)
free[0x00080b40]: 6.7578% (6064 samples)
__malloc[0x0007ff60]: 7.2448% (6501 samples)
chunk_alloc[0x000801d0]: 10.4867% (9410 samples)
chunk_free[0x00080c40]: 10.7151% (9615 samples)
memcpy[0x00088c10]: 12.9261% (11599 samples)

7) Any plan on supporting the Pentium IV counters?
Self Explanatory.

I only mention #4 and #5, because my system swaps madly when I run
op_time.

I would really like to see a full featured profiling/performance tool
for X86.  I have some experience in the performance tool area, so if I
can help, just drop me a line.

--Phil

Compaq:   High Performance Server Systems Quality & Performance Engineering
---------------------------------------------------------------------------
Phi...@co...                         Performance Tools/Analysis

Thread: Nice Work.

oprofile-list