Thread: [Lse-tech] perfmon interface

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Well, having thrown my oar on the concerns I have with the perfctr
interface, now let me do the same for the perfmon interface.  I'm
moving this discussion onto the lse-tech list so that it is archived
and publicly visible and all those good things.

Stephane, I've been having a closer look at the perfmon spec document
(the version dated Dec 21, 2004).  Below are a number of points which
concern me (this is by no means exhaustive):

General issues:

* The multiplexed syscall takes an argument giving the number of
parameters, the size and type of which depends on the individual
call.  Better would be an overall size argument, or even argument size
and number of arguments.  It's a little harder to process, but at
least you can tell how much memory a call will touch without having to
know about every individual operation.

* The method of requesting overflow sampling or notification for a PMD
assumes there is a unique PMC associated with that PMD.  This is
insufficiently general, since this setup is not naturally true for
ppc64 (event selection for the various counters is controlled by a
combination of various fields in the registers MMCR0, MMCR1 and
MMCRA).

* How widespread is the use of the term "event sets"?  Is it perfmon
specific, or more widely established.  I find the term rather
misleading, and would prefer something like "subcontext".

More specific issues:

PFM_CREATE_CONTEXT

Altering the calling process's memory map as a side effect is icky.
It could also cause problems for (the few) programs which need to take
fine-grained control of their memory maps (JVMs?).  Much better for
the process to map the sample buffer with an explicit mmap() on the
context's fd.  You could however, return an offset at which to perform
the mmap().

PFM_WRITE_PMCS

The documentation says that the PFM_MAX_PMD_BITVECTOR can vary between
PMU models.  But what the value of this is for the current PMU model
is not exported anywhere.  Varying by architecture doesn't make much
sense, since PMU model details vary only mildly more between
architectures than they do within CPU models of one architecture.

PFM_START / PFM_START_SET

	I see no reason for two separate entry points; PFM_START_SET
with NULL argument can just leave the default or currently actively
running set, as a PFM_START.

PFM_STOP

	This could reasonably be folded into PFM_START_SET also, by
having a special set id meaning "no set".  Obviously the name of the
operation would want to be changed, too (PFM_CHANGE_RUNNING?)

PFM_LOAD_CONTEXT

	I'm not sure I see the point of the load_set argument.  What
can be accomplished with this that can't be with appropriate use of
PFM_START_SET?

PFM_UNLOAD_CONTEXT

	Could also reasonably be folded into the above, say using
load_pid == 0 to request binding the context to no thread at all.
Again a name change would be in order (PFM_CHANGE_THREAD?).

PFM_CREATE_EVTSET / PFM_DELETE_EVTSET / PFM_CHANGE_EVSET

Is there really a need to incrementally update the event sets?  Would
a PFM_SETUP_EVTSETS which acts like PFM_CREATE_EVTSET, but replaces
all existing event sets with the ones described suffice.  This
approach would not only reduce the number of entry points, but could
also simplify the kernel's parameter checking.  For example at the
moment deleting an event set which is reference by another sets
set_id_next must presumably either fail, or alter those other event
sets to no longer reference the deleted event set.

PFM_GET_DEFAULT_PMCS

The usefulness of this operation is not obvious to me.  How would you
envisage it being used?

PFM_GET_FEATURES

For one thing this doesn't belong in the multiplexor - it doesn't use
the fd, and would be better exported via /proc or /sys.  But, in any
case, I don't think I've ever seen a subsystem version like this well
used, so I'm not sure the operation is a good idea at all.  How would
you envisage this being used?

PFM_GET_CONFIG / PFM_SET_CONFIG

Again, these definitely don't belong on the multiplexor.  They don't
use the fd, and since they set the permission regime for the
mutliplexor itself, they logically belong outside.  Under Linux these
definitely ought to be sysctls.  If this is ever ported to other OSes,
I still don't think they belong here.  Setting up the permission
regime for all the other calls is, I think a logically OS-specific
operation and doesn't belong in the core API.  As far as I can tell
it's unlikely you would use these operations in the same programs
using the rest of perfmon.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/people/dgibson