My first impression was to ask Vince and Phil if they could explain why PAPI uses pfm_find_event.  But Phil beat me to it so I decided to have a look at the PAPI source code to see if I could figure it out instead.  After a quick check of the source, it looks like the calls to pfm_find_event may be unnecessary so I will try to bypass them and see what happens.




From: [] On Behalf Of Heike McCraw
Sent: Tuesday, April 22, 2014 2:01 PM
To: Philip Mucci
Cc:; <>; perfmon2-devel
Subject: Re: [Perfapi-devel] [perfmon2] FW: Proposed enhancement to libpfm4.




On Tue, Apr 22, 2014 at 1:18 PM, Philip Mucci <> wrote:

This may be a pretty easy change.


Sure, go for it.


Papi team, comments?

Apologies for brevity and errors as this was sent from my mobile device.

On Apr 22, 2014, at 11:34, Stephane Eranian <> wrote:



On Sat, Apr 19, 2014 at 1:38 AM, Gary Mohr <> wrote:



Papi normally uses the OS_NONE mode.  But I changed the call to pfm_get_os_event_encoding  to use the OS_PERF_EVENT_EXT mode for the test I mentioned below.


If we could get to the encode call it might work.  But before papi calls the encode function, it calls pfm_find_event passing the event string (which includes an extended mask).  The code in libpfm4 is hard coded to force OS_NONE to get used by pfmlib_build_event_pattrs (which gets called under pfm_find_event).


Why does PAPI need to call pfm_find_event() before calling the encoding routine?

Just let the encoding routine figure out if the event exists.



So the problem is that event strings which contain an extended mask cannot be passed to pfm_find_event.  If you do it always returns an error.


Yes, because this old routine is assume PERF_OS_NONE (pure hardware event lookup).


I ran both of the libpfm4 examples you show below and they both worked correctly.  But when I turned on libpfm4 debug, it showed that neither of them ever called pfm_find_event (I have inserted an extra debug message in my version so I can see if it gets called).


I do not know why papi uses pfm_find_event but do not think I am up to restructuring the way papi uses the libpfm4 API.


So it looks like I cannot use your extended masks unless pfm_find_event can accept event strings that contain them.


As a test, I put a change in the pfm_find_event to force the osid index to PFM_OS_PERF_EVENT_EXT after the new event table is cleared.  With this change, when PAPI uses the cpu mask it seems to work (at least I do not get any errors and the cpu number comes back from the encode function in the arg structure).  I have not done the papi changes yet to use it but that should be possible.


So with it sort of working, I tried to list the native events with papi_native_avail.  The list did not show the cpu mask on any of the events (probably somewhere in the papi or libpfm4 list code I will need to force it to use the extended OS mode).


After getting it working part way, I am not convinced it is cleaner than what I had added to libpfm4.  Making this work with extended event masks will also require changes in both papi and libpfm4.  The change I made above to force pfm_find_event to always use extended masks may not be the right approach.  In addition papi will not be able to use this approach to add additional masks in its effort to change the effective scope  of other papi attributes (without additional support in libpfm4 to also know about new masks for those attributes).  The patch I gave you allows papi to extend the event masks without you having to know anything about what they are doing.


But you are in charge so do you want to go with the patch I had provided or do you want me to make it work using extended event masks ?


I am sorry this has been a pain but thanks a bunch for your help and understanding.




From: Stephane Eranian []
Sent: Friday, April 18, 2014 3:06 PM

To: Gary Mohr
Cc: perfmon2-devel
Subject: Re: [perfmon2] FW: Proposed enhancement to libpfm4.



$ examples/showevtinfo -O perf_ext | less

$ perf_examples/task -e unhalted_core_cycles:cpu=2 ls

             1991845 unhalted_core_cycles:cpu=2 (0.00% scaling, ena=1273464, run=1273464)


The showevtinfo must show the [cpu] modifier for all events.


Some explanation about OS_PERF_EVENT vs. OS_PERF_EVENT_EXT vs. OS_NONE


This flag controls what modifiers are visible to users.

OS_NONE: raw modifiers supported by hw

OS_PERF_EVENT: basic perf_event modifiers, mostly priv levels (take over from hw)

OS_PERF_EVENT_EXT: OS_PERF_EVENT + extra modifiers for sampling period, freq, and so on


Also the last two prep a perf_event_attr struct.


I don't know what papi is doing with the encoding of events. It may not be using the

OS_PERF_EVENT_* modes.

I have layered the modifiers based on what is providing the support for them.

This makes the library portable to other OSes.



On Fri, Apr 18, 2014 at 10:39 PM, Gary Mohr <> wrote:



I modified the calls to  pfm_get_os_event_encoding to use PFM_OS_PERF_EVENT_EXT but I am still seeing the same error. 


The error is coming back from a call to pfm_find_event which only takes the event string (no OS argument).  Papi calls pfm_find_event before calling pfm_get_os_event_encoding and since the first one gets an error we never get far enough to try the encoding.  I added debug prints to both of these functions and this is the output I see:


pfmlib_common.c (pfm_find_event.791): ENTER: str: unhalted_core_cycles:cpu=2

pfmlib_common.c (pfmlib_build_event_pattrs.1050): 72 0 2 1 0 k

pfmlib_common.c (pfmlib_build_event_pattrs.1050): 72 1 2 1 1 u

pfmlib_common.c (pfmlib_build_event_pattrs.1050): 72 2 2 1 2 e

pfmlib_common.c (pfmlib_build_event_pattrs.1050): 72 3 2 1 3 i

pfmlib_common.c (pfmlib_build_event_pattrs.1050): 72 4 3 1 4 c

pfmlib_common.c (pfmlib_build_event_pattrs.1050): 72 5 2 1 5 t

pfmlib_common.c (pfmlib_parse_event_attr.899): cannot find attribute cpu


The call to the encoding function would have happened soon if pfm_find_event did not return the error.


It looks to me like the libpfm4 code pfm_find_event sets up an pfmlib_event_desc_t table which is cleared to zeros.  Then it passes the table to pfmlib_parse_event which passes it to pfmlib_build_event_pattrs.  Then pfmlib_build_event_pattrs picks an os based on the osid in the table. Since the table was cleared by pfm_find_event, the osid is zero with is an OS of NONE.  I do not see any the caller can control what OS gets used in the call to pfm_find_event.


So I am still not having any luck.  Any other ideas ??






From: Stephane Eranian []
Sent: Friday, April 18, 2014 12:17 PM

To: Gary Mohr
Cc: perfmon2-devel
Subject: Re: [perfmon2] FW: Proposed enhancement to libpfm4.




On Fri, Apr 18, 2014 at 8:12 PM, Gary Mohr <> wrote:

Hi Stephane,


I put your patch in but it still does not seem to recognize the cpu mask.  The papi and libpfm4 debug output where the error is reported looks like this:



API:papi.c:PAPI_event_name_to_code:1005:80833 Entry: in: 0x7fffaa5d669f, name: unhalted_core_cycles:cpu=2, out: 0x7fffaa5d3f54

SUBSTRATE:papi_internal.c:_papi_hwi_native_name_to_code:2099:80833 checking all 9 components

SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_ntv_name_to_code:796:80833 Converting unhalted_core_cycles:cpu=2

SUBSTRATE:components/perf_event/pe_libpfm4_events.c:find_existing_event:38:80833 Looking for unhalted_core_cycles:cpu=2 in 35 events

SUBSTRATE:components/perf_event/pe_libpfm4_events.c:find_existing_event:54:80833 unhalted_core_cycles:cpu=2 not allocated yet

SUBSTRATE:components/perf_event/pe_libpfm4_events.c:_pe_libpfm4_ntv_name_to_code:805:80833 Using pfm to look up event unhalted_core_cycles:cpu=2

SUBSTRATE:components/perf_event/pe_libpfm4_events.c:find_event:116:80833 Looking for unhalted_core_cycles:cpu=2

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 0 2 1 0 k

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 1 2 1 1 u

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 2 2 1 2 e

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 3 2 1 3 i

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 4 3 1 4 c

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 5 2 1 5 t

pfmlib_common.c (pfmlib_parse_event_attr.898): cannot find attribute cpu

SUBSTRATE:papi_internal.c:_papi_hwi_native_to_eventcode:378:80833 Looking for component 0 event 0

SUBSTRATE:components/perf_event_uncore/peu_libpfm4_events.c:_peu_libpfm4_ntv_name_to_code:828:80833 Converting unhalted_core_cycles:cpu=2

SUBSTRATE:components/perf_event_uncore/peu_libpfm4_events.c:find_existing_event:43:80833 Looking for unhalted_core_cycles:cpu=2 in 0 events

SUBSTRATE:components/perf_event_uncore/peu_libpfm4_events.c:find_existing_event:59:80833 unhalted_core_cycles:cpu=2 not allocated yet

SUBSTRATE:components/perf_event_uncore/peu_libpfm4_events.c:_peu_libpfm4_ntv_name_to_code:837:80833 Using pfm to look up event unhalted_core_cycles:cpu=2

SUBSTRATE:components/perf_event_uncore/peu_libpfm4_events.c:find_event:121:80833 Looking for unhalted_core_cycles:cpu=2

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 0 2 1 0 k

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 1 2 1 1 u

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 2 2 1 2 e

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 3 2 1 3 i

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 4 3 1 4 c

pfmlib_common.c (pfmlib_build_event_pattrs.1049): 72 5 2 1 5 t

pfmlib_common.c (pfmlib_parse_event_attr.898): cannot find attribute cpu

SUBSTRATE:papi_internal.c:_papi_hwi_native_to_eventcode:378:80833 Looking for component 1 event 0x40000000

SUBSTRATE:papi_internal.c:_papi_hwi_add_native_event:185:80833 Creating Event 0x40000023 which is comp 1 internal 0x40000000

SUBSTRATE:papi_internal.c:_papi_hwi_native_to_eventcode:378:80833 Looking for component 4 event 0x40000023

SUBSTRATE:papi_internal.c:_papi_hwi_add_native_event:185:80833 Creating Event 0x40000024 which is comp 4 internal 0x40000023

SUBSTRATE:papi_internal.c:_papi_hwi_native_to_eventcode:378:80833 Looking for component 8 event 0x40000024

SUBSTRATE:papi_internal.c:_papi_hwi_add_native_event:185:80833 Creating Event 0x40000025 which is comp 8 internal 0x40000024

PAPI Error: Error Code -7,Event does not exist.

PAPI Error: Error Code -7,Event does not exist.


The patches were applied to a copy of libpfm4 pulled from your GIT 2 days ago (4/16).  But I also pulled your latest (this morning) and the only differences I see are some man page updates and a bunch of attributes that have been added to the haswell events.


I have not made the papi change to use the cpu number from the arg structure yet but it looks like papi gets an error back from pfm_get_event_info which happens before we ever try to call the encode function.  I also tried some uncore events with the cpu mask and got the same results.


Is there something I am missing ?


Yes, you need to request encoding for PERF_EVENT_EXT:


                 ret = pfm_get_os_event_encoding(*argv, PFM_PLM0|PFM_PLM3, PFM_OS_PERF_EVENT_EXT, &arg);


That will enable the extra modifiers which are specific to perf_events.

Note that cpu= takes a cpu number and not a CPU mask.




From: Stephane Eranian []
Sent: Friday, April 18, 2014 5:17 AM

To: Gary Mohr
Cc: perfmon2-devel
Subject: Re: [perfmon2] FW: Proposed enhancement to libpfm4.




Here is a better way of doing this. In the libpfm4 perf_event layer, there was already provision

to handle the cpu to program. With this, you can simply do: unhalted_core_cycles:cpu=2

The cpu index is returned by the encoding call, in arg.cpu, up to you to use it.


Please try the attached patch on top of libpfm4 git tree and let me know if it helps solve your





On Thu, Apr 17, 2014 at 8:57 PM, Gary Mohr <> wrote:

Hi Stephane,


Thanks for the reply.


I understand that uncore events are system wide and that the cpu number being passed is done to allow the kernel to identify which package should be used for an uncore event.  Just as a note, I am running on a Redhat kernel which includes backported uncore support.  In this kernel there are no /sys/device/uncore_xxx/cpumask files, but uncore counters work just fine.  Apparently these files were added after the initial uncore logic was put in.


Papi provides a way to set a cpu number to be used for all events in an event set.  In order to use it a papi application must call PAPI_set_opt to attach a cpu to the event set.  When using this interface however, it is only possible to count events on one uncore package at a time.  This is because the attached cpu is used for all events in an event set and papi only allows one event set per component to be active at any given time.


If the cpu can be specified as a mask for uncore events, it allows the user to count uncore events on multiple packages at the same time.   I hope this explains why converting from the current way papi sets a cpu number (API call) to the new approach (event mask) is worthwhile.  The other advantage of using an event mask to specify the core number is that it automatically extends to all existing papi applications the ability to use uncore events without having to change a single line of code in any of them.  Papi also has a couple of other cases where moving an event attributes scope from the event set to an event would be desirable.


So the question is really about the best way to extend the set of supported event masks to include some that contain information to be processed by papi.  It is true that this could be done without making changes to libpfm4 but it is my belief that doing so would make the papi code much more complicated and one of them (adding new event string delimiters) may even introduce problems in existing papi applications.


So it was my feeling that if libpfm4 could be enhanced to provide an additional interface, it would make this much easier to handle in papi.  One of my requirements for this change was that all existing calls to libpfm4 must continue to work exactly as they already do.  Did not want to have any adverse impact on any other possible users of libpfm4.  But I also feel that if there is something libpfm4 can do to help papi evolve into a better product, it is reasonable to modify libpfm4 (as long as it does not break any existing libpfm4 features).


I implemented the new interface I had in mind and changed the papi uncore component to use it.  When I got it working, it provided what I hoped it would.  An unchanged existing papi application (papi_command_line) was able to use uncore events and count events on multiple packages at the same time.  Furthermore the core component in papi still uses the old interface to libpfm4 and it also still works correctly.


This implementation makes fairly small changes in both libpfm4 and papi to make this work.  Since all existing interfaces in libpfm4 were preserved, I do not see any risk to libfpm4.


Phil’s response to your message below gave a pretty good overview of what the changes did.  I hope that my explanations above help you to understand why I choose to take this approach.


I am willing to go back and do it a different way if you are not happy with these changes.  But I still feel this is the best solution and would like you to understand my thought process and the changes before making that choice.  So if you have any more questions with the approach or the code, I will be happy to answer them.


Thanks again for your time.





From: Stephane Eranian []
Sent: Thursday, April 17, 2014 8:00 AM
To: Gary Mohr
Cc: perfmon2-devel

Subject: Re: [perfmon2] FW: Proposed enhancement to libpfm4.





I am trying to understand the underpinning here. You are saying there is no way to pass a CPU to the PAPI call

to pin an uncore event to a particular socket.


First, uncore events are system-wide only events. This is why you need to pass a CPU number (as a substitute

for a socket number). Second, the kernel always exports a list of CPUs to monitor for each uncore PMU. It is

located in /sys/device/uncore_xxx/cpumask.


I don't really like the libpfm4 changes you are proposing. They do not make sense to me because you are trying

to work around a limitation of PAPI by modifying libpfm4.


My understanding is that PAPI is not designed to handle system-wide events. System-wide events require a CPU

number. So why not extend PAPI to handle this instead so it would work with or without libpfm4? I understand it

would break existing tools, but then those tools are not ready to cope with CPU or socket-level measurements, maybe.



On Thu, Apr 10, 2014 at 5:40 PM, Gary Mohr <> wrote:

Also send this to the perfmon mailing list.


From: Gary Mohr
Sent: Wednesday, April 09, 2014 4:56 PM
To: Stephane Eranian
Cc: Vince Weaver; Philip Mucci; Heike McCraw; Michel Brown
Subject: Proposed enhancement to libpfm4.


Hi Stephane,


There has been quite a bit of discussion in the PAPI community lately regarding ways to make the PAPI uncore component useful to existing PAPI applications. 


A short description of the problem:


The kernel requires a cpu number to be provided on the open when setting up to count uncore events (used by kernel to pick the package/socket to count). 

PAPI currently provides a way to set a cpu number but it requires a call to PAPI_set_opt which existing papi applications that are currently used with core events almost never use.

This means that existing PAPI applications cannot use uncore events without coding changes.


Possible solution:


Change the uncore event string to include information to specify the core number that should pass to the kernel for this event.

PAPI applications normally get the event to use from a user or config file, so they would have access to uncore events if the user just adds a little extra information to the event string.


Two approaches were considered:


1 -- The event name could be extended to include a package component.  This would result in the event names being replicated once for each package on the system.

2 -- A new event mask could be added to provide the number of the core which should  be passed to the kernel for the event.


Since the SNBEP system already has 315 uncore events, replicating them for each package could lead to over 1200 different event names.  The current list output for uncore events on this system produces 6,000+ lines of output.  Replicating each event could drive that to about 24,000 lines of output.  This makes the first approach less than desirable.


A new mask for the uncore events could be added to identify which core number should be passed to the kernel.  But this information is needed by PAPI and does not end up in the attribute structure built by libpfm4 and passed to the kernel by PAPI.  This means that we would be introducing a mask that should be processed by PAPI and not libpfm4.  The new mask approach would have no effect on the number of events and little or no effect on the list output.  So it seemed to be the preferred approach.


In addition during these discussions, it was felt that a small number of other PAPI attributes could also be handled with PAPI specific event masks rather than through independent API calls (as is required today).  This encouraged looking for a general solution.


Two different approaches for adding a mask have been considered:


1 -- Modify PAPI to prescan the event strings to remove and process the new mask.

2 – Enhance libpfm4 to allow event strings which contain masks it does know about.


The first approach probably can be done but there is some concern that if PAPI prescreens and removes some of the event masks, it may remove masks that would have been meaningful to libpfm4.  This would be undesirable but could be avoided with careful PAPI mask names.


The idea behind the second approach is to add a feature to libpfm4 which would allow PAPI to pass an event string which contains some masks which libpfm4 may not understand.  When this is done, libpfm4 would be able to return a table to the caller which contains the events libpfm4 did not recognize.  When using this new feature, libpfm4 would not consider an unknown mask as an error.  It would just return unprocessed masks  to the caller and let the caller decide if those masks were valid.  This provides PAPI with an easy way to extend the set of event masks an application can use.  Of course when this new feature is not being used, libpfm4 would continue to behave exactly as it has in the past.


I spent some time adding this feature to libpfm4 and now have it working.  The end result is that I can now use papi_command_line to count uncore events without any changes to the application.


A high level summary of what I did to libpfm4:


I created two new libpfm4 functions which provide the same service as two existing functions but accept an additional calling argument.  The additional calling argument is a pointer to a table where libpfm4 can store any unprocessed masks.  The new functions are pfm_find_event_mask and pfm_get_os_event_encoding_mask.  The current function names also still exist and just call the new functions passing a NULL pointer for the unprocessed masks table.  Then the code in these new functions was changed to handle the case where it finds an unrecognized mask so it now behaves as described above.


Attached you will find a patch file that contains the libpfm4 changes that I made (code is always more interesting than descriptions).


I am hoping to persuade you that this code is worth putting into libpfm4 but in either case, I am interested in your views on the topic.


There are still a few things in these patches that I think should be changed to make it more robust but if you are in agreement with this approach, I will gladly adjust it to meet expectations.


I hope I did not bore you too much with details but I thought some of the background to explain why something in this area is needed was important.






Put Bad Developers to Shame
Dominate Development with Jenkins Continuous Integration
Continuously Automate Build, Test & Deployment
Start a new project now. Try Jenkins in the cloud.
perfmon2-devel mailing list






Perfapi-devel mailing list

Heike McCraw
Innovative Computing Laboratory (ICL)
University of Tennessee, Knoxville (UTK)
phone:   +1 865 974 8057
fax:        +1 865 974 8296