Re: [perfmon2] libpfm4 progress
Status: Beta
Brought to you by:
seranian
From: stephane e. <er...@go...> - 2009-06-19 17:24:32
|
Dan, Here is another libpfm4-related question. One thing that is quite bad with the current libpfm-3.x is that you pass events and get back PMCs and PMDs but it is not obvious which register corresponds to which event. The mapping is not always 1-to-1. Take Intel Core, you can pass unhalted_core_cycles and instructions_retired, that would go into a single PMCs and two PMDs. In general tools, do not really care about PMCs, they get them back and pass them to the kernel (perfmon does not allow reading back of the PMCs). But you need to know which data register to read, so you can collect your data. There is an implicit rule inside libpfm which says that PMD for events are returned IN THE ORDER of the events. And it assumes a one-to-one mapping: 1 event = 1 data register. This is pretty basic and has worked okay until now. With libpfm4, everything is represented as an event, not just the actual events but also AMD64 IBS, Intel LBR. Thus, there needs to be a more robust way of mapping events -> registers. Ideally, you'd want libpfm to return the list of PMCs and PMDs that correspond to each event. For instance: struct { pfmlib_reg_t pmcs[]; pfmlib_reg_t pmds[] }; In some cases, the PMCs of two events could be identical, e.g., Intel Core fixed counters. But usually PMD would always be distinct. But the above proposal is overkill and consumes quite a bit of memory, unless all of this is dynamically allocated. The current alternative I am experimenting with is that for each register returned, the index of the event is stored. Remember that pfm_dispatch_events() is replaced with pfm_assign_events(char **events_argv, pfmlib_assign_in_t *in, pfmlib_assign_out_t *out); If you pass: events_argv[0] = unhalted_core_cycles events_argv[1] = instructions_retired You will get back: out->pmds[0].reg_num = 17; out->pmds[0].reg_eventid = 0; out->pmds[1].reg_num = 16; out->pmds[1].reg_eventid = 1 Thus, to find out what register(s) you need to read for unhalted_core_cycles, you have to scan out->pmds[] once, looking for eventid == 0. I know it is not ideal. But this works for multi-pmd events or pseudo-events such as AMD IBS. For instance, IBSOPFETCH: 1 PMC + 3PMDS Note that this scheme does not work too well for PMCS if they are shared by multiple events, such as on Intel Core for fixed counters. But my earlier point was that tool, do not really care as to which PMCs corresponds to which event. There are some alternatives such as returning a pair of bitmasks per event, one for the PMCs the other for the PMDs. I am open to suggestions on how to solve this better. |