 [perfmon2] Multiplexing counters
From: Leonardo Piga - 2012-04-27 03:52:14

Hello,

I am using libpfm 4.2 to measure about 15 performance counters on a
AMD Barcelona CPU. I am able to collect 5 performance counter without
multiplexing.

However, when multiplexing I am getting "negative" delta values using
the syst from perf_examples.

Here is an example:

Sample n
"core" : 0,
       "name" : "perf::PERF_COUNT_HW_CPU_CYCLES",
       "val"  : 131956093,
       "raw"  : 62161538,
       "ena"  : 1000618391,
       "run"  : 471368743,
       "ratio"  : 0.47,
       "delta"  : 131956093

Sample n+1
"core" : 0,
       "name" : "perf::PERF_COUNT_HW_CPU_CYCLES",
       "val"  : 118435137,
       "raw"  : 63447168,
       "ena"  : 2002214687,
       "run"  : 1072611170,
       "ratio"  : 0.54,
       "delta"  : 18446744073696030660

As you can see Sample_n\$val is greater than Sample_n+1\$val that is why
delta is so big (negative actually). (The raw value is growing
though).

So, I think that I am not doing the measurements in the best way.

My questions are:

1)How  does libpfm do the multiplexation?

2) What is the best way to multiplex 15 performance counters in time
window of 1 second? Is there any example available with the library to
do this?

3) What is the purpose of group (-g option on syst tool)? Can it help
on my issue?

--

Leonardo
 Re: [perfmon2] Multiplexing counters
From: Leonardo Piga - 2012-04-28 09:00:49

I figure out the problem what the problem is.

I am not sure if I can consider it a bug on the syst.c or I was just
making wrong assumption about its output.

Anyway, I wrote a document explaining the problem and possibles solutions.

If it is actually a bug I can send my patch that samples the actual
scaled value related to the last sample instead of the value related
to the whole measurement set.

The document can be found here:

http://lampiao.lsc.ic.unicamp.br/~piga/misc/libpfmSystInformation.pdf
 Re: [perfmon2] Multiplexing counters
From: stephane eranian - 2012-04-28 17:12:32

Hi,

On Sat, Apr 28, 2012 at 11:00 AM, Leonardo Piga wrote:
> I figure out the problem what the problem is.
>
> I am not sure if I can consider it a bug on the syst.c or I was just
> making wrong assumption about its output.
>
> Anyway, I wrote a document explaining the problem and possibles solutions.
>
Something is not quite clear to me based on your paper.
What do you mean by tr(i), tr(k)? Is that the value of time_enable, time_running
at sample t? If so, then I don't understand the sum.
time_enable, time_running represent total time since origin of measurement.
At time t, you get t, at time t+1, you get t+1. But maybe I am not
reading this right.
 Re: [perfmon2] Multiplexing counters
From: Leonardo Piga - 2012-04-28 17:40:20

Hi,

tr(i) is the value that I would get on the value counter for "run" if
the counters were restarted after printing the previous sample (i-1).

The same for v(i) the value that I would get on the "raw" counter if
the counters were restarted after printing the sample(i-1).

The same applies for te(i), but for the enable value.

Pv(i), Ptr(i), Pte(i) are the values reported by the tool, for these
numbers, without reseting.

Thus, at the first sample(i=1) we have: Pv(1) = v(1); Ptr(1) = tr(1);
Pte(1)=te(1)

At the second sample (i=2), since the counters are not being reseted,
the values are accumulated, thus we have:
Pv(2)=v(1)+v(2)
Ptr(2)=tr(1)+tr(2)
Pte(2)=te(2)+te(2)

Where tr(2) is the value that we would get in the field corresponding
to the "run" if we had reseted the counter after printing sample 1.

The same applies for v(2) and te(2).

At the third sample (i=3)

Pv(3)=v(1)+v(2)+v(3)
Ptr(2)=tr(1)+tr(2)+tr(3)
Pte(2)=te(2)+te(2)+te(3)

And so on for higher values of i

We are interested in the v(i), tr(i), and te(i). These numbers should
be used to scale and estimate the actual value of the counter at the
sample i and not the Pv(i), Ptr(i), and Pte(i) values as the tool is
doing currently.

Is it clearer now?

Leonardo
 Re: [perfmon2] Multiplexing counters
From: stephane eranian - 2012-05-10 09:33:52

Hi,

Please send me your patch and I'll look at it next week.

Thanks.
 Re: [perfmon2] Multiplexing counters
From: Drongowski, Paul - 2012-04-30 13:29:38

Hello Leonardo --

Barcelona (Family 10h) has only four physical performance counters.
Thus, some other strategy such as multiplexing is required when
collecting 5 performance events during a run on Barcelona.

As an aside, if people are using Family 15h, 15h has six core
performance counters and four Northbridge performance counters.
Even though there are six core counters, event assignment is
restricted on Family 15h. Thus, you will not always be able to
measure six events in a single run without multiplexing. The
Family 15h BKDG has the details about event-to-counter assignment
and restrictions.

-- pj