Re: [perfmon2] PEBS support update
Status: Beta
Brought to you by:
seranian
From: Carole Wu <cw...@gm...> - 2009-11-01 13:58:23
|
Got it. Thanks very much for the explanation. --Carole On Sun, Nov 1, 2009 at 7:06 AM, stephane eranian <er...@go...>wrote: > On Sun, Nov 1, 2009 at 4:37 AM, Carole Wu <cw...@gm...> wrote: > > Hello, > > > > pfmon --smpl-module=pebs-ll -e MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD \ > > --ld-lat-threshold=4 --long-smpl-periods=2000 --smpl-compact > > --with-header ... > > > > This should generate a trace of sample, where each sample represents the > > 2000th long latency (>4 cycles) memory operations. 2000 x (the number of > > samples in the trace) should equal to the number of > LAST_LEVEL_CACHE_MISSES > > collected with perfmon2's "counting" capability, e.g. pfmon -e > > LAST_LEVEL_CACHE_MISSES. Is this right? > > No. > > First of all this is not tracing but statistical sampling. You are > going to loose some misses. > There are shadowing effects. In order to complete a sample, you need > the latency. Until > it is collected, no other loads can be sampled even if they exceed the > threshold, i.e., PEBS > can only track one load at a time. When the PEBS buffer fills up, > monitoring stop, but execution > continues, you are missing some more loads there. > > Note that this discrepancy is not specific to PEBS, you get the same > behavior with AMD IBS or > Itanium D-EAR. But the idea is that if you run for long enough, you > will eventually get enough > representative samples to approximate a trace. > > > > I am seeing mismatching numbers for counts collected with PEBS and the > > counting feature in perfmon2 for my Nehalem machine. > > Your help is appreciated. Thanks, > > Carole > > On Wed, Oct 28, 2009 at 4:36 AM, stephane eranian < > er...@go...> > > wrote: > >> > >> Hi, > >> > >> I am happy to report that I have now uploaded all the code necessary to > >> use > >> PEBS on Intel Core, Atom, and Nehalem. That includes PEBS-LL on Nehalem > >> which is used to sample where cache misses occur. > >> > >> What you need: > >> - latest libpfm sources from CVS > >> > >> - latest pfmon sources from CVS > >> > >> - perfmon2 2.6.30 from GIT > >> > >> git clone > >> git://git.kernel.org/pub/scm/linux/kernel/git/eranian/linux-2.6.git > >> Make sure you enabled 'Unified PEBS' > >> > >> > >> This kernel includes a unified PEBS sampling format which supports > >> Netburst, > >> Core, Atom, and Nehalem. You must insert the module perfmon_pebs_smpl > >> (or compile in the code). > >> > >> Next, to use PEBS, you can simply do: > >> > >> pfmon --smpl-module=pebs --smpl-compact --with-header > >> -einst_retired:any_p \ > >> --long-smpl-period=2400000 ... > >> > >> Not all events support PEBS. In --smpl-compact mode, each line > >> contains a PEBS > >> sample. > >> > >> To collect cache misses on Nehalem, you can do: > >> > >> pfmon --smpl-module=pebs-ll -e MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD > \ > >> --ld-lat-threshold=4 --long-smpl-periods=2000 --smpl-compact > >> --with-header ... > >> > >> You must use the MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD to activate > >> this > >> HW feature. > >> > >> Each line contains a PEBS record, including the cache miss > >> information. The ld-lat parameter > >> is the minimal threshold for the miss latency. Only misses >= > >> threshold are captured. It must > >> be at least 4. 4 cycles is the L1D hit latency. For each captured > >> miss, you get an instruction addr, > >> data addr, miss latency, source of the data (where did it come from, > >> refer to Intel documentation). > >> It is important to understand that the instruction addr does NOT point > >> to the load instruction but > >> ALWAYS to the next dynamic instruction, i.e., the whole state is > >> recorded at retirement of the load. > >> > >> > >> On Mon, Oct 5, 2009 at 1:55 AM, Carole Wu <cw...@gm...> wrote: > >> > Hello, > >> > > >> > I'd like to collect information about my workload, running on Nehalem, > >> > using > >> > PEBS, so I use the following command. > >> > > >> >>> pfmon -e MEM_INST_RETIRED:LATENCY_ABOVE_THRESHOLD > --ld-lat-threshold=1 > >> >>> --long-smpl-periods=2000 --short-smpl-periods=200 ./mcf_base inp.in > >> >>> load latency threshold not yet supported > >> > However, the response seems to suggest that my machine does not > >> > currently > >> > support PEBS? Is it true, or am I not setting parameters correctly? > >> > > >> > Any help is greatly appreciated. > >> > > >> > Carole > >> > > >> > > ------------------------------------------------------------------------------ > >> > Come build with us! The BlackBerry® Developer Conference in SF, CA > >> > is the only developer event you need to attend this year. Jumpstart > your > >> > developing skills, take BlackBerry mobile applications to market and > >> > stay > >> > ahead of the curve. Join us from November 9-12, 2009. Register > >> > now! > >> > http://p.sf.net/sfu/devconf > >> > _______________________________________________ > >> > perfmon2-devel mailing list > >> > per...@li... > >> > https://lists.sourceforge.net/lists/listinfo/perfmon2-devel > >> > > >> > > > > > > |