From: David G. <da...@gi...> - 2005-04-05 04:29:11
|
On Mon, Apr 04, 2005 at 04:50:30AM -0700, Stephane Eranian wrote: > David, > > > > Now I see several problems if we maek that move. The way tools > > > typically work is that they say "I want to measure event X,Y,Z". > > > Using a support library, they come up with the correct event to > > > counter assignment (Event -> PMC). The PMU configuration registers > > > (perfsel, PMC, ..) are written. The reason the perfmon flags where > > > also setup when writing PMC is because, there a re part of the > > > configuration as well. The counter register itself is not > > > necessarily accessed, if default value is good enough. Hence a > > > PFM_WRITE_PMDS was not required. If we move to your approach, the > > > call woule become necessary. Now, it is true that no matter what a > > > tool needs to know which PMD (perfctr, PMD) are associated with the > > > PMC (perfsel, PMC) used for the measurement. At a minimum, this is > > > needed for reading out the results. A portable tool cannot assume > > > that PMCx corresponds to PMDx. In fact, on PPC and also P4, it seems > > > you need multiple PMC to setup a single counter. > > > > And conversely setting one PMC can affect multiple PMDs... > > > That's true even on Itanium. Take the Branch Trace Buffer for instance. > > > > Note that if we move all flags to PFM_WRITE_PMDS, that would also move > > > the following other fields: eventid, smpl_pmds[], and reset_pmds[]. Another > > > side effect of this is that the dats structure passed to PFM_WRITE_PMDS will > > > grow in size thereby making the call less efficient (think copy_user). > > > > It might be worth splitting, say, PFM_CONFIGURE_PMDS, which would set > > the flags and reset values and so forth, away from PFM_WRITE_PMDS > > which would write the actual values to the PMDs. Presumably one would > > (usually) only need to call CONFIGURE_PMDS once, so that would remove > > the overhead of the larger structures from WRITE_PMDS calls used to > > update the values. > > > I have made the change now and the impact does not appear to big that big > at least on Itanium. Ok. > > I tend to agree, so I think a single universal limit probably makes > > more sense. 256 seems like a fairly reasonable choice. > > So what about we fix it to 256 for actual PMU hardware registers. Then > anything above this is either other hardware registers or software state. > To make things nicely aligned, we could add up to 64 bit on top of the 256. > If you look at the new document, you'll see that this is how I do this > for Itanium. For PMC, I have 0-256 reserved for actual PMC, 256-272 is for > IBR and DBR (debug registers). For PMD, I have 0-256 for actual PMD registers. > Would that fit the PowerPC model? It seem PowerPC like P4.Xeon does not > use indexed registers for PMU. Hence a somewhat more complicated mapping > must be found. Ok, so the restriction would be that only things below 256 could be triggered for the reset bitmaps and so forth, but there could be things numbered above that? > > > Another factor to consider here. The limit we use does not necessarily > > > reflect the actual number of PMU registers. If I used 64 for the limit, > > > that does not necessarily translate into 64 PMC registers. Hardware designers > > > sometimes introduce holes in the namespace of registers because of wiring > > > constraint (just a guess). Yet it would be quite costly for software to > > > try and skip those holes. Hence the bitmask may have holes in them. > > > > Hrm... I doubt it would really be all that costly to pack the wholes, > > especially when we have to check for software/virtualized PMDs in > > there as well. Of course, I am biased by ppc where we need to use > > switch statements to access the registers, even though they are > > contiguous (the special purpose register number is part of the > > instruction opcode, and can't be given indirectly). > > > Oops, that's yet another difficulty... It's not that big a deal. There aren't that many registers, so a switch isn't too bad. > > Sure, I can see the appeal of this. But is there a compelling reason > > we need to support this way of doing it, rather than: > > > > for (i=0; i<N; i++) { > > c = CREATE/ATTACH; > > WRITE_PMCS(c); > > WRITE_PMDS(c); > > PFM_START(c); > > <wait for monitored stuff to happen> > > PFM_STOP(c); > > } > > > > > > > As long as the context is not attach no acutal PMU hardware is touched. > > > But you can still program the context (i.e., PMU software state). This becomes > > > interesting for batching context setup. You can imagine a tool that monitors > > > across fork/pthread_create. Creating and setting up a context can be quite costly. > > > You can prepare the work and then dynamically on fork/pthread_create event > > > you simply have to attach and start and let go. We do have tools that do > > > follow across fork and provide per-thread measurement. > > > > Hmm.. ok. It think you've pretty much convinced me. > > > Excellent. > > > > This is interesting because each sublist may be used to measure > > > certain metric. Again, you can prepare all the sublists in advance. > > > Then you attach, point the a sublist and start (PFM_START). After > > > a while, you stop, and restart on another sublist. This saves you > > > the reprograming of the sets, you can therefore alternate between > > > sublists much faster. This is kind of an advanced feature, for most > > > tool the basic ordering will eb just fine. > > > Checking the validity of the explicit next set when the set > > > is created/modified would impose an programming order for the > > > tool, i.e., you could not point to a set that does not already exist. > > > At the time, I thought this could be better checked at PFM_LOAD time > > > where you know you have the whole setup. > > > > We seem to be talking at cross purposes here. I'm not suggesting any > > change to the features of event sets once they're established, or at > > what time they can be set up. I'm just suggesting that it might be > > simpler to require all the event sets to be created simultaneously, > > rather than allowing individual, or subsets of the event sets to be > > created and deleted at will. > > > But that's not the natural way the interface is designed. It does > not mandate that you issue only a single PFM_WRITE_PMDS or PFM_WRITE_PMCS. > So why should it be different for PFM_CREATE_EVTSETS. In fact, it may make > it easier on applications which are highly modularized where each module > contribute its part of the measurement without knowing about the others. > This allows for incremental updates. I guess. It's just that I can see compelling reasons why incremental WRITE_PMDS and WRITE_PMCS are useful, but the same is not true for CREATE_EVTSETS. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/people/dgibson |