From: Eric V V. h. <eri...@us...> - 2004-02-19 21:23:13
|
>> I just started working with oprofile (actually working on PowerPC >> performance counter support for Power4 variants including GP/UL) and was >> wondering if there was ever any discussion of extending oprofile to sample >> hardware performance counters on context switches? (first step towards >> recording performance counters per process more accurately). > >humm, oprofile is designed as a system-wide profiler, this is a bit >agaisnt this design and afaics you can't use both interrupt on overflow >and count event by task. > I'm fine with the system-wide aspect of oprofile, I just want to be able to accurately charge performance counters to every task instead of getting sampled results. We are using these results for validation of simulation and want to try and limit noise as much as possible - each task will have its own set of performance counters which will be swapped in/out with the rest of their context. Interrupts get their own buckets for performance counters. I don't see how this conflicts with using overflow interrupts, but I've just started looking into stuff. >> On a separate note, the architectures I'm targeting only use performance >> counter group (you select all 8 counters as a group instead of >> individually). > >can all these counter generate interrupt on overflow ? > I believe so, yes. >How can you specify reset count inividually ? Again, I beleive so. > >An url for documentation ? > Unfortunately, double secret proprietary - although Apple's documentation of the G5 performance tools might be a start. The intent would be to push the resulting code through our internal open-source clearance process, and then hopefully into the PPC64 Linux kernel patches. Unfortunately, I'm not yet aware of what exactly will be permitted to be open-sourced (ie. not all performance counters may be publically available). >> I was wondering if there was any sort of existing >> methodology about how to configure events as groups instead of as >> individual counters. > >Not for the moment, I think this must be solved in userspace, we planned >to add some sort of event name aliasing, the goal is to provide consistent >event name across different arch for some commonly used event (branch/L1 >miss etc). This can be probably extended to map an event name to more than >one counter but how counter can get there reset count on overflow ? > >event aliases is a 0.9 things in our TODO file Perhaps I misrepresented things in my first message, you can individually configure events, but only certain events can be used at the same time. This results in event groups which are made up of various "valid" combinations of events and particular counters. The stuff seems very complicated, so I'd like to keep determining valid combinations out of the kernel and solve it in userspace. I would prefer to solve stuff in user space by having a tool which helps users select the correct group (or groups) and then have the ability to program the performance config registers at the toplevel of the filesystem. Thanks for your help and comments. -eric |
From: Eric V V. h. <eri...@us...> - 2004-02-19 23:01:13
|
>> >> I don't see how this conflicts with using overflow interrupts, but I've >> just >> started looking into stuff. >possibly it'll work on powerpc, problem come from some architecture where >we need to re-write the counter value in the interrupt handler and this >write is not atomic so your read in the scheduler can read a semi-updated >value, on those architecture we can't sleep in the interrupt handler so >using a spinlock is not a solution. I understand where this can cause problems now - I guess I'll try getting this working on PPC first and see how it maps (or fails to map) to other architectures. > I found power4 reference manual, is it a roughly correct base ? Yeah, that should cover the architectures I'm interested in. >> Perhaps I misrepresented things in my first message, you can individually >> configure events, but only certain events can be used at the same time. >> This results in event groups which are made up of various "valid" >> combinations of events and particular counters. The stuff seems very >> complicated, so I'd like to keep determining valid combinations out of >> the kernel and solve it in userspace. >ha ok, we have a sort of mechanism already in place, basically the events >description file is in the form ($oprofile_dir/events/*) : >EVENT_NAME_A:#event_nr counters: 0, 1, 2, 3 other field describing the event. >EVENT_NAME_B:#event_nr counters: 0, 3 Yeah, that helps to some degree buy there are dozens of groups with all sorts of crazy combinations, and I really didn't want to bloat the driver with the necessary information to cross-check valid combinations. I'm going to start with a user-space wrapper which handles validating the configuration and sets the appropriate configuration registers through some arch-specific files at the oprofilefs root. Once this is working I can look into something more I'm a little confused about how unit-masks are used still, but I probably just need to go re-read the documentation. > Do you plan to use userspace tools too (oreport etc.) ? That was the plan, not sure if I'll need to augment any to accomodate the extensions I wanted. With a wrapper for opcontrol I should be able to use sampling as-is. There's just the question of whether I'd need a new interface for my context-specific perf counters - I figured I'd be able to track samples the same way, so things should "just work" - might need an extension for things like interrupt handlers, etc. -eric |
From: Philippe E. <ph...@wa...> - 2004-02-19 23:45:46
|
On Thu, 19 Feb 2004 at 17:55 +0000, Eric V Van hensbergen wrote: > > Do you plan to use userspace tools too (oreport etc.) ? > > That was the plan, not sure if I'll need to augment any to accomodate > the extensions I wanted. With a wrapper for opcontrol I should be able > to use sampling as-is. There's just the question of whether I'd need a > new interface for my context-specific perf counters - I figured I'd be > able to track samples the same way, so things should "just work" - might > need an extension for things like interrupt handlers, etc. The problem come from sample filename scheme, we encode in it the event number and unit mask, daemon differentiate the source of event from the counter number, then build the filename from what it see, ie: "application/name/event_name.#count.#unit-mask.#cpu.#tid.#tgid" if the settings of two counter differ only by the invisible parameter passed directly to the driver it'll not work. Opreport too need to be able to distinguish these distinct events. Extending our handling of samples filename scheme can be difficult. -- Phil |
From: Eric V V. h. <eri...@us...> - 2004-02-20 01:18:28
|
>The problem come from sample filename scheme, we encode in it >the event number and unit mask, daemon differentiate the source >of event from the counter number, then build the filename from >what it see, ie: > >"application/name/event_name.#count.#unit-mask.#cpu.#tid.#tgid" > >if the settings of two counter differ only by the invisible >parameter passed directly to the driver it'll not work. I think I understand, I was going to have the wrapper set per counter event information as well, but the setting of those event files/fields are for information purposes only (they wouldn't trigger the configuration of the perf counter group - that would be done via the invisible parameter(s). -eric |
From: Philippe E. <ph...@wa...> - 2004-02-19 22:31:30
|
On Thu, 19 Feb 2004 at 16:17 +0000, Eric V Van hensbergen wrote: > I'm fine with the system-wide aspect of oprofile, I just want to be able > to > accurately charge performance counters to every task instead of getting > sampled results. We are using these results for validation of simulation > and > want to try and limit noise as much as possible - each task will have its > own > set of performance counters which will be swapped in/out with the rest of > their > context. Interrupts get their own buckets for performance counters. > > I don't see how this conflicts with using overflow interrupts, but I've > just > started looking into stuff. possibly it'll work on powerpc, problem come from some architecture where we need to re-write the counter value in the interrupt handler and this write is not atomic so your read in the scheduler can read a semi-updated value, on those architecture we can't sleep in the interrupt handler so using a spinlock is not a solution. > >An url for documentation ? > > > > Unfortunately, double secret proprietary - although Apple's documentation > of > the G5 performance tools might be a start. The intent would be to push > the > resulting code through our internal open-source clearance process, and > then > hopefully into the PPC64 Linux kernel patches. Unfortunately, I'm not yet > aware of what exactly will be permitted to be open-sourced (ie. not all > performance counters may be publically available). I found power4 reference manual, is it a roughly correct base ? > Perhaps I misrepresented things in my first message, you can individually > configure events, but only certain events can be used at the same time. > This results in event groups which are made up of various "valid" > combinations of events and particular counters. The stuff seems very > complicated, so I'd like to keep determining valid combinations out of > the kernel and solve it in userspace. ha ok, we have a sort of mechanism already in place, basically the events description file is in the form ($oprofile_dir/events/*) : EVENT_NAME_A:#event_nr counters: 0, 1, 2, 3 other field describing the event. EVENT_NAME_B:#event_nr counters: 0, 3 then from this description we allocate the events to specific counter number (iow you don't choose the counter nr where the event is mapped). If allocation fails we flame the user. The event_nr is defined as you want, in some simple implementation it's just the event nr which must be set in one control register, on more complex implementation we used it as index into table etc. Unit mask allow too to pass arbitrary data to the driver. Unit mask is restricted to 16 bits, I *think* event nr is restricted to 32 bits. We can probably fix unit mask size to 32 bits but not w/o changing samples file format. > I would prefer to solve stuff in user space by having a tool which helps > users select the correct group (or groups) and then have the ability to > program the performance config registers at the toplevel of the > filesystem. Yes, if the above mechanism is not sufficient you probably need to wrap our opcontrol script, add some sysctl specific to your implementation and pass additional parameter to driver through these files e.g if your implementation allow burst mode counting it's unlikely you can fit it in our model. Do you plan to use userspace tools too (oreport etc.) ? regards, Phil |