From: MONTGOMERY,BOB (HP-FtCollins,ex1) <bob...@hp...> - 2002-05-28 23:23:52
|
Well, we're definitely thinking about it and messing with it here. I have a hacked up version of the oprofile module working enough that we can work on our prospect tool (prospect.sourceforge.net). I have not looked at any of the 64-bit issues for the rest of the oprofile directory. My system call interception stuff seems to be working OK. Prospect samples with CPU_CYCLES at the moment, so time-based profiles are working. This might be a good time to start a slightly higher level discussion about lower level things :-), with the hope that some other folks interested in IA64 might get involved. Here are some of the issues: The IA64 Performance Monitor stuff is quite a bit more involved than any of the current IA32 implementations. - There are many counters and sub-counters - CPU_CYCLES and IA64_INST_RETIRED are "architected" as part of the family, but the other 100s of events are processor dependent - There are 4 counters but lots of rules (some events can only be counted on some of the counters, some events can't be counted at the same time as other events) - There are lots of other features (instruction masks, address masks, precise address capture for some events), and lots of extra registers to set up to get them. The IA64 Linux kernel has support for accessing the PMU in arch/ia64/perfmon.c. A traditional "steal the interrupt and go" implementation will collide with that kernel support, allowing two non-cooperating tools to interfere with each other's data collection. PMU control is in the processor status register, so you don't just turn it on and off globally, you have to deal with setting it in all the tasks that you're profiling (with some help from the kernel, especially in SMP mode). So I have it limping along. It currently requires 2.4.18 and only works on SMP kernels. And with that prototype in hand, we're starting to investigate an alternative proposed by Stephane Eranian, the maintainer and designer of the IA64 Linux perfmon support. He proposes that oprofile's module handle only the system call interception trace and the directory path table, and that the user-level daemons (like oprofile's daemon and prospect) get their PMU sampling through his kernel API for perfmon. This keeps all the rules, processor dependencies, and low-level PMU management in either the kernel or Stephane's user-level library, and allows the kernel to manage multiple users of the PMU (oprofile would request global control, effectively locking out other tools while it runs, but the kernel's perfmon stuff can manage the counters per process if desired.) Thoughts? Bob Montgomery, HP > -----Original Message----- > From: John Levon [mailto:le...@mo...] > Subject: Re: IA64 > > On Mon, May 27, 2002 at 01:49:32PM +0200, johan at appeal.se wrote: > > > Is anybody working on porting OProfile to IA64? If so, > how's it going? > > I can't say what people are planning, but as far as I know there's no > active effort going on at the moment... > > regards > john > |
From: John L. <le...@mo...> - 2002-05-29 12:30:41
|
On Tue, May 28, 2002 at 04:23:28PM -0700, MONTGOMERY,BOB (HP-FtCollins,ex1) wrote: > I have a hacked up version of the oprofile module working enough > that we can work on our prospect tool (prospect.sourceforge.net). Excellent ! > This might be a good time to start a slightly higher level > discussion about lower level things :-), with the hope that > some other folks interested in IA64 might get involved. We also have at least one person interested in hacking on an Alpha port ... > works on SMP kernels. And with that prototype in hand, we're > starting to investigate an alternative proposed by Stephane > Eranian, the maintainer and designer of the IA64 Linux perfmon > support. He proposes that oprofile's module handle only the > system call interception trace and the directory path table, and > that the user-level daemons (like oprofile's daemon and prospect) > get their PMU sampling through his kernel API for perfmon. This > keeps all the rules, processor dependencies, and low-level PMU > management in either the kernel or Stephane's user-level library, > and allows the kernel to manage multiple users of the PMU > (oprofile would request global control, effectively locking > out other tools while it runs, but the kernel's perfmon stuff > can manage the counters per process if desired.) A priori, I prefer to use the available facilities. It would be great if we could avoid the living nightmare that ia32 has in this respect. I'm leafing through the Intel manual now, I suppose I'd better read up on his perfmon interface too. I suppose the biggest issue is the efficiency of perfmon, when it comes to system-wide profiling: how are the buffers encoded etc. regards john -- "Time is a great teacher, but unfortunately it kills all its pupils." - Hector Louis Berlioz |
From: Andi K. <ak...@su...> - 2002-05-29 12:40:34
|
On Wed, May 29, 2002 at 01:27:01PM +0100, John Levon wrote: > > This might be a good time to start a slightly higher level > > discussion about lower level things :-), with the hope that > > some other folks interested in IA64 might get involved. > > We also have at least one person interested in hacking on an Alpha > port ... I started on an x86-64 port. The main issue is supporting both 32bit and 64bit syscalls cleanly. -Andi |
From: William C. <wc...@nc...> - 2002-05-30 16:22:23
|
John Levon wrote: > On Tue, May 28, 2002 at 04:23:28PM -0700, MONTGOMERY,BOB (HP-FtCollins,ex1) wrote: > > >>I have a hacked up version of the oprofile module working enough >>that we can work on our prospect tool (prospect.sourceforge.net). >> > > Excellent ! > > >>This might be a good time to start a slightly higher level >>discussion about lower level things :-), with the hope that >>some other folks interested in IA64 might get involved. >> > > We also have at least one person interested in hacking on an Alpha > port ... Many processors now have performance monitoring hardware that would allow OProfile to run on them. There are versions of the arm (xscale), powerpc, and mips that have performance monitoring hardware. There should be some thought on how to modify OProfile to make it easier to add support for other processors, maybe subdirectories for the arch specific code. Also cross-compilation is commonly used for these architecture, so some of the assumption made in configure about the host and target processor architecture being the same are not going to work. >>works on SMP kernels. And with that prototype in hand, we're >>starting to investigate an alternative proposed by Stephane >>Eranian, the maintainer and designer of the IA64 Linux perfmon >>support. He proposes that oprofile's module handle only the >>system call interception trace and the directory path table, and >>that the user-level daemons (like oprofile's daemon and prospect) >>get their PMU sampling through his kernel API for perfmon. This >>keeps all the rules, processor dependencies, and low-level PMU >>management in either the kernel or Stephane's user-level library, >>and allows the kernel to manage multiple users of the PMU >>(oprofile would request global control, effectively locking >>out other tools while it runs, but the kernel's perfmon stuff >>can manage the counters per process if desired.) >> > > A priori, I prefer to use the available facilities. It would be great if > we could avoid the living nightmare that ia32 has in this respect. > > I'm leafing through the Intel manual now, I suppose I'd better read up > on his perfmon interface too. I suppose the biggest issue is the > efficiency of perfmon, when it comes to system-wide profiling: how > are the buffers encoded etc. > > regards > john |
From: John L. <le...@mo...> - 2002-05-30 17:42:03
|
On Thu, May 30, 2002 at 11:17:31AM -0400, William Cohen wrote: > Many processors now have performance monitoring hardware that would > allow OProfile to run on them. There are versions of the arm (xscale), > powerpc, and mips that have performance monitoring hardware. There Of course this is all irrelevant unless we can find people interesting in porting it :) > should be some thought on how to modify OProfile to make it easier to > add support for other processors, maybe subdirectories for the arch But yes, we need to consider how to re-arrange things. A directory structure is the easy bit ... > specific code. Also cross-compilation is commonly used for these > architecture, so some of the assumption made in configure about the host > and target processor architecture being the same are not going to work. Yes that'll definitely need work too. regards john -- "Do you mean to tell me that "The Prince" is not the set textbook for CS1072 Professional Issues ? What on earth do you learn in that course ?" - David Lester |
From: William C. <wc...@nc...> - 2002-05-30 20:29:28
|
John Levon wrote: > On Thu, May 30, 2002 at 11:17:31AM -0400, William Cohen wrote: > > >>Many processors now have performance monitoring hardware that would >>allow OProfile to run on them. There are versions of the arm (xscale), >>powerpc, and mips that have performance monitoring hardware. There >> > > Of course this is all irrelevant unless we can find people interesting > in porting it :) If programmers weren't interested in the performance monitoring hardware, why would manufacturers put the performance monitoring hardware in the processor? :) Many of the embedded processors are as complicated as the original Pentium processors (which had performance monitoring hardware) and having that hardware there can help track down performance problems. For low-end system (consumer electronics) there is a lot of pressure make the code more efficient. Use slower (cheaper parts) to get same performance with more efficient code. Use same part but turn down clock rate to get longer battery life. >>should be some thought on how to modify OProfile to make it easier to >>add support for other processors, maybe subdirectories for the arch >> > > But yes, we need to consider how to re-arrange things. A directory > structure is the easy bit . > > >>specific code. Also cross-compilation is commonly used for these >>architecture, so some of the assumption made in configure about the host >>and target processor architecture being the same are not going to work. >> > > Yes that'll definitely need work too. Oh yes, there are lot of things that will need to be fixed to support multiple architectures. I was just naming a few off the top of my head. -Will |