From: Niels C. <nc...@us...> - 2001-11-30 19:06:34
|
Sorry, forgot to copy lse-tech. Paul Jackson bounced it back to me for posting... Regards, Niels Christiansen Yup looks like I got the only copy - here it is bounced back. - pj On Fri, 30 Nov 2001, Niels Christiansen wrote: > Paul Jackson wrote: > > >So we have need of topology for (at least): > > NUMA-aware tools and system services > > NUMA-aware scheduler and memory allocation refinements > > System management and performance instrumentation > > Plug-and-Play and hot swap > > Field service and diagnostics > > Robust configuration in changing configurations (SGI's ioconfig) > > Power Management > > > >In the particular case of "System management and performance > >instrumentation", I suspect that SGI will be more focused on > >its own solution here - Performance Co-Pilot: > > Yes, all of those need to know about topology and there may be > others. However, I have a hard time seeing products such as SGI's > Co-Pilot or IBM's AIX Performance Toolbox (which used to be called > PTX, by the way): > > http://www-1.ibm.com/servers/aix/products/ibmsw/system_man/perftoolbox.html > > making it into Linux. There are several initiatives being worked > on to handle system management and performance monitoring. What > I do see a need for is a common base of instrumentation fitted > into some hierarchical or network view of the system's topology. > > Whether the tool extracting the information is Co-Pilot, AIX > Performance Toolbox, some open-source product, or all of the above > is immaterial. What is important is that the data they need is > already available in an organized manner with streamlined access > mechanisms (which is not a file system if I have to choose, but > may end up being one). > > >As for using the same framework (kernel hooks, system calls, > >daemon for presentation) for instrumentation such as driver > >statistics and lock metering, there I am more cautious, and > >tend to one of two extremes: > > 1) each such instrumentation hook has its own apparatus, or > > 2) a common high level framework of a fairly powerful nature. > > > >I see lock metering as an example of (1). > > In Solaris, at least when I last looked, performance instrumentation > is organized so you can traverse a linked list of "objects" to > explore what is available. In AIX, we don't call the things objects > but you can traverse a hierarchy of entities that carry performance > metrics. In Windows NT and W2K there are similar mechanisms. All > of these systems allow you to explore and select what you are > interested in for, say, performance monitoring. Once you have > selected what you want to monitor, you typically save pointers or > some other form of direct reference to the entities and forget about > the topology. > > As you explore the topology, you may build the user's view of the > system as it fits your application. AIX Performance Toolbox, for > example, allows the end user to build "instruments" to display sets > of metrics that often represent specific groups of system components > of local or remote systems. > > What I'm trying to say is that just as you need to know about the > relationship between groups of processors, memory banks, and I/O > drawers for NUMA optimisation, so may the person monitoring your > NUMA system need to know about it. And since the basic requirement > of seeing the systems topology conveniently arranged in some way > is shared, why reinvent the wheel? > > Lockmeter is used extensively by our team. The first reason is > that when you want to improve SMP performance, you need to fix all > the locking problems. The second reason is that there is no other > tool. Or rather, there was no other tool. We now have an internal > tool, which gives us the same information as Lockmeter but does it > through a trace facility. The same facility does Kernprof+ and a > lot of other things. The key is instrumentation. > > As things are now, that instrumentation has to be added through yet > another kernel patch, which I loathe but can't get around. Because > I loathe the kernel patching and the crazy playing catch-up with > kernel versions, I am hoping to round up enough support to get some > decent instrumentation into Linux in an organized manner, which > includes topology and mechanisms to access it all. > > I have put a list on the fridge. Please sign up! > > Niels Christiansen > IBM Linux Technology Center, Kernel Performance > I won't rest till it's the best ... Manager, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |