From: Michael H. <hb...@us...> - 2001-11-19 16:08:17
|
Paul, Thanks for the quick response. Lots of info for me to muse over. I'll spend some time looking over your suggestions and make another pass at the DYNIX/ptx API mapping. I have some concern over trying to do too much in the library code versus in the kernel. I'll put a bit more thought on that. An example of a potential problem is if a qexec is supported by the library code by querying the kernel for the current memory load on all of the nodes, constructing a CpuMemSet and setting it through another kernel call, this will add multiple additional kernel/user state transitions and the overhead associated with it. I need to make a pass at coding up a library routine in support of this to better understand the implications and feasiblity. I've responded to a few of your specific points below, more will follow as I rework the DYNIX/ptx API mapping. Michael Hohnbaum hoh...@us... >> Those who are allergic to long email messages should probably >> bail now - sorry <grin>. achoo!! At least your response was shorter than my posting. I think this response continues that trend. Note that I will respond to other items in subsequent emails. === On Fri, 16 Nov 2001, Paul Jackson responded: > On Fri, 16 Nov 2001, Michael Hohnbaum wrote: > |> Policies missing: > |> * soft versus hard - DYNIX/ptx has the notion of treating placement > |> requests as either hints (soft) or demands (hard). CpuMemSet > |> provides only the hard option. > At first I was confused by just what a "soft" policy meant, but > thanks to your fine snippets of documentation below, I think > I understand it now. > It seems that when attaching processes to resources, a "hard" > request will fail if it can't place the process on a node with > all the requested resources, whereas a "soft" request will fall > back to other nodes, if need be. > > If I understand this correctly, then CpuMemSets supports both, > easily. No kernel support is required or relevant. Rather, when > setting up a CpuMemSet, in the library code that is emulating > the DYNIX/ptx API's on top of CpuMemSets, the library code can > decide to succeed or fail, if the requested resources aren't > available where the requester wants them, depending on whether > the requester used the "hard" or "soft" option. > This is not (so far as I can tell) something that requires kernel > awareness each time a cpu is scheduled or a page allocated. > Rather it seems to only affect the initial binding of resources > to processes, and can easily and naturally be resolved in the > library code. Your understanding is mostly correct about soft/hard policy. The one point missing is that it is not just an initial placement issue, but comes into play during allocations throughout the life of a process. An attach may succeed, but at a later point memory exhaustion (for example) may occur on the node the process is attached to. For a hard attachment, memory allocations would now fail. For a soft attachment, memory allocation may succeed from another node. This could be emulated by, in the soft policy case, providing all nodes in the CpuMemSet, and ordering them such that the requested node(s) are first. In the hard policy case, only put the requested nodes in the CpuMemSet. This might have a similar affect - I need to think about this some more. > |> * first touch, followed by round robin. The default algorithm for > |> memory allocations for DYNIX/ptx is to allocate on the same quad > |> the process is running on, and if none available, to round robin > |> through the remaining quads. The CpuMemSet choices are either > |> round robin or always in the order of the memory lists. > > I am unclear just what ordering "first touch, followed by round robin" > might be. I suspect that it is one of these two: > > 1) Try allocating on the node that is executing the allocation > request, and if that fails, try allocating on the next closest > nodes, in distance order. > > 2) Try allocating on the node that is executing the allocation > request, and if that fails, try allocating in a distributed > fashion, on the next node past the last one that satisfied > an allocation request, according to some list. > > If you mean (1), then that's too easy - just sort the memory lists > in distance order from the faulting cpu. > > So probably you mean (2). Yep, I mean two. > If so, you're right that the current CpuMemSet design doesn't > have this combination of options. But it would be trivial to > add, if you want me to. Just another memory allocation policy > option, and a few more lines of code that combine the current > DEFAULT and ROUND_ROBIN policies. > > Let's say: > #define CMS_FIRST_ROBIN 0x03 /* First touch, then round-robin */ > > * If a CpuMemSet has a policy of CMS_FIRST_ROBIN, the > * kernel first searches the first Memory Block on the memory > * list, then if that doesn't provide the required memory, > * the kernel searches the memory list beginning one past > * where the last search on that same Memory List of that > * same CpuMemSet concluded. > > (Surely someone has a better name than CMS_FIRST_ROBIN ;). > > Let me know if this is what you need, and I will add it. This is it. I'm in favor of adding this policy. > |> ===> Revert the process back to using the system-wide CpuMemSet > > A couple of times you refer to a system-wide default CpuMemSet. > > There is no such entity. The kernel has its own CpuMemSet, > which is inherited by init, and subject to change, by all init > creates. But any given process can know only: > the kernel's CpuMemSet > the CpuMemSet of any given process > the CpuMemSet of any given vm area > > There is no "system-wide" default. I couldn't quite tell > if this will be a problem for supporting the DYNIX/ptx API > or not. Hopefully not. Terminology problem. These should be references to the kernel's CpuMemSet. Check the context and see if this makes sense. |
From: Paul J. <pj...@en...> - 2001-11-19 21:44:26
|
On Mon, 19 Nov 2001, Michael Hohnbaum wrote: > Thanks for the quick response. You're welcome. > I have some concern over trying to do too much in the library > code versus in the kernel. I'll put a bit more thought on that. Ok - my thrust will continue to be to identify which essential capabilities the kernel must provide and to look for ways that the "policy" stuff can be moved out of the kernel. When it is likely that different heuristics, approaches or tuning will be desired by different application loads or for different system architectures, then it is best for this to be out of the kernel. This is the "Unix" equivalent of "user exits", that have helped in making IBM main frame operating systems so flexible. When we identify some desired operation, such as perhaps qexec(), that seems too slow or lacks some essential semantic if done this way, then I will seek out the essential additional mechanism(s) that can be provided by the kernel to make the overall operation satisfactory again. I will tend to resist the temptation to say "that's too slow; just move it lock, stock and barrel into the kernel." > > Let's say: > > > #define CMS_FIRST_ROBIN 0x03 /* First touch, then round-robin */ > > > > * If a CpuMemSet has a policy of CMS_FIRST_ROBIN, the > > * kernel first searches the first Memory Block on the memory > > * list, then if that doesn't provide the required memory, > > * the kernel searches the memory list beginning one past > > * where the last search on that same Memory List of that > > * same CpuMemSet concluded. > > > > (Surely someone has a better name than CMS_FIRST_ROBIN ;). > > > > Let me know if this is what you need, and I will add it. > > This is it. I'm in favor of adding this policy. Ok - it is now added to my master copy, and will filter out to the LSE documents on SourceForge next time I update. I called it: #define CMS_EARLY_BIRD 0x03 /* First touch (default), then round-robin */ > > ... But any given process can know only: > > > the kernel's CpuMemSet > > the CpuMemSet of any given process > > the CpuMemSet of any given vm area > > > > There is no "system-wide" default. I couldn't quite tell > > if this will be a problem for supporting the DYNIX/ptx API > > or not. Hopefully not. > > Terminology problem. These should be references to the > kernel's CpuMemSet. Check the context and see if this > makes sense. Ah - I would recommend that init (pid == 1) be used for any such system-wide default, not the kernel. That is, one might configure a system so that your average generic application, if not otherwise specified, ran on one particular set of nodes, while reserving the remaining nodes for dedicated purposes. The kernel of such a system might be spread over (willing to take memory from) most of the nodes on the system, so that kernel memory allocated on a particular process context usually ended up close to that process's nodes - whether it was a process running on dedicated nodes or not. In such a case, the init process would quite likely have its CpuMemSet set to the nodes to be used by the average generic non-dedicated process, so that any daemons it spawned went there, absent intentionally going to some of the dedicated nodes. I look forward to your additional comments. I won't rest till it's the best ... Manager, Linux Scalability Paul Jackson <pj...@sg...> 1.650.933.1373 |