Thread: Re: [Lse-tech] DYNIX/ptx NUMA APIs mapping to CpuMemSets

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Paul,

Thanks for the quick response.  Lots of info for me to muse over.
I'll spend some time looking over your suggestions and make another
pass at the DYNIX/ptx API mapping.

I have some concern over trying to do too much in the library code
versus in the kernel.  I'll put a bit more thought on that. 

An example of a potential problem is if a qexec is supported by
the library code by querying the kernel for the current memory load
on all of the nodes, constructing a CpuMemSet and setting it through
another kernel call, this will add multiple additional kernel/user
state transitions and the overhead associated with it.  I need to
make a pass at coding up a library routine in support of this to
better understand the implications and feasiblity.

I've responded to a few of your specific points below, more will
follow as I rework the DYNIX/ptx API mapping.

            Michael Hohnbaum
            hoh...@us...

>> Those who are allergic to long email messages should probably
>> bail now - sorry <grin>.

achoo!!  At least your response was shorter than my posting.  I think
this response continues that trend.  Note that I will respond to
other items in subsequent emails.

===

On Fri, 16 Nov 2001, Paul Jackson responded:

> On Fri, 16 Nov 2001, Michael Hohnbaum wrote:

> |> Policies missing:
> |> * soft versus hard - DYNIX/ptx has the notion of treating placement
> |> requests as either hints (soft) or demands (hard).  CpuMemSet
> |> provides only the hard option.

> At first I was confused by just what a "soft" policy meant, but
> thanks to your fine snippets of documentation below, I think
> I understand it now.

> It seems that when attaching processes to resources, a "hard"
> request will fail if it can't place the process on a node with
> all the requested resources, whereas a "soft" request will fall
> back to other nodes, if need be.
>
> If I understand this correctly, then CpuMemSets supports both,
> easily.  No kernel support is required or relevant.  Rather, when
> setting up a CpuMemSet, in the library code that is emulating
> the DYNIX/ptx API's on top of CpuMemSets, the library code can
> decide to succeed or fail, if the requested resources aren't
> available where the requester wants them, depending on whether
> the requester used the "hard" or "soft" option.

> This is not (so far as I can tell) something that requires kernel
> awareness each time a cpu is scheduled or a page allocated.
> Rather it seems to only affect the initial binding of resources
> to processes, and can easily and naturally be resolved in the
> library code.

Your understanding is mostly correct about soft/hard policy.  The
one point missing is that it is not just an initial placement
issue, but comes into play during allocations throughout the
life of a process.  An attach may succeed, but at a later point
memory exhaustion (for example) may occur on the node the process
is attached to.  For a hard attachment, memory allocations would
now fail.  For a soft attachment, memory allocation may succeed from
another node.  This could be emulated by, in the soft policy case,
providing all nodes in the CpuMemSet, and ordering them such that
the requested node(s) are first.  In the hard policy case, only
put the requested nodes in the CpuMemSet.  This might have a
similar affect - I need to think about this some more.

> |> * first touch, followed by round robin.  The default algorithm for
> |> memory allocations for DYNIX/ptx is to allocate on the same quad
> |> the process is running on, and if none available, to round robin
> |> through the remaining quads.  The CpuMemSet choices are either
> |> round robin or always in the order of the memory lists.
>
> I am unclear just what ordering "first touch, followed by round robin"
> might be.  I suspect that it is one of these two:
>
>      1) Try allocating on the node that is executing the allocation
>      request, and if that fails, try allocating on the next closest
>      nodes, in distance order.
>
>      2) Try allocating on the node that is executing the allocation
>      request, and if that fails, try allocating in a distributed
>      fashion, on the next node past the last one that satisfied
>      an allocation request, according to some list.
>
> If you mean (1), then that's too easy - just sort the memory lists
> in distance order from the faulting cpu.
>
> So probably you mean (2).

Yep, I mean two.

> If so, you're right that the current CpuMemSet design doesn't
> have this combination of options.  But it would be trivial to
> add, if you want me to.  Just another memory allocation policy
> option, and a few more lines of code that combine the current
> DEFAULT and ROUND_ROBIN policies.
>
> Let's say:

>    #define CMS_FIRST_ROBIN 0x03  /* First touch, then round-robin */
>
>     * If a CpuMemSet has a policy of CMS_FIRST_ROBIN, the
>     * kernel first searches the first Memory Block on the memory
>     * list, then if that doesn't provide the required memory,
>     * the kernel searches the memory list beginning one past
>     * where the last search on that same Memory List of that
>     * same CpuMemSet concluded.
>
>    (Surely someone has a better name than CMS_FIRST_ROBIN ;).
>
> Let me know if this is what you need, and I will add it.

This is it.  I'm in favor of adding this policy.

> |> ===> Revert the process back to using the system-wide CpuMemSet
>
> A couple of times you refer to a system-wide default CpuMemSet.
>
> There is no such entity.  The kernel has its own CpuMemSet,
> which is inherited by init, and subject to change, by all init
> creates.  But any given process can know only:

>    the kernel's CpuMemSet
>    the CpuMemSet of any given process
>    the CpuMemSet of any given vm area 
>
> There is no "system-wide" default.  I couldn't quite tell
> if this will be a problem for supporting the DYNIX/ptx API 
> or not.  Hopefully not. 

Terminology problem.  These should be references to the kernel's CpuMemSet.
Check the context and see if this makes sense.

Thread: Re: [Lse-tech] DYNIX/ptx NUMA APIs mapping to CpuMemSets

lse-tech