[Lse-tech] Re: NUMA: Simple binding API, rev 0.3

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Mon, 11 Feb 2002 Paul Jackson wrote:

> 1) Two details of the API still concern me -- the placeholder
>    numamap argument and the 64 bit limit on number of cpus
>    (without at least an API change such as you say Russ Wright
>    is considering).  Neither placeholders nor such potential API
>    changes stand the test of time, in my experience.

Martin Bligh had a couple of suggestions for how to deal with
the cpu_t.  I'll probably take his suggestion of using a cpu_t
and defining that as a long.  Then when a real cpu_t is defined
it should be a trivial transition.
>
>    How about instead (being controversial ... brainstorming):
>    a] build the simple binding only on top of CpuMemSets,
>    b] with the simple binding _only_ using application numbering,
>       not system numbering,
>    c] drop the numamap placeholder argument, and
>    d] accept forever afterward that simple binding only manages
>       up to 64 (or 32) cpus, memory blocks, and nodes.
>
>    The last item, [d], doesn't keep you from running on a
>    larger system -- just keeps a given application from using
>    the simple binding to manage a larger set of cpus.

Well, that is always one approach.  For now, I prefer to keep
this separate from CpuMemSets.
>
> 2) You observe that there is no node number mapping, so all
>    node numbers are physical.  Do you think that there should
>    be a node number mapping?

We thought about node mapping for a few minutes and then realized
that the point of the API was to provide a means for applications
to place their resources on specific nodes.  Mapping the nodes
makes this problematic, and we could see no benefit.
>
> 3) Typo:
>
>    Under restrictmemblk(... memblk, ... numamap) you wrote:
>
>     If the memblk bitmask adds CPUs and the user is not root ...
>
>    where I suspect you intended:
>
>     If the memblk bitmask adds blocks and the user is not root ...

Yes, typo on my part.

>
> 4) You state that getcpu() does the same thing as cmsGetCpu().
>    If CpuMemSets is _not_ present, then I have a slightly subtle
>    concern here.  Your simple binding makes certain promises to
>    its user that all CPUs within one node will have numbers in
>    a range that does not overlap with the CPU numbers of any
>    other node.  However I am not aware of any promise by the
>    kernel, across all architectures, to honor this numbering of
>    physical CPU numbers.  Seems to me that you require some
>    sort of abstract binding mechanism to ensure a favorable
>    CPU numbering.

Actually, this numbering scheme is a requirement being placed on
the kernel to support NUMA.  This was put out last year with no
loud objections, and is explained in more detail in the rationale.
>
> 5) The shell script binding using 'runon' looks interesting,
>    and I can imagine that most of the simple binding API can
>    and should be exposed as runon options. I can also imagine
>    Python (and Perl) modules that implement the simple API, and
>    (a heretical thought) coding the runon command in Python
>    using this module.
>
> 6) You state that "Launch policies are not inherited" (with
>    emphasis on the 'not').  At first reading, this confuses me.
>    Are you saying, at least in part, that if:
>       1. Process P1 (operating under policy x1) sets a launch policy x2
>       2. Process P1 forks P2
>       3. Process P2 modifies its own policy (rebinds) to policy x3
>       4. Process P2 forks P3
>    then P3 starts with a policy of x3 (its parent P2s policy, since
>    P2 didn't setlaunch any other launch policy), _not_ with a policy
>    of x2 (which would be the inherited launch policy, if such were
>    inherited)?  If so -- good.  With the minor note that usually
>    when I find myself writing something with "not" italicized,
>    I usually end up later rewriting that sentence in a more direct
>    manner.

Your understanding is correct, and also spells out quite clearly why
launch policies are not inherited.
>
> 7) You state:
>
>     ... the binding in effect for the process or thread that
>     executes the first page fault on a given page of memory
>     determines the binding for that page ... see the rationale
>
>    I don't see any "rationale".  And I'd think it would be
>    more stable to have the creator of the memory region determine
>    its memory policy, not the first faulter.

The "rationale" is a link.  Click on it and it will take you to
a separate document that has, amongst other things, a section
titled "Conflicting Memory Bindings".  It also has quite a bit
of discussion and examples, of the processor/node numbering
scheme.

Michael Hohnbaum
hoh...@us...