Thread: Re: [Lse-tech] Hair brained scheme for grouping CpuMemSets

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Fri, 18 Jan 2002, Paul Jackson stated:

> The remaining open items in the CpuMemSets design:
>
>   http://sourceforge.net/docman/display_doc.php?docid=8463&group_id=8875
>
> are:
>
>  1) how do we group the users of CpuMemSets (the processes and vm areas),
>  2) how do we group the resources (cpus and memory blocks), and
>  3) in particular, how do we handle reconfiguring after removing a cpu?
>
> Proposed Solutions:
>
>  1) I think that any adequate solution to (1):
>
>      1a) must be hierarchical (groups and subgroups)

Why?

>      1b) must be inherited (across fork and vm area creation),

No, but it must be possible to request inheritance (e.g., through
the use of launch policies, or through specific system calls)

>      1c) requires the kernel to restrict change authorization
>        to group boundaries,

Huh?

>      1d) requires the kernel to support the automatic
>        inheritance across process and vm area creation, and

Not automatic inheritance, but optional inheritance.

>      1e) requires the kernel to support bulk operations to change
>        maps and sets for all members of a group atomically.

Or, alternatively, requires all members of a group to use the same maps.

> ..remaining post deleted..

I have previously promised (threatened) to post my views on process
grouping to this forum, fully intending to post last week but was
sidetracked.  Below is what I have put together.  It addresses directly
the first of the three open items, indirectly the second, and so
far, ignores the third.

Thoughts on grouping of processes for NUMA performance:

The goal is to ensure that 2 or more processes execute on the same
NUMA node, and if one process migrates to another node, all grouped
processes also migrate.  This grouping is referred to as process
association in the following discussion.

The reason - process association has shown significant performance
benefits on NUMA systems with dynamic, database-oriented workloads.

Issues:

* CpuMemSets does not provide the concept of a node.  It works on the
  association of a memory block with one or more CPUs.  The only way to
  force a set of processes to always reside on the same node is to only
  provide one node in their cpumemmap or cpumemset.  This is too restrictive.

* Currently there is no support for process migration across NUMA nodes.
  That infrastructure needs to be established.

* Assuming process migration, process association, and CpuMemSets all exist,
  how does the system handle associated processes in respect to their
  CpuMemSets?  Should all associated processes use the same set?  And
  if so which process controls it?  At the least, a mechanism needs to
  be in place to a) allow sharing of cpumemsets; b) force the sharing
  of cpumemsets; and c) establish and enforce ownership rights over
  cpumemsets.

  If, on the other hand, all associated processes do not use the same
  cpumemset, then at process migration time the set of associated
  processes must have their cpumemsets scanned, and the association
  may only be migrated to a node that is represented in all of the
  cpumemsets.

* How is the system to determine what constitutes a node for the purpose
  of process migration?  This is being addressed in the topology design,
  but how best to incorporate this concept into CpuMemSets?

Ideas:

* First, we need to establish the concept of a NUMA node.  While this
  has been intentionally kept out of CpuMemSets, I believe it necessary
  to provide process association.  Getting topology info from the topology
  subsystem is fine, but is it practical to have to scan through the
  topology structures everytime that a decision is needed regarding nodes
  and process association?  Thinking through the steps, one must go
  through the cpumemset, then the cpumemmap to obtain a system processor
  number.  From there the topology structures must be scanned to find
  the processor and the node it resides on.  This lookup needs to be
  done at least twice (once for current processor, once for target),
  and would be happening from the scheduling routines.  This seems like
  a good spot to provide a shortcut to obtain this information.  Can
  we come up with a way to keep node info in the cpumemmap?

* Process migration can be implemented completely independent of CpuMemSets.
  CpuMemSets, though, would provide policy concerning what resources process
  may be migrated to.

* Assuming we solve the previous two problems, then my preference would be
  to have associated processes share a cpumemmap/cpumemset with modification
  rights restricted (to whom?).

Summary:

Paul Jackson's "Hare-braned" solution, while intriguing, is perhaps
larger than what is needed in respect to process grouping.  I propose
a smaller, simpler solution that ties into CpuMemSets.  Does this meet
the needs of potential users?

Thread: Re: [Lse-tech] Hair brained scheme for grouping CpuMemSets

lse-tech