Earlier, I wrote to Dinakar:
> What are your invariants, and how can you assure yourself and us
> that your code preserves these invariants?
I repeat that question.
On my first reading of your example, I see the following.
It is sinking into my dense skull more than it had before that your
patch changes the meaning of the cpuset field 'cpus_allowed', to only
include the cpus not in isolated children. However there are other uses
of the 'cpus_allowed' field in the cpuset code that are not changed, and
comments and documentation describing this field that are not changed.
I suspect this is an incomplete change.
You don't actually state it that I noticed, but the main point of your
example seems to be that you support incrementally moving individual
cpus between cpusets, without the constraint that both cpusets be in the
same subset of the partition (the same isolation group). So you can
move a cpu in and out of an isolated group without tearing down the
group down first, only to rebuild it after.
To do this, you've added new semantics to some of the operations to
write the 'cpus' special file of a cpuset, if and only if that cpuset is
marked isolated, which involves changing some other masks. These new
semantics are something along the lines of "adding a cpu here implies
removing it from there. This presumably allows you to move cpus in or
out of or between an isolated cpuset, while preserving the essential
properties of a partition - that it is a disjoint covering.
> He removes cpus 4-5 from batch and adds them to cint
Could you spell out the exact steps the user would take, for this part
of your example? What does the user do, what does the kernel do in
response, and what state the cpusets end up in, after each action of the
So far, to be honest, I am finding your patch to be rather frustrating.
Perhaps the essential reason is this. The interface that cpusets
presents in the cpuset file system, mounted at /dev/cpuset, is not in my
intentions primarily a human interface. It is primarily a programmatic
As such, there is a high premium on clarity of design, consistency of
behaviour and absence of side affects. Each operation should do one
thing, clearly defined, changing only what is operated on, preserving
clearly spelled out invariants.
If it takes three steps instead of one to accomplish a typical task,
that's fine. The programs that layer on top of /dev/cpuset don't mind
doing three things to get one thing done. But such programs are a pain
in the backside to program correctly if the affects of each operation
are not clearly defined, not focused on the obvious object being
operated on, or not precisely consistent with an overriding model.
This patch seems to add side affects and the change the meanings of
things, doing so with the most minimum of mention in the description,
without clearly and consistently spelling out the new mental model, and
without uniformly changing all uses, comments and documentation to fit
the new model.
This cpuset facility is also a less commonly used kernel facility, and
changes to cpusets, outside of a few key hooks in the scheduler and
allocator, are not performance critical. This means that there is a
premium in keeping the kernel code minimal, leaving as many details as
practical to userland. This patch seems to increase the kernel text
size, for an ia64 SN2 build using gcc 3.2.3 of a 2.6.12-rc1-mm4 tree I
had at hand, _just_ for the cpuset.c changes, from 23071 bytes to 28999.
That's over a 25% per-cent increase in the kernel text size of the file
kernel/cpuset.o, just for this feature. That's too much, in my view.
I don't know yet if the ability to move cpus between isolated sched
domains without tearing them down and rebuilding them, is a critical
feature for you or not. You have not been clear on what are the
essential requirements of this feature. I don't even know for sure yet
that this is the one key feature in your view that separates your
proposal from the variations I explored.
But if this is for you the critical feature that your proposal has, and
mine lack, then I'd like to see if there is a way to do it without
implicit side affects, without messing with the semantics of what's
there now, and with significantly fewer bytes of kernel text space. And
I'd like to see if we can have uniform and precisely spelled out
semantics, in the code, comments and documentation, with any changes to
the current semantics made everywhere, uniformly.
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <pj@...> 1.650.933.1373, 1.925.600.0401