From: Hanna Linder <hannal@us...> - 2002-03-22 23:17:43
Minutes from 3/22 lse conference call
Martin Bligh - Hybrid user-kernel virtual address space
See the thread on linux-kernel for more information:
Normally kernel and user address spaces break at the 3gb barrier.
One problem is the need to shift page tables into high mem to alter them
Martin would like to make a hybrid user-kernel address space that is
per task like user space and protected like kernel space. It need not be
very big, a few MB. This area could come from either the current user
space area, or from the current kernel area - initial thoughts are to take
it from the current user area, above the user stack.
This area would have several possible uses:
1. A place to put the user pagetables.
2. An area where an efficient, scalable, per-task kmap could be created.
3. Possibly moving the task's kernel stack into this space
4. Possibly moving the task_struct into this space.
There are two current known problems:
1. If we take the memory from the current user space, the pagetables are
not really per-task they are per-address space.
2. Not all accesses to the task's page tables are done from within the
task's context (eg. swapout).
Current plans for these problems:
1. To create a true per task mapping is thought to require a seperate PGD
for each task, which would be somewhat inefficient for multi-threaded
processes. A per address-space mapping is much simpler, and provides for
the user pagetable usage (though not the other usage). Stage 1 will be
to implement the per-address space area. We could implment a per
address-spece kmap, which would not be as good as a per-task kmap (in that
it would still require locking), but would be much more scalable than the
2. Initial plan is to use atomic kmap, as currently done by the
pte-highmem patch. Martin noted that kswapd doesn't use the current 3Gb
user area, and this could be used as a huge kmap pool. Dave pointed out
that the address mapping is shared between all kernel threads, but thought
this could be easily fixed.
Going back to the usage list, items 3 and 4 are much more problematic,
Waitqueues are currently put into the processes kernel stack, but globally
accessed. Task structurea are used globally as well, but it may be
advantageous to do a secondary mapping into the user-kernel area in order
to locate the structure at a fixed virtual address (though this requires
the task_struct to be page aligned). Bill pointed out that some architecture
do virtual caching instead of physical caching so wouldnt help them much.
There was some discussion of other methods of deriving current that were
cleaner that the current kernel stack address trick.
Martin mentioned that the NUMA kernel text replication would pose similar
problems to the per-task user-kernel adress space, and there was some
ensuing discussion, including the fact that ia64 already does this, but
nobody present knew how this was done.
Bill Irwin - status of pagemap_lru_lock subdivision in the -rmap VM
The page map lru lock protects a variety of nebulous
things which caused races and is hard to maintain. Basically the
locks seemed to be a major point of contention in large memory
Recently found a race that stopped us from running very long.
Looks like some other races are triggerable. With this patch the
system now runs for a while.
Rik said it might be fun to try two zones. a primary zone
and a fall back zone which is used only if page allocation bound
system. Might not be useful but would be fun to try.
Martin said the problem with breaking up into too many zones
is it might take too long to find pages.
Pat asked Bill if the reason he wanted her discontig mem patch
is to fake an smp system into using multiple zones like a numa. Bill
said yes that is why.
Bill would like to see lockmeter data on Martins NUMA machine
running the pagemap lru patches.
Hanna Linder - Get it right people
Hanna verified Arjan van de Ven's first name is pronounced:
ar-ian not ar-jan
minutes compiled by hannal@... with significant editing of his
section by mjbligh@...