Good point on the modules... I'll have to think about that a bit more....
but it does seem to be the a problem.
Andi Kleen <ak@... on 03/29/2002 01:32:05 AM
Sent by: lse-tech-admin@...
To: James Washer/Beaverton/IBM@...
cc: mjbligh@..., "Luck, Tony" <tony.luck@...>, lse-tech
Subject: Re: [Lse-tech] RE: NUMA kernel text replication
[Mixed answer to James and Martin]
On Thu, Mar 28, 2002 at 09:34:32PM -0800, James Washer wrote:
> As I said ( in private mail to Martin ) earlier, I believe using the CS
> segment selector to offset the kernels address on each node will give you
> the ability to duplicate the kernel text on each node... with very little
> overhead... No need to worry about kernel context on processes that
> across nodes, etc...
On some x86 CPUs a non zero code segment base adds a cycle or two to any
non relative jump instructions. I don't know if it's the case on the x86
you're using, but it may be worth finding out (?)
> > On ia64 I don't have the task migration problem that you describe,
> > the mapping for this per-node area of memory doesn't appear in any
> > page table (in fact it isn't in any in-memory page table at all, it
> > exists in the TLB in a pair of "tr" registers, one for code references,
> > other for data references ... these registers lock the translation so I
> > never have to worry about having a TLB miss and needing to reload
> Hey, no fair, you're using hardware support! ;-) Cool trick though.
> We started looking at the ia32 for similar support, and John Stultz
> out the PGE (page global enable) flag, which seems to pin entries into
> tlb cache in much the same way - has anyone actually played with this,
> and can confirm that it hard pins stuff into the TLB?
I don't think G pages will help you. They would only really pin
if you had ever only 4 4MB pages ever (and you're hardcoding the number of
into your kernel, it's in no way specified in the architecture). But the
kernel uses much more 4MB pages for its direct mapping and when these are
used your "pinned" TLBs could be thrown out. Not using 4MB pages for the
direct mapping would likely cost you more than what text replication would
gain you, it would add a lot more TLB thrashing to the kernel.
> I believe we have 4MB pages for the kernel addr part of space, so we'd
> have to align kernel text to a 4MB section of memory, and pin an entry
> into the TLB cache where it looks like we only have 4 entries for large
> pages (a little scary) but it doesn't look too bad .....
One problem with that is that it doesn't work very well with modules
(even when you keep free the kernel 4MB mapping and allocate modules
linearly there). Not supporting modules efficiently means that you
cannot use distribution kernels directly.
Lse-tech mailing list