From: Paul McKenney <Paul.McK<enney@us...> - 2000-12-21 23:09:06
We have based the priority decision on SPL and process priority--both may
be stored into the APIC. Idle CPUs have the least-favored possible process
priority and in addition run at the lowest possible SPL. I can understand
reluctance to bury this into core Linux, given that it depends on special
hardware (if you can call something as widely available as the Intel APIC
"special"!), but it would be good to be able to make use of this feature on
hardware that supports it.
That said, I do understand your earlier point about directing the interrupt
to the CPU that ran the process that initiated the event that provoked the
interrupt. There is an interesting tradeoff between cache affinity and
load balancing here! Careful benchmarking will clearly be needed to
evaluate potential solutions.
Andi Kleen <ak@...>@lists.sourceforge.net on 12/21/2000 02:51:39 PM
Sent by: lse-tech-admin@...
To: Tim Wright <timw@...>
cc: Andi Kleen <ak@...>, Tim Hockin <thockin@...>,
npollitt@..., lse-tech@..., slinx@...
Subject: Re: [Lse-tech] fwd: Process Pinning
On Thu, Dec 21, 2000 at 02:37:39PM -0800, Tim Wright wrote:
> Sorry, yes.
> Just like the traditional Unix interrupt priority levels. In DYNIX/ptx at
> we even have the same number of levels viz SPL0 through SPL7 (aka splhi :
> Obviously, any spinlocks that can be used in interrupt context have to
> interrupts at the local CPU to prevent a livelock when the interrupt
> happens to land on a CPU that already has the lock. We maintained the
> priority level hierarchy because it allows you greater flexibility in who
> give the best latency to. Again, it might be argued that this introduces
> needless complexity, but it would be interesting to test and see.
I doubt that there would be many chances to get generally visible spl
Linux -- Torvalds et.al. have a near religious aversion against them.
I am also not sure I understand how the spl level is related to the load of
the CPU. Do you have a special SPL level for the idle thread ?
I suspect you could just set the APIC in the idle thread in Linux, to
interrupts to idle cpus, but this would only be a win when the interrupt
is more costly than the cost of transfering the context changed by the CPU
to the final CPU that runs the consumer.
Is that true for all the NUMA/SMP boxes?
Lse-tech mailing list
>> Also in the block IO space, we used NUMA-aware IO scheduling and a
>> multipath IO fabric (usually at least one path to each block device
>> from each node) and ensured that IO traffic was rarely sent across
>> a node boundary (avoiding the NUMA interconnect).
> What exactly was scheduled? It sounds like a similar problem to the
> dynamic NIC-IRQ binding.
In Dynix/ptx, Multi-path I/O drivers route I/O requests based on NUMA
node affinity of the I/O buffers. The idea is to reduce likely NUMA
traffic when your controller is in one node and the I/O buffers are in
It is different from the NIC-IRQ binding issue in the sense that somewhat
slower nature of the NUMA interconnect and large DMAs made node affinity
preferable to any other schemes like load balancing.