From: Gerrit H. <ge...@us...> - 2001-11-17 19:21:19
|
Andrew, generally I think a good patch. The problem you are describing has been happening occasionally in some of our benchmarking. In a few cases, we've upgraded a lock from spin_lock to spin_lock_irq to avoid the problem. We've also considered replacing the spin_lock body with the code from spin_lock_irq and making spin_lock_irq be a macro defined to spin_lock (make all spin locks ignore interrupts). However, this means that when a device raises an interrupt, [on Intel] the IO APIC will select a CPU to deliver that interrupt to. With interrupts disabled at the CPU, the device interrupt will remain pending until the CPU returns from the lock, delaying the delivery of the interrupt (increasing the latency of interrupt delivery). One solution that we used in a past OS was to actually mark the CPUs IO APIC as "unwilling to handle an interrupt" so that the IO APIC would select a different CPU to handle the device interrupt. We have that patch ported to 2.4.X (6, 9, 14?) and it turns out that it is going to be very important for Foster [on the Pentium III and lower IO APICs, the CPU is chosen in a "round robin" fashion. In Foster/Pentium 4, it appears that all interrupts are deliverd only to CPU 0 by the IO APIC]. This would *seem* to be a better solution to the general problem (it goes even further than your current proposed solution). However, in some benchmarking, we haven't found as much of an increase as expected. And, it turns out that IRQ binding and CPU binding have a greater effect because of cache effects, then simply reducing the latency. However, in principle, it seems that this approach would be a better general approach. Are you at all interested in evaluating the patch yourself? We are currently wondering if we've had some oversight in the patch that keeps the results from being as good as we expected - the goal should be reduced interrupt delivery latency, which should improve network and disk IO throughput/responsiveness. If so, Dave Olien @ IBM can supply the patch. I may also consider including your current patch in the LSE roll up patch that we are working on as I would expect that it has some measurable benefit. I'll also see if our performance team has the bandwidth to do some isolated testing on this. Would you be able to recommend any particular benchmark you'd like to see this tested with? gerrit > If a CPU take an interrupt while holding a contended spinlock, > the cost of the interrupt is borne not only by the CPU which takes > it, but also by all the CPUs which are waiting for the lock. > > So here's an x86 patch which disables cpu-local interrupts > whenever that CPU is holding a spinlock. > > I don't have enough CPUs to measure its effects, but perhaps > someone who has an 8-way may be interested in trying it with > a lock- and interrupt-intensive workload, see what happens? > > > --- linux-2.4.15-pre5/include/asm-i386/spinlock.h Mon Nov 5 21:01:12 2001 > +++ linux-akpm/include/asm-i386/spinlock.h Fri Nov 16 18:58:49 2001 > @@ -25,6 +25,7 @@ extern int printk(const char * fmt, ...) > > typedef struct { > volatile unsigned int lock; > + unsigned long flags; > #if SPINLOCK_DEBUG > unsigned magic; > #endif > @@ -79,15 +80,18 @@ typedef struct { > > static inline void spin_unlock(spinlock_t *lock) > { > + unsigned long flags; > #if SPINLOCK_DEBUG > if (lock->magic != SPINLOCK_MAGIC) > BUG(); > if (!spin_is_locked(lock)) > BUG(); > #endif > + flags = lock->flags; > __asm__ __volatile__( > spin_unlock_string > ); > + local_irq_restore(flags); > } > > #else > @@ -99,6 +103,7 @@ static inline void spin_unlock(spinlock_ > > static inline void spin_unlock(spinlock_t *lock) > { > + unsigned long flags; > char oldval = 1; > #if SPINLOCK_DEBUG > if (lock->magic != SPINLOCK_MAGIC) > @@ -106,9 +111,11 @@ static inline void spin_unlock(spinlock_ > if (!spin_is_locked(lock)) > BUG(); > #endif > + flags = lock->flags; > __asm__ __volatile__( > spin_unlock_string > ); > + local_irq_restore(flags); > } > > #endif > @@ -116,11 +123,15 @@ static inline void spin_unlock(spinlock_ > static inline int spin_trylock(spinlock_t *lock) > { > char oldval; > + int ret; > __asm__ __volatile__( > "xchgb %b0,%1" > :"=q" (oldval), "=m" (lock->lock) > :"0" (0) : "memory"); > - return oldval > 0; > + ret = (oldval > 0); > + if (ret) > + local_irq_save(lock->flags); > + return ret; > } > > static inline void spin_lock(spinlock_t *lock) > @@ -136,6 +147,7 @@ printk("eip: %p\n", &&here); > __asm__ __volatile__( > spin_lock_string > :"=m" (lock->lock) : : "memory"); > + local_irq_save(lock->flags); > } > > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech > > |
From: Andrew M. <ak...@zi...> - 2001-11-19 19:19:44
|
Gerrit Huizenga wrote: > > ... Hi, Gerrit. Interesting discussion. I mainly tossed that out to see what people thought, whether this scenario has been investigated. Sounds like it has. Regarding interrupt latency from this approach: If CPUs are taking interrupts while holding spinlocks to a significant extent then disabling interrupts inside spinlocks will increase latency as well as decrease lock contention. I have a gut feel that the benefit from the latter will exceed the cost of the former? Another approach here would be to increment a counter in the task_struct somewhere when a lock is taken, peek at that counter in do_IRQ() and generate statistics as to how often an interrupt is taken inside a locked region. > > I'll also see if our performance team has the bandwidth to do some > isolated testing on this. I'd be interested, if it's not too much work. All I was able to do was verify that the kernel still works - I don't expect it to make any performance difference on a puny 2-way. The system-wide cost of an interrupt-inside-lock is proportional to the number of CPUs. > Would you be able to recommend any particular benchmark you'd > like to see this tested with? Something with lots of interrupts and lock contention :) I guess netbench, SFS, webbench, especially with TUX would all be candidates for suffering from this effect. |
From: george a. <ge...@mv...> - 2001-11-19 19:29:38
|
Andrew Morton wrote: > > Gerrit Huizenga wrote: > > > > ... > > Hi, Gerrit. > > Interesting discussion. I mainly tossed that out to see what people > thought, whether this scenario has been investigated. Sounds like it > has. > > Regarding interrupt latency from this approach: > > If CPUs are taking interrupts while holding spinlocks to a significant > extent then disabling interrupts inside spinlocks will increase latency > as well as decrease lock contention. I have a gut feel that the benefit > from the latter will exceed the cost of the former? > > Another approach here would be to increment a counter in the task_struct > somewhere when a lock is taken, peek at that counter in do_IRQ() and > generate statistics as to how often an interrupt is taken inside a locked > region. Uh, the preempt patch does this. Now all you need is the peek code. (It is also incremented by the interrupt itself, so you need to look at relative levels, but it is there.) -- George ge...@mv... High-res-timers: http://sourceforge.net/projects/high-res-timers/ Real time sched: http://sourceforge.net/projects/rtsched/ |