Thread: RE: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3 | Linux Scalability Effort

lse-tech

RE: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Nakajima, J. <jun...@in...> - 2004-03-25 15:16:00

We have found some performance regressions (e.g. SPECjbb) with the
scheduler on a large IA-64 NUMA machine, and we are debugging it. On SMP
machines, we haven't seen performance regressions.

Jun

>-----Original Message-----
>From: Andi Kleen [mailto:ak...@su...]
>Sent: Wednesday, March 24, 2004 8:56 PM
>To: Ingo Molnar
>Cc: pi...@cy...; lin...@vg...;
ak...@os...;
>ke...@ko...; ru...@ru...; Nakajima, Jun;
>ric...@us...; an...@sa...; lse...@li...;
>mb...@ar...
>Subject: Re: [Lse-tech] [patch] sched-domain cleanups,
sched-2.6.5-rc2-mm2-
>A3
>
>On Thu, 25 Mar 2004 09:28:09 +0100
>Ingo Molnar <mi...@el...> wrote:
>
>> i've reviewed the sched-domains balancing patches for upstream
inclusion
>> and they look mostly fine.
>
>The main problem it has is that it performs quite badly on Opteron NUMA
>e.g. in the OpenMP STREAM test (much worse than the normal scheduler)
>
>-Andi

RE: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Nakajima, J. <jun...@in...> - 2004-03-25 15:32:32

Andi,

Can you be more specific with "it doesn't load balance threads
aggressively enough"? Or what behavior of the base NUMA scheduler is
missing in the sched-domain scheduler especially for NUMA?

Jun

>-----Original Message-----
>From: Andi Kleen [mailto:ak...@su...]
>Sent: Thursday, March 25, 2004 3:47 AM
>To: Rick Lindsley
>Cc: Andi Kleen; Ingo Molnar; pi...@cy...; linux-
>ke...@vg...; ak...@os...; ke...@ko...;
>ru...@ru...; Nakajima, Jun; an...@sa...; lse-
>te...@li...; mb...@ar...
>Subject: Re: [Lse-tech] [patch] sched-domain cleanups,
sched-2.6.5-rc2-mm2-
>A3
>
>On Thu, Mar 25, 2004 at 03:40:22AM -0800, Rick Lindsley wrote:
>>     The main problem it has is that it performs quite badly on
Opteron
>NUMA
>>     e.g. in the OpenMP STREAM test (much worse than the normal
scheduler)
>>
>> Andi, I've got some schedstat code which may help us to understand
why.
>> I'll need to port it to Ingo's changes, but if I drop you a patch in
a
>> day or two can you try your test on sched-domain/non-sched-domain,
>> collecting the stats?
>
>The openmp failure is already pretty well understood - it doesn't load
>balance
>threads aggressively enough over CPUs after startup.
>
>-Andi

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Andi K. <ak...@su...> - 2004-03-25 15:40:24

On Thu, Mar 25, 2004 at 07:31:37AM -0800, Nakajima, Jun wrote:
> Andi,
> 
> Can you be more specific with "it doesn't load balance threads
> aggressively enough"? Or what behavior of the base NUMA scheduler is
> missing in the sched-domain scheduler especially for NUMA?

It doesn't do load balance in wake_up_forked_process()  and is relatively
non aggressive in balancing later. This leads to the multithreaded OpenMP
STREAM running its childs first on the same node as the original process
and allocating memory there. Then later they run on a different node when
the balancing finally happens, but generate  cross traffic to the old node, 
instead of using the memory bandwidth of their local nodes.

The difference is very visible, even the 4 thread STREAM only sees the
bandwidth of a single node. With a more aggressive scheduler you get
4 times as much.

Admittedly it's a bit of a stupid benchmark, but seems to representative
for a lot of HPC codes.

-Andi

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Ingo M. <mi...@el...> - 2004-03-25 19:09:15

* Andi Kleen <ak...@su...> wrote:

> It doesn't do load balance in wake_up_forked_process() and is
> relatively non aggressive in balancing later. This leads to the
> multithreaded OpenMP STREAM running its childs first on the same node
> as the original process and allocating memory there. [...]

i believe the fix we want is to pre-balance the context at fork() time. 
I've implemented this (which is basically just a reuse of
sched_balance_exec() in fork.c, and the related namespace cleanups), 
could you give it a go:

  http://redhat.com/~mingo/scheduler-patches/sched-2.6.5-rc2-mm2-A5

another solution would be to add SD_BALANCE_FORK.

also, the best place to do fork() blancing is not at
wake_up_forked_process() time, but prior doing the MM copy. This patch
does it there. At wakeup time we've already copied all the pagetables
and created tons of dirty cachelines.

	Ingo

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Andi K. <ak...@su...> - 2004-03-25 19:20:56

On Thu, 25 Mar 2004 20:09:45 +0100
Ingo Molnar <mi...@el...> wrote:

> also, the best place to do fork() blancing is not at
> wake_up_forked_process() time, but prior doing the MM copy. This patch
> does it there. At wakeup time we've already copied all the pagetables
> and created tons of dirty cachelines.

That won't help for threaded programs that use clone(). OpenMP is such a case.

-Andi

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Ingo M. <mi...@el...> - 2004-03-25 19:38:47

* Andi Kleen <ak...@su...> wrote:

> That won't help for threaded programs that use clone(). OpenMP is such
> a case.

yeah, agreed. Also, exec-balance, if applied to fork(), would migrate
the parent which is not what we want. We could perhaps migrate the
parent to the target CPU, copy the context, then migrate the parent back
to the original CPU ... but this sounds too complex.

	Ingo

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Ingo M. <mi...@el...> - 2004-03-25 20:46:34

* Andi Kleen <ak...@su...> wrote:

> That won't help for threaded programs that use clone(). OpenMP is such
> a case.

this patch:

        redhat.com/~mingo/scheduler-patches/sched-2.6.5-rc2-mm3-A4

does balancing at wake_up_forked_process()-time.

but it's a hard issue. Especially after fork() we do have a fair amount
of cache context, and migrating at this point can be bad for
performance.

	Ingo

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Andi K. <ak...@su...> - 2004-03-29 08:46:04

On Thu, Mar 25, 2004 at 09:30:32PM +0100, Ingo Molnar wrote:
> 
> * Andi Kleen <ak...@su...> wrote:
> 
> > That won't help for threaded programs that use clone(). OpenMP is such
> > a case.
> 
> this patch:
> 
>         redhat.com/~mingo/scheduler-patches/sched-2.6.5-rc2-mm3-A4
> 
> does balancing at wake_up_forked_process()-time.
> 
> but it's a hard issue. Especially after fork() we do have a fair amount
> of cache context, and migrating at this point can be bad for
> performance.

I ported it by hand to the -mm4 scheduler now and tested it. While
it works marginally better than the standard -mm scheduler 
(you get 1 1/2 the bandwidth of one CPU instead of one) it's still
still much worse than the optimum of nearly 4 CPUs archived by
2.4 or the standard scheduler.

-Andi

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Rick L. <ric...@us...> - 2004-03-29 10:22:02

I've got a web page up now on my home machine which shows data from
schedstats across the various flavors of 2.6.4 and 2.6.5-rc2 under
load from kernbench, SPECjbb, and SPECdet.

    http://eaglet.rain.com/rick/linux/sched-domain/index.html

Two things that stand out are that sched-domains tends to call
load_balance() less frequently when it is idle and more frequently when
it is busy (as compared to the "standard" scheduler.)  Another is that
even though it moves fewer tasks on average, the sched-domains code shows
about half of pull_task()'s work is coming from active_load_balance() ...
and that seems wrong.  Could these be contributing to what you're seeing?

Rick

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Andi K. <ak...@su...> - 2004-03-29 10:30:36

On Mon, 29 Mar 2004 02:20:58 -0800
Rick Lindsley <ric...@us...> wrote:

> I've got a web page up now on my home machine which shows data from
> schedstats across the various flavors of 2.6.4 and 2.6.5-rc2 under
> load from kernbench, SPECjbb, and SPECdet.
> 
>     http://eaglet.rain.com/rick/linux/sched-domain/index.html
> 
> Two things that stand out are that sched-domains tends to call
> load_balance() less frequently when it is idle and more frequently when
> it is busy (as compared to the "standard" scheduler.)  Another is that
> even though it moves fewer tasks on average, the sched-domains code shows
> about half of pull_task()'s work is coming from active_load_balance() ...
> and that seems wrong.  Could these be contributing to what you're seeing?

Sounds quite possible yes.

-Andi

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Nick P. <nic...@ya...> - 2004-03-29 11:28:48

Rick Lindsley wrote:
> I've got a web page up now on my home machine which shows data from
> schedstats across the various flavors of 2.6.4 and 2.6.5-rc2 under
> load from kernbench, SPECjbb, and SPECdet.
> 
>     http://eaglet.rain.com/rick/linux/sched-domain/index.html
> 

I can't see it

> Two things that stand out are that sched-domains tends to call
> load_balance() less frequently when it is idle and more frequently when
> it is busy (as compared to the "standard" scheduler.)  Another is that

John Hawkes noticed problems here too. mm5 has a patch to
improve this for NUMA node balancing. No change on non-NUMA
though if that is what you were testing - we might need to
tune this a bit if it is hurting.

> even though it moves fewer tasks on average, the sched-domains code shows
> about half of pull_task()'s work is coming from active_load_balance() ...

Yeah this is wrong and shouldn't be happening. It would have been
due to a bug in the imbalance calculation which is now fixed.

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Nick P. <nic...@ya...> - 2004-03-29 11:20:23

Andi Kleen wrote:
> On Thu, Mar 25, 2004 at 09:30:32PM +0100, Ingo Molnar wrote:
> 
>>* Andi Kleen <ak...@su...> wrote:
>>
>>
>>>That won't help for threaded programs that use clone(). OpenMP is such
>>>a case.
>>
>>this patch:
>>
>>        redhat.com/~mingo/scheduler-patches/sched-2.6.5-rc2-mm3-A4
>>
>>does balancing at wake_up_forked_process()-time.
>>
>>but it's a hard issue. Especially after fork() we do have a fair amount
>>of cache context, and migrating at this point can be bad for
>>performance.
> 
> 
> I ported it by hand to the -mm4 scheduler now and tested it. While
> it works marginally better than the standard -mm scheduler 
> (you get 1 1/2 the bandwidth of one CPU instead of one) it's still
> still much worse than the optimum of nearly 4 CPUs archived by
> 2.4 or the standard scheduler.
> 

OK there must be some pretty simple reason why this is happening.
I guess being OpenMP it is probably a bit complicated for you to
try your own scheduling in userspace using CPU affinities?
Otherwise could you trace what gets scheduled where for both
good and bad kernels? It should help us work out what is going
on.

I wonder if using one CPU from each quad of the NUMAQ would be
give at all comparable behaviour...

If it isn't a big problem, could you test with -mm5 with the
generic sched domain? STREAM doesn't take long, does it?
I don't expect much difference, but the code is in flux while
Ingo and I try to sort things out.

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Andi K. <ak...@su...> - 2004-03-29 11:24:32

On Mon, 29 Mar 2004 21:20:12 +1000
Nick Piggin <nic...@ya...> wrote:

> > 
> > I ported it by hand to the -mm4 scheduler now and tested it. While
> > it works marginally better than the standard -mm scheduler 
> > (you get 1 1/2 the bandwidth of one CPU instead of one) it's still
> > still much worse than the optimum of nearly 4 CPUs archived by
> > 2.4 or the standard scheduler.
> > 
> 

Sorry ignore this report - I just found out I booted the wrong
kernel by mistake. Currently retesting, also with the proposed change
to only use a single scheduling domain.

-Andi

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Ingo M. <mi...@el...> - 2004-03-29 12:00:00

* Andi Kleen <ak...@su...> wrote:

> Sorry ignore this report - I just found out I booted the wrong kernel
> by mistake. Currently retesting, also with the proposed change to only
> use a single scheduling domain.

here are the items that are in the works:

  redhat.com/~mingo/scheduler-patches/sched.patch

it's against 2.6.5-rc2-mm5. This patch also reduces the rate of active
balancing a bit.

	Ingo

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Andi K. <ak...@su...> - 2004-03-29 20:31:02

On Mon, 29 Mar 2004 13:46:35 +0200
Ingo Molnar <mi...@el...> wrote:

> 
> * Andi Kleen <ak...@su...> wrote:
> 
> > Sorry ignore this report - I just found out I booted the wrong kernel
> > by mistake. Currently retesting, also with the proposed change to only
> > use a single scheduling domain.
> 
> here are the items that are in the works:
> 
>   redhat.com/~mingo/scheduler-patches/sched.patch
> 
> it's against 2.6.5-rc2-mm5. This patch also reduces the rate of active
> balancing a bit.

I applied only this patch and it did slightly better than the normal -mm* 
1.5 - 2x CPU bandwidth, but still very short of the 3.7x-4x mainline
and 2.4 reach.

-Andi

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Nick P. <nic...@ya...> - 2004-03-29 23:52:19

Andi Kleen wrote:
> On Mon, 29 Mar 2004 13:46:35 +0200
> Ingo Molnar <mi...@el...> wrote:
> 
> 
>>* Andi Kleen <ak...@su...> wrote:
>>
>>
>>>Sorry ignore this report - I just found out I booted the wrong kernel
>>>by mistake. Currently retesting, also with the proposed change to only
>>>use a single scheduling domain.
>>
>>here are the items that are in the works:
>>
>>  redhat.com/~mingo/scheduler-patches/sched.patch
>>
>>it's against 2.6.5-rc2-mm5. This patch also reduces the rate of active
>>balancing a bit.
> 
> 
> I applied only this patch and it did slightly better than the normal -mm* 
> 1.5 - 2x CPU bandwidth, but still very short of the 3.7x-4x mainline
> and 2.4 reach.

So both -mm5 and Ingo's sched.patch are much worse than
what 2.4 and 2.6 get?

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Andi K. <ak...@su...> - 2004-03-30 06:34:58

On Tue, 30 Mar 2004 09:51:46 +1000
Nick Piggin <nic...@ya...> wrote:

> So both -mm5 and Ingo's sched.patch are much worse than
> what 2.4 and 2.6 get?

Yes (2.6 vanilla and 2.4-aa at that, i haven't tested 2.4-vanilla) 

Ingo's sched.patch makes it a bit better (from 1x CPU to 1.5-1.7xCPU), but still
much worse than the max of 3.7x-4x CPU bandwidth.

-Andi

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Ingo M. <mi...@el...> - 2004-03-30 06:39:59

* Andi Kleen <ak...@su...> wrote:

> > So both -mm5 and Ingo's sched.patch are much worse than
> > what 2.4 and 2.6 get?
> 
> Yes (2.6 vanilla and 2.4-aa at that, i haven't tested 2.4-vanilla)
> 
> Ingo's sched.patch makes it a bit better (from 1x CPU to 1.5-1.7xCPU),
> but still much worse than the max of 3.7x-4x CPU bandwidth.

Andi, could you please try the patch below - this will test whether this
has to do with the rate of balancing between NUMA nodes. The patch
itself is not correct (it way overbalances on NUMA), but it tests the
theory.

	Ingo

--- linux/include/linux/sched.h.orig
+++ linux/include/linux/sched.h
@@ -627,7 +627,7 @@ struct sched_domain {
 	.parent			= NULL,			\
 	.groups			= NULL,			\
 	.min_interval		= 8,			\
-	.max_interval		= 256*fls(num_online_cpus()),\
+	.max_interval		= 8,			\
 	.busy_factor		= 8,			\
 	.imbalance_pct		= 125,			\
 	.cache_hot_time		= (10*1000000),		\

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Andi K. <ak...@su...> - 2004-03-30 07:07:22

On Tue, 30 Mar 2004 08:40:15 +0200
Ingo Molnar <mi...@el...> wrote:

> 
> * Andi Kleen <ak...@su...> wrote:
> 
> > > So both -mm5 and Ingo's sched.patch are much worse than
> > > what 2.4 and 2.6 get?
> > 
> > Yes (2.6 vanilla and 2.4-aa at that, i haven't tested 2.4-vanilla)
> > 
> > Ingo's sched.patch makes it a bit better (from 1x CPU to 1.5-1.7xCPU),
> > but still much worse than the max of 3.7x-4x CPU bandwidth.
> 
> Andi, could you please try the patch below - this will test whether this
> has to do with the rate of balancing between NUMA nodes. The patch
> itself is not correct (it way overbalances on NUMA), but it tests the
> theory.

This works much better, but wildly varying (my tests go from 2.8xCPU to 
~3.8x CPU for 4 CPUs. 2,3 CPU cases are ok). A bit more consistent 
results would be better though.

-Andi

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Nick P. <nic...@ya...> - 2004-03-30 07:14:45

Andi Kleen wrote:
> On Tue, 30 Mar 2004 08:40:15 +0200
> Ingo Molnar <mi...@el...> wrote:
> 
> 
>>* Andi Kleen <ak...@su...> wrote:
>>
>>
>>>>So both -mm5 and Ingo's sched.patch are much worse than
>>>>what 2.4 and 2.6 get?
>>>
>>>Yes (2.6 vanilla and 2.4-aa at that, i haven't tested 2.4-vanilla)
>>>
>>>Ingo's sched.patch makes it a bit better (from 1x CPU to 1.5-1.7xCPU),
>>>but still much worse than the max of 3.7x-4x CPU bandwidth.
>>
>>Andi, could you please try the patch below - this will test whether this
>>has to do with the rate of balancing between NUMA nodes. The patch
>>itself is not correct (it way overbalances on NUMA), but it tests the
>>theory.
> 
> 
> This works much better, but wildly varying (my tests go from 2.8xCPU to 
> ~3.8x CPU for 4 CPUs. 2,3 CPU cases are ok). A bit more consistent 
> results would be better though.
> 

Oh good, thanks Ingo. Andi you probably want to lower your minimum
balance time too then, and maybe try with an even lower maximum.
Maybe reduce cache_hot_time a bit too.

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Ingo M. <mi...@el...> - 2004-03-30 07:45:04

* Nick Piggin <nic...@ya...> wrote:

> >This works much better, but wildly varying (my tests go from 2.8xCPU to 
> >~3.8x CPU for 4 CPUs. 2,3 CPU cases are ok). A bit more consistent 
> >results would be better though.
> 
> Oh good, thanks Ingo. Andi you probably want to lower your minimum
> balance time too then, and maybe try with an even lower maximum. Maybe
> reduce cache_hot_time a bit too.

i dont think we want to balance with that high of a frequency on NUMA
Opteron. These tunes were for testing only.

i'm dusting off the balance-on-clone patch right now, that should be the
correct solution. It is based on a find_idlest_cpu() function which
searches for the least loaded CPU and checks whether we can do passive
load-balancing to it. Ie. it's yet another balancing point in the
scheduler, _not_ some balancing logic change.

	Ingo

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Nick P. <nic...@ya...> - 2004-03-30 07:58:16

Ingo Molnar wrote:
> * Nick Piggin <nic...@ya...> wrote:
> 
> 
>>>This works much better, but wildly varying (my tests go from 2.8xCPU to 
>>>~3.8x CPU for 4 CPUs. 2,3 CPU cases are ok). A bit more consistent 
>>>results would be better though.
>>
>>Oh good, thanks Ingo. Andi you probably want to lower your minimum
>>balance time too then, and maybe try with an even lower maximum. Maybe
>>reduce cache_hot_time a bit too.
> 
> 
> i dont think we want to balance with that high of a frequency on NUMA
> Opteron. These tunes were for testing only.
> 

I guess not. Andi says he wants it more like UMA balancing though...


> i'm dusting off the balance-on-clone patch right now, that should be the
> correct solution. It is based on a find_idlest_cpu() function which
> searches for the least loaded CPU and checks whether we can do passive
> load-balancing to it. Ie. it's yet another balancing point in the
> scheduler, _not_ some balancing logic change.
> 

Yep, as I said to Martin, I also agree this is probably good if it
is done carefully. I think we'll need to get a horde of thread
benchmarking people together before turning it on by default, of
course.

It seems Andi can now get equivalent results without it now, so it
isn't a pressing issue.

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Ingo M. <mi...@el...> - 2004-03-30 07:14:57

* Andi Kleen <ak...@su...> wrote:

> > Andi, could you please try the patch below - this will test whether this
> > has to do with the rate of balancing between NUMA nodes. The patch
> > itself is not correct (it way overbalances on NUMA), but it tests the
> > theory.
> 
> This works much better, but wildly varying (my tests go from 2.8xCPU
> to ~3.8x CPU for 4 CPUs. 2,3 CPU cases are ok). A bit more consistent
> results would be better though.

ok, could you try min_interval,max_interval and busy_factor all with a
value as 4, in sched.h's SD_NODE_INIT template? (again, only for testing
purposes.)

	Ingo

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Nick P. <nic...@ya...> - 2004-03-30 07:19:03

Ingo Molnar wrote:
> * Andi Kleen <ak...@su...> wrote:
> 
> 
>>>Andi, could you please try the patch below - this will test whether this
>>>has to do with the rate of balancing between NUMA nodes. The patch
>>>itself is not correct (it way overbalances on NUMA), but it tests the
>>>theory.
>>
>>This works much better, but wildly varying (my tests go from 2.8xCPU
>>to ~3.8x CPU for 4 CPUs. 2,3 CPU cases are ok). A bit more consistent
>>results would be better though.
> 
> 
> ok, could you try min_interval,max_interval and busy_factor all with a
> value as 4, in sched.h's SD_NODE_INIT template? (again, only for testing
> purposes.)
> 

(sorry, forget what I said then, I'll leave it to Ingo)

Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3

From: Andi K. <ak...@su...> - 2004-03-30 07:50:04

On Tue, 30 Mar 2004 09:15:19 +0200
Ingo Molnar <mi...@el...> wrote:

> 
> * Andi Kleen <ak...@su...> wrote:
> 
> > > Andi, could you please try the patch below - this will test whether this
> > > has to do with the rate of balancing between NUMA nodes. The patch
> > > itself is not correct (it way overbalances on NUMA), but it tests the
> > > theory.
> > 
> > This works much better, but wildly varying (my tests go from 2.8xCPU
> > to ~3.8x CPU for 4 CPUs. 2,3 CPU cases are ok). A bit more consistent
> > results would be better though.
> 
> ok, could you try min_interval,max_interval and busy_factor all with a
> value as 4, in sched.h's SD_NODE_INIT template? (again, only for testing
> purposes.)

I kept the old patch and made these changes. The results are much more
consistent now 3+x CPU. I still get varyations of ~2GB/s, but I had this
with older kernels too.

-Andi

1 2 3 > >> (Page 1 of 3)