Thread: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

lse-tech

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul M. <Pau...@us...> - 2001-10-09 05:45:23

>    I am particularly interested in comments from people who understand
>    the detailed operation of the SPARC membar instruction and the PARISC
>    SYNC instruction.  My belief is that the membar("#SYNC") and SYNC
>    instructions are sufficient,
>
> SYNC is sufficient but way too strict.  You don't explicitly say what
> you need to happen.  If you need all previous stores to finish
> before all subsequent memory operations then:
>
>          membar #StoreStore | #StoreLoad
>
> is sufficient.  If you need all previous memory operations to finish
> before all subsequent stores then:
>
>          membar #StoreStore | #LoadStore
>
> is what you want.

I need to segregate the stores executed by the CPU doing the membar.
All other CPUs must observe the preceding stores before the following
stores.  Of course, this means that the loads on the observing CPUs
must be ordered somehow.  I need data dependencies between the loads
to be sufficient to order the loads.

For example, if a CPU executes the following:

     a = new_value;
     wmbdd();
     p = &a;

then i need any other CPU executing:

     d = *p;

to see either the value that "p" pointed to before the "p = &a" assignment,
or "new_value", -never- the old value of "a".

Does this do the trick?

           membar #StoreStore

>    Thoughts?
>
> I think if you need to perform IPIs and junk like that to make the
> memory barrier happen correctly, just throw your code away and use a
> spinlock instead.

The IPIs and related junk are I believe needed only on Alpha, which has
no single memory-barrier instruction that can do wmbdd()'s job.  Given
that Alpha seems to be on its way out, this did not seem to me to be
too horrible.

                                   Thanx, Paul

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul M. <Pau...@us...> - 2001-10-09 16:50:52

>    From: "Paul McKenney" <Pau...@us...>
>    Date: Mon, 8 Oct 2001 22:27:44 -0700
>
>    All other CPUs must observe the preceding stores before the following
>    stores.
>  ...
>    Does this do the trick?
>
>               membar #StoreStore
>
> Yes.

Cool!  Thank you!!!

>    The IPIs and related junk are I believe needed only on Alpha, which
has
>    no single memory-barrier instruction that can do wmbdd()'s job.  Given
>    that Alpha seems to be on its way out, this did not seem to me to be
>    too horrible.
>
> I somehow doubt that you need an IPI to implement the equivalent of
> "membar #StoreStore" on Alpha.  Richard?

If "membar #StoreStore" is sufficient, then there is no equivalent of
it on Alpha.  Neither the "mb" nor the "wmb" instructions wait for
outstanding invalidations to complete, and therefore do -not- guarantee
that reading CPUs will see writes occuring in the order that the writes
occurred on the writing CPU, even if data dependencies force the order
of the reads (as the pointer-dereference example I gave does).

On Alpha, there -must- be an "mb" on the reading CPU if the reading CPU
is to observe the stores in order.  The IPIs are just a way of causing
those "mb"s to happen without having code like this:

     d = p->a->b;

from having to be written as follows:

     q = p->a;
     rmb();
     d = q->b;

More thoughts?

                         Thanx, Paul

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul M. <Pau...@us...> - 2001-10-09 16:50:53

> On Mon, Oct 08, 2001 at 10:56:10PM -0700, David S. Miller wrote:
> > I somehow doubt that you need an IPI to implement the equivalent of
> > "membar #StoreStore" on Alpha.  Richard?
>
> Lol.  Of course not.  Is someone under the impression that AXP
> designers were smoking crack?

The ones I have talked to showed no signs of having done so.  However,
their architecture -does- make it quite challenging for anyone trying to
write lock-free common code, hence all the IPIs.

> "wmb" == "membar #StoreStore".
> "mb"  == "membar #Sync".
>
> See the nice mb/rmb/wmb macros in <asm/system.h>.

OK, if "membar #StoreStore" really is equivalent to "wmb", then
"membar #StoreStore" definitely will -not- do the job required here.
Will "membar #SYNC" allow read-side "membar #ReadRead"s to be omitted,
or does "membar #SYNC" fail to detect when outstanding cache
invalidations complete?

                              Thanx, Paul

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul M. <Pau...@us...> - 2001-10-09 16:50:55

> On Mon, Oct 08, 2001 at 06:55:24PM -0700, Paul E. McKenney wrote:
> > This is a proposal to provide a wmb()-like primitive that enables
> > lock-free traversal of lists while elements are concurrently being
> > inserted into these lists.
>
> I've discussed this with you before and you continue to have
> completely missed the point.

It would not be the first point that I have completely missed, but
please read on.  I have discussed this algorithm with Alpha architects,
who tell me that it is sound.

> Alpha requires that you issue read-after-read memory barriers on
> the reader side if you require ordering between reads.  That is
> the extent of the weakness of the memory ordering.

I agree that Alpha requires "mb" instructions to be executed on the
reading CPUs if the reading CPUs are to observe some other CPU's writes
occuring in order.  And I agree that the usual way that this is done
is to insert "mb" instructions between the reads on the read side.

However, if the reading CPU executes an "mb" instruction between the
time that the writing CPU executes the "wmb" and the time that the writing
CPU executes the second store, then the reading CPU is guaranteed to
see the writes in order.  Here is how this happens:

     Initial values: a = 0, p = &a, b = 1.

     Writing CPU              Reading CPU

     1) b = 2;

     2) Execute "wmb" instruction

     3) Send a bunch of IPIs

                         4) Receive IPI

                         5) Execute "mb" instruction

                         6) Indicate completion

     7) Detect completion

     8) p = &b

The combination of steps 2 and 5 guarantee that the reading CPU will
invalidate the cacheline containing the old value of "b" before it
can possibly reference the new value of "p".  The CPU must read "p"
before "b", since it can't know where "p" points before reading it.

> Sparc64 is the same way.

I can believe that "membar #StoreStore" and friends operate in the same
way that the Alpha memory-ordering instructions do.  However, some of
the code in Linux seems to rely on "membar #SYNC" waiting for outstanding
invalidations to complete.  If this is the case, then "membar #SYNC"
could be used to segregate writes when the corresponding reads are
implicitly ordered by data dependencies, as they are during pointer
dereferences.

> This crap will never be applied.  Your algorithms are simply broken
> if you do not ensure proper read ordering via the rmb() macro.

Please see the example above.  I do believe that my algorithms are
reliably forcing proper read ordering using IPIs, just in an different
way.  Please note that I have discussed this algorithm with Alpha
architects, who believe that it is sound.

But they (and I) may well be confused.  If so, could you please
show me what I am missing?

                              Thanx, Paul

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Richard H. <rt...@tw...> - 2001-10-09 17:00:26

On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
> Please see the example above.  I do believe that my algorithms are
> reliably forcing proper read ordering using IPIs, just in an different
> way.

I wasn't suggesting that the IPI wouldn't work -- it will.
But it will be _extremely_ slow.

I am suggesting that the lock-free algorithms should add the
read barriers, and that failure to do so indicates that they
are incomplete.  If nothing else, it documents where the real
dependancies are.

r~

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul M. <pa...@sa...> - 2001-10-10 03:34:22

Richard Henderson writes:

> I am suggesting that the lock-free algorithms should add the
> read barriers, and that failure to do so indicates that they
> are incomplete.  If nothing else, it documents where the real
> dependancies are.

Please, let's not go adding rmb's in places where there is already an
ordering forced by a data dependency - that will hurt performance
unnecessarily on x86, ppc, sparc, ia64, etc.

It seems to me that there are two viable alternatives:

1. Define an rmbdd() which is a no-op on all architectures except for
   alpha, where it is an rmb.  Richard can then have the job of
   finding all the places where an rmbdd is needed, which sounds like
   one of the smellier labors of Hercules to me. :)  

2. Use Paul McKenney's scheme.

I personally don't really mind which gets chosen.  Scheme 1 will
result in intermittent hard-to-find bugs on alpha (since the vast
majority of kernel hackers will not understand where or why rmbdd's
are required), but if Richard prefers that to scheme 2, it's his call
IMHO.

Regards,
Paul.

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Richard H. <rt...@tw...> - 2001-10-10 17:02:12

On Wed, Oct 10, 2001 at 01:33:58PM +1000, Paul Mackerras wrote:
> 1. Define an rmbdd() which is a no-op on all architectures except for
>    alpha, where it is an rmb.  Richard can then have the job of
>    finding all the places where an rmbdd is needed, which sounds like
>    one of the smellier labors of Hercules to me. :)  

I don't think it's actually all that bad.  There won't be all
that many places that require the rmbdd, and they'll pretty
much exactly correspond to the places in which you have to put
wmb for all architectures anyway.

r~

Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Andrea A. <an...@su...> - 2001-10-10 02:05:15

On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
> Please see the example above.  I do believe that my algorithms are
> reliably forcing proper read ordering using IPIs, just in an different
> way.  Please note that I have discussed this algorithm with Alpha
> architects, who believe that it is sound.

The IPI way is certainly safe.

The point here is that it is suprisingly that alpha needs this IPI
unlike all other architectures. So while the IPI is certainly safe we
wouldn't expect it to be necessary on alpha either.

Now my only worry is that when you worked on this years ago with the
alpha architects there were old chips, old caches and old machines (ev5
maybe?).

So before changing any code, I would prefer to double check with the
current alpha architects that the read dependency really isn't enough to
enforce read ordering without the need of rmb also on the beleeding edge
ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine
wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel,
but we wouldn't be affected with any recent hardware compiling for
EV6/EV67.  Jay, Peter, comments?

Andrea

Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Ivan K. <in...@ju...> - 2001-10-10 13:25:52

On Wed, Oct 10, 2001 at 04:05:02AM +0200, Andrea Arcangeli wrote:
> So before changing any code, I would prefer to double check with the
> current alpha architects that the read dependency really isn't enough to
> enforce read ordering without the need of rmb also on the beleeding edge
> ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine
> wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel,
> but we wouldn't be affected with any recent hardware compiling for
> EV6/EV67.  Jay, Peter, comments?

21264 Compiler Writer's Guide [appendix C] explicitly says that the
second load cannot issue if its address depends on a result of previous
load until that result is available. I refuse to believe that it isn't
true for older alphas, especially because they are strictly in-order
machines, unlike ev6.

I suspect some confusion here - probably that architect meant loads
to independent addresses. Of course, in this case mb() is required
to assure ordering.

Ivan.

Re: [Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Andrea A. <an...@su...> - 2001-10-10 13:41:59

On Wed, Oct 10, 2001 at 05:24:31PM +0400, Ivan Kokshaysky wrote:
> On Wed, Oct 10, 2001 at 04:05:02AM +0200, Andrea Arcangeli wrote:
> > So before changing any code, I would prefer to double check with the
> > current alpha architects that the read dependency really isn't enough to
> > enforce read ordering without the need of rmb also on the beleeding edge
> > ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine
> > wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel,
> > but we wouldn't be affected with any recent hardware compiling for
> > EV6/EV67.  Jay, Peter, comments?
> 
> 21264 Compiler Writer's Guide [appendix C] explicitly says that the
> second load cannot issue if its address depends on a result of previous
> load until that result is available. I refuse to believe that it isn't

Fine, btw I also recall to have read something on those lines, and not
even in the 21264 manual but in the alpha reference manual that would
apply to all the chips but I didn't find it with a short lookup. Thanks
for checking!

> true for older alphas, especially because they are strictly in-order
> machines, unlike ev6.

Yes, it sounds strange. However According to Paul this would not be the
cpu but a cache coherency issue. rmb() would enforce the cache coherency
etc... so maybe the issue is related to old SMP motherboard etc... not
even to the cpus ... dunno. But as said it sounded very strange that
also new chips and new boards have such a weird reodering trouble.

> I suspect some confusion here - probably that architect meant loads
> to independent addresses. Of course, in this case mb() is required
> to assure ordering.
> 
> Ivan.

Andrea

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul M. <Pau...@us...> - 2001-10-09 18:15:22

> On Tue, Oct 09, 2001 at 07:03:37PM +1000, Rusty Russell wrote:
> > I don't *like* making Alpha's wmb() stronger, but it is the
> > only solution which doesn't touch common code.
>
> It's not a "solution" at all.  It's so heavy weight you'd be
> much better off with locks.  Just use the damned rmb_me_harder.

There are a number of cases where updates are extremely rare.
FD management and module unloading are but two examples.  In
such cases, the overhead of the IPIs in the extremely rare updates
is overwhelmed by the reduction in overhead in the very common
accesses.

And getting rid of rmb() or rmb_me_harder() makes the read-side
code less complex.

                              Thanx, Paul

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul M. <Pau...@us...> - 2001-10-09 18:15:24

> On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote:
> > Please see the example above.  I do believe that my algorithms are
> > reliably forcing proper read ordering using IPIs, just in an different
> > way.
>
> I wasn't suggesting that the IPI wouldn't work -- it will.
> But it will be _extremely_ slow.

Ah!  Please accept my apologies for belaboring the obvious
in my previous emails.

> I am suggesting that the lock-free algorithms should add the
> read barriers, and that failure to do so indicates that they
> are incomplete.  If nothing else, it documents where the real
> dependancies are.

Such read barriers are not needed on any architecture where
data dependencies imply an rmb().  Examples include i386, PPC,
and IA64.  On these architectures, read-side rmb()s add both
overhead and complexity.

On the completeness, it seems to me that in cases were updates
are rare, the IPIs fill in the gap, and with good performance
benefits.  What am I missing here?

                              Thanx, Paul

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul M. <Pau...@us...> - 2001-10-10 01:21:54

>    The IPIs and related junk are I believe needed only on Alpha, which
has
>    no single memory-barrier instruction that can do wmbdd()'s job.  Given
>    that Alpha seems to be on its way out, this did not seem to me to be
>    too horrible.
>
> I somehow doubt that you need an IPI to implement the equivalent of
> "membar #StoreStore" on Alpha.  Richard?

I received my copy of the SPARC Architecture Manual (Weaver and Germond)
today.

It turns out that there is -no- equivalent of "membar #StoreStore"
on Alpha, if I am correctly interpreting this manual.

From section D.4.4, on page 260:

     A memory order is legal in RMO if and only if:

     (1)  X <d Y & L(X) -> X <m Y

     [... two other irrelevant cases omitted ...]

     Rule (1) states that the RMO model will maintain dependence
     when the preceding transaction is a load.  Preceding stores
     may be delayed in the implementation, so their order may
     not be preserved globally.

In the example dereferencing a pointer, we first load the
pointer, then load the value it points to.  The second load is
dependent on the first, and the first is a load.  Thus, rule (1)
holds, and there is no need for a read-side memory barrier
between the two loads.

This is consistent with the book's definition of
"completion" and the description of the membar
instruction.

In contrast, on Alpha, unless there is an explicit rmb(), data
dependence between a pair of loads in no way forces the two loads
to be ordered.  http://lse.sourceforge.net/locking/wmbdd.html
shows how Alpha can get the new value of the pointer, but the
old value of the data it points to.  Alpha thus needs the rmb()
between the two loads, even though there is a data dependency.

Am I misinterpreting the SPARC manual?

                              Thanx, Paul

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Andrea A. <an...@su...> - 2001-10-10 01:44:08

On Tue, Oct 09, 2001 at 06:19:49PM -0700, Paul McKenney wrote:
> 
> >    The IPIs and related junk are I believe needed only on Alpha, which
> has
> >    no single memory-barrier instruction that can do wmbdd()'s job.  Given
> >    that Alpha seems to be on its way out, this did not seem to me to be
> >    too horrible.
> >
> > I somehow doubt that you need an IPI to implement the equivalent of
> > "membar #StoreStore" on Alpha.  Richard?
> 
> I received my copy of the SPARC Architecture Manual (Weaver and Germond)
> today.
> 
> It turns out that there is -no- equivalent of "membar #StoreStore"
> on Alpha, if I am correctly interpreting this manual.

The equivalent of "membar #StoreStore" on alpha is be the "wmb" asm
instruction, in linux common code called wmb().

> >From section D.4.4, on page 260:
> 
>      A memory order is legal in RMO if and only if:
> 
>      (1)  X <d Y & L(X) -> X <m Y
> 
>      [... two other irrelevant cases omitted ...]
> 
>      Rule (1) states that the RMO model will maintain dependence
>      when the preceding transaction is a load.  Preceding stores
>      may be delayed in the implementation, so their order may
>      not be preserved globally.
> 
> In the example dereferencing a pointer, we first load the
> pointer, then load the value it points to.  The second load is
> dependent on the first, and the first is a load.  Thus, rule (1)
> holds, and there is no need for a read-side memory barrier
> between the two loads.
> 
> This is consistent with the book's definition of
> "completion" and the description of the membar
> instruction.
> 
> In contrast, on Alpha, unless there is an explicit rmb(), data
> dependence between a pair of loads in no way forces the two loads
> to be ordered.  http://lse.sourceforge.net/locking/wmbdd.html
> shows how Alpha can get the new value of the pointer, but the
> old value of the data it points to.  Alpha thus needs the rmb()
> between the two loads, even though there is a data dependency.

You remeber I was suprised when you told me alpha needs the rmb despite
of the data dependency :). I thought it wasn't needed (and in turn I
thought we didn't need the wmbdd). I cannot see this requirement
in any alpha specification infact. Are you sure the issue isn't
specific to old cpus or old cache coherency protocols that we can safely
ignore today? I think in SMP systems we care only about ev6 ev67 and
future chips. Also if this can really be reproduced it shouldn't be too
difficult to demonstrate it with a malicious application that stress the
race in loop, maybe somebody (Ivan?) could be interested to write such
application to test.

The IPI just for the rmb within two reads that depends on each other is
just too ugly... But yes, adding rmb() in the reader side looks even
uglier and nobody should really need it.

> Am I misinterpreting the SPARC manual?
> 
>                               Thanx, Paul
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to maj...@vg...
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/


Andrea

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Rusty R. <ru...@ru...> - 2001-10-10 01:44:37

In message <200...@tw...> you write:
> On Tue, Oct 09, 2001 at 07:03:37PM +1000, Rusty Russell wrote:
> > I don't *like* making Alpha's wmb() stronger, but it is the
> > only solution which doesn't touch common code.
> 
> It's not a "solution" at all.  It's so heavy weight you'd be
> much better off with locks.  Just use the damned rmb_me_harder.

Wow!  I'm glad you're volunteering to audit all the kernel code to fix
this Alpha-specific bug by inserting rmb_me_harder() in all the
critical locations!

Don't miss any!

I look forward to seeing your patch,
Rusty.
--
Premature optmztion is rt of all evl. --DK

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul M. <Pau...@us...> - 2001-10-10 21:51:44

> On Wed, Oct 10, 2001 at 01:33:58PM +1000, Paul Mackerras wrote:
> > 1. Define an rmbdd() which is a no-op on all architectures except for
> >    alpha, where it is an rmb.  Richard can then have the job of
> >    finding all the places where an rmbdd is needed, which sounds like
> >    one of the smellier labors of Hercules to me. :)
>
> I don't think it's actually all that bad.  There won't be all
> that many places that require the rmbdd, and they'll pretty
> much exactly correspond to the places in which you have to put
> wmb for all architectures anyway.

Just to make sure I understand...  This rmbdd() would use IPIs to
get all the CPUs' caches synchronized, right?  Or do you have some
other trick up your sleeve?  ;-)

                              Thanx, Paul

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Richard H. <rt...@tw...> - 2001-10-10 22:22:55

On Wed, Oct 10, 2001 at 02:47:05PM -0700, Paul McKenney wrote:
> Just to make sure I understand...  This rmbdd() would use IPIs to
> get all the CPUs' caches synchronized, right?

No, it would expand to rmb on Alpha, and to nothing elsewhere.


r~

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Richard H. <rt...@tw...> - 2001-10-10 22:27:20

On Wed, Oct 10, 2001 at 02:47:05PM -0700, Paul McKenney wrote:
> > I don't think it's actually all that bad.  There won't be all
> > that many places that require the rmbdd, and they'll pretty
> > much exactly correspond to the places in which you have to put
> > wmb for all architectures anyway.
> 
> Just to make sure I understand...  This rmbdd() would use IPIs to
> get all the CPUs' caches synchronized, right?

Err, I see your confusion now.

"Correspond" meaning "for every wmb needed on the writer side,
there is likely an rmb needed on the reader side in a similar
place".


r~

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul M. <Pau...@us...> - 2001-10-11 05:57:56

< On Wed, Oct 10, 2001 at 02:47:05PM -0700, Paul McKenney wrote:
< > > I don't think it's actually all that bad.  There won't be all
< > > that many places that require the rmbdd, and they'll pretty
< > > much exactly correspond to the places in which you have to put
< > > wmb for all architectures anyway.
< >
< > Just to make sure I understand...  This rmbdd() would use IPIs to
< > get all the CPUs' caches synchronized, right?
<
< Err, I see your confusion now.
<
< "Correspond" meaning "for every wmb needed on the writer side,
< there is likely an rmb needed on the reader side in a similar
< place".

Fair enough!

Here are two patches.  The wmbdd patch has been modified to use
the lighter-weight SPARC instruction, as suggested by Dave Miller.
The rmbdd patch defines an rmbdd() primitive that is defined to be
rmb() on Alpha and a nop on other architectures.  I believe this
rmbdd() primitive is what Richard is looking for.

Please pass on any comments or criticisms.  I am particularly
interested in comments from people with PA-RISC and MIPS
expertise, as I am not 100% sure that I have interpreted
the PA-RISC architecture manual correctly, and I do not yet
have a MIPS manual.  I do not believe that these architectures
need the Alpha treatment, but then again, I didn't think
that Alpha needed the Alpha treatment when I first encountered
it -- and I am quite clearly not the only one!  ;-)

                                                    Thanx, Paul

PS.  An updated explanation of why this is needed may be found
     at http://lse.sourceforge.net/locking/wmbdd.htmldiff -urN -X
/home/mckenney/dontdiff linux-2.4.10/include/asm-alpha/system.h
linux-2.4.10.rmbdd/include/asm-alpha/system.h
--- linux-2.4.10/include/asm-alpha/system.h         Sun Aug 12 10:38:47
2001
+++ linux-2.4.10.rmbdd/include/asm-alpha/system.h        Wed Oct 10
16:49:11 2001
@@ -148,16 +148,21 @@
 #define rmb() \
 __asm__ __volatile__("mb": : :"memory")

+#define rmbdd() \
+__asm__ __volatile__("mb": : :"memory")
+
 #define wmb() \
 __asm__ __volatile__("wmb": : :"memory")

 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
+#define smp_rmbdd()           rmbdd()
 #define smp_wmb()        wmb()
 #else
 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
+#define smp_rmbdd()           barrier()
 #define smp_wmb()        barrier()
 #endif

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-arm/system.h
linux-2.4.10.rmbdd/include/asm-arm/system.h
--- linux-2.4.10/include/asm-arm/system.h           Mon Nov 27 17:07:59
2000
+++ linux-2.4.10.rmbdd/include/asm-arm/system.h          Wed Oct 10
18:18:12 2001
@@ -38,6 +38,7 @@

 #define mb() __asm__ __volatile__ ("" : : : "memory")
 #define rmb() mb()
+#define rmbdd() do { } while(0)
 #define wmb() mb()
 #define nop() __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t");

@@ -67,12 +68,14 @@

 #define smp_mb()                   mb()
 #define smp_rmb()                  rmb()
+#define smp_rmbdd()                      rmbdd()
 #define smp_wmb()                  wmb()

 #else

 #define smp_mb()                   barrier()
 #define smp_rmb()                  barrier()
+#define smp_rmbdd()                      do { } while(0)
 #define smp_wmb()                  barrier()

 #define cli()                            __cli()
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-cris/system.h
linux-2.4.10.rmbdd/include/asm-cris/system.h
--- linux-2.4.10/include/asm-cris/system.h          Tue May  1 16:05:00
2001
+++ linux-2.4.10.rmbdd/include/asm-cris/system.h         Wed Oct 10
18:19:04 2001
@@ -143,15 +143,18 @@

 #define mb() __asm__ __volatile__ ("" : : : "memory")
 #define rmb() mb()
+#define rmbdd() do { } while(0)
 #define wmb() mb()

 #ifdef CONFIG_SMP
 #define smp_mb()        mb()
 #define smp_rmb()       rmb()
+#define smp_rmbdd()     rmbdd()
 #define smp_wmb()       wmb()
 #else
 #define smp_mb()        barrier()
 #define smp_rmb()       barrier()
+#define smp_rmbdd()     do { } while(0)
 #define smp_wmb()       barrier()
 #endif

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-i386/system.h
linux-2.4.10.rmbdd/include/asm-i386/system.h
--- linux-2.4.10/include/asm-i386/system.h          Sun Sep 23 10:31:01
2001
+++ linux-2.4.10.rmbdd/include/asm-i386/system.h         Wed Oct 10
17:00:57 2001
@@ -284,15 +284,18 @@
  */
 #define mb()        __asm__ __volatile__ ("lock; addl $0,0(%%esp)": :
:"memory")
 #define rmb()       mb()
+#define rmbdd()          do { } while(0)
 #define wmb()       __asm__ __volatile__ ("": : :"memory")

 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
+#define smp_rmbdd()           rmbdd()
 #define smp_wmb()        wmb()
 #else
 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
+#define smp_rmbdd()           do { } while(0)
 #define smp_wmb()        barrier()
 #endif

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ia64/system.h
linux-2.4.10.rmbdd/include/asm-ia64/system.h
--- linux-2.4.10/include/asm-ia64/system.h          Tue Jul 31 10:30:09
2001
+++ linux-2.4.10.rmbdd/include/asm-ia64/system.h         Wed Oct 10
17:01:09 2001
@@ -85,6 +85,9 @@
  *                  stores and that all following stores will be
  *                  visible only after all previous stores.
  *   rmb():         Like wmb(), but for reads.
+ *   rmbdd():       Like rmb(), but only for pairs of loads where
+ *                  the second load depends on the value loaded
+ *                  by the first.
  *   mb():          wmb()/rmb() combo, i.e., all previous memory
  *                  accesses are visible before all subsequent
  *                  accesses and vice versa.  This is also known as
@@ -98,15 +101,18 @@
  */
 #define mb()        __asm__ __volatile__ ("mf" ::: "memory")
 #define rmb()       mb()
+#define rmbdd()          do { } while(0)
 #define wmb()       mb()

 #ifdef CONFIG_SMP
 # define smp_mb()        mb()
 # define smp_rmb()       rmb()
+# define smp_rmbdd()          rmbdd()
 # define smp_wmb()       wmb()
 #else
 # define smp_mb()        barrier()
 # define smp_rmb()       barrier()
+# define smp_rmbdd()          do { } while(0)
 # define smp_wmb()       barrier()
 #endif

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-m68k/system.h
linux-2.4.10.rmbdd/include/asm-m68k/system.h
--- linux-2.4.10/include/asm-m68k/system.h          Mon Jun 11 19:15:27
2001
+++ linux-2.4.10.rmbdd/include/asm-m68k/system.h         Wed Oct 10
17:01:15 2001
@@ -80,12 +80,14 @@
 #define nop()                 do { asm volatile ("nop"); barrier(); }
while (0)
 #define mb()                  barrier()
 #define rmb()                 barrier()
+#define rmbdd()                    do { } while(0)
 #define wmb()                 barrier()
 #define set_mb(var, value)    do { xchg(&var, value); } while (0)
 #define set_wmb(var, value)    do { var = value; wmb(); } while (0)

 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
+#define smp_rmbdd()           do { } while(0)
 #define smp_wmb()        barrier()


diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-mips/system.h
linux-2.4.10.rmbdd/include/asm-mips/system.h
--- linux-2.4.10/include/asm-mips/system.h          Sun Sep  9 10:43:01
2001
+++ linux-2.4.10.rmbdd/include/asm-mips/system.h         Wed Oct 10
17:01:26 2001
@@ -150,6 +150,7 @@

 #include <asm/wbflush.h>
 #define rmb()       do { } while(0)
+#define rmbdd()          do { } while(0)
 #define wmb()       wbflush()
 #define mb()        wbflush()

@@ -166,6 +167,7 @@
           : /* no input */                                        \
           : "memory")
 #define rmb() mb()
+#define rmbdd()          do { } while(0)
 #define wmb() mb()

 #endif /* CONFIG_CPU_HAS_WB  */
@@ -173,10 +175,12 @@
 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
+#define smp_rmbdd()           rmbdd()
 #define smp_wmb()        wmb()
 #else
 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
+#define smp_rmbdd()           do { } while(0)
 #define smp_wmb()        barrier()
 #endif

diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-mips64/system.h
linux-2.4.10.rmbdd/include/asm-mips64/system.h
--- linux-2.4.10/include/asm-mips64/system.h        Wed Jul  4 11:50:39
2001
+++ linux-2.4.10.rmbdd/include/asm-mips64/system.h            Wed Oct 10
17:01:41 2001
@@ -147,15 +147,18 @@
           : /* no input */                                        \
           : "memory")
 #define rmb() mb()
+#define rmbdd()          do { } while(0)
 #define wmb() mb()

 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
+#define smp_rmbdd()           rmbdd()
 #define smp_wmb()        wmb()
 #else
 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
+#define smp_rmbdd()           do { } while(0)
 #define smp_wmb()        barrier()
 #endif

diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-parisc/system.h
linux-2.4.10.rmbdd/include/asm-parisc/system.h
--- linux-2.4.10/include/asm-parisc/system.h        Wed Dec  6 11:46:39
2000
+++ linux-2.4.10.rmbdd/include/asm-parisc/system.h            Wed Oct 10
17:04:07 2001
@@ -50,6 +50,7 @@
 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
+#define smp_rmbdd()           do { } while(0)
 #define smp_wmb()        wmb()
 #else
 /* This is simply the barrier() macro from linux/kernel.h but when
serial.c
@@ -58,6 +59,7 @@
  */
 #define smp_mb()         __asm__ __volatile__("":::"memory");
 #define smp_rmb()        __asm__ __volatile__("":::"memory");
+#define smp_rmbdd()           do { } while(0)
 #define smp_wmb()        __asm__ __volatile__("":::"memory");
 #endif

@@ -122,6 +124,7 @@

 #define mb()  __asm__ __volatile__ ("sync" : : :"memory")
 #define wmb() mb()
+#define rmbdd()          do { } while(0)

 extern unsigned long __xchg(unsigned long, unsigned long *, int);

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ppc/system.h
linux-2.4.10.rmbdd/include/asm-ppc/system.h
--- linux-2.4.10/include/asm-ppc/system.h           Tue Aug 28 06:58:33
2001
+++ linux-2.4.10.rmbdd/include/asm-ppc/system.h          Wed Oct 10
18:19:43 2001
@@ -24,6 +24,8 @@
  *
  * mb() prevents loads and stores being reordered across this point.
  * rmb() prevents loads being reordered across this point.
+ * rmbdd() prevents data-dependant loads being reordered across this point
+ *        (nop on PPC).
  * wmb() prevents stores being reordered across this point.
  *
  * We can use the eieio instruction for wmb, but since it doesn't
@@ -32,6 +34,7 @@
  */
 #define mb()  __asm__ __volatile__ ("sync" : : : "memory")
 #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
+#define rmbdd()  do { } while(0)
 #define wmb()  __asm__ __volatile__ ("eieio" : : : "memory")

 #define set_mb(var, value)         do { var = value; mb(); } while (0)
@@ -40,10 +43,12 @@
 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
+#define smp_rmbdd()           rmbdd()
 #define smp_wmb()        wmb()
 #else
 #define smp_mb()         __asm__ __volatile__("": : :"memory")
 #define smp_rmb()        __asm__ __volatile__("": : :"memory")
+#define smp_rmbdd()           do { } while(0)
 #define smp_wmb()        __asm__ __volatile__("": : :"memory")
 #endif /* CONFIG_SMP */

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-s390/system.h
linux-2.4.10.rmbdd/include/asm-s390/system.h
--- linux-2.4.10/include/asm-s390/system.h          Wed Jul 25 14:12:02
2001
+++ linux-2.4.10.rmbdd/include/asm-s390/system.h         Wed Oct 10
18:20:31 2001
@@ -117,9 +117,11 @@
 # define SYNC_OTHER_CORES(x)   eieio()
 #define mb()    eieio()
 #define rmb()   eieio()
+#define rmbdd() do { } while(0)
 #define wmb()   eieio()
 #define smp_mb()       mb()
 #define smp_rmb()      rmb()
+#define smp_rmbdd()    rmbdd()
 #define smp_wmb()      wmb()
 #define smp_mb__before_clear_bit()     smp_mb()
 #define smp_mb__after_clear_bit()      smp_mb()
diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-s390x/system.h
linux-2.4.10.rmbdd/include/asm-s390x/system.h
--- linux-2.4.10/include/asm-s390x/system.h         Wed Jul 25 14:12:03
2001
+++ linux-2.4.10.rmbdd/include/asm-s390x/system.h        Wed Oct 10
17:04:45 2001
@@ -130,9 +130,11 @@
 # define SYNC_OTHER_CORES(x)   eieio()
 #define mb()    eieio()
 #define rmb()   eieio()
+#define rmbdd()          do { } while(0)
 #define wmb()   eieio()
 #define smp_mb()       mb()
 #define smp_rmb()      rmb()
+#define smp_rmbdd()    rmbdd()
 #define smp_wmb()      wmb()
 #define smp_mb__before_clear_bit()     smp_mb()
 #define smp_mb__after_clear_bit()      smp_mb()
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sh/system.h
linux-2.4.10.rmbdd/include/asm-sh/system.h
--- linux-2.4.10/include/asm-sh/system.h            Sat Sep  8 12:29:09
2001
+++ linux-2.4.10.rmbdd/include/asm-sh/system.h           Wed Oct 10
17:05:07 2001
@@ -88,15 +88,18 @@

 #define mb()        __asm__ __volatile__ ("": : :"memory")
 #define rmb()       mb()
+#define rmbdd()          do { } while(0)
 #define wmb()       __asm__ __volatile__ ("": : :"memory")

 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
+#define smp_rmbdd()           rmbdd()
 #define smp_wmb()        wmb()
 #else
 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
+#define smp_rmbdd()           do { } while(0)
 #define smp_wmb()        barrier()
 #endif

diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-sparc/system.h
linux-2.4.10.rmbdd/include/asm-sparc/system.h
--- linux-2.4.10/include/asm-sparc/system.h         Tue Oct  3 09:24:41
2000
+++ linux-2.4.10.rmbdd/include/asm-sparc/system.h        Wed Oct 10
16:59:44 2001
@@ -277,11 +277,13 @@
 /* XXX Change this if we ever use a PSO mode kernel. */
 #define mb()        __asm__ __volatile__ ("" : : : "memory")
 #define rmb()       mb()
+#define rmbdd()          do { } while(0)
 #define wmb()       mb()
 #define set_mb(__var, __value)  do { __var = __value; mb(); } while(0)
 #define set_wmb(__var, __value) set_mb(__var, __value)
 #define smp_mb()         __asm__ __volatile__("":::"memory");
 #define smp_rmb()        __asm__ __volatile__("":::"memory");
+#define smp_rmbdd()           do { } while(0)
 #define smp_wmb()        __asm__ __volatile__("":::"memory");

 #define nop() __asm__ __volatile__ ("nop");
diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-sparc64/system.h
linux-2.4.10.rmbdd/include/asm-sparc64/system.h
--- linux-2.4.10/include/asm-sparc64/system.h            Fri Sep  7
11:01:20 2001
+++ linux-2.4.10.rmbdd/include/asm-sparc64/system.h           Wed Oct 10
17:00:12 2001
@@ -99,6 +99,7 @@
 #define mb()                  \
           membar("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad");
 #define rmb()                 membar("#LoadLoad")
+#define rmbdd()                    do { } while(0)
 #define wmb()                 membar("#StoreStore")
 #define set_mb(__var, __value) \
           do { __var = __value; membar("#StoreLoad | #StoreStore"); }
while(0)
@@ -108,10 +109,12 @@
 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
+#define smp_rmbdd()           rmbdd()
 #define smp_wmb()        wmb()
 #else
 #define smp_mb()         __asm__ __volatile__("":::"memory");
 #define smp_rmb()        __asm__ __volatile__("":::"memory");
+#define smp_rmbdd()           do { } while(0)
 #define smp_wmb()        __asm__ __volatile__("":::"memory");
 #endif

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/arch/alpha/kernel/smp.c
linux-2.4.10.wmbdd/arch/alpha/kernel/smp.c
--- linux-2.4.10/arch/alpha/kernel/smp.c            Thu Sep 13 15:21:32
2001
+++ linux-2.4.10.wmbdd/arch/alpha/kernel/smp.c           Mon Oct  8
18:31:18 2001
@@ -63,8 +63,20 @@
           IPI_RESCHEDULE,
           IPI_CALL_FUNC,
           IPI_CPU_STOP,
+          IPI_MB,
 };

+/* Global and per-CPU state for global MB shootdown. */
+static struct {
+          spinlock_t mutex;
+          unsigned long need_mb;         /* bitmask of CPUs that need to
do "mb". */
+          long curgen;                   /* Each "generation" is a group
of requests */
+          long maxgen;                   /*  that is handled by one set of
"mb"s. */
+} mb_global_data __cacheline_aligned = { SPIN_LOCK_UNLOCKED, 0, 1, 0 };
+static struct {
+          long mygen ____cacheline_aligned;
+} mb_data[NR_CPUS] __cacheline_aligned;
+
 spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED;

 /* Set to a secondary's cpuid when it comes online.  */
@@ -772,6 +784,41 @@
           goto again;
 }

+/*
+ * Execute an "mb" instruction in response to an IPI_MB.  Also directly
+ * called by smp_global_mb().  If this is the last CPU to respond to
+ * an smp_global_mb(), then check to see if an additional generation of
+ * requests needs to be satisfied.
+ */
+
+void
+handle_mb_ipi(void)
+{
+          int this_cpu = smp_processor_id();
+          unsigned long this_cpu_mask = 1UL << this_cpu;
+          unsigned long flags;
+          unsigned long to_whom = cpu_present_mask ^ this_cpu_mask;
+
+          /* Avoid lock contention when extra IPIs arrive (due to race)
and
+             when waiting for global mb shootdown. */
+          if ((mb_global_data.need_mb & this_cpu_mask) == 0) {
+                    return;
+          }
+          spin_lock_irqsave(&mb_global_data.mutex, flags); /* implied mb
*/
+          if ((mb_global_data.need_mb & this_cpu_mask) == 0) {
+                    spin_unlock_irqrestore(&mb_global_data.mutex, flags);
+                    return;
+          }
+          mb_global_data.need_mb &= ~this_cpu_mask;
+          if (mb_global_data.need_mb == 0) {
+                    if (++mb_global_data.curgen - mb_global_data.maxgen
<= 0) {
+                              mb_global_data.need_mb = to_whom;
+                              send_ipi_message(to_whom, IPI_MB);
+                    }
+          }
+          spin_unlock_irqrestore(&mb_global_data.mutex, flags); /* implied
mb */
+}
+
 void
 handle_ipi(struct pt_regs *regs)
 {
@@ -825,6 +872,9 @@
                     else if (which == IPI_CPU_STOP) {
                               halt();
                     }
+                    else if (which == IPI_MB) {
+                              handle_mb_ipi();
+                    }
                     else {
                               printk(KERN_CRIT "Unknown IPI on CPU %d:
%lu\n",
                                      this_cpu, which);
@@ -860,6 +910,58 @@
                     printk(KERN_WARNING "smp_send_stop: Not on boot cpu.
\n");
 #endif
           send_ipi_message(to_whom, IPI_CPU_STOP);
+}
+
+/*
+ * Execute an "mb" instruction, then force all other CPUs to execute "mb"
+ * instructions.  Does not block.  Once this function returns, the caller
+ * is guaranteed that all of its memory writes preceding the call to
+ * smp_global_mb() will be seen by all CPUs as preceding all memory
+ * writes following the call to smp_global_mb().
+ *
+ * For example, if CPU 0 does:
+ *        a.data = 1;
+ *        smp_global_mb();
+ *        p = &a;
+ * and CPU 1 does:
+ *        d = p->data;
+ * where a.data is initially garbage and p initially points to another
+ * structure with the "data" field being zero, then CPU 1 will be
+ * guaranteed to have "d" set to either 0 or 1, never garbage.
+ *
+ * Note that the Alpha "wmb" instruction is -not- sufficient!!!  If CPU 0
+ * were replace the smp_global_mb() with a wmb(), then CPU 1 could end
+ * up with garbage in "d"!
+ *
+ * This function sends IPIs to all other CPUs, then spins waiting for
+ * them to receive the IPI and execute an "mb" instruction.  While
+ * spinning, this function -must- respond to other CPUs executing
+ * smp_global_mb() concurrently, otherwise, deadlock would result.
+ */
+
+void
+smp_global_mb(void)
+{
+          int this_cpu = smp_processor_id();
+          unsigned long this_cpu_mask = 1UL << this_cpu;
+          unsigned long flags;
+          unsigned long to_whom = cpu_present_mask ^ this_cpu_mask;
+
+          spin_lock_irqsave(&mb_global_data.mutex, flags); /* implied mb
*/
+          if (mb_global_data.curgen - mb_global_data.maxgen <= 0) {
+                    mb_global_data.maxgen = mb_global_data.curgen + 1;
+          } else {
+                    mb_global_data.maxgen = mb_global_data.curgen;
+                    mb_global_data.need_mb = to_whom;
+                    send_ipi_message(to_whom, IPI_MB);
+          }
+          mb_data[this_cpu].mygen = mb_global_data.maxgen;
+          spin_unlock_irqrestore(&mb_global_data.mutex, flags);
+          while (mb_data[this_cpu].mygen - mb_global_data.curgen >= 0) {
+                    handle_mb_ipi();
+                    barrier();
+          }
+
 }

 /*
diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-alpha/system.h
linux-2.4.10.wmbdd/include/asm-alpha/system.h
--- linux-2.4.10/include/asm-alpha/system.h         Sun Aug 12 10:38:47
2001
+++ linux-2.4.10.wmbdd/include/asm-alpha/system.h        Mon Oct  8
18:31:18 2001
@@ -151,14 +151,21 @@
 #define wmb() \
 __asm__ __volatile__("wmb": : :"memory")

+#define mbdd()                     smp_mbdd()
+#define wmbdd()                    smp_wmbdd()
+
 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
 #define smp_wmb()        wmb()
+#define smp_mbdd()       smp_global_mb()
+#define smp_wmbdd()           smp_mbdd()
 #else
 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
 #define smp_wmb()        barrier()
+#define smp_mbdd()       barrier()
+#define smp_wmbdd()           barrier()
 #endif

 #define set_mb(var, value) \
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-arm/system.h
linux-2.4.10.wmbdd/include/asm-arm/system.h
--- linux-2.4.10/include/asm-arm/system.h           Mon Nov 27 17:07:59
2000
+++ linux-2.4.10.wmbdd/include/asm-arm/system.h          Mon Oct  8
18:31:18 2001
@@ -39,6 +39,8 @@
 #define mb() __asm__ __volatile__ ("" : : : "memory")
 #define rmb() mb()
 #define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() wmb()
 #define nop() __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t");

 #define prepare_to_switch()    do { } while(0)
@@ -68,12 +70,16 @@
 #define smp_mb()                   mb()
 #define smp_rmb()                  rmb()
 #define smp_wmb()                  wmb()
+#define smp_mbdd()                 rmbdd()
+#define smp_wmbdd()                      wmbdd()

 #else

 #define smp_mb()                   barrier()
 #define smp_rmb()                  barrier()
 #define smp_wmb()                  barrier()
+#define smp_mbdd()                 barrier()
+#define smp_wmbdd()                      barrier()

 #define cli()                            __cli()
 #define sti()                            __sti()
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-cris/system.h
linux-2.4.10.wmbdd/include/asm-cris/system.h
--- linux-2.4.10/include/asm-cris/system.h          Tue May  1 16:05:00
2001
+++ linux-2.4.10.wmbdd/include/asm-cris/system.h         Mon Oct  8
18:31:18 2001
@@ -144,15 +144,21 @@
 #define mb() __asm__ __volatile__ ("" : : : "memory")
 #define rmb() mb()
 #define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() wmb()

 #ifdef CONFIG_SMP
 #define smp_mb()        mb()
 #define smp_rmb()       rmb()
 #define smp_wmb()       wmb()
+#define smp_mbdd()      mbdd()
+#define smp_wmbdd()     wmbdd()
 #else
 #define smp_mb()        barrier()
 #define smp_rmb()       barrier()
 #define smp_wmb()       barrier()
+#define smp_mbdd()      barrier()
+#define smp_wmbdd()     barrier()
 #endif

 #define iret()
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-i386/system.h
linux-2.4.10.wmbdd/include/asm-i386/system.h
--- linux-2.4.10/include/asm-i386/system.h          Sun Sep 23 10:31:01
2001
+++ linux-2.4.10.wmbdd/include/asm-i386/system.h         Mon Oct  8
18:31:18 2001
@@ -285,15 +285,21 @@
 #define mb()        __asm__ __volatile__ ("lock; addl $0,0(%%esp)": :
:"memory")
 #define rmb()       mb()
 #define wmb()       __asm__ __volatile__ ("": : :"memory")
+#define mbdd()           mb()
+#define wmbdd()          wmb()

 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
 #define smp_wmb()        wmb()
+#define smp_mbdd()       mbdd()
+#define smp_wmbdd()           wmbdd()
 #else
 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
 #define smp_wmb()        barrier()
+#define smp_mbdd()       barrier()
+#define smp_wmbdd()           barrier()
 #endif

 #define set_mb(var, value) do { xchg(&var, value); } while (0)
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ia64/system.h
linux-2.4.10.wmbdd/include/asm-ia64/system.h
--- linux-2.4.10/include/asm-ia64/system.h          Tue Jul 31 10:30:09
2001
+++ linux-2.4.10.wmbdd/include/asm-ia64/system.h         Mon Oct  8
18:31:18 2001
@@ -84,11 +84,36 @@
  *                  like regions are visible before any subsequent
  *                  stores and that all following stores will be
  *                  visible only after all previous stores.
- *   rmb():         Like wmb(), but for reads.
+ *                  In common code, any reads that depend on this
+ *                  ordering must be separated by an mb() or rmb().
+ *   rmb():         Guarantees that all preceding loads to memory-
+ *                  like regions are executed before any subsequent
+ *                  loads.
  *   mb():          wmb()/rmb() combo, i.e., all previous memory
  *                  accesses are visible before all subsequent
  *                  accesses and vice versa.  This is also known as
- *                  a "fence."
+ *                  a "fence."  Again, in common code, any reads that
+ *                  depend on the order of writes must themselves be
+ *                  separated by an mb() or rmb().
+ *   wmbdd():       Guarantees that all preceding stores to memory-
+ *                  like regions are visible before any subsequent
+ *                  stores and that all following stores will be
+ *                  visible only after all previous stores.
+ *                  In common code, any reads that depend on this
+ *                  ordering either must be separated by an mb()
+ *                  or rmb(), or the later reads must depend on
+ *                  data loaded by the earlier reads.  For an example
+ *                  of the latter, consider "p->next".  The read of
+ *                  the "next" field depends on the read of the
+ *                  pointer "p".
+ *   mbdd():        wmb()/rmb() combo, i.e., all previous memory
+ *                  accesses are visible before all subsequent
+ *                  accesses and vice versa.  This is also known as
+ *                  a "fence."  Again, in common code, any reads that
+ *                  depend on the order of writes must themselves be
+ *                  separated by an mb() or rmb(), or there must be
+ *                  a data dependency that forces the second to
+ *                  wait until the first completes.
  *
  * Note: "mb()" and its variants cannot be used as a fence to order
  * accesses to memory mapped I/O registers.  For that, mf.a needs to
@@ -99,15 +124,21 @@
 #define mb()        __asm__ __volatile__ ("mf" ::: "memory")
 #define rmb()       mb()
 #define wmb()       mb()
+#define rmbdd()          mb()
+#define wmbdd()          mb()

 #ifdef CONFIG_SMP
 # define smp_mb()        mb()
 # define smp_rmb()       rmb()
 # define smp_wmb()       wmb()
+# define smp_mbdd()           mbdd()
+# define smp_wmbdd()          wmbdd()
 #else
 # define smp_mb()        barrier()
 # define smp_rmb()       barrier()
 # define smp_wmb()       barrier()
+# define smp_mbdd()           barrier()
+# define smp_wmbdd()          barrier()
 #endif

 /*
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-m68k/system.h
linux-2.4.10.wmbdd/include/asm-m68k/system.h
--- linux-2.4.10/include/asm-m68k/system.h          Mon Jun 11 19:15:27
2001
+++ linux-2.4.10.wmbdd/include/asm-m68k/system.h         Mon Oct  8
18:31:18 2001
@@ -81,12 +81,16 @@
 #define mb()                  barrier()
 #define rmb()                 barrier()
 #define wmb()                 barrier()
+#define rmbdd()                    barrier()
+#define wmbdd()                    barrier()
 #define set_mb(var, value)    do { xchg(&var, value); } while (0)
 #define set_wmb(var, value)    do { var = value; wmb(); } while (0)

 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
 #define smp_wmb()        barrier()
+#define smp_mbdd()       barrier()
+#define smp_wmbdd()           barrier()


 #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned
long)(x),(ptr),sizeof(*(ptr))))
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-mips/system.h
linux-2.4.10.wmbdd/include/asm-mips/system.h
--- linux-2.4.10/include/asm-mips/system.h          Sun Sep  9 10:43:01
2001
+++ linux-2.4.10.wmbdd/include/asm-mips/system.h         Mon Oct  8
18:31:18 2001
@@ -152,6 +152,8 @@
 #define rmb()       do { } while(0)
 #define wmb()       wbflush()
 #define mb()        wbflush()
+#define wmbdd()          wbflush()
+#define mbdd()           wbflush()

 #else /* CONFIG_CPU_HAS_WB  */

@@ -167,6 +169,8 @@
           : "memory")
 #define rmb() mb()
 #define wmb() mb()
+#define wmbdd()          mb()
+#define mbdd()           mb()

 #endif /* CONFIG_CPU_HAS_WB  */

@@ -174,10 +178,14 @@
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
 #define smp_wmb()        wmb()
+#define smp_mbdd()       mbdd()
+#define smp_wmbdd()           wmbdd()
 #else
 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
 #define smp_wmb()        barrier()
+#define smp_mbdd()       barrier()
+#define smp_wmbdd()           barrier()
 #endif

 #define set_mb(var, value) \
diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-mips64/system.h
linux-2.4.10.wmbdd/include/asm-mips64/system.h
--- linux-2.4.10/include/asm-mips64/system.h        Wed Jul  4 11:50:39
2001
+++ linux-2.4.10.wmbdd/include/asm-mips64/system.h            Mon Oct  8
18:31:18 2001
@@ -148,15 +148,21 @@
           : "memory")
 #define rmb() mb()
 #define wmb() mb()
+#define rmbdd() mb()
+#define wmbdd() mb()

 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
 #define smp_wmb()        wmb()
+#define smp_mbdd()       mbdd()
+#define smp_wmbdd()           wmbdd()
 #else
 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
 #define smp_wmb()        barrier()
+#define smp_mbdd()       barrier()
+#define smp_wmbdd()           barrier()
 #endif

 #define set_mb(var, value) \
diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-parisc/system.h
linux-2.4.10.wmbdd/include/asm-parisc/system.h
--- linux-2.4.10/include/asm-parisc/system.h        Wed Dec  6 11:46:39
2000
+++ linux-2.4.10.wmbdd/include/asm-parisc/system.h            Mon Oct  8
18:31:18 2001
@@ -51,6 +51,8 @@
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
 #define smp_wmb()        wmb()
+#define smp_mbdd()       rmb()
+#define smp_wmbdd()           wmb()
 #else
 /* This is simply the barrier() macro from linux/kernel.h but when
serial.c
  * uses tqueue.h uses smp_mb() defined using barrier(), linux/kernel.h
@@ -59,6 +61,8 @@
 #define smp_mb()         __asm__ __volatile__("":::"memory");
 #define smp_rmb()        __asm__ __volatile__("":::"memory");
 #define smp_wmb()        __asm__ __volatile__("":::"memory");
+#define smp_mbdd()       __asm__ __volatile__("":::"memory");
+#define smp_wmbdd()           __asm__ __volatile__("":::"memory");
 #endif

 /* interrupt control */
@@ -122,6 +126,8 @@

 #define mb()  __asm__ __volatile__ ("sync" : : :"memory")
 #define wmb() mb()
+#define mbdd() mb()
+#define wmbdd() mb()

 extern unsigned long __xchg(unsigned long, unsigned long *, int);

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ppc/system.h
linux-2.4.10.wmbdd/include/asm-ppc/system.h
--- linux-2.4.10/include/asm-ppc/system.h           Tue Aug 28 06:58:33
2001
+++ linux-2.4.10.wmbdd/include/asm-ppc/system.h          Mon Oct  8
18:31:18 2001
@@ -33,6 +33,8 @@
 #define mb()  __asm__ __volatile__ ("sync" : : : "memory")
 #define rmb()  __asm__ __volatile__ ("sync" : : : "memory")
 #define wmb()  __asm__ __volatile__ ("eieio" : : : "memory")
+#define mbdd()           mb()
+#define wmbdd()          wmb()

 #define set_mb(var, value)         do { var = value; mb(); } while (0)
 #define set_wmb(var, value)        do { var = value; wmb(); } while (0)
@@ -41,10 +43,14 @@
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
 #define smp_wmb()        wmb()
+#define smp_mbdd()       mb()
+#define smp_wmbdd()           wmb()
 #else
 #define smp_mb()         __asm__ __volatile__("": : :"memory")
 #define smp_rmb()        __asm__ __volatile__("": : :"memory")
 #define smp_wmb()        __asm__ __volatile__("": : :"memory")
+#define smp_mbdd()       __asm__ __volatile__("": : :"memory")
+#define smp_wmbdd()           __asm__ __volatile__("": : :"memory")
 #endif /* CONFIG_SMP */

 #ifdef __KERNEL__
diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-s390/system.h
linux-2.4.10.wmbdd/include/asm-s390/system.h
--- linux-2.4.10/include/asm-s390/system.h          Wed Jul 25 14:12:02
2001
+++ linux-2.4.10.wmbdd/include/asm-s390/system.h         Mon Oct  8
18:31:18 2001
@@ -118,9 +118,13 @@
 #define mb()    eieio()
 #define rmb()   eieio()
 #define wmb()   eieio()
+#define mbdd()           mb()
+#define wmbdd()          wmb()
 #define smp_mb()       mb()
 #define smp_rmb()      rmb()
 #define smp_wmb()      wmb()
+#define smp_mbdd()     mb()
+#define smp_wmbdd()    wmb()
 #define smp_mb__before_clear_bit()     smp_mb()
 #define smp_mb__after_clear_bit()      smp_mb()

diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-s390x/system.h
linux-2.4.10.wmbdd/include/asm-s390x/system.h
--- linux-2.4.10/include/asm-s390x/system.h         Wed Jul 25 14:12:03
2001
+++ linux-2.4.10.wmbdd/include/asm-s390x/system.h        Mon Oct  8
18:31:19 2001
@@ -131,9 +131,13 @@
 #define mb()    eieio()
 #define rmb()   eieio()
 #define wmb()   eieio()
+#define mbdd()           mb()
+#define wmbdd()          wmb()
 #define smp_mb()       mb()
 #define smp_rmb()      rmb()
 #define smp_wmb()      wmb()
+#define smp_mbdd()     mb()
+#define smp_wmbdd()    wmb()
 #define smp_mb__before_clear_bit()     smp_mb()
 #define smp_mb__after_clear_bit()      smp_mb()

diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sh/system.h
linux-2.4.10.wmbdd/include/asm-sh/system.h
--- linux-2.4.10/include/asm-sh/system.h            Sat Sep  8 12:29:09
2001
+++ linux-2.4.10.wmbdd/include/asm-sh/system.h           Mon Oct  8
18:31:19 2001
@@ -89,15 +89,21 @@
 #define mb()        __asm__ __volatile__ ("": : :"memory")
 #define rmb()       mb()
 #define wmb()       __asm__ __volatile__ ("": : :"memory")
+#define mbdd()           mb()
+#define wmbdd()          wmb()

 #ifdef CONFIG_SMP
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
 #define smp_wmb()        wmb()
+#define smp_mbdd()       mb()
+#define smp_wmbdd()           wmb()
 #else
 #define smp_mb()         barrier()
 #define smp_rmb()        barrier()
 #define smp_wmb()        barrier()
+#define smp_mbdd()       barrier()
+#define smp_wmbdd()           barrier()
 #endif

 #define set_mb(var, value) do { xchg(&var, value); } while (0)
diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-sparc/system.h
linux-2.4.10.wmbdd/include/asm-sparc/system.h
--- linux-2.4.10/include/asm-sparc/system.h         Tue Oct  3 09:24:41
2000
+++ linux-2.4.10.wmbdd/include/asm-sparc/system.h        Mon Oct  8
18:31:19 2001
@@ -278,11 +278,15 @@
 #define mb()        __asm__ __volatile__ ("" : : : "memory")
 #define rmb()       mb()
 #define wmb()       mb()
+#define mbdd()           mb()
+#define wmbdd()          wmb()
 #define set_mb(__var, __value)  do { __var = __value; mb(); } while(0)
 #define set_wmb(__var, __value) set_mb(__var, __value)
 #define smp_mb()         __asm__ __volatile__("":::"memory");
 #define smp_rmb()        __asm__ __volatile__("":::"memory");
 #define smp_wmb()        __asm__ __volatile__("":::"memory");
+#define smp_mbdd()       __asm__ __volatile__("":::"memory");
+#define smp_wmbdd()           __asm__ __volatile__("":::"memory");

 #define nop() __asm__ __volatile__ ("nop");

diff -urN -X /home/mckenney/dontdiff
linux-2.4.10/include/asm-sparc64/system.h
linux-2.4.10.wmbdd/include/asm-sparc64/system.h
--- linux-2.4.10/include/asm-sparc64/system.h            Fri Sep  7
11:01:20 2001
+++ linux-2.4.10.wmbdd/include/asm-sparc64/system.h           Wed Oct 10
16:43:21 2001
@@ -100,6 +100,8 @@
           membar("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad");
 #define rmb()                 membar("#LoadLoad")
 #define wmb()                 membar("#StoreStore")
+#define mbdd()                     mb()
+#define wmbdd()                    wmb()
 #define set_mb(__var, __value) \
           do { __var = __value; membar("#StoreLoad | #StoreStore"); }
while(0)
 #define set_wmb(__var, __value) \
@@ -109,10 +111,14 @@
 #define smp_mb()         mb()
 #define smp_rmb()        rmb()
 #define smp_wmb()        wmb()
+#define smp_mbdd()       mbdd()
+#define smp_wmbdd()           wmbdd()
 #else
 #define smp_mb()         __asm__ __volatile__("":::"memory");
 #define smp_rmb()        __asm__ __volatile__("":::"memory");
 #define smp_wmb()        __asm__ __volatile__("":::"memory");
+#define smp_mbdd()       __asm__ __volatile__("":::"memory");
+#define smp_wmbdd()           __asm__ __volatile__("":::"memory");
 #endif

 #define flushi(addr)          __asm__ __volatile__ ("flush %0" : : "r"
(addr) : "memory")

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Rusty R. <ru...@ru...> - 2001-10-12 04:19:10

On Wed, 10 Oct 2001 18:56:26 -0700 (PDT)
"Paul E. McKenney" <mck...@en...> wrote:
 
> Here are two patches.  The wmbdd patch has been modified to use
> the lighter-weight SPARC instruction, as suggested by Dave Miller.
> The rmbdd patch defines an rmbdd() primitive that is defined to be
> rmb() on Alpha and a nop on other architectures.  I believe this
> rmbdd() primitive is what Richard is looking for.

Surely we don't need both?  If rmbdd exists, any code needing wmbdd
is terminally broken?

Rusty.

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Paul E. M. <pmc...@us...> - 2001-10-13 14:48:57

>> Here are two patches.  The wmbdd patch has been modified to use
>> the lighter-weight SPARC instruction, as suggested by Dave Miller.
>> The rmbdd patch defines an rmbdd() primitive that is defined to be
>> rmb() on Alpha and a nop on other architectures.  I believe this
>> rmbdd() primitive is what Richard is looking for.
>
>Surely we don't need both?  If rmbdd exists, any code needing wmbdd
>is terminally broken?

One or the other.  And at this point, it looks like rmbdd() (or
read_cache_depends()) is the mechanism of choice, given wmbdd()'s
performance on Alpha.

						Thanx, Paul

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: David S. M. <da...@re...> - 2001-10-09 05:56:15

   From: "Paul McKenney" <Pau...@us...>
   Date: Mon, 8 Oct 2001 22:27:44 -0700

   All other CPUs must observe the preceding stores before the following
   stores.
 ...
   Does this do the trick?

              membar #StoreStore

Yes.

   The IPIs and related junk are I believe needed only on Alpha, which has
   no single memory-barrier instruction that can do wmbdd()'s job.  Given
   that Alpha seems to be on its way out, this did not seem to me to be
   too horrible.

I somehow doubt that you need an IPI to implement the equivalent of
"membar #StoreStore" on Alpha.  Richard?

Franks a lot,
David S. Miller
da...@re...

[Lse-tech] Re: RFC: patch to allow lock-free traversal of lists with insertion

From: Richard H. <rt...@re...> - 2001-10-09 06:43:55

On Mon, Oct 08, 2001 at 10:56:10PM -0700, David S. Miller wrote:
> I somehow doubt that you need an IPI to implement the equivalent of
> "membar #StoreStore" on Alpha.  Richard?

Lol.  Of course not.  Is someone under the impression that AXP
designers were smoking crack?

"wmb" == "membar #StoreStore".
"mb"  == "membar #Sync".

See the nice mb/rmb/wmb macros in <asm/system.h>.

r~