From: Paul M. <Pau...@us...> - 2001-10-09 05:45:23
|
> I am particularly interested in comments from people who understand > the detailed operation of the SPARC membar instruction and the PARISC > SYNC instruction. My belief is that the membar("#SYNC") and SYNC > instructions are sufficient, > > SYNC is sufficient but way too strict. You don't explicitly say what > you need to happen. If you need all previous stores to finish > before all subsequent memory operations then: > > membar #StoreStore | #StoreLoad > > is sufficient. If you need all previous memory operations to finish > before all subsequent stores then: > > membar #StoreStore | #LoadStore > > is what you want. I need to segregate the stores executed by the CPU doing the membar. All other CPUs must observe the preceding stores before the following stores. Of course, this means that the loads on the observing CPUs must be ordered somehow. I need data dependencies between the loads to be sufficient to order the loads. For example, if a CPU executes the following: a = new_value; wmbdd(); p = &a; then i need any other CPU executing: d = *p; to see either the value that "p" pointed to before the "p = &a" assignment, or "new_value", -never- the old value of "a". Does this do the trick? membar #StoreStore > Thoughts? > > I think if you need to perform IPIs and junk like that to make the > memory barrier happen correctly, just throw your code away and use a > spinlock instead. The IPIs and related junk are I believe needed only on Alpha, which has no single memory-barrier instruction that can do wmbdd()'s job. Given that Alpha seems to be on its way out, this did not seem to me to be too horrible. Thanx, Paul |
From: Paul M. <Pau...@us...> - 2001-10-09 16:50:52
|
> From: "Paul McKenney" <Pau...@us...> > Date: Mon, 8 Oct 2001 22:27:44 -0700 > > All other CPUs must observe the preceding stores before the following > stores. > ... > Does this do the trick? > > membar #StoreStore > > Yes. Cool! Thank you!!! > The IPIs and related junk are I believe needed only on Alpha, which has > no single memory-barrier instruction that can do wmbdd()'s job. Given > that Alpha seems to be on its way out, this did not seem to me to be > too horrible. > > I somehow doubt that you need an IPI to implement the equivalent of > "membar #StoreStore" on Alpha. Richard? If "membar #StoreStore" is sufficient, then there is no equivalent of it on Alpha. Neither the "mb" nor the "wmb" instructions wait for outstanding invalidations to complete, and therefore do -not- guarantee that reading CPUs will see writes occuring in the order that the writes occurred on the writing CPU, even if data dependencies force the order of the reads (as the pointer-dereference example I gave does). On Alpha, there -must- be an "mb" on the reading CPU if the reading CPU is to observe the stores in order. The IPIs are just a way of causing those "mb"s to happen without having code like this: d = p->a->b; from having to be written as follows: q = p->a; rmb(); d = q->b; More thoughts? Thanx, Paul |
From: Paul M. <Pau...@us...> - 2001-10-09 16:50:53
|
> On Mon, Oct 08, 2001 at 10:56:10PM -0700, David S. Miller wrote: > > I somehow doubt that you need an IPI to implement the equivalent of > > "membar #StoreStore" on Alpha. Richard? > > Lol. Of course not. Is someone under the impression that AXP > designers were smoking crack? The ones I have talked to showed no signs of having done so. However, their architecture -does- make it quite challenging for anyone trying to write lock-free common code, hence all the IPIs. > "wmb" == "membar #StoreStore". > "mb" == "membar #Sync". > > See the nice mb/rmb/wmb macros in <asm/system.h>. OK, if "membar #StoreStore" really is equivalent to "wmb", then "membar #StoreStore" definitely will -not- do the job required here. Will "membar #SYNC" allow read-side "membar #ReadRead"s to be omitted, or does "membar #SYNC" fail to detect when outstanding cache invalidations complete? Thanx, Paul |
From: Paul M. <Pau...@us...> - 2001-10-09 16:50:55
|
> On Mon, Oct 08, 2001 at 06:55:24PM -0700, Paul E. McKenney wrote: > > This is a proposal to provide a wmb()-like primitive that enables > > lock-free traversal of lists while elements are concurrently being > > inserted into these lists. > > I've discussed this with you before and you continue to have > completely missed the point. It would not be the first point that I have completely missed, but please read on. I have discussed this algorithm with Alpha architects, who tell me that it is sound. > Alpha requires that you issue read-after-read memory barriers on > the reader side if you require ordering between reads. That is > the extent of the weakness of the memory ordering. I agree that Alpha requires "mb" instructions to be executed on the reading CPUs if the reading CPUs are to observe some other CPU's writes occuring in order. And I agree that the usual way that this is done is to insert "mb" instructions between the reads on the read side. However, if the reading CPU executes an "mb" instruction between the time that the writing CPU executes the "wmb" and the time that the writing CPU executes the second store, then the reading CPU is guaranteed to see the writes in order. Here is how this happens: Initial values: a = 0, p = &a, b = 1. Writing CPU Reading CPU 1) b = 2; 2) Execute "wmb" instruction 3) Send a bunch of IPIs 4) Receive IPI 5) Execute "mb" instruction 6) Indicate completion 7) Detect completion 8) p = &b The combination of steps 2 and 5 guarantee that the reading CPU will invalidate the cacheline containing the old value of "b" before it can possibly reference the new value of "p". The CPU must read "p" before "b", since it can't know where "p" points before reading it. > Sparc64 is the same way. I can believe that "membar #StoreStore" and friends operate in the same way that the Alpha memory-ordering instructions do. However, some of the code in Linux seems to rely on "membar #SYNC" waiting for outstanding invalidations to complete. If this is the case, then "membar #SYNC" could be used to segregate writes when the corresponding reads are implicitly ordered by data dependencies, as they are during pointer dereferences. > This crap will never be applied. Your algorithms are simply broken > if you do not ensure proper read ordering via the rmb() macro. Please see the example above. I do believe that my algorithms are reliably forcing proper read ordering using IPIs, just in an different way. Please note that I have discussed this algorithm with Alpha architects, who believe that it is sound. But they (and I) may well be confused. If so, could you please show me what I am missing? Thanx, Paul |
From: Richard H. <rt...@tw...> - 2001-10-09 17:00:26
|
On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote: > Please see the example above. I do believe that my algorithms are > reliably forcing proper read ordering using IPIs, just in an different > way. I wasn't suggesting that the IPI wouldn't work -- it will. But it will be _extremely_ slow. I am suggesting that the lock-free algorithms should add the read barriers, and that failure to do so indicates that they are incomplete. If nothing else, it documents where the real dependancies are. r~ |
From: Paul M. <pa...@sa...> - 2001-10-10 03:34:22
|
Richard Henderson writes: > I am suggesting that the lock-free algorithms should add the > read barriers, and that failure to do so indicates that they > are incomplete. If nothing else, it documents where the real > dependancies are. Please, let's not go adding rmb's in places where there is already an ordering forced by a data dependency - that will hurt performance unnecessarily on x86, ppc, sparc, ia64, etc. It seems to me that there are two viable alternatives: 1. Define an rmbdd() which is a no-op on all architectures except for alpha, where it is an rmb. Richard can then have the job of finding all the places where an rmbdd is needed, which sounds like one of the smellier labors of Hercules to me. :) 2. Use Paul McKenney's scheme. I personally don't really mind which gets chosen. Scheme 1 will result in intermittent hard-to-find bugs on alpha (since the vast majority of kernel hackers will not understand where or why rmbdd's are required), but if Richard prefers that to scheme 2, it's his call IMHO. Regards, Paul. |
From: Richard H. <rt...@tw...> - 2001-10-10 17:02:12
|
On Wed, Oct 10, 2001 at 01:33:58PM +1000, Paul Mackerras wrote: > 1. Define an rmbdd() which is a no-op on all architectures except for > alpha, where it is an rmb. Richard can then have the job of > finding all the places where an rmbdd is needed, which sounds like > one of the smellier labors of Hercules to me. :) I don't think it's actually all that bad. There won't be all that many places that require the rmbdd, and they'll pretty much exactly correspond to the places in which you have to put wmb for all architectures anyway. r~ |
From: Andrea A. <an...@su...> - 2001-10-10 02:05:15
|
On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote: > Please see the example above. I do believe that my algorithms are > reliably forcing proper read ordering using IPIs, just in an different > way. Please note that I have discussed this algorithm with Alpha > architects, who believe that it is sound. The IPI way is certainly safe. The point here is that it is suprisingly that alpha needs this IPI unlike all other architectures. So while the IPI is certainly safe we wouldn't expect it to be necessary on alpha either. Now my only worry is that when you worked on this years ago with the alpha architects there were old chips, old caches and old machines (ev5 maybe?). So before changing any code, I would prefer to double check with the current alpha architects that the read dependency really isn't enough to enforce read ordering without the need of rmb also on the beleeding edge ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel, but we wouldn't be affected with any recent hardware compiling for EV6/EV67. Jay, Peter, comments? Andrea |
From: Ivan K. <in...@ju...> - 2001-10-10 13:25:52
|
On Wed, Oct 10, 2001 at 04:05:02AM +0200, Andrea Arcangeli wrote: > So before changing any code, I would prefer to double check with the > current alpha architects that the read dependency really isn't enough to > enforce read ordering without the need of rmb also on the beleeding edge > ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine > wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel, > but we wouldn't be affected with any recent hardware compiling for > EV6/EV67. Jay, Peter, comments? 21264 Compiler Writer's Guide [appendix C] explicitly says that the second load cannot issue if its address depends on a result of previous load until that result is available. I refuse to believe that it isn't true for older alphas, especially because they are strictly in-order machines, unlike ev6. I suspect some confusion here - probably that architect meant loads to independent addresses. Of course, in this case mb() is required to assure ordering. Ivan. |
From: Andrea A. <an...@su...> - 2001-10-10 13:41:59
|
On Wed, Oct 10, 2001 at 05:24:31PM +0400, Ivan Kokshaysky wrote: > On Wed, Oct 10, 2001 at 04:05:02AM +0200, Andrea Arcangeli wrote: > > So before changing any code, I would prefer to double check with the > > current alpha architects that the read dependency really isn't enough to > > enforce read ordering without the need of rmb also on the beleeding edge > > ev6/ev67/etc.. cores. So potentially as worse we'd need to redefine > > wmb() as wmbdd() (and friends) only for EV5+SMP compiles of the kernel, > > but we wouldn't be affected with any recent hardware compiling for > > EV6/EV67. Jay, Peter, comments? > > 21264 Compiler Writer's Guide [appendix C] explicitly says that the > second load cannot issue if its address depends on a result of previous > load until that result is available. I refuse to believe that it isn't Fine, btw I also recall to have read something on those lines, and not even in the 21264 manual but in the alpha reference manual that would apply to all the chips but I didn't find it with a short lookup. Thanks for checking! > true for older alphas, especially because they are strictly in-order > machines, unlike ev6. Yes, it sounds strange. However According to Paul this would not be the cpu but a cache coherency issue. rmb() would enforce the cache coherency etc... so maybe the issue is related to old SMP motherboard etc... not even to the cpus ... dunno. But as said it sounded very strange that also new chips and new boards have such a weird reodering trouble. > I suspect some confusion here - probably that architect meant loads > to independent addresses. Of course, in this case mb() is required > to assure ordering. > > Ivan. Andrea |
From: Paul M. <Pau...@us...> - 2001-10-09 18:15:22
|
> On Tue, Oct 09, 2001 at 07:03:37PM +1000, Rusty Russell wrote: > > I don't *like* making Alpha's wmb() stronger, but it is the > > only solution which doesn't touch common code. > > It's not a "solution" at all. It's so heavy weight you'd be > much better off with locks. Just use the damned rmb_me_harder. There are a number of cases where updates are extremely rare. FD management and module unloading are but two examples. In such cases, the overhead of the IPIs in the extremely rare updates is overwhelmed by the reduction in overhead in the very common accesses. And getting rid of rmb() or rmb_me_harder() makes the read-side code less complex. Thanx, Paul |
From: Paul M. <Pau...@us...> - 2001-10-09 18:15:24
|
> On Tue, Oct 09, 2001 at 08:45:15AM -0700, Paul McKenney wrote: > > Please see the example above. I do believe that my algorithms are > > reliably forcing proper read ordering using IPIs, just in an different > > way. > > I wasn't suggesting that the IPI wouldn't work -- it will. > But it will be _extremely_ slow. Ah! Please accept my apologies for belaboring the obvious in my previous emails. > I am suggesting that the lock-free algorithms should add the > read barriers, and that failure to do so indicates that they > are incomplete. If nothing else, it documents where the real > dependancies are. Such read barriers are not needed on any architecture where data dependencies imply an rmb(). Examples include i386, PPC, and IA64. On these architectures, read-side rmb()s add both overhead and complexity. On the completeness, it seems to me that in cases were updates are rare, the IPIs fill in the gap, and with good performance benefits. What am I missing here? Thanx, Paul |
From: Paul M. <Pau...@us...> - 2001-10-10 01:21:54
|
> The IPIs and related junk are I believe needed only on Alpha, which has > no single memory-barrier instruction that can do wmbdd()'s job. Given > that Alpha seems to be on its way out, this did not seem to me to be > too horrible. > > I somehow doubt that you need an IPI to implement the equivalent of > "membar #StoreStore" on Alpha. Richard? I received my copy of the SPARC Architecture Manual (Weaver and Germond) today. It turns out that there is -no- equivalent of "membar #StoreStore" on Alpha, if I am correctly interpreting this manual. From section D.4.4, on page 260: A memory order is legal in RMO if and only if: (1) X <d Y & L(X) -> X <m Y [... two other irrelevant cases omitted ...] Rule (1) states that the RMO model will maintain dependence when the preceding transaction is a load. Preceding stores may be delayed in the implementation, so their order may not be preserved globally. In the example dereferencing a pointer, we first load the pointer, then load the value it points to. The second load is dependent on the first, and the first is a load. Thus, rule (1) holds, and there is no need for a read-side memory barrier between the two loads. This is consistent with the book's definition of "completion" and the description of the membar instruction. In contrast, on Alpha, unless there is an explicit rmb(), data dependence between a pair of loads in no way forces the two loads to be ordered. http://lse.sourceforge.net/locking/wmbdd.html shows how Alpha can get the new value of the pointer, but the old value of the data it points to. Alpha thus needs the rmb() between the two loads, even though there is a data dependency. Am I misinterpreting the SPARC manual? Thanx, Paul |
From: Andrea A. <an...@su...> - 2001-10-10 01:44:08
|
On Tue, Oct 09, 2001 at 06:19:49PM -0700, Paul McKenney wrote: > > > The IPIs and related junk are I believe needed only on Alpha, which > has > > no single memory-barrier instruction that can do wmbdd()'s job. Given > > that Alpha seems to be on its way out, this did not seem to me to be > > too horrible. > > > > I somehow doubt that you need an IPI to implement the equivalent of > > "membar #StoreStore" on Alpha. Richard? > > I received my copy of the SPARC Architecture Manual (Weaver and Germond) > today. > > It turns out that there is -no- equivalent of "membar #StoreStore" > on Alpha, if I am correctly interpreting this manual. The equivalent of "membar #StoreStore" on alpha is be the "wmb" asm instruction, in linux common code called wmb(). > >From section D.4.4, on page 260: > > A memory order is legal in RMO if and only if: > > (1) X <d Y & L(X) -> X <m Y > > [... two other irrelevant cases omitted ...] > > Rule (1) states that the RMO model will maintain dependence > when the preceding transaction is a load. Preceding stores > may be delayed in the implementation, so their order may > not be preserved globally. > > In the example dereferencing a pointer, we first load the > pointer, then load the value it points to. The second load is > dependent on the first, and the first is a load. Thus, rule (1) > holds, and there is no need for a read-side memory barrier > between the two loads. > > This is consistent with the book's definition of > "completion" and the description of the membar > instruction. > > In contrast, on Alpha, unless there is an explicit rmb(), data > dependence between a pair of loads in no way forces the two loads > to be ordered. http://lse.sourceforge.net/locking/wmbdd.html > shows how Alpha can get the new value of the pointer, but the > old value of the data it points to. Alpha thus needs the rmb() > between the two loads, even though there is a data dependency. You remeber I was suprised when you told me alpha needs the rmb despite of the data dependency :). I thought it wasn't needed (and in turn I thought we didn't need the wmbdd). I cannot see this requirement in any alpha specification infact. Are you sure the issue isn't specific to old cpus or old cache coherency protocols that we can safely ignore today? I think in SMP systems we care only about ev6 ev67 and future chips. Also if this can really be reproduced it shouldn't be too difficult to demonstrate it with a malicious application that stress the race in loop, maybe somebody (Ivan?) could be interested to write such application to test. The IPI just for the rmb within two reads that depends on each other is just too ugly... But yes, adding rmb() in the reader side looks even uglier and nobody should really need it. > Am I misinterpreting the SPARC manual? > > Thanx, Paul > > - > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to maj...@vg... > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ Andrea |
From: Rusty R. <ru...@ru...> - 2001-10-10 01:44:37
|
In message <200...@tw...> you write: > On Tue, Oct 09, 2001 at 07:03:37PM +1000, Rusty Russell wrote: > > I don't *like* making Alpha's wmb() stronger, but it is the > > only solution which doesn't touch common code. > > It's not a "solution" at all. It's so heavy weight you'd be > much better off with locks. Just use the damned rmb_me_harder. Wow! I'm glad you're volunteering to audit all the kernel code to fix this Alpha-specific bug by inserting rmb_me_harder() in all the critical locations! Don't miss any! I look forward to seeing your patch, Rusty. -- Premature optmztion is rt of all evl. --DK |
From: Paul M. <Pau...@us...> - 2001-10-10 21:51:44
|
> On Wed, Oct 10, 2001 at 01:33:58PM +1000, Paul Mackerras wrote: > > 1. Define an rmbdd() which is a no-op on all architectures except for > > alpha, where it is an rmb. Richard can then have the job of > > finding all the places where an rmbdd is needed, which sounds like > > one of the smellier labors of Hercules to me. :) > > I don't think it's actually all that bad. There won't be all > that many places that require the rmbdd, and they'll pretty > much exactly correspond to the places in which you have to put > wmb for all architectures anyway. Just to make sure I understand... This rmbdd() would use IPIs to get all the CPUs' caches synchronized, right? Or do you have some other trick up your sleeve? ;-) Thanx, Paul |
From: Richard H. <rt...@tw...> - 2001-10-10 22:22:55
|
On Wed, Oct 10, 2001 at 02:47:05PM -0700, Paul McKenney wrote: > Just to make sure I understand... This rmbdd() would use IPIs to > get all the CPUs' caches synchronized, right? No, it would expand to rmb on Alpha, and to nothing elsewhere. r~ |
From: Richard H. <rt...@tw...> - 2001-10-10 22:27:20
|
On Wed, Oct 10, 2001 at 02:47:05PM -0700, Paul McKenney wrote: > > I don't think it's actually all that bad. There won't be all > > that many places that require the rmbdd, and they'll pretty > > much exactly correspond to the places in which you have to put > > wmb for all architectures anyway. > > Just to make sure I understand... This rmbdd() would use IPIs to > get all the CPUs' caches synchronized, right? Err, I see your confusion now. "Correspond" meaning "for every wmb needed on the writer side, there is likely an rmb needed on the reader side in a similar place". r~ |
From: Paul M. <Pau...@us...> - 2001-10-11 05:57:56
|
< On Wed, Oct 10, 2001 at 02:47:05PM -0700, Paul McKenney wrote: < > > I don't think it's actually all that bad. There won't be all < > > that many places that require the rmbdd, and they'll pretty < > > much exactly correspond to the places in which you have to put < > > wmb for all architectures anyway. < > < > Just to make sure I understand... This rmbdd() would use IPIs to < > get all the CPUs' caches synchronized, right? < < Err, I see your confusion now. < < "Correspond" meaning "for every wmb needed on the writer side, < there is likely an rmb needed on the reader side in a similar < place". Fair enough! Here are two patches. The wmbdd patch has been modified to use the lighter-weight SPARC instruction, as suggested by Dave Miller. The rmbdd patch defines an rmbdd() primitive that is defined to be rmb() on Alpha and a nop on other architectures. I believe this rmbdd() primitive is what Richard is looking for. Please pass on any comments or criticisms. I am particularly interested in comments from people with PA-RISC and MIPS expertise, as I am not 100% sure that I have interpreted the PA-RISC architecture manual correctly, and I do not yet have a MIPS manual. I do not believe that these architectures need the Alpha treatment, but then again, I didn't think that Alpha needed the Alpha treatment when I first encountered it -- and I am quite clearly not the only one! ;-) Thanx, Paul PS. An updated explanation of why this is needed may be found at http://lse.sourceforge.net/locking/wmbdd.htmldiff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-alpha/system.h linux-2.4.10.rmbdd/include/asm-alpha/system.h --- linux-2.4.10/include/asm-alpha/system.h Sun Aug 12 10:38:47 2001 +++ linux-2.4.10.rmbdd/include/asm-alpha/system.h Wed Oct 10 16:49:11 2001 @@ -148,16 +148,21 @@ #define rmb() \ __asm__ __volatile__("mb": : :"memory") +#define rmbdd() \ +__asm__ __volatile__("mb": : :"memory") + #define wmb() \ __asm__ __volatile__("wmb": : :"memory") #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_rmbdd() barrier() #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-arm/system.h linux-2.4.10.rmbdd/include/asm-arm/system.h --- linux-2.4.10/include/asm-arm/system.h Mon Nov 27 17:07:59 2000 +++ linux-2.4.10.rmbdd/include/asm-arm/system.h Wed Oct 10 18:18:12 2001 @@ -38,6 +38,7 @@ #define mb() __asm__ __volatile__ ("" : : : "memory") #define rmb() mb() +#define rmbdd() do { } while(0) #define wmb() mb() #define nop() __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t"); @@ -67,12 +68,14 @@ #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_rmbdd() do { } while(0) #define smp_wmb() barrier() #define cli() __cli() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-cris/system.h linux-2.4.10.rmbdd/include/asm-cris/system.h --- linux-2.4.10/include/asm-cris/system.h Tue May 1 16:05:00 2001 +++ linux-2.4.10.rmbdd/include/asm-cris/system.h Wed Oct 10 18:19:04 2001 @@ -143,15 +143,18 @@ #define mb() __asm__ __volatile__ ("" : : : "memory") #define rmb() mb() +#define rmbdd() do { } while(0) #define wmb() mb() #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_rmbdd() do { } while(0) #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-i386/system.h linux-2.4.10.rmbdd/include/asm-i386/system.h --- linux-2.4.10/include/asm-i386/system.h Sun Sep 23 10:31:01 2001 +++ linux-2.4.10.rmbdd/include/asm-i386/system.h Wed Oct 10 17:00:57 2001 @@ -284,15 +284,18 @@ */ #define mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": : :"memory") #define rmb() mb() +#define rmbdd() do { } while(0) #define wmb() __asm__ __volatile__ ("": : :"memory") #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_rmbdd() do { } while(0) #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ia64/system.h linux-2.4.10.rmbdd/include/asm-ia64/system.h --- linux-2.4.10/include/asm-ia64/system.h Tue Jul 31 10:30:09 2001 +++ linux-2.4.10.rmbdd/include/asm-ia64/system.h Wed Oct 10 17:01:09 2001 @@ -85,6 +85,9 @@ * stores and that all following stores will be * visible only after all previous stores. * rmb(): Like wmb(), but for reads. + * rmbdd(): Like rmb(), but only for pairs of loads where + * the second load depends on the value loaded + * by the first. * mb(): wmb()/rmb() combo, i.e., all previous memory * accesses are visible before all subsequent * accesses and vice versa. This is also known as @@ -98,15 +101,18 @@ */ #define mb() __asm__ __volatile__ ("mf" ::: "memory") #define rmb() mb() +#define rmbdd() do { } while(0) #define wmb() mb() #ifdef CONFIG_SMP # define smp_mb() mb() # define smp_rmb() rmb() +# define smp_rmbdd() rmbdd() # define smp_wmb() wmb() #else # define smp_mb() barrier() # define smp_rmb() barrier() +# define smp_rmbdd() do { } while(0) # define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-m68k/system.h linux-2.4.10.rmbdd/include/asm-m68k/system.h --- linux-2.4.10/include/asm-m68k/system.h Mon Jun 11 19:15:27 2001 +++ linux-2.4.10.rmbdd/include/asm-m68k/system.h Wed Oct 10 17:01:15 2001 @@ -80,12 +80,14 @@ #define nop() do { asm volatile ("nop"); barrier(); } while (0) #define mb() barrier() #define rmb() barrier() +#define rmbdd() do { } while(0) #define wmb() barrier() #define set_mb(var, value) do { xchg(&var, value); } while (0) #define set_wmb(var, value) do { var = value; wmb(); } while (0) #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_rmbdd() do { } while(0) #define smp_wmb() barrier() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-mips/system.h linux-2.4.10.rmbdd/include/asm-mips/system.h --- linux-2.4.10/include/asm-mips/system.h Sun Sep 9 10:43:01 2001 +++ linux-2.4.10.rmbdd/include/asm-mips/system.h Wed Oct 10 17:01:26 2001 @@ -150,6 +150,7 @@ #include <asm/wbflush.h> #define rmb() do { } while(0) +#define rmbdd() do { } while(0) #define wmb() wbflush() #define mb() wbflush() @@ -166,6 +167,7 @@ : /* no input */ \ : "memory") #define rmb() mb() +#define rmbdd() do { } while(0) #define wmb() mb() #endif /* CONFIG_CPU_HAS_WB */ @@ -173,10 +175,12 @@ #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_rmbdd() do { } while(0) #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-mips64/system.h linux-2.4.10.rmbdd/include/asm-mips64/system.h --- linux-2.4.10/include/asm-mips64/system.h Wed Jul 4 11:50:39 2001 +++ linux-2.4.10.rmbdd/include/asm-mips64/system.h Wed Oct 10 17:01:41 2001 @@ -147,15 +147,18 @@ : /* no input */ \ : "memory") #define rmb() mb() +#define rmbdd() do { } while(0) #define wmb() mb() #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_rmbdd() do { } while(0) #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-parisc/system.h linux-2.4.10.rmbdd/include/asm-parisc/system.h --- linux-2.4.10/include/asm-parisc/system.h Wed Dec 6 11:46:39 2000 +++ linux-2.4.10.rmbdd/include/asm-parisc/system.h Wed Oct 10 17:04:07 2001 @@ -50,6 +50,7 @@ #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() do { } while(0) #define smp_wmb() wmb() #else /* This is simply the barrier() macro from linux/kernel.h but when serial.c @@ -58,6 +59,7 @@ */ #define smp_mb() __asm__ __volatile__("":::"memory"); #define smp_rmb() __asm__ __volatile__("":::"memory"); +#define smp_rmbdd() do { } while(0) #define smp_wmb() __asm__ __volatile__("":::"memory"); #endif @@ -122,6 +124,7 @@ #define mb() __asm__ __volatile__ ("sync" : : :"memory") #define wmb() mb() +#define rmbdd() do { } while(0) extern unsigned long __xchg(unsigned long, unsigned long *, int); diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ppc/system.h linux-2.4.10.rmbdd/include/asm-ppc/system.h --- linux-2.4.10/include/asm-ppc/system.h Tue Aug 28 06:58:33 2001 +++ linux-2.4.10.rmbdd/include/asm-ppc/system.h Wed Oct 10 18:19:43 2001 @@ -24,6 +24,8 @@ * * mb() prevents loads and stores being reordered across this point. * rmb() prevents loads being reordered across this point. + * rmbdd() prevents data-dependant loads being reordered across this point + * (nop on PPC). * wmb() prevents stores being reordered across this point. * * We can use the eieio instruction for wmb, but since it doesn't @@ -32,6 +34,7 @@ */ #define mb() __asm__ __volatile__ ("sync" : : : "memory") #define rmb() __asm__ __volatile__ ("sync" : : : "memory") +#define rmbdd() do { } while(0) #define wmb() __asm__ __volatile__ ("eieio" : : : "memory") #define set_mb(var, value) do { var = value; mb(); } while (0) @@ -40,10 +43,12 @@ #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #else #define smp_mb() __asm__ __volatile__("": : :"memory") #define smp_rmb() __asm__ __volatile__("": : :"memory") +#define smp_rmbdd() do { } while(0) #define smp_wmb() __asm__ __volatile__("": : :"memory") #endif /* CONFIG_SMP */ diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-s390/system.h linux-2.4.10.rmbdd/include/asm-s390/system.h --- linux-2.4.10/include/asm-s390/system.h Wed Jul 25 14:12:02 2001 +++ linux-2.4.10.rmbdd/include/asm-s390/system.h Wed Oct 10 18:20:31 2001 @@ -117,9 +117,11 @@ # define SYNC_OTHER_CORES(x) eieio() #define mb() eieio() #define rmb() eieio() +#define rmbdd() do { } while(0) #define wmb() eieio() #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #define smp_mb__before_clear_bit() smp_mb() #define smp_mb__after_clear_bit() smp_mb() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-s390x/system.h linux-2.4.10.rmbdd/include/asm-s390x/system.h --- linux-2.4.10/include/asm-s390x/system.h Wed Jul 25 14:12:03 2001 +++ linux-2.4.10.rmbdd/include/asm-s390x/system.h Wed Oct 10 17:04:45 2001 @@ -130,9 +130,11 @@ # define SYNC_OTHER_CORES(x) eieio() #define mb() eieio() #define rmb() eieio() +#define rmbdd() do { } while(0) #define wmb() eieio() #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #define smp_mb__before_clear_bit() smp_mb() #define smp_mb__after_clear_bit() smp_mb() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sh/system.h linux-2.4.10.rmbdd/include/asm-sh/system.h --- linux-2.4.10/include/asm-sh/system.h Sat Sep 8 12:29:09 2001 +++ linux-2.4.10.rmbdd/include/asm-sh/system.h Wed Oct 10 17:05:07 2001 @@ -88,15 +88,18 @@ #define mb() __asm__ __volatile__ ("": : :"memory") #define rmb() mb() +#define rmbdd() do { } while(0) #define wmb() __asm__ __volatile__ ("": : :"memory") #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() +#define smp_rmbdd() do { } while(0) #define smp_wmb() barrier() #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sparc/system.h linux-2.4.10.rmbdd/include/asm-sparc/system.h --- linux-2.4.10/include/asm-sparc/system.h Tue Oct 3 09:24:41 2000 +++ linux-2.4.10.rmbdd/include/asm-sparc/system.h Wed Oct 10 16:59:44 2001 @@ -277,11 +277,13 @@ /* XXX Change this if we ever use a PSO mode kernel. */ #define mb() __asm__ __volatile__ ("" : : : "memory") #define rmb() mb() +#define rmbdd() do { } while(0) #define wmb() mb() #define set_mb(__var, __value) do { __var = __value; mb(); } while(0) #define set_wmb(__var, __value) set_mb(__var, __value) #define smp_mb() __asm__ __volatile__("":::"memory"); #define smp_rmb() __asm__ __volatile__("":::"memory"); +#define smp_rmbdd() do { } while(0) #define smp_wmb() __asm__ __volatile__("":::"memory"); #define nop() __asm__ __volatile__ ("nop"); diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sparc64/system.h linux-2.4.10.rmbdd/include/asm-sparc64/system.h --- linux-2.4.10/include/asm-sparc64/system.h Fri Sep 7 11:01:20 2001 +++ linux-2.4.10.rmbdd/include/asm-sparc64/system.h Wed Oct 10 17:00:12 2001 @@ -99,6 +99,7 @@ #define mb() \ membar("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad"); #define rmb() membar("#LoadLoad") +#define rmbdd() do { } while(0) #define wmb() membar("#StoreStore") #define set_mb(__var, __value) \ do { __var = __value; membar("#StoreLoad | #StoreStore"); } while(0) @@ -108,10 +109,12 @@ #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() +#define smp_rmbdd() rmbdd() #define smp_wmb() wmb() #else #define smp_mb() __asm__ __volatile__("":::"memory"); #define smp_rmb() __asm__ __volatile__("":::"memory"); +#define smp_rmbdd() do { } while(0) #define smp_wmb() __asm__ __volatile__("":::"memory"); #endif diff -urN -X /home/mckenney/dontdiff linux-2.4.10/arch/alpha/kernel/smp.c linux-2.4.10.wmbdd/arch/alpha/kernel/smp.c --- linux-2.4.10/arch/alpha/kernel/smp.c Thu Sep 13 15:21:32 2001 +++ linux-2.4.10.wmbdd/arch/alpha/kernel/smp.c Mon Oct 8 18:31:18 2001 @@ -63,8 +63,20 @@ IPI_RESCHEDULE, IPI_CALL_FUNC, IPI_CPU_STOP, + IPI_MB, }; +/* Global and per-CPU state for global MB shootdown. */ +static struct { + spinlock_t mutex; + unsigned long need_mb; /* bitmask of CPUs that need to do "mb". */ + long curgen; /* Each "generation" is a group of requests */ + long maxgen; /* that is handled by one set of "mb"s. */ +} mb_global_data __cacheline_aligned = { SPIN_LOCK_UNLOCKED, 0, 1, 0 }; +static struct { + long mygen ____cacheline_aligned; +} mb_data[NR_CPUS] __cacheline_aligned; + spinlock_t kernel_flag = SPIN_LOCK_UNLOCKED; /* Set to a secondary's cpuid when it comes online. */ @@ -772,6 +784,41 @@ goto again; } +/* + * Execute an "mb" instruction in response to an IPI_MB. Also directly + * called by smp_global_mb(). If this is the last CPU to respond to + * an smp_global_mb(), then check to see if an additional generation of + * requests needs to be satisfied. + */ + +void +handle_mb_ipi(void) +{ + int this_cpu = smp_processor_id(); + unsigned long this_cpu_mask = 1UL << this_cpu; + unsigned long flags; + unsigned long to_whom = cpu_present_mask ^ this_cpu_mask; + + /* Avoid lock contention when extra IPIs arrive (due to race) and + when waiting for global mb shootdown. */ + if ((mb_global_data.need_mb & this_cpu_mask) == 0) { + return; + } + spin_lock_irqsave(&mb_global_data.mutex, flags); /* implied mb */ + if ((mb_global_data.need_mb & this_cpu_mask) == 0) { + spin_unlock_irqrestore(&mb_global_data.mutex, flags); + return; + } + mb_global_data.need_mb &= ~this_cpu_mask; + if (mb_global_data.need_mb == 0) { + if (++mb_global_data.curgen - mb_global_data.maxgen <= 0) { + mb_global_data.need_mb = to_whom; + send_ipi_message(to_whom, IPI_MB); + } + } + spin_unlock_irqrestore(&mb_global_data.mutex, flags); /* implied mb */ +} + void handle_ipi(struct pt_regs *regs) { @@ -825,6 +872,9 @@ else if (which == IPI_CPU_STOP) { halt(); } + else if (which == IPI_MB) { + handle_mb_ipi(); + } else { printk(KERN_CRIT "Unknown IPI on CPU %d: %lu\n", this_cpu, which); @@ -860,6 +910,58 @@ printk(KERN_WARNING "smp_send_stop: Not on boot cpu. \n"); #endif send_ipi_message(to_whom, IPI_CPU_STOP); +} + +/* + * Execute an "mb" instruction, then force all other CPUs to execute "mb" + * instructions. Does not block. Once this function returns, the caller + * is guaranteed that all of its memory writes preceding the call to + * smp_global_mb() will be seen by all CPUs as preceding all memory + * writes following the call to smp_global_mb(). + * + * For example, if CPU 0 does: + * a.data = 1; + * smp_global_mb(); + * p = &a; + * and CPU 1 does: + * d = p->data; + * where a.data is initially garbage and p initially points to another + * structure with the "data" field being zero, then CPU 1 will be + * guaranteed to have "d" set to either 0 or 1, never garbage. + * + * Note that the Alpha "wmb" instruction is -not- sufficient!!! If CPU 0 + * were replace the smp_global_mb() with a wmb(), then CPU 1 could end + * up with garbage in "d"! + * + * This function sends IPIs to all other CPUs, then spins waiting for + * them to receive the IPI and execute an "mb" instruction. While + * spinning, this function -must- respond to other CPUs executing + * smp_global_mb() concurrently, otherwise, deadlock would result. + */ + +void +smp_global_mb(void) +{ + int this_cpu = smp_processor_id(); + unsigned long this_cpu_mask = 1UL << this_cpu; + unsigned long flags; + unsigned long to_whom = cpu_present_mask ^ this_cpu_mask; + + spin_lock_irqsave(&mb_global_data.mutex, flags); /* implied mb */ + if (mb_global_data.curgen - mb_global_data.maxgen <= 0) { + mb_global_data.maxgen = mb_global_data.curgen + 1; + } else { + mb_global_data.maxgen = mb_global_data.curgen; + mb_global_data.need_mb = to_whom; + send_ipi_message(to_whom, IPI_MB); + } + mb_data[this_cpu].mygen = mb_global_data.maxgen; + spin_unlock_irqrestore(&mb_global_data.mutex, flags); + while (mb_data[this_cpu].mygen - mb_global_data.curgen >= 0) { + handle_mb_ipi(); + barrier(); + } + } /* diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-alpha/system.h linux-2.4.10.wmbdd/include/asm-alpha/system.h --- linux-2.4.10/include/asm-alpha/system.h Sun Aug 12 10:38:47 2001 +++ linux-2.4.10.wmbdd/include/asm-alpha/system.h Mon Oct 8 18:31:18 2001 @@ -151,14 +151,21 @@ #define wmb() \ __asm__ __volatile__("wmb": : :"memory") +#define mbdd() smp_mbdd() +#define wmbdd() smp_wmbdd() + #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() smp_global_mb() +#define smp_wmbdd() smp_mbdd() #else #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() +#define smp_mbdd() barrier() +#define smp_wmbdd() barrier() #endif #define set_mb(var, value) \ diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-arm/system.h linux-2.4.10.wmbdd/include/asm-arm/system.h --- linux-2.4.10/include/asm-arm/system.h Mon Nov 27 17:07:59 2000 +++ linux-2.4.10.wmbdd/include/asm-arm/system.h Mon Oct 8 18:31:18 2001 @@ -39,6 +39,8 @@ #define mb() __asm__ __volatile__ ("" : : : "memory") #define rmb() mb() #define wmb() mb() +#define mbdd() mb() +#define wmbdd() wmb() #define nop() __asm__ __volatile__("mov\tr0,r0\t@ nop\n\t"); #define prepare_to_switch() do { } while(0) @@ -68,12 +70,16 @@ #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() rmbdd() +#define smp_wmbdd() wmbdd() #else #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() +#define smp_mbdd() barrier() +#define smp_wmbdd() barrier() #define cli() __cli() #define sti() __sti() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-cris/system.h linux-2.4.10.wmbdd/include/asm-cris/system.h --- linux-2.4.10/include/asm-cris/system.h Tue May 1 16:05:00 2001 +++ linux-2.4.10.wmbdd/include/asm-cris/system.h Mon Oct 8 18:31:18 2001 @@ -144,15 +144,21 @@ #define mb() __asm__ __volatile__ ("" : : : "memory") #define rmb() mb() #define wmb() mb() +#define mbdd() mb() +#define wmbdd() wmb() #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() mbdd() +#define smp_wmbdd() wmbdd() #else #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() +#define smp_mbdd() barrier() +#define smp_wmbdd() barrier() #endif #define iret() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-i386/system.h linux-2.4.10.wmbdd/include/asm-i386/system.h --- linux-2.4.10/include/asm-i386/system.h Sun Sep 23 10:31:01 2001 +++ linux-2.4.10.wmbdd/include/asm-i386/system.h Mon Oct 8 18:31:18 2001 @@ -285,15 +285,21 @@ #define mb() __asm__ __volatile__ ("lock; addl $0,0(%%esp)": : :"memory") #define rmb() mb() #define wmb() __asm__ __volatile__ ("": : :"memory") +#define mbdd() mb() +#define wmbdd() wmb() #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() mbdd() +#define smp_wmbdd() wmbdd() #else #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() +#define smp_mbdd() barrier() +#define smp_wmbdd() barrier() #endif #define set_mb(var, value) do { xchg(&var, value); } while (0) diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ia64/system.h linux-2.4.10.wmbdd/include/asm-ia64/system.h --- linux-2.4.10/include/asm-ia64/system.h Tue Jul 31 10:30:09 2001 +++ linux-2.4.10.wmbdd/include/asm-ia64/system.h Mon Oct 8 18:31:18 2001 @@ -84,11 +84,36 @@ * like regions are visible before any subsequent * stores and that all following stores will be * visible only after all previous stores. - * rmb(): Like wmb(), but for reads. + * In common code, any reads that depend on this + * ordering must be separated by an mb() or rmb(). + * rmb(): Guarantees that all preceding loads to memory- + * like regions are executed before any subsequent + * loads. * mb(): wmb()/rmb() combo, i.e., all previous memory * accesses are visible before all subsequent * accesses and vice versa. This is also known as - * a "fence." + * a "fence." Again, in common code, any reads that + * depend on the order of writes must themselves be + * separated by an mb() or rmb(). + * wmbdd(): Guarantees that all preceding stores to memory- + * like regions are visible before any subsequent + * stores and that all following stores will be + * visible only after all previous stores. + * In common code, any reads that depend on this + * ordering either must be separated by an mb() + * or rmb(), or the later reads must depend on + * data loaded by the earlier reads. For an example + * of the latter, consider "p->next". The read of + * the "next" field depends on the read of the + * pointer "p". + * mbdd(): wmb()/rmb() combo, i.e., all previous memory + * accesses are visible before all subsequent + * accesses and vice versa. This is also known as + * a "fence." Again, in common code, any reads that + * depend on the order of writes must themselves be + * separated by an mb() or rmb(), or there must be + * a data dependency that forces the second to + * wait until the first completes. * * Note: "mb()" and its variants cannot be used as a fence to order * accesses to memory mapped I/O registers. For that, mf.a needs to @@ -99,15 +124,21 @@ #define mb() __asm__ __volatile__ ("mf" ::: "memory") #define rmb() mb() #define wmb() mb() +#define rmbdd() mb() +#define wmbdd() mb() #ifdef CONFIG_SMP # define smp_mb() mb() # define smp_rmb() rmb() # define smp_wmb() wmb() +# define smp_mbdd() mbdd() +# define smp_wmbdd() wmbdd() #else # define smp_mb() barrier() # define smp_rmb() barrier() # define smp_wmb() barrier() +# define smp_mbdd() barrier() +# define smp_wmbdd() barrier() #endif /* diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-m68k/system.h linux-2.4.10.wmbdd/include/asm-m68k/system.h --- linux-2.4.10/include/asm-m68k/system.h Mon Jun 11 19:15:27 2001 +++ linux-2.4.10.wmbdd/include/asm-m68k/system.h Mon Oct 8 18:31:18 2001 @@ -81,12 +81,16 @@ #define mb() barrier() #define rmb() barrier() #define wmb() barrier() +#define rmbdd() barrier() +#define wmbdd() barrier() #define set_mb(var, value) do { xchg(&var, value); } while (0) #define set_wmb(var, value) do { var = value; wmb(); } while (0) #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() +#define smp_mbdd() barrier() +#define smp_wmbdd() barrier() #define xchg(ptr,x) ((__typeof__(*(ptr)))__xchg((unsigned long)(x),(ptr),sizeof(*(ptr)))) diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-mips/system.h linux-2.4.10.wmbdd/include/asm-mips/system.h --- linux-2.4.10/include/asm-mips/system.h Sun Sep 9 10:43:01 2001 +++ linux-2.4.10.wmbdd/include/asm-mips/system.h Mon Oct 8 18:31:18 2001 @@ -152,6 +152,8 @@ #define rmb() do { } while(0) #define wmb() wbflush() #define mb() wbflush() +#define wmbdd() wbflush() +#define mbdd() wbflush() #else /* CONFIG_CPU_HAS_WB */ @@ -167,6 +169,8 @@ : "memory") #define rmb() mb() #define wmb() mb() +#define wmbdd() mb() +#define mbdd() mb() #endif /* CONFIG_CPU_HAS_WB */ @@ -174,10 +178,14 @@ #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() mbdd() +#define smp_wmbdd() wmbdd() #else #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() +#define smp_mbdd() barrier() +#define smp_wmbdd() barrier() #endif #define set_mb(var, value) \ diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-mips64/system.h linux-2.4.10.wmbdd/include/asm-mips64/system.h --- linux-2.4.10/include/asm-mips64/system.h Wed Jul 4 11:50:39 2001 +++ linux-2.4.10.wmbdd/include/asm-mips64/system.h Mon Oct 8 18:31:18 2001 @@ -148,15 +148,21 @@ : "memory") #define rmb() mb() #define wmb() mb() +#define rmbdd() mb() +#define wmbdd() mb() #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() mbdd() +#define smp_wmbdd() wmbdd() #else #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() +#define smp_mbdd() barrier() +#define smp_wmbdd() barrier() #endif #define set_mb(var, value) \ diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-parisc/system.h linux-2.4.10.wmbdd/include/asm-parisc/system.h --- linux-2.4.10/include/asm-parisc/system.h Wed Dec 6 11:46:39 2000 +++ linux-2.4.10.wmbdd/include/asm-parisc/system.h Mon Oct 8 18:31:18 2001 @@ -51,6 +51,8 @@ #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() rmb() +#define smp_wmbdd() wmb() #else /* This is simply the barrier() macro from linux/kernel.h but when serial.c * uses tqueue.h uses smp_mb() defined using barrier(), linux/kernel.h @@ -59,6 +61,8 @@ #define smp_mb() __asm__ __volatile__("":::"memory"); #define smp_rmb() __asm__ __volatile__("":::"memory"); #define smp_wmb() __asm__ __volatile__("":::"memory"); +#define smp_mbdd() __asm__ __volatile__("":::"memory"); +#define smp_wmbdd() __asm__ __volatile__("":::"memory"); #endif /* interrupt control */ @@ -122,6 +126,8 @@ #define mb() __asm__ __volatile__ ("sync" : : :"memory") #define wmb() mb() +#define mbdd() mb() +#define wmbdd() mb() extern unsigned long __xchg(unsigned long, unsigned long *, int); diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-ppc/system.h linux-2.4.10.wmbdd/include/asm-ppc/system.h --- linux-2.4.10/include/asm-ppc/system.h Tue Aug 28 06:58:33 2001 +++ linux-2.4.10.wmbdd/include/asm-ppc/system.h Mon Oct 8 18:31:18 2001 @@ -33,6 +33,8 @@ #define mb() __asm__ __volatile__ ("sync" : : : "memory") #define rmb() __asm__ __volatile__ ("sync" : : : "memory") #define wmb() __asm__ __volatile__ ("eieio" : : : "memory") +#define mbdd() mb() +#define wmbdd() wmb() #define set_mb(var, value) do { var = value; mb(); } while (0) #define set_wmb(var, value) do { var = value; wmb(); } while (0) @@ -41,10 +43,14 @@ #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() mb() +#define smp_wmbdd() wmb() #else #define smp_mb() __asm__ __volatile__("": : :"memory") #define smp_rmb() __asm__ __volatile__("": : :"memory") #define smp_wmb() __asm__ __volatile__("": : :"memory") +#define smp_mbdd() __asm__ __volatile__("": : :"memory") +#define smp_wmbdd() __asm__ __volatile__("": : :"memory") #endif /* CONFIG_SMP */ #ifdef __KERNEL__ diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-s390/system.h linux-2.4.10.wmbdd/include/asm-s390/system.h --- linux-2.4.10/include/asm-s390/system.h Wed Jul 25 14:12:02 2001 +++ linux-2.4.10.wmbdd/include/asm-s390/system.h Mon Oct 8 18:31:18 2001 @@ -118,9 +118,13 @@ #define mb() eieio() #define rmb() eieio() #define wmb() eieio() +#define mbdd() mb() +#define wmbdd() wmb() #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() mb() +#define smp_wmbdd() wmb() #define smp_mb__before_clear_bit() smp_mb() #define smp_mb__after_clear_bit() smp_mb() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-s390x/system.h linux-2.4.10.wmbdd/include/asm-s390x/system.h --- linux-2.4.10/include/asm-s390x/system.h Wed Jul 25 14:12:03 2001 +++ linux-2.4.10.wmbdd/include/asm-s390x/system.h Mon Oct 8 18:31:19 2001 @@ -131,9 +131,13 @@ #define mb() eieio() #define rmb() eieio() #define wmb() eieio() +#define mbdd() mb() +#define wmbdd() wmb() #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() mb() +#define smp_wmbdd() wmb() #define smp_mb__before_clear_bit() smp_mb() #define smp_mb__after_clear_bit() smp_mb() diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sh/system.h linux-2.4.10.wmbdd/include/asm-sh/system.h --- linux-2.4.10/include/asm-sh/system.h Sat Sep 8 12:29:09 2001 +++ linux-2.4.10.wmbdd/include/asm-sh/system.h Mon Oct 8 18:31:19 2001 @@ -89,15 +89,21 @@ #define mb() __asm__ __volatile__ ("": : :"memory") #define rmb() mb() #define wmb() __asm__ __volatile__ ("": : :"memory") +#define mbdd() mb() +#define wmbdd() wmb() #ifdef CONFIG_SMP #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() mb() +#define smp_wmbdd() wmb() #else #define smp_mb() barrier() #define smp_rmb() barrier() #define smp_wmb() barrier() +#define smp_mbdd() barrier() +#define smp_wmbdd() barrier() #endif #define set_mb(var, value) do { xchg(&var, value); } while (0) diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sparc/system.h linux-2.4.10.wmbdd/include/asm-sparc/system.h --- linux-2.4.10/include/asm-sparc/system.h Tue Oct 3 09:24:41 2000 +++ linux-2.4.10.wmbdd/include/asm-sparc/system.h Mon Oct 8 18:31:19 2001 @@ -278,11 +278,15 @@ #define mb() __asm__ __volatile__ ("" : : : "memory") #define rmb() mb() #define wmb() mb() +#define mbdd() mb() +#define wmbdd() wmb() #define set_mb(__var, __value) do { __var = __value; mb(); } while(0) #define set_wmb(__var, __value) set_mb(__var, __value) #define smp_mb() __asm__ __volatile__("":::"memory"); #define smp_rmb() __asm__ __volatile__("":::"memory"); #define smp_wmb() __asm__ __volatile__("":::"memory"); +#define smp_mbdd() __asm__ __volatile__("":::"memory"); +#define smp_wmbdd() __asm__ __volatile__("":::"memory"); #define nop() __asm__ __volatile__ ("nop"); diff -urN -X /home/mckenney/dontdiff linux-2.4.10/include/asm-sparc64/system.h linux-2.4.10.wmbdd/include/asm-sparc64/system.h --- linux-2.4.10/include/asm-sparc64/system.h Fri Sep 7 11:01:20 2001 +++ linux-2.4.10.wmbdd/include/asm-sparc64/system.h Wed Oct 10 16:43:21 2001 @@ -100,6 +100,8 @@ membar("#LoadLoad | #LoadStore | #StoreStore | #StoreLoad"); #define rmb() membar("#LoadLoad") #define wmb() membar("#StoreStore") +#define mbdd() mb() +#define wmbdd() wmb() #define set_mb(__var, __value) \ do { __var = __value; membar("#StoreLoad | #StoreStore"); } while(0) #define set_wmb(__var, __value) \ @@ -109,10 +111,14 @@ #define smp_mb() mb() #define smp_rmb() rmb() #define smp_wmb() wmb() +#define smp_mbdd() mbdd() +#define smp_wmbdd() wmbdd() #else #define smp_mb() __asm__ __volatile__("":::"memory"); #define smp_rmb() __asm__ __volatile__("":::"memory"); #define smp_wmb() __asm__ __volatile__("":::"memory"); +#define smp_mbdd() __asm__ __volatile__("":::"memory"); +#define smp_wmbdd() __asm__ __volatile__("":::"memory"); #endif #define flushi(addr) __asm__ __volatile__ ("flush %0" : : "r" (addr) : "memory") |
From: Rusty R. <ru...@ru...> - 2001-10-12 04:19:10
|
On Wed, 10 Oct 2001 18:56:26 -0700 (PDT) "Paul E. McKenney" <mck...@en...> wrote: > Here are two patches. The wmbdd patch has been modified to use > the lighter-weight SPARC instruction, as suggested by Dave Miller. > The rmbdd patch defines an rmbdd() primitive that is defined to be > rmb() on Alpha and a nop on other architectures. I believe this > rmbdd() primitive is what Richard is looking for. Surely we don't need both? If rmbdd exists, any code needing wmbdd is terminally broken? Rusty. |
From: Paul E. M. <pmc...@us...> - 2001-10-13 14:48:57
|
>> Here are two patches. The wmbdd patch has been modified to use >> the lighter-weight SPARC instruction, as suggested by Dave Miller. >> The rmbdd patch defines an rmbdd() primitive that is defined to be >> rmb() on Alpha and a nop on other architectures. I believe this >> rmbdd() primitive is what Richard is looking for. > >Surely we don't need both? If rmbdd exists, any code needing wmbdd >is terminally broken? One or the other. And at this point, it looks like rmbdd() (or read_cache_depends()) is the mechanism of choice, given wmbdd()'s performance on Alpha. Thanx, Paul |
From: David S. M. <da...@re...> - 2001-10-09 05:56:15
|
From: "Paul McKenney" <Pau...@us...> Date: Mon, 8 Oct 2001 22:27:44 -0700 All other CPUs must observe the preceding stores before the following stores. ... Does this do the trick? membar #StoreStore Yes. The IPIs and related junk are I believe needed only on Alpha, which has no single memory-barrier instruction that can do wmbdd()'s job. Given that Alpha seems to be on its way out, this did not seem to me to be too horrible. I somehow doubt that you need an IPI to implement the equivalent of "membar #StoreStore" on Alpha. Richard? Franks a lot, David S. Miller da...@re... |
From: Richard H. <rt...@re...> - 2001-10-09 06:43:55
|
On Mon, Oct 08, 2001 at 10:56:10PM -0700, David S. Miller wrote: > I somehow doubt that you need an IPI to implement the equivalent of > "membar #StoreStore" on Alpha. Richard? Lol. Of course not. Is someone under the impression that AXP designers were smoking crack? "wmb" == "membar #StoreStore". "mb" == "membar #Sync". See the nice mb/rmb/wmb macros in <asm/system.h>. r~ |