|
From: jason <huz...@gm...> - 2013-10-25 07:49:06
|
Hi Ying,
On Friday, October 25, 2013, Ying Xue wrote:
> On 10/25/2013 11:20 AM, jason wrote:
> > Hi Ying,
> > As I under standing of the memory barrier, it is used for other CPU to
> > see a same order of two instructions (one before the memory barrier and
> > one after it) operating on memory as the local CPU. So if one decide to
> > use memory barrier, there must be at lease two memory operations first.
> > In your patch, operation on 'block' is just a single operation on memory
> > which just store a value to a memory location, no subsequent memory
> > operations presented to be execute in a strong order (must take effect
> > on memory AFTER set the 'block' flag) among all other CPUs. So I here
> > agree on Jon too, no need to use atomic, memory barrier and locks.
> >
>
> Please carefully read my mail before you post your comments.
>
> I have placed emphasis which there is no required data dependency in our
> case. But I have listed two exceptions: one is that some older
> processors have weak or even no cache coherences system. Specially you
> can look through implementations of memory barriers in all archs.
Sorry for my careless...
Yes, I got your point ,and you are correct absolutely. Yes, many
architectures do need memory barrier to force to update cache, but in most
case, you do not need to do it explicitly. Many kinds of implicit memory
barrier will appear in the subsequent code, such as a
rtnl_lock/rtnl_unlock... If you can find one, then your memory barrier
operation can be omitted and improve performance.
> Another case is compiler optimization factor.
>
>
Yes, memory barrier can suppress compiler optimization on the reader side.
But I think as you know, a volatile key word is enough for this job.
>
> //Ying
>
> > On Friday, October 25, 2013, Ying Xue wrote:
> >
> > On 10/21/2013 05:14 AM, Jon Maloy wrote:
> > > On 09/30/2013 10:38 AM, Ying Xue wrote:
> > >> The 'blocked' flag in struct tipc_bearer is currently protected by
> > >> a big spinlock aggregated into tipc_bearer. We want to reduce all
> > >> dependencies to this lock, as part of our effort to make the
> locking
> > >> policy simpler and more manageable.
> > >>
> > >> Furthermore, the functions tipc_disc_recv_msg() and
> disc_send_msg()
> > >> access the flag without taking this lock at all, implying a risk
> that
> > >> they may occasionally obtain a wrong value and drop
> sending/receiving
> > >> of discovery messages. Since discovery messages are timer driven
> and
> > >> best-effort, the latter problem is not fatal, but it is clearly
> > avoidable.
> > >>
> > >> By converting the type of the flag from int to atomic_t we
> eliminate
> > >> both the spinlock dependency and the current race condition.
> > >>
> > >> We also make two small code changes:
> > >>
> > >> 1) The functions tipc_continue() and bearer_blocked() now become
> > >> one-liners doing a single access to the flag. So we just remove
> > >> those redundant functions.
> > >>
> > >> 2) The initialization of the flag is moved from the enable_media()
> > >> functions in ib_media.c and eth_media.c respecively, to the
> > generic
> > >> tipc_enable_bearer() function, so that it is done in one place
> > only.
> > >
> > > Maybe it is just me, but I still fail to see the benefit (or harm)
> > of this change.
> > >
> > > As I understand it, the time line for the current implementation,
> > and the
> > > one you are suggesting, look like this:
> > >
> > > Without atomc_t:
> > > ------------------------
> > > 1: Device driver detects loss of carrier or similar.
> > > 2: It issues a call (work_queue?) to all registered users.
> > > 3: Until the call has reached the users, they (including TIPC)
> > will continue
> > > to send packets, although the driver can only drop them.
> > > 4: The field bearer->blocked is set as a one single write both on
> > 32-bits and
> > > 64-bist cPUs, since it is one word, and a small constant
> > value (1 or 0) that is written.
> > > 5: Other cores happening to work on the same cache line at that
> moment
> > > will notice nothing; they will continue to send the packet
> > they were working
> > > on, seeing the previous value of the field.
> > > 6: The cache line is flushed and reloaded as soon as next packet
> > is sent or received
> > > on these cores. (I don't think a single 64-byte cache line
> > will remain between
> > > two transmissions. There is way to many data fields involved.)
> > > 7: The next time the cache line is reloaded it has the correct
> > value of
> > > 'blocked', since the cache coherency HW support ensures that
> > > the cache is updated. Next packet is not sent.
> > >
> > >
> > > With atomic_t:
> > > ------------------
> > > 1: Device driver detects loss of carrier or similar.
> > > 2: It issues a call (work_queue?) to all registered users.
> > > 3: Until the call has reached the users, they (including TIPC)
> > will continue
> > > to send packets, although the driver can only drop them.
> > > 4: The field bearer->blocked is set as a single, atomic write,
> > which it would be
> > > anyway, just as in the previous case.
> > > 5: Other cores happening to read on the same cache line at that
> moment
> > > will now see the cache line invalidated, so they will have to
> > reread it with the
> > > new, correct value. The 'volatile' keyword ensures that. As I
> > understand
> > > GCC even puts in memory barrier around volatile variables, but
> > I fail to
> > > to see the relevance here, we don make any other changes to the
> > > bearer struct anyway, so ordereing is not an issue.
> > > 6: The cache line now has the correct read value. Packet is not
> sent.
> > >
> > > What have we won? That TIPC may have time to send one more
> > > packet per core than it would otherwise, but nothing can stop
> > > TIPC and others from sending a small number of packets during the
> > > interval between carrier loss detection and until the 'blocked'
> > flag is
> > > updated. The difference it makes if we cache line is invalidated
> > immediately
> > > or at next reload is microscopic in this context.
> > >
> > > Admittedly, the harm done is not big either. The memory
> > > barriers put in by GCC around the volatile read does not cost much.
> > > But why make such a change if we cannot motivate it?
> > >
> > > First of all, thank you for so detailed comments. In fact the
> > things you
> > concerned below often confuse me. So I spend some times trying to
> > understand memory cache system of modern processor as well as the
> > effects of kinds of Linux memory barrier primitives on memory cache
> > coherence. That's why I am late to respond you. But it's really so
> > complex that I am also afraid that I don't entirely understand them
> > although I consumed several days to research. Anyway, I am still
> going
> > to answer your questions :)
> >
> > Nowadays most of modern processors have cache coherence protocols
> > ensuring that all CPUs maintain a consistent view of data. The most
> > famous coherence protocol is called MESI. For example, the blocked
> > variable is manipulated by core 0 and is only written back to its
> local
> > cache, and other cores snoop/monitor the change of core 0 from
> internal
> > high speed bus and then invalidate their local cacheline where the
> > blocked variable resides. When other cores read the blocked, they can
> > get the up-to-date value from the cacheline of core 0. The whole
> process
> > is guaranteed by MESI. However, why do we still need memory barriers
> if
> > processor HW has cache coherence capability? This is because these
> CPUs
> > cannot ensure the order of accessing memory occurring on core 0 is
> same
> > as happening on other cores although they can guarantee that data is
> > consistent. But in our case, as there is no required data dependency,
> > memory barriers seem useless for us in this case.
> >
> > Unfortunately life doesn't sound like so easy! There are still some
> of
> > old CPUs being incapable of the cache coherence mechanisms, meaning
> that
> > manually flushing cache becomes mandatory for them. In this
> situation,
> > without memory barriers, the rate of getting stale data might
> > significantly arise. Even if many packets are involved(ie, many
> memory
> > areas would be access via CPU's local cache), in some extremely worst
> > case it doesn't mean the cacheline storing the stale blocked is
> > displaced/sacrificed within a short time. Therefore, it becomes
> > unpredictable for us when the stale blocked would be updated, which
> may
> > be unacceptable.
> >
> > If we find barrier.h files in kernel source tree, below results are
> got:
> >
> > */net-next-2.6$ find ./ -name barrier.h
> > ./arch/metag/include/asm/barrier.h
> > ./arch/sparc/include/asm/barrier.h
> > ./arch/h8300/include/asm/barrier.h
> > ./arch/x86/um/asm/barrier.h
> > ./arch/x86/include/asm/barrier.h
> > ./arch/unicore32/include/asm/barrier.h
> > ./arch/xtensa/include/asm/barrier.h
> > ./arch/frv/include/asm/barrier.h
> > ./arch/m68k/include/asm/barrier.h
> > ./arch/hexagon/include/asm/barrier.h
> > ./arch/mips/include/asm/barrier.h
> > ./arch/arm64/include/asm/barrier.h
> > ./arch/parisc/include/asm/barrier.h
> > ./arch/s390/include/asm/barrier.h
> > ./arch/ia64/include/asm/barrier.h
> > ./arch/m32r/include/asm/barrier.h
> > ./arch/powerpc/include/asm/barrier.h
> > ./arch/alpha/include/asm/barrier.h
> > ./arch/arm/include/asm/barrier.h
> > ./arch/avr32/include/asm/barrier.h
> > ./arch/sh/include/asm/barrier.h
> > ./arch/microblaze/include/asm/barrier.h
> > ./arch/mn10300/include/asm/barrier.h
> > ./arch/tile/include/asm/barrier.h
> > ./arch/arc/include/asm/barrier.h
> > ./arch/score/include/asm/barrier.h
> > ./arch/blackfin/include/asm/barrier.h
> > ./arch/cris/include/asm/barrier.h
> > ./include/asm-generic/barrier.h
> >
> > Different of architectures have different implementations. Even if
> arch
> > is same, different HW versions also have different implementations.
> If
> > we take a closer look at them, it's found many of implementations
> > forcedly flush/sync cache in memory barriers operations.
> >
> > Beside the effect on CPU cache, memory barriers also can suppress the
> > optimization of compiler. For it, I just attempt to state my
> > understanding of keyword - volatile, so please image below scenario:
> >
> > CPU 0 CPU 1
> > ----------- -----
> > T0: read blocked(blocked is 0 now)
> > T1: blocked = 1
> > T2: read blocked(blocked is 0)
> >
> > As compiler optimizes the process of accessing blocked, without
> volatile
> > keyword decorating the variable of blocked, at T0, CPU 0 reads
> blocked
> > from main memory/cache and load it into one of its registers. At T1,
> > blocked is set to 1 by CPU 1. But as CPU 0 is unaware of the change,
> it
> > gets the value from its register instead of main memory or cache
> when it
> > reads blocked variable again at T2. We don't know when the value in
> the
> > register will be overwritten by other values, that means we don't
> know
> > when CPU 0 would read the true data. This may also be unacceptable
> > for us.
> >
> > Therefore, without memory barriers, in most time I believe everything
> > works fine. But to be more cautious, I still prefer to apply some
> > limitations to ensure blocked is always sync.
> >
> > Of course, probably my above understanding is incomplete or
> incorrect.
> > If so, please correct me.
> >
> >
> > Regards,
> > Ying
> >
> >
> > > Regards
> > > ///jon
> > >
> > >
> > >>
> > >> Signed-off-by: Ying Xue <yin...@wi...>
> > >> ---
> > >> net/tipc/bcast.c | 4 ++--
> > >> net/tipc/bearer.c | 33 ++++-----------------------------
> > >> net/tipc/bearer.h | 5 +----
> > >> net/tipc/discover.c | 5 +++--
> > >> net/tipc/eth_media.c | 9 ++++-----
> > >> net/tipc/ib_media.c | 9 ++++-----
> > >> net/tipc/link.c | 12 ++++++------
> > >> 7 files changed, 24 insertions(+), 53 deletions(-)
> > >>
> > >> diff --git a/net/tipc/bcast.c b/net/tipc/bcast.c
> > >> index 716de1a..c78f30c 100644
> > >> --- a/net/tipc/bcast.c
> > >> +++ b/net/tipc/bcast.c
> > >> @@ -615,8 +615,8 @@ static int tipc_bcbearer_send(struct sk_buff
> > *buf, struct tipc_bearer *unused1,
> > >> if (!p)
> > >> break; /* No more bearers to try */
> > >>
> > >> - if (tipc_bearer_blocked(p)) {
> > >> - if (!s || tipc_bearer_blocked(s))
> > >> + if (atomic_read(&p->blocked)) {
> > >> + if (!s || atomic_read(&s->blocked))
> > >> continue; /* Can't use either bearer
> */
> > >> b = s;
> > >> }
> > >> diff --git a/net/tipc/bearer.c b/net/tipc/bearer.c
> > >> index 3f9707a..17f9824 100644
> > >> --- a/net/tipc/bearer.c
> > >> +++ b/net/tipc/bearer.c
> > >> @@ -275,31 +275,6 @@ void tipc_bearer_remove_dest(struct
> > tipc_bearer *b_ptr, u32 dest)
> > >> tipc_disc_remove_dest(b_ptr->link_req);
> > >> }
> > >>
> > >> -/*
> > >> - * Interrupt enabling new requests after bearer blocking:
> > >> - * See bearer_send().
> > >> - */
> > >> -void tipc_continue(struct tipc_bearer *b)
> > >> -{
> > >> - spin_lock_bh(&b->lock);
> > >> - b->blocked = 0;
> > >> - spin_unlock_bh(&b->lock);
> > >> -}
> > >> -
> > >> -/*
> > >> - * tipc_bearer_blocked - determines if bearer is currently
> blocked
> > >> - */
> > >> -int tipc_bearer_blocked(struct tipc_bearer *b)
> > >> -{
> > >> - int res;
> > >> -
> > >> - spin_lock_bh(&b->lock);
> > >> - res = b->blocked;
> > >> - spin_unlock_bh(&b->lock);
> > >> -
> > >> - return res;
> > >> -}
> > >> -
> > >> /**
> > >> * tipc_enable_bearer - enable bearer with the given name
> > >> */
> > >> @@ -387,6 +362,8 @@ restart:
> > >>
> > >> b_ptr = &tipc_bearers[bearer_id];
> > >> strcpy(b_ptr->name, name);
> > >> + atomic_set(&b_ptr->blocked, 0);
> > >> + smp_mb();
> > >> res = m_ptr->enable_media(b_ptr);
> > >> if (res) {
> > >> pr_warn("Bearer <%s> rejected, enable failure
> (%d)\n",
> > >> @@ -429,8 +406,8 @@ int tipc_block_bearer(struct tipc_bearer
> *b_ptr)
> > >>
> > >> read_lock_bh(&tipc_net_lock);
> > >> pr_info("Blocking bearer <%s>\n", b_ptr->name);
> > >> + atomic_add_unless(&b_ptr->blocked, 1, 1);
> > >> spin_lock_bh(&b_ptr->lock);
> > >> - b_ptr->blocked = 1;
> > >> list_for_each_entry_safe(l_ptr, temp_l_ptr, &b_ptr->links,
> > link_list) {
> > >> struct tipc_node *n_ptr = l_ptr->owner;
> > >>
> > >> @@ -455,8 +432,8 @@ static void bearer_disable(struct tipc_bearer
> > *b_ptr)
> > >> struct tipc_link_req *temp_req;
> > >>
> >
> http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk
> > _______________________________________________
> > tipc-discussion mailing list
> > tip...@li... <javascript:;>
> > https://lists.sourceforge.net/lists/listinfo/tipc-discussion
> >
> >
> >
> > --
> > Yours,
> > Jason
>
>
--
Yours,
Jason
|