From: Jon M. <jon...@er...> - 2004-05-19 13:50:52
|
I think I have succeeded in making the port/transport level lock handling somewhat easier to understand, without changing the basic locking policy. New: 1: Callbacks to socket.c (dispatcher/wakeup dipatcher) are now called with and return with the port-lock on. 2: Symmetry: A port or subscription is now always locked/unlocked in the same function. 3: No dual-purpose functions to the reference manager; ref_lock_acquire/ref_unlock_discard etc are now gone. The port lock is still located in the corresponding reference entry, though, and is accessed via easily comprehensible inline functions; port_lock()/port_unlock() etc. 4: Better comments about what the locks protect, and what the lock functions do. I will continue with a similar work on the node locks in a few days, but I would welcome some feedback first. /Jon |
From: Mark H. <ma...@os...> - 2004-05-19 14:59:11
|
On Tue, 2004-05-18 at 17:19, Jon Maloy wrote: > I think I have succeeded in making the port/transport level lock > handling somewhat easier to understand, without changing the > basic locking policy. > > New: > > 1: Callbacks to socket.c (dispatcher/wakeup dipatcher) are now > called with and return with the port-lock on. > > 2: Symmetry: A port or subscription is now always > locked/unlocked in the same function. These are a good idea. It makes it a lot easier to follow when you can see the scope of a lock without having to look through a bunch of called functions to find out whether a lock was released in one of them. > > 3: No dual-purpose functions to the reference manager; > ref_lock_acquire/ref_unlock_discard etc are now gone. > The port lock is still located in the corresponding reference > entry, though, and is accessed via easily comprehensible > inline functions; port_lock()/port_unlock() etc. > > 4: Better comments about what the locks protect, and what > the lock functions do. > > I will continue with a similar work on the node locks in a few days, > but I would welcome some feedback first. I've loaded it on to a couple of my systems and will try it out today. Mark. -- Mark Haverkamp <ma...@os...> |
From: Daniel M. <da...@os...> - 2004-05-19 23:14:35
|
On Tue, 2004-05-18 at 17:19, Jon Maloy wrote: > I think I have succeeded in making the port/transport level lock > handling somewhat easier to understand, without changing the > basic locking policy. > I was testing multicast and Mark had updated 2 of the machines to use the latest tipc cvs and multicast sendto()s were not being seen by a process on one of the nodes running the latest version. I changed the tipc version on the node whose messages were not being seen to Monday's checked in version. (2 nodes running monday's version and 1 running latest) the multicast sends are correctly seen by all processes on all nodes. Something is not quite right. Any ideas? Daniel |
From: Jon M. <jon...@er...> - 2004-05-19 23:54:16
|
The last thing I did before the port_lock changes was to add a device notifier in eth_media, and a "tipc_block_bearer()" function to handle this. I tested this with an "ifconfig eth1 down/up" (I run dual links most of the time) during traffic, and this worked fine, but when I thereafter removed the module I got a crash in bcast.c/blink_remove(), - a NULL pointer access. I corrected this (I believed) and tested it, but it seems like I have still introduced some problem here. Maybe Guo or Ling can say more about this ? Pay special attention to the flag "blocked" in the bearer, if this gets stuck with with the wrong value the traffic will never restart. /Jon Daniel McNeil wrote: On Tue, 2004-05-18 at 17:19, Jon Maloy wrote: I think I have succeeded in making the port/transport level lock handling somewhat easier to understand, without changing the basic locking policy. I was testing multicast and Mark had updated 2 of the machines to use the latest tipc cvs and multicast sendto()s were not being seen by a process on one of the nodes running the latest version. I changed the tipc version on the node whose messages were not being seen to Monday's checked in version. (2 nodes running monday's version and 1 running latest) the multicast sends are correctly seen by all processes on all nodes. Something is not quite right. Any ideas? Daniel |
From: Mark H. <ma...@os...> - 2004-05-20 16:13:48
|
On Wed, 2004-05-19 at 16:54, Jon Maloy wrote: > The last thing I did before the port_lock changes was to add > a device notifier in eth_media, and a "tipc_block_bearer()" > function to handle this. I tested this with an "ifconfig eth1 > down/up" > (I run dual links most of the time) during traffic, and this worked > fine, but when I thereafter removed the module I got a crash > in bcast.c/blink_remove(), - a NULL pointer access. > > I corrected this (I believed) and tested it, but it seems like I > have still introduced some problem here. > > Maybe Guo or Ling can say more about this ? Pay special > attention to the flag "blocked" in the bearer, if this gets stuck with > with the wrong value the traffic will never restart. > Daniel and I sprinkled a few printks around and found an error in tipc_forward_buf2nameseq. The main problem was that the result from bcast_port_recv is the message data size but was being checked for non-zero to be an error. Also the result was only being checked for the local delivery, and the prev_destnode was being reset to zero inside the loop defeating its purpose. Included is a patch that works for us. One other thing, since tipc_forward_buf2nameseq returns the message data size, that means that tipc_multicast returns the same. cvs diff -u sendbcast.c Index: sendbcast.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/sendbcast.c,v retrieving revision 1.15 diff -u -r1.15 sendbcast.c --- sendbcast.c 6 May 2004 15:35:31 -0000 1.15 +++ sendbcast.c 20 May 2004 16:08:03 -0000 @@ -167,18 +167,20 @@ { struct port *this = (struct port *) ref_deref(ref); uint res = 0; + int dsz; struct tipc_msg *m; struct mc_identity *mid = NULL; struct list_head *pos; struct sk_buff *copybuf; tipc_net_addr_t prev_destnode; + dsz = msg_data_sz(buf_msg(buf)); m = &this->publ.phdr; if (importance <= 3) msg_set_importance(m, importance); + prev_destnode = 0; list_for_each(pos, mc_head) { - prev_destnode = 0; mid = list_entry(pos, struct mc_identity, list); if (mid != NULL && (prev_destnode != mid->node)) { prev_destnode = mid->node; @@ -188,9 +190,9 @@ res = tipc_send_buf_fast(copybuf, mid->node); } else { res = bcast_port_recv(copybuf); - if (res != 0) - break; } + if (res != dsz) + break; } } buf_safe_discard(buf); -- Mark Haverkamp <ma...@os...> |
From: Jon M. <jon...@er...> - 2004-05-20 17:01:14
|
Sounds good. Does this mean that the problem from yesterday is solved? If so, it puzzles me that this problem showed up now, after my seemingly unrelated changes, and not earlier. /jon Mark Haverkamp wrote: On Wed, 2004-05-19 at 16:54, Jon Maloy wrote: The last thing I did before the port_lock changes was to add a device notifier in eth_media, and a "tipc_block_bearer()" function to handle this. I tested this with an "ifconfig eth1 down/up" (I run dual links most of the time) during traffic, and this worked fine, but when I thereafter removed the module I got a crash in bcast.c/blink_remove(), - a NULL pointer access. I corrected this (I believed) and tested it, but it seems like I have still introduced some problem here. Maybe Guo or Ling can say more about this ? Pay special attention to the flag "blocked" in the bearer, if this gets stuck with with the wrong value the traffic will never restart. Daniel and I sprinkled a few printks around and found an error in tipc_forward_buf2nameseq. The main problem was that the result from bcast_port_recv is the message data size but was being checked for non-zero to be an error. Also the result was only being checked for the local delivery, and the prev_destnode was being reset to zero inside the loop defeating its purpose. Included is a patch that works for us. One other thing, since tipc_forward_buf2nameseq returns the message data size, that means that tipc_multicast returns the same. cvs diff -u sendbcast.c Index: sendbcast.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/sendbcast.c,v retrieving revision 1.15 diff -u -r1.15 sendbcast.c --- sendbcast.c 6 May 2004 15:35:31 -0000 1.15 +++ sendbcast.c 20 May 2004 16:08:03 -0000 @@ -167,18 +167,20 @@ { struct port *this = (struct port *) ref_deref(ref); uint res = 0; + int dsz; struct tipc_msg *m; struct mc_identity *mid = NULL; struct list_head *pos; struct sk_buff *copybuf; tipc_net_addr_t prev_destnode; + dsz = msg_data_sz(buf_msg(buf)); m = &this->publ.phdr; if (importance <= 3) msg_set_importance(m, importance); + prev_destnode = 0; list_for_each(pos, mc_head) { - prev_destnode = 0; mid = list_entry(pos, struct mc_identity, list); if (mid != NULL && (prev_destnode != mid->node)) { prev_destnode = mid->node; @@ -188,9 +190,9 @@ res = tipc_send_buf_fast(copybuf, mid->node); } else { res = bcast_port_recv(copybuf); - if (res != 0) - break; } + if (res != dsz) + break; } } buf_safe_discard(buf); |
From: Daniel M. <da...@os...> - 2004-05-20 17:07:01
|
On Thu, 2004-05-20 at 10:00, Jon Maloy wrote: > Sounds good. Does this mean that the problem from yesterday > is solved? If so, it puzzles me that this problem showed > up now, after my seemingly unrelated changes, and not > earlier. > > /jon This fixes the multicast problem. Mark is still seeing a problem when running 64 processes. Daniel |
From: Jon M. <jon...@er...> - 2004-05-20 17:30:05
|
This is what I meant. I can not see the relation between my changes and the multicast problem. Anyway, as long as it works I am happy. /jon Daniel McNeil wrote: On Thu, 2004-05-20 at 10:00, Jon Maloy wrote: Sounds good. Does this mean that the problem from yesterday is solved? If so, it puzzles me that this problem showed up now, after my seemingly unrelated changes, and not earlier. /jon This fixes the multicast problem. Mark is still seeing a problem when running 64 processes. Daniel |