From: Jon M. <jon...@er...> - 2004-04-28 20:01:38
|
/jon Mark Haverkamp wrote: On Tue, 2004-04-27 at 16:58, Jon Maloy wrote: Hi all, I just uploaded a new file release to SourceForge. Has anybody tested the management interface all the way (Mark?) I've been trying it out. I'm still having trouble in some circumstances. Sometimes, while the tipc benchmark program is running, and I try to get information from the management interface I'll get stuck waiting in poll. If I kill the server side while stuck, the client side will freeze and requires a reboot. If I kill the program waiting on the management interface first and then kill the server side program, I have no problems. I haven't gotten too far into figuring out what is going on with this one yet but it sounds like a lock problem. Especially since it only seems to happen with SMP. This may be one of the problems I solved in tipc-1.3.10: If you kill a node while there are pending connections to it, the auto-disconnect at the other end will hit the node-lock a second time in the same thread, in "nodesub_unsubscribe()". I fixed this by making it unecessary to unlink the node-subscriber objects within the "handle_node_down()" upcall, -all such subscriptions are one-shot anyway. The solution is not totally satisfactory, because the "abort_self" call and consequenctly the dispatcher upcall is called with the node lock set. I found no better solution right now, and it works. Do you still have this problem with tipc-1.3.10 ? Has anybody tried the SOCK_STREAM interface yet ? What is the status of the multicast code ? (there were some discussions about who should receive the messages last week, -has this been followed up with corrections) ? The reliable broadcast code ? Are there any known bugs/problems that have not been corrected yet. I have enclosed a couple small patches that were found while debugging access to the management interface while running the tipc benchmark program. It looks like a lock is not dropped sometimes in nametbl_withdraw. Also we ran into a situation where a port is dropped from the congested list but is still referenced in the chain. This was causing us to panic if the referenced port was deleted, and then the referencing port was deleted while still congested. Good. I am pretty sure I have seen that problem, but only once. Let's hope it is solved now. cvs diff -u name_table.c Index: name_table.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/name_table.c,v retrieving revision 1.13 diff -u -r1.13 name_table.c --- name_table.c 27 Apr 2004 22:44:43 -0000 1.13 +++ name_table.c 28 Apr 2004 17:50:36 -0000 @@ -907,6 +907,7 @@ kfree(publ); return 1; } + write_unlock_bh(&nametbl_lock); return 0; } cvs diff -u link.c Index: link.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/link.c,v retrieving revision 1.18 diff -u -r1.18 link.c --- link.c 27 Apr 2004 22:35:25 -0000 1.18 +++ link.c 28 Apr 2004 17:51:02 -0000 @@ -474,6 +474,14 @@ port = next; } this->first_waiting_port = port; + + /* + * Make sure that this port isn't pointing at + * any port just removed from congestion + */ + if (port) { + port->prev_waiting = 0; + } exit: spin_unlock_bh(&port_lock); } |