From: Jon M. <jon...@er...> - 2004-05-21 22:53:34
|
Guo, It was an obvious NULL-pointer error (in blinkset_remove, not blink_remove as I wrote). I fixed this, and everything works fine; it does not seem related to the problem Daniel found. I don't have a dump, but here is the change I did. < * $Id: bcast.c,v 1.23 2004/05/19 17:47:16 jonmaloy Exp $ < * $Id: bcast.c,v 1.23 2004/05/19 17:47:16 jonmaloy Exp $ --- > * $Id: bcast.c,v 1.22 2004/05/07 22:17:26 jonmaloy Exp $ > * $Id: bcast.c,v 1.22 2004/05/07 22:17:26 jonmaloy Exp $ 46,48d45 < * Revision 1.23 2004/05/19 17:47:16 jonmaloy < * Fixed NULL-pointer bug in blinkset_remove < * 751,752d747 < if (linkset[i] == NULL) < continue; Thanks /jon Guo, Min wrote: The patch is OK for me! you can apply it by yourself. For the blink_remove bug,can you reproduce the bug and send us the back trace log? Another thing is YinHu, who is now a intern in Intel,will contribute the TIPC validation in the future, wish he can get help from you all! Thanks Guo Min tip...@li... <mailto:tip...@li...> wrote: On Wed, 2004-05-19 at 16:54, Jon Maloy wrote: The last thing I did before the port_lock changes was to add a device notifier in eth_media, and a "tipc_block_bearer()" function to handle this. I tested this with an "ifconfig eth1 down/up" (I run dual links most of the time) during traffic, and this worked fine, but when I thereafter removed the module I got a crash in bcast.c/blink_remove(), - a NULL pointer access. I corrected this (I believed) and tested it, but it seems like I have still introduced some problem here. Maybe Guo or Ling can say more about this ? Pay special attention to the flag "blocked" in the bearer, if this gets stuck with with the wrong value the traffic will never restart. Daniel and I sprinkled a few printks around and found an error in tipc_forward_buf2nameseq. The main problem was that the result from bcast_port_recv is the message data size but was being checked for non-zero to be an error. Also the result was only being checked for the local delivery, and the prev_destnode was being reset to zero inside the loop defeating its purpose. Included is a patch that works for us. One other thing, since tipc_forward_buf2nameseq returns the message data size, that means that tipc_multicast returns the same. cvs diff -u sendbcast.c Index: sendbcast.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/sendbcast.c,v retrieving revision 1.15 diff -u -r1.15 sendbcast.c --- sendbcast.c 6 May 2004 15:35:31 -0000 1.15 +++ sendbcast.c 20 May 2004 16:08:03 -0000 @@ -167,18 +167,20 @@ { struct port *this = (struct port *) ref_deref(ref); uint res = 0; + int dsz; struct tipc_msg *m; struct mc_identity *mid = NULL; struct list_head *pos; struct sk_buff *copybuf; tipc_net_addr_t prev_destnode; + dsz = msg_data_sz(buf_msg(buf)); m = &this->publ.phdr; if (importance <= 3) msg_set_importance(m, importance); + prev_destnode = 0; list_for_each(pos, mc_head) { - prev_destnode = 0; mid = list_entry(pos, struct mc_identity, list); if (mid != NULL && (prev_destnode != mid->node)) { prev_destnode = mid->node; @@ -188,9 +190,9 @@ res = tipc_send_buf_fast(copybuf, mid->node); } else { res = bcast_port_recv(copybuf); - if (res != 0) - break; } + if (res != dsz) + break; } } buf_safe_discard(buf) ; |