From: Mark H. <ma...@os...> - 2004-06-14 21:54:38
|
I started testing multicasting large messages. Ones that need to be fragmented. I noticed the they weren't being delivered. I put some debug prints in the driver and found that the messages were being sent out OK. The messages were also being received and reassembled. The problem is with net_route_msg. It ends up calling net_route_named_msg which just throws the message away. Im not sure if this is the right place to put this, but I added code to net_route_msg (see attached patch) that helps. --- net.c 9 Jun 2004 23:14:47 -0000 1.12 +++ net.c 14 Jun 2004 21:42:29 -0000 @@ -124,6 +124,7 @@ #include "reg.h" #include "msg.h" #include "port.h" +#include "bcast.h" /* * The TIPC locking policy is designed to ensure a very fine locking @@ -321,7 +322,9 @@ if (msg_isdata(msg)) { if (msg_destport(msg)) port_recv_msg(buf); - else + else if (msg_mcast(msg)) + bcast_port_recv(buf); + else net_route_named_msg(buf); return; } I generally get the messages on my other nodes. The bad part is, that if I send 100 or so messages quickly, the machine panics with a NULL pointer dereference. (See attached trace). Unable to handle kernel NULL pointer dereference at virtual address 00000050 printing eip: f8e29009 *pde = 00000000 Oops: 0000 [#1] SMP Modules linked in: tipc CPU: 0 EIP: 0060:[<f8e29009>] Not tainted EFLAGS: 00010206 (2.6.7-rc2) EIP is at link_recv_fragment+0xe9/0x760 [tipc] eax: 00000044 ebx: 00000001 ecx: 00000000 edx: 5940057c esi: d9736c7e edi: d9736c7e ebp: c050be30 esp: c050bdc8 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c050a000 task=c046a1c0) Stack: d9f026bc dd2f1506 00000000 c050be04 00000000 c050be04 d958c01c 00000000 00000044 000088ca dd2f04e8 f7bc28cc dcfbc804 dd2f0530 00000198 d9f02430 d9736c7e 00000000 f2ab8e70 dd2f04e4 00000206 00000246 f5f481b0 00000001 Call Trace: [<c010614f>] show_stack+0x7f/0xa0 [<c01062fe>] show_registers+0x15e/0x1c0 [<c01064aa>] die+0x9a/0x160 [<c0118946>] do_page_fault+0x2e6/0x5b9 [<c0105dcd>] error_code+0x2d/0x38 [<f8e2642e>] tipc_recv_msg+0x57e/0x8d0 [tipc] [<f8e45402>] recv_msg+0x42/0x70 [tipc] [<c037dca2>] netif_receive_skb+0x172/0x1b0 [<c037dd64>] process_backlog+0x84/0x120 [<c037de80>] net_rx_action+0x80/0x120 [<c0125ff8>] __do_softirq+0xb8/0xc0 [<c0126035>] do_softirq+0x35/0x40 [<c0107d45>] do_IRQ+0x175/0x230 [<c0105cd0>] common_interrupt+0x18/0x20 [<c0103106>] cpu_idle+0x46/0x50 [<c050c984>] start_kernel+0x184/0x1d0 [<c01001e0>] 0xc01001e0 Code: 8b 58 0c 89 d7 8b 4e 10 0f c9 8b 43 08 c1 e9 10 81 e2 ff ff <0>Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing One other thing. buf_safe_discard exits its while loop on the first busy buffer. Is it intended to not go through the whole list? Mark. -- Mark Haverkamp <ma...@os...> |