Re: [Tipc-discussion] Re: hang while deleting ports

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Mark,
I think you can easily verify your hypothesis by replacing 
spin_lock_bh() with a spin_trylock_bh() in tipc_recv_msg(), and
then just discard the packet and continue the loop if it fails.
(If this happens the packet will be retransmitted anyway.)
It is worth a try.

/Jon

Mark Haverkamp wrote:

On Thu, 2004-05-06 at 16:37, Jon Maloy wrote:

I think your analysis is correct, but I don't know where this omission

happens. The problem I am trying to solve may be related,  

- when I run parallel links between two nodes comunication on

one link sometimes seem to stop under heavy load.

With a little luck we are looking for the same bug.

/Jon

Jon,

I was thinking about this today and it occurred to me that spin_lock_bh

doesn't prevent interrupts from happening.  If true, we can get in a

deadlock situation when a CPU has the node lock, an ethernet interrupt

happens causing tipc_recv_msg to get called.  One of the first things

that tipc_recv_msg does is try to get the node lock.  This seems to be a

possible explanation for the spin hang on the node lock.  Does this make

sense to you?  

Mark.

Mark Haverkamp wrote:

Jon,

Daniel and I have been seeing a tipc hang for 3 or 4 weeks when we kill

a running application in a certain order.

While running the tipc benchmark program we can get tipc to hang the

computer by killing the client while it has the 32 processes running.

Although, to get the hang, I have to have tried to run some management

port accesses which are stalled due to congestion.  After doing some

tracing, I have narrowed it down to an exiting process spinning while

trying to get the node lock.  Our assumption is that some other process

hasn't released the lock by accident, although its not obvious where.  I

have included the stack dump from the sysrq P console command.

SysRq : Show Regs

Pid: 2001, comm:       client_tipc_tp

EIP: 0060:[<f8a913d9>] CPU: 0

EIP is at .text.lock.link+0xd7/0x3ce [tipc]

EFLAGS: 00000286    Not tainted  (2.6.6-rc3)

EAX: f7c8ef6c EBX: 00000000 ECX: 01001011 EDX: 00000013

ESI: f7c8eee0 EDI: f359a000 EBP: f359bcf8 DS: 007b ES: 007b

CR0: 8005003b CR2: 080e2ce8 CR3: 0053d000 CR4: 000006d0

Call Trace:

[<c0126a38>] __do_softirq+0xb8/0xc0

[<f8a9818b>] net_route_msg+0x48b/0x4ad [tipc]

[<c015b3a1>] __pte_chain_free+0x81/0x90

[<f8a99e6e>] port_send_proto_msg+0x1ae/0x2d0 [tipc]

[<f8a9af73>] port_abort_peer+0x83/0x90 [tipc]

[<f8a999a1>] tipc_deleteport+0x181/0x2a0 [tipc]

[<f8aa7ae2>] release+0x72/0x130 [tipc]

[<c0378ff9>] sock_release+0x99/0xf0

[<c0379a16>] sock_close+0x36/0x50

[<c016740d>] __fput+0x12d/0x140

[<c0165857>] filp_close+0x57/0x90

[<c0123adc>] put_files_struct+0x7c/0xf0

[<c0124b1c>] do_exit+0x26c/0x600

[<c012cc05>] __dequeue_signal+0xf5/0x1b0

[<c0125057>] do_group_exit+0x107/0x190

[<c012cced>] dequeue_signal+0x2d/0x90

[<c012f14c>] get_signal_to_deliver+0x28c/0x590

[<c0105286>] do_signal+0xb6/0xf0

[<c037a736>] sys_send+0x36/0x40

[<c037af8e>] sys_socketcall+0x12e/0x240

[<c010531b>] do_notify_resume+0x5b/0x5d

[<c010554a>] work_notifysig+0x13/0x15

You can see that the process is trying to exit. I have traced the EIP to

the spin_lock_bh(&node->lock) in link_lock_select from a disassembly of

link.o.

Any ideas on this?

Mark.

-------------------------------------------------------

This SF.Net email is sponsored by Sleepycat Software

Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to
deliver

higher performing products faster, at low TCO.

http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
<http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3> 

_______________________________________________

TIPC-discussion mailing list

TIP...@li...
<mailto:TIP...@li...> 

https://lists.sourceforge.net/lists/listinfo/tipc-discussion
<https://lists.sourceforge.net/lists/listinfo/tipc-discussion> 

Re: [Tipc-discussion] Re: hang while deleting ports

Cluster wide IPC providing datagram, connection, and bus messaging

Re: [Tipc-discussion] Re: hang while deleting ports