You can subscribe to this list here.
2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2004 |
Jan
(9) |
Feb
(11) |
Mar
(22) |
Apr
(73) |
May
(78) |
Jun
(146) |
Jul
(80) |
Aug
(27) |
Sep
(5) |
Oct
(14) |
Nov
(18) |
Dec
(27) |
2005 |
Jan
(20) |
Feb
(30) |
Mar
(19) |
Apr
(28) |
May
(50) |
Jun
(31) |
Jul
(32) |
Aug
(14) |
Sep
(36) |
Oct
(43) |
Nov
(74) |
Dec
(63) |
2006 |
Jan
(34) |
Feb
(32) |
Mar
(21) |
Apr
(76) |
May
(106) |
Jun
(72) |
Jul
(70) |
Aug
(175) |
Sep
(130) |
Oct
(39) |
Nov
(81) |
Dec
(43) |
2007 |
Jan
(81) |
Feb
(36) |
Mar
(20) |
Apr
(43) |
May
(54) |
Jun
(34) |
Jul
(44) |
Aug
(55) |
Sep
(44) |
Oct
(54) |
Nov
(43) |
Dec
(41) |
2008 |
Jan
(42) |
Feb
(84) |
Mar
(73) |
Apr
(30) |
May
(119) |
Jun
(54) |
Jul
(54) |
Aug
(93) |
Sep
(173) |
Oct
(130) |
Nov
(145) |
Dec
(153) |
2009 |
Jan
(59) |
Feb
(12) |
Mar
(28) |
Apr
(18) |
May
(56) |
Jun
(9) |
Jul
(28) |
Aug
(62) |
Sep
(16) |
Oct
(19) |
Nov
(15) |
Dec
(17) |
2010 |
Jan
(14) |
Feb
(36) |
Mar
(37) |
Apr
(30) |
May
(33) |
Jun
(53) |
Jul
(42) |
Aug
(50) |
Sep
(67) |
Oct
(66) |
Nov
(69) |
Dec
(36) |
2011 |
Jan
(52) |
Feb
(45) |
Mar
(49) |
Apr
(21) |
May
(34) |
Jun
(13) |
Jul
(19) |
Aug
(37) |
Sep
(43) |
Oct
(10) |
Nov
(23) |
Dec
(30) |
2012 |
Jan
(42) |
Feb
(36) |
Mar
(46) |
Apr
(25) |
May
(96) |
Jun
(146) |
Jul
(40) |
Aug
(28) |
Sep
(61) |
Oct
(45) |
Nov
(100) |
Dec
(53) |
2013 |
Jan
(79) |
Feb
(24) |
Mar
(134) |
Apr
(156) |
May
(118) |
Jun
(75) |
Jul
(278) |
Aug
(145) |
Sep
(136) |
Oct
(168) |
Nov
(137) |
Dec
(439) |
2014 |
Jan
(284) |
Feb
(158) |
Mar
(231) |
Apr
(275) |
May
(259) |
Jun
(91) |
Jul
(222) |
Aug
(215) |
Sep
(165) |
Oct
(166) |
Nov
(211) |
Dec
(150) |
2015 |
Jan
(164) |
Feb
(324) |
Mar
(299) |
Apr
(214) |
May
(111) |
Jun
(109) |
Jul
(105) |
Aug
(36) |
Sep
(58) |
Oct
(131) |
Nov
(68) |
Dec
(30) |
2016 |
Jan
(46) |
Feb
(87) |
Mar
(135) |
Apr
(174) |
May
(132) |
Jun
(135) |
Jul
(149) |
Aug
(125) |
Sep
(79) |
Oct
(49) |
Nov
(95) |
Dec
(102) |
2017 |
Jan
(104) |
Feb
(75) |
Mar
(72) |
Apr
(53) |
May
(18) |
Jun
(5) |
Jul
(14) |
Aug
(19) |
Sep
(2) |
Oct
(13) |
Nov
(21) |
Dec
(67) |
2018 |
Jan
(56) |
Feb
(50) |
Mar
(148) |
Apr
(41) |
May
(37) |
Jun
(34) |
Jul
(34) |
Aug
(11) |
Sep
(52) |
Oct
(48) |
Nov
(28) |
Dec
(46) |
2019 |
Jan
(29) |
Feb
(63) |
Mar
(95) |
Apr
(54) |
May
(14) |
Jun
(71) |
Jul
(60) |
Aug
(49) |
Sep
(3) |
Oct
(64) |
Nov
(115) |
Dec
(57) |
2020 |
Jan
(15) |
Feb
(9) |
Mar
(38) |
Apr
(27) |
May
(60) |
Jun
(53) |
Jul
(35) |
Aug
(46) |
Sep
(37) |
Oct
(64) |
Nov
(20) |
Dec
(25) |
2021 |
Jan
(20) |
Feb
(31) |
Mar
(27) |
Apr
(23) |
May
(21) |
Jun
(30) |
Jul
(30) |
Aug
(7) |
Sep
(18) |
Oct
|
Nov
(15) |
Dec
(4) |
2022 |
Jan
(3) |
Feb
(1) |
Mar
(10) |
Apr
|
May
(2) |
Jun
(26) |
Jul
(5) |
Aug
|
Sep
(1) |
Oct
(2) |
Nov
(9) |
Dec
(2) |
2023 |
Jan
(4) |
Feb
(4) |
Mar
(5) |
Apr
(10) |
May
(29) |
Jun
(17) |
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
(2) |
Dec
|
2024 |
Jan
|
Feb
(6) |
Mar
|
Apr
(1) |
May
(6) |
Jun
|
Jul
(5) |
Aug
|
Sep
(3) |
Oct
|
Nov
|
Dec
|
2025 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Jon M. <jon...@er...> - 2004-05-19 14:45:51
|
Hi Hu, It does fragment/defragment messages, but only up to a limit, depending on the socket type you are using. If you use message oriented sockets, i.e., SOCK_SEQPACKET,SOCK_RDM or SOCK_DGRAM, it will only fragment up to MAX_MSG_SIZE, which is indeed 66000 bytes. If you use SOCK_STREAM, it will fragment up to a size of 100 Mbyte per sent "chunk". I must admit that I have still not tested this feature beyond 66k (this is on my TODO list), so you may quite well run into bugs here. Any feedback on this is welcome. Regards /Jon Yin, Hu wrote: >Hi Jon, > >I'm a intern of Intel China Software Lab. Now I'm doing TIPC's >validation job here. > >Today I meet a problem. When I write a little program to send/receive >different sizes message between two programs there will be error( errno: >22, "Invalid argument") when the size of message is more than 66000 >bytes. > >Doesn't TIPC fragment and defragment messages? > >Best regards, > >Nick Yin > >Intel China Software Lab > >The content of this email message solely contains my own personal views, >and not those of my employer. > > >------------------------------------------------------- >This SF.Net email is sponsored by: SourceForge.net Broadband >Sign-up now for SourceForge Broadband and get the fastest >6.0/768 connection for only $19.95/mo for the first 3 months! >http://ads.osdn.com/?ad_id%62&alloc_ida84&op=click >_______________________________________________ >TIPC-discussion mailing list >TIP...@li... >https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > |
From: Mark H. <ma...@os...> - 2004-05-19 14:29:23
|
On Wed, 2004-05-19 at 04:04, Yin, Hu wrote: > Hi Jon, > > I'm a intern of Intel China Software Lab. Now I'm doing TIPC's > validation job here. > > Today I meet a problem. When I write a little program to send/receive > different sizes message between two programs there will be error( errno: > 22, "Invalid argument") when the size of message is more than 66000 > bytes. > > Doesn't TIPC fragment and defragment messages? tipc.h has #define TIPC_MAX_MSG_SIZE 66000 Messages are fragmented, but at a smaller size (the max_packet size in the bearer structure) up to a maximum message size. I believe that 66000 is the largest size of a single message handled by tipc. Mark. -- Mark Haverkamp <ma...@os...> |
From: Jon M. <jon...@er...> - 2004-05-19 13:50:52
|
I think I have succeeded in making the port/transport level lock handling somewhat easier to understand, without changing the basic locking policy. New: 1: Callbacks to socket.c (dispatcher/wakeup dipatcher) are now called with and return with the port-lock on. 2: Symmetry: A port or subscription is now always locked/unlocked in the same function. 3: No dual-purpose functions to the reference manager; ref_lock_acquire/ref_unlock_discard etc are now gone. The port lock is still located in the corresponding reference entry, though, and is accessed via easily comprehensible inline functions; port_lock()/port_unlock() etc. 4: Better comments about what the locks protect, and what the lock functions do. I will continue with a similar work on the node locks in a few days, but I would welcome some feedback first. /Jon |
From: Yin, H. <hu...@in...> - 2004-05-19 11:04:23
|
Hi Jon, I'm a intern of Intel China Software Lab. Now I'm doing TIPC's validation job here. Today I meet a problem. When I write a little program to send/receive different sizes message between two programs there will be error( errno: 22, "Invalid argument") when the size of message is more than 66000 bytes. Doesn't TIPC fragment and defragment messages?=20 Best regards, Nick Yin Intel China Software Lab The content of this email message solely contains my own personal views, and not those of my employer. |
From: Jon M. <jon...@er...> - 2004-05-18 17:39:56
|
Hi Mark, I haven't tried with 64 processes yet, but I will try to reproduce and trouble-shoot this problem when I have time. Right now I am spending some time on making the lock handling at port/socket level more symmetric and easier to follow. It will have some performance implications, but it will not have any major impact, I think. I must admit that I am not familiar with Bugzilla. Is this the base for the bug reporting/tracking system we already have for each project, or is it something else ? The bug report system has only been used sporadically, as you may have noticed, but I have no problems with starting to use it more systematically; - I think indeed we will have to if TIPC makes it into the kernel. /Jon Mark Haverkamp wrote: >I was running a modified tipc benchmark program that used 64 processes >instead of 32. I also had my kernel compiled with page alloc debug >turned on (memory allocations are unmapped when free to catch bad access >as soon as possible). It was part way through the 16K size when the >panic happened. It was on the server side. > >Any thoughts on using the sourceforge bugzilla to keep track of current >bugs? > >Mark. > > >[root@cl019 root]# >Unable to handle kernel paging request at virtual address f1067fe4 > printing eip: >f8a8617e >*pde = 00585067 >*pte = 31067163 >Oops: 0000 [#1] >SMP DEBUG_PAGEALLOC >CPU: 1 >EIP: 0060:[<f8a8617e>] Not tainted >EFLAGS: 00010206 (2.6.6-rc3) >EIP is at tipc_recv_msg+0x15e/0x880 [tipc] >eax: 00000000 ebx: f1067f50 ecx: f104a7f8 edx: f105df50 >esi: efb89e18 edi: f4d8fbf8 ebp: f01cda70 esp: f01cda18 >ds: 007b es: 007b ss: 0068 >Process server_tipc (pid: 1439, threadinfo=f01cc000 task=f01f3a60) >Stack: f632fbf8 00000000 00000086 c1620ce0 00000000 00000086 f01cda3c c01338cb > 00000000 00002ae8 0000abe6 efb89e18 effcbf50 f104a7f8 00000001 00000000 > f01cda90 00000286 c1620ce0 f8aaf400 effcbf50 c054f970 f01cda80 f8aa4319 >Call Trace: > [<c01338cb>] kernel_text_address+0x3b/0x50 > [<f8aa4319>] recv_msg+0x39/0x50 [tipc] > [<c0375e92>] netif_receive_skb+0x172/0x1b0 > [<c0375f54>] process_backlog+0x84/0x120 > [<c0376070>] net_rx_action+0x80/0x120 > [<c0124c38>] __do_softirq+0xb8/0xc0 > [<c0124c75>] do_softirq+0x35/0x40 > [<c0107cf5>] do_IRQ+0x175/0x230 > [<c0375e92>] netif_receive_skb+0x172/0x1b0 > [<c0105ce0>] common_interrupt+0x18/0x20 > [<c0221e39>] __copy_user_zeroing_intel+0x19/0xb0 > [<c0221fc2>] __copy_from_user_ll+0x72/0x80 > [<c0222088>] copy_from_user+0x58/0x80 > [<f8a8530f>] link_send_sections_long+0x30f/0xad0 [tipc] > [<f8a824de>] link_schedule_port+0xfe/0x1b0 [tipc] > [<f8a84d39>] link_send_sections_fast+0x559/0x820 [tipc] > [<c0124c38>] __do_softirq+0xb8/0xc0 > [<f8a96312>] tipc_send+0x92/0x9d0 [tipc] > [<c03f186f>] schedule+0x37f/0x7a0 > [<c0370bd5>] kfree_skbmem+0x25/0x30 > [<f8aa1946>] recv_msg+0x2b6/0x560 [tipc] > [<f8aa12f0>] send_packet+0x90/0x180 [tipc] > [<c011b140>] default_wake_function+0x0/0x20 > [<c036cbde>] sock_sendmsg+0x8e/0xb0 > [<c01193f8>] kernel_map_pages+0x28/0x64 > [<c036c9ba>] sockfd_lookup+0x1a/0x80 > [<c036e101>] sys_sendto+0xe1/0x100 > [<c0128fd2>] del_timer_sync+0x42/0x140 > [<c036d4a9>] sock_poll+0x29/0x30 > [<c01193f8>] kernel_map_pages+0x28/0x64 > [<c036e156>] sys_send+0x36/0x40 > [<c036e9ae>] sys_socketcall+0x12e/0x240 > [<c0105373>] syscall_call+0x7/0xb > >Code: 8b 83 94 00 00 00 48 0f 85 f9 00 00 00 8d 73 44 8b 46 0c 8b > <0>Kernel panic: Fatal exception in interrupt >In interrupt handler - not syncing > > > |
From: Mark H. <ma...@os...> - 2004-05-18 14:23:45
|
I was running a modified tipc benchmark program that used 64 processes instead of 32. I also had my kernel compiled with page alloc debug turned on (memory allocations are unmapped when free to catch bad access as soon as possible). It was part way through the 16K size when the panic happened. It was on the server side. Any thoughts on using the sourceforge bugzilla to keep track of current bugs? Mark. [root@cl019 root]# Unable to handle kernel paging request at virtual address f1067fe4 printing eip: f8a8617e *pde = 00585067 *pte = 31067163 Oops: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 EIP: 0060:[<f8a8617e>] Not tainted EFLAGS: 00010206 (2.6.6-rc3) EIP is at tipc_recv_msg+0x15e/0x880 [tipc] eax: 00000000 ebx: f1067f50 ecx: f104a7f8 edx: f105df50 esi: efb89e18 edi: f4d8fbf8 ebp: f01cda70 esp: f01cda18 ds: 007b es: 007b ss: 0068 Process server_tipc (pid: 1439, threadinfo=f01cc000 task=f01f3a60) Stack: f632fbf8 00000000 00000086 c1620ce0 00000000 00000086 f01cda3c c01338cb 00000000 00002ae8 0000abe6 efb89e18 effcbf50 f104a7f8 00000001 00000000 f01cda90 00000286 c1620ce0 f8aaf400 effcbf50 c054f970 f01cda80 f8aa4319 Call Trace: [<c01338cb>] kernel_text_address+0x3b/0x50 [<f8aa4319>] recv_msg+0x39/0x50 [tipc] [<c0375e92>] netif_receive_skb+0x172/0x1b0 [<c0375f54>] process_backlog+0x84/0x120 [<c0376070>] net_rx_action+0x80/0x120 [<c0124c38>] __do_softirq+0xb8/0xc0 [<c0124c75>] do_softirq+0x35/0x40 [<c0107cf5>] do_IRQ+0x175/0x230 [<c0375e92>] netif_receive_skb+0x172/0x1b0 [<c0105ce0>] common_interrupt+0x18/0x20 [<c0221e39>] __copy_user_zeroing_intel+0x19/0xb0 [<c0221fc2>] __copy_from_user_ll+0x72/0x80 [<c0222088>] copy_from_user+0x58/0x80 [<f8a8530f>] link_send_sections_long+0x30f/0xad0 [tipc] [<f8a824de>] link_schedule_port+0xfe/0x1b0 [tipc] [<f8a84d39>] link_send_sections_fast+0x559/0x820 [tipc] [<c0124c38>] __do_softirq+0xb8/0xc0 [<f8a96312>] tipc_send+0x92/0x9d0 [tipc] [<c03f186f>] schedule+0x37f/0x7a0 [<c0370bd5>] kfree_skbmem+0x25/0x30 [<f8aa1946>] recv_msg+0x2b6/0x560 [tipc] [<f8aa12f0>] send_packet+0x90/0x180 [tipc] [<c011b140>] default_wake_function+0x0/0x20 [<c036cbde>] sock_sendmsg+0x8e/0xb0 [<c01193f8>] kernel_map_pages+0x28/0x64 [<c036c9ba>] sockfd_lookup+0x1a/0x80 [<c036e101>] sys_sendto+0xe1/0x100 [<c0128fd2>] del_timer_sync+0x42/0x140 [<c036d4a9>] sock_poll+0x29/0x30 [<c01193f8>] kernel_map_pages+0x28/0x64 [<c036e156>] sys_send+0x36/0x40 [<c036e9ae>] sys_socketcall+0x12e/0x240 [<c0105373>] syscall_call+0x7/0xb Code: 8b 83 94 00 00 00 48 0f 85 f9 00 00 00 8d 73 44 8b 46 0c 8b <0>Kernel panic: Fatal exception in interrupt In interrupt handler - not syncing -- Mark Haverkamp <ma...@os...> |
From: Jon M. <jon...@er...> - 2004-05-17 19:53:17
|
With the following: 1: Notification handler towards net_device. Links will now go down immediately when the carrier on an interface is lost, and act correctly on ifconfig commands. 2: Changed the lock handling in port.c according to description in previous mail. The only overlap between node locks and port locks now is in the link_schedule/wakeup functions. I will consider what more I can do with this, and other parts of the lock handling. 3: Added a verbal description (in net.c) of the whole locking policy. /jon |
From: Jon M. <jon...@er...> - 2004-05-13 19:46:45
|
Just to avoid misunderstandings: I am not working with the changes I suggested. I am struggling with trying to understand the usage of netdevice notifications, one of the issues Stephen H.'s called a showstopper in his review. Btw, does anybody know if NETDEV_DOWN/UP means that the carrier disappeared/reappeared (I hope so), or does it simply mean that somebody did "ifconfig ethX down" ? (I think so). Is there any description of how this is supposed to be used ? /Jon Jon Maloy wrote: Hi Mark, The general principle I had in mind when designing this was to not keep a node-lock and the port-lock at the same time, but in some cases this looks unavoidable, such as in the wakeup-function you corrected yesterday. In other cases I have just not been aware of the potential problem, even when it is easy to fix. The current problem is in fact on of the latter type; as it is completely unecessary to call port_send_protocol_msg() with the port lock on. What we need here is a small redesign to resolve the whole problem: Everywhere port_send_protocol_msg() is called (in port_timeout(), port_recv_protocol_msg() and a few other places) we should 1) build the message (we rename port_send_protocol_msg() to e.g. port_build_protocol_msg() and let it return an sk_buff* ) 2) Update the port's sequence number, when applicable. 3) Release the port lock. 4) Send the buffer through net_route_msg() There is one call where this is difficult, in port_abort_self((), since this itself is called with the lock on, but on the other hand this only sends a message to the own port, not to any link, so no node lock will be taken. This should resolve the deadlock problem between port and node locks once and for all, I hope. The rest is about proper documenation. Regards /Jon Mark Haverkamp wrote: On Wed, 2004-05-12 at 13:25, Mark Haverkamp wrote: On Wed, 2004-05-12 at 12:09, Jon Maloy wrote: Yeah, that's a classical one. I also think your solution is ok; the pending ports will be awakened at next message reception, so no harm done. Thanks /jon I found another one. I got another hang yesterday after the current deadlock fix. I re-added my spinlock debug code and found out that we're getting a deadlock between the node lock and the tipc_port lock. It looks like the port timeout handler is running on one CPU and a recv_msg is running on the other. I suppose as a workaround, we could make all the spin lock access in wakeup be conditional, but that will probably just make the problem just show up somewhere else. There should probably be an analysis of code paths and determine how the locks interact with each other. I have noticed that there is at least one place where three locks are required. This can cause problems like we've seen when different code paths need multiple locks unless there is some sort of back off method to insure no deadlocks. For instance if a function needs both the node lock and tipc_port lock, something like this could be done: - - - - - - again: spinlock(node_lock); spintrylock(tipc_port_lock) if failed to acquire then release node_lock and goto again. - - - - - - - - Unfortunately, if one of the locks was acquired before the function was entered, releasing that lock may allow some state to change that the caller was assuming wouldn't change and that could cause problems too. So, I'm not sure that this method could work in the link_wakeup_ports function. The node lock is held on entry and if the tipc_port lock can't be acquired, you probably can't just release the node lock and wait for it again since that could invalidate the tipc_recv_msg functions assumptions about what the node lock was protecting. Anyway, here is the raw trace from the debug: &node->lock lock timeout Call Trace: [<f8a8543a>] link_send+0xda/0x2a0 [tipc] [<f8a92c8e>] net_route_msg+0x41e/0x43d [tipc] [<f8a94962>] port_send_proto_msg+0x1a2/0x2a0 [tipc] [<f8aa1b10>] k_signal+0x60/0x170 [tipc] [<f8a956be>] port_timeout+0xbe/0x200 [tipc] [<f8aa1ccc>] timeout_receiver+0xac/0x110 [tipc] [<f8aa1a9f>] receive_signal+0x14f/0x160 [tipc] [<c011ab51>] scheduler_tick+0x111/0x690 [<c0124ec3>] tasklet_action+0x73/0xc0 [<c0124bc8>] __do_softirq+0xb8/0xc0 [<c0124c05>] do_softirq+0x35/0x40 [<c011526d>] smp_apic_timer_interrupt+0xdd/0x150 [<c0105d62>] apic_timer_interrupt+0x1a/0x20 [<c01119c0>] delay_pit+0x20/0x30 [<c0221694>] __delay+0x14/0x20 [<f8a8685f>] link_send_sections_long+0x50f/0xb30 [tipc] [<c01152d1>] smp_apic_timer_interrupt+0x141/0x150 [<f8a8366f>] link_schedule_port+0x13f/0x1e0 [tipc] [<f8a86095>] link_send_sections_fast+0x5b5/0x870 [tipc] [<f8a97b92>] tipc_send+0x92/0x9d0 [tipc] [<c03f14cf>] schedule+0x37f/0x7a0 [<f8aa3386>] recv_msg+0x2b6/0x560 [tipc] [<f8aa2d30>] send_packet+0x90/0x180 [tipc] [<c011b0d0>] default_wake_function+0x0/0x20 [<c036c83e>] sock_sendmsg+0x8e/0xb0 [<f8a8784e>] tipc_recv_msg+0x47e/0x8c0 [tipc] [<c01435ba>] buffered_rmqueue+0xfa/0x220 [<c036c61a>] sockfd_lookup+0x1a/0x80 [<c036dd61>] sys_sendto+0xe1/0x100 [<c0128f62>] del_timer_sync+0x42/0x140 [<c036d109>] sock_poll+0x29/0x30 [<c017884b>] do_pollfd+0x5b/0xa0 [<c036ddb6>] sys_send+0x36/0x40 [<c036e60e>] sys_socketcall+0x12e/0x240 [<c0105373>] syscall_call+0x7/0xb port->publ lock timeout Call Trace: [<f8a8383a>] link_wakeup_ports+0x12a/0x1d0 [tipc] [<f8a87bce>] tipc_recv_msg+0x7fe/0x8c0 [tipc] [<c014949d>] __kmalloc+0x19d/0x250 [<f8aa5d59>] recv_msg+0x39/0x50 [tipc] [<c0375af2>] netif_receive_skb+0x172/0x1b0 [<c0375bb4>] process_backlog+0x84/0x120 [<c0375cd0>] net_rx_action+0x80/0x120 [<c0124bc8>] __do_softirq+0xb8/0xc0 [<c0124c05>] do_softirq+0x35/0x40 [<c0107ce5>] do_IRQ+0x175/0x230 [<c0105ea5>] nmi_stack_correct+0x1e/0x2e [<c0105ce0>] common_interrupt+0x18/0x20 |
From: Jon M. <jon...@er...> - 2004-05-13 18:10:12
|
See below. /jon Daniel McNeil wrote: On Thu, 2004-05-13 at 08:10, Mark Haverkamp wrote: On Wed, 2004-05-12 at 13:25, Mark Haverkamp wrote: On Wed, 2004-05-12 at 12:09, Jon Maloy wrote: Yeah, that's a classical one. I also think your solution is ok; the pending ports will be awakened at next message reception, so no harm done. Thanks /jon I found another one. I got another hang yesterday after the current deadlock fix. I re-added my spinlock debug code and found out that we're getting a deadlock between the node lock and the tipc_port lock. It looks like the port timeout handler is running on one CPU and a recv_msg is running on the other. I suppose as a workaround, we could make all the spin lock access in wakeup be conditional, but that will probably just make the problem just show up somewhere else. There should probably be an analysis of code paths and determine how the locks interact with each other. I agree. The locking hierarchy should be documented. Even if this is just comments a source file, it needs to be done. It should also include exactly what the lock is protecting. Agree completely. I wil try to do my part. I have noticed that there is at least one place where three locks are required. This can cause problems like we've seen when different code paths need multiple locks unless there is some sort of back off method to insure no deadlocks. We also need to analyze if the multiple locks are actually giving us any performance and/or parallelism benefits. If we have to take multiple locks in a common path, that might be causing worse performance and more deadlock potential. We could experiment on that. The one-big-lock approach that we had eariler was heavily critisized, and caused a lot of problems, but so far measurements suggest it was mor efficient when running the benchmark between two nodes. When we start to run real systems I am pretty convinced that the finer granularity will pay off, as we will have thousands of ports and dozens of nodes being accessed in parallel. In the previous version we had seven locks/unlocks per message transmitted: Send side: 1x big_lock/unlock when message was created and sent, 2x buffer locks when buffer was allocated and released. Receive side: Ditto + 1x grab/release of big_lock when message is read to user space. I don't think it is possible to make it much better. In the current version we have ten: Send side: 1x net_lock (read) + 1x node lock + 2 x buffer lock. (No tipc_port lock!) Receive side: 1 net_read_lock + 1x node lock + 1x tipc_port_lock + 1x tipc_port_lock when message is read (in "advance_queue()" + 2x buffer lock). I don't think an increase from seven to ten lock accesses adds any significant overhead to the data-phase path, while the advantages are obvious. The real problem is not performance, but potential deadlock situations in other parts of the code. Those can only be solved by as far as possible avoiding to keep the different locks simultaneously, as I suggested in my previous mail. There may be one two cases more I have missed, but if we find any, this is the solution we should look for. Regards /Jon Daniel |
From: Jon M. <jon...@er...> - 2004-05-13 17:27:47
|
Hi Mark, The general principle I had in mind when designing this was to not keep a node-lock and the port-lock at the same time, but in some cases this looks unavoidable, such as in the wakeup-function you corrected yesterday. In other cases I have just not been aware of the potential problem, even when it is easy to fix. The current problem is in fact on of the latter type; as it is completely unecessary to call port_send_protocol_msg() with the port lock on. What we need here is a small redesign to resolve the whole problem: Everywhere port_send_protocol_msg() is called (in port_timeout(), port_recv_protocol_msg() and a few other places) we should 1) build the message (we rename port_send_protocol_msg() to e.g. port_build_protocol_msg() and let it return an sk_buff* ) 2) Update the port's sequence number, when applicable. 3) Release the port lock. 4) Send the buffer through net_route_msg() There is one call where this is difficult, in port_abort_self((), since this itself is called with the lock on, but on the other hand this only sends a message to the own port, not to any link, so no node lock will be taken. This should resolve the deadlock problem between port and node locks once and for all, I hope. The rest is about proper documenation. Regards /Jon Mark Haverkamp wrote: On Wed, 2004-05-12 at 13:25, Mark Haverkamp wrote: On Wed, 2004-05-12 at 12:09, Jon Maloy wrote: Yeah, that's a classical one. I also think your solution is ok; the pending ports will be awakened at next message reception, so no harm done. Thanks /jon I found another one. I got another hang yesterday after the current deadlock fix. I re-added my spinlock debug code and found out that we're getting a deadlock between the node lock and the tipc_port lock. It looks like the port timeout handler is running on one CPU and a recv_msg is running on the other. I suppose as a workaround, we could make all the spin lock access in wakeup be conditional, but that will probably just make the problem just show up somewhere else. There should probably be an analysis of code paths and determine how the locks interact with each other. I have noticed that there is at least one place where three locks are required. This can cause problems like we've seen when different code paths need multiple locks unless there is some sort of back off method to insure no deadlocks. For instance if a function needs both the node lock and tipc_port lock, something like this could be done: - - - - - - again: spinlock(node_lock); spintrylock(tipc_port_lock) if failed to acquire then release node_lock and goto again. - - - - - - - - Unfortunately, if one of the locks was acquired before the function was entered, releasing that lock may allow some state to change that the caller was assuming wouldn't change and that could cause problems too. So, I'm not sure that this method could work in the link_wakeup_ports function. The node lock is held on entry and if the tipc_port lock can't be acquired, you probably can't just release the node lock and wait for it again since that could invalidate the tipc_recv_msg functions assumptions about what the node lock was protecting. Anyway, here is the raw trace from the debug: &node->lock lock timeout Call Trace: [<f8a8543a>] link_send+0xda/0x2a0 [tipc] [<f8a92c8e>] net_route_msg+0x41e/0x43d [tipc] [<f8a94962>] port_send_proto_msg+0x1a2/0x2a0 [tipc] [<f8aa1b10>] k_signal+0x60/0x170 [tipc] [<f8a956be>] port_timeout+0xbe/0x200 [tipc] [<f8aa1ccc>] timeout_receiver+0xac/0x110 [tipc] [<f8aa1a9f>] receive_signal+0x14f/0x160 [tipc] [<c011ab51>] scheduler_tick+0x111/0x690 [<c0124ec3>] tasklet_action+0x73/0xc0 [<c0124bc8>] __do_softirq+0xb8/0xc0 [<c0124c05>] do_softirq+0x35/0x40 [<c011526d>] smp_apic_timer_interrupt+0xdd/0x150 [<c0105d62>] apic_timer_interrupt+0x1a/0x20 [<c01119c0>] delay_pit+0x20/0x30 [<c0221694>] __delay+0x14/0x20 [<f8a8685f>] link_send_sections_long+0x50f/0xb30 [tipc] [<c01152d1>] smp_apic_timer_interrupt+0x141/0x150 [<f8a8366f>] link_schedule_port+0x13f/0x1e0 [tipc] [<f8a86095>] link_send_sections_fast+0x5b5/0x870 [tipc] [<f8a97b92>] tipc_send+0x92/0x9d0 [tipc] [<c03f14cf>] schedule+0x37f/0x7a0 [<f8aa3386>] recv_msg+0x2b6/0x560 [tipc] [<f8aa2d30>] send_packet+0x90/0x180 [tipc] [<c011b0d0>] default_wake_function+0x0/0x20 [<c036c83e>] sock_sendmsg+0x8e/0xb0 [<f8a8784e>] tipc_recv_msg+0x47e/0x8c0 [tipc] [<c01435ba>] buffered_rmqueue+0xfa/0x220 [<c036c61a>] sockfd_lookup+0x1a/0x80 [<c036dd61>] sys_sendto+0xe1/0x100 [<c0128f62>] del_timer_sync+0x42/0x140 [<c036d109>] sock_poll+0x29/0x30 [<c017884b>] do_pollfd+0x5b/0xa0 [<c036ddb6>] sys_send+0x36/0x40 [<c036e60e>] sys_socketcall+0x12e/0x240 [<c0105373>] syscall_call+0x7/0xb port->publ lock timeout Call Trace: [<f8a8383a>] link_wakeup_ports+0x12a/0x1d0 [tipc] [<f8a87bce>] tipc_recv_msg+0x7fe/0x8c0 [tipc] [<c014949d>] __kmalloc+0x19d/0x250 [<f8aa5d59>] recv_msg+0x39/0x50 [tipc] [<c0375af2>] netif_receive_skb+0x172/0x1b0 [<c0375bb4>] process_backlog+0x84/0x120 [<c0375cd0>] net_rx_action+0x80/0x120 [<c0124bc8>] __do_softirq+0xb8/0xc0 [<c0124c05>] do_softirq+0x35/0x40 [<c0107ce5>] do_IRQ+0x175/0x230 [<c0105ea5>] nmi_stack_correct+0x1e/0x2e [<c0105ce0>] common_interrupt+0x18/0x20 |
From: Daniel M. <da...@os...> - 2004-05-13 16:12:44
|
On Thu, 2004-05-13 at 08:10, Mark Haverkamp wrote: > On Wed, 2004-05-12 at 13:25, Mark Haverkamp wrote: > > On Wed, 2004-05-12 at 12:09, Jon Maloy wrote: > > > Yeah, that's a classical one. I also think your solution is ok; the > > > pending ports will be awakened at next message reception, so > > > no harm done. > > > > > > Thanks /jon > > > > > I found another one. I got another hang yesterday after the current > deadlock fix. I re-added my spinlock debug code and found out that > we're getting a deadlock between the node lock and the tipc_port lock. > It looks like the port timeout handler is running on one CPU and a > recv_msg is running on the other. I suppose as a workaround, we could > make all the spin lock access in wakeup be conditional, but that will > probably just make the problem just show up somewhere else. There > should probably be an analysis of code paths and determine how the locks > interact with each other. I agree. The locking hierarchy should be documented. Even if this is just comments a source file, it needs to be done. It should also include exactly what the lock is protecting. > I have noticed that there is at least one > place where three locks are required. This can cause problems like > we've seen when different code paths need multiple locks unless there is > some sort of back off method to insure no deadlocks. > We also need to analyze if the multiple locks are actually giving us any performance and/or parallelism benefits. If we have to take multiple locks in a common path, that might be causing worse performance and more deadlock potential. Daniel |
From: Mark H. <ma...@os...> - 2004-05-13 15:11:09
|
On Wed, 2004-05-12 at 13:25, Mark Haverkamp wrote: > On Wed, 2004-05-12 at 12:09, Jon Maloy wrote: > > Yeah, that's a classical one. I also think your solution is ok; the > > pending ports will be awakened at next message reception, so > > no harm done. > > > > Thanks /jon > > I found another one. I got another hang yesterday after the current deadlock fix. I re-added my spinlock debug code and found out that we're getting a deadlock between the node lock and the tipc_port lock. It looks like the port timeout handler is running on one CPU and a recv_msg is running on the other. I suppose as a workaround, we could make all the spin lock access in wakeup be conditional, but that will probably just make the problem just show up somewhere else. There should probably be an analysis of code paths and determine how the locks interact with each other. I have noticed that there is at least one place where three locks are required. This can cause problems like we've seen when different code paths need multiple locks unless there is some sort of back off method to insure no deadlocks. For instance if a function needs both the node lock and tipc_port lock, something like this could be done: - - - - - - again: spinlock(node_lock); spintrylock(tipc_port_lock) if failed to acquire then release node_lock and goto again. - - - - - - - - Unfortunately, if one of the locks was acquired before the function was entered, releasing that lock may allow some state to change that the caller was assuming wouldn't change and that could cause problems too. So, I'm not sure that this method could work in the link_wakeup_ports function. The node lock is held on entry and if the tipc_port lock can't be acquired, you probably can't just release the node lock and wait for it again since that could invalidate the tipc_recv_msg functions assumptions about what the node lock was protecting. Anyway, here is the raw trace from the debug: &node->lock lock timeout Call Trace: [<f8a8543a>] link_send+0xda/0x2a0 [tipc] [<f8a92c8e>] net_route_msg+0x41e/0x43d [tipc] [<f8a94962>] port_send_proto_msg+0x1a2/0x2a0 [tipc] [<f8aa1b10>] k_signal+0x60/0x170 [tipc] [<f8a956be>] port_timeout+0xbe/0x200 [tipc] [<f8aa1ccc>] timeout_receiver+0xac/0x110 [tipc] [<f8aa1a9f>] receive_signal+0x14f/0x160 [tipc] [<c011ab51>] scheduler_tick+0x111/0x690 [<c0124ec3>] tasklet_action+0x73/0xc0 [<c0124bc8>] __do_softirq+0xb8/0xc0 [<c0124c05>] do_softirq+0x35/0x40 [<c011526d>] smp_apic_timer_interrupt+0xdd/0x150 [<c0105d62>] apic_timer_interrupt+0x1a/0x20 [<c01119c0>] delay_pit+0x20/0x30 [<c0221694>] __delay+0x14/0x20 [<f8a8685f>] link_send_sections_long+0x50f/0xb30 [tipc] [<c01152d1>] smp_apic_timer_interrupt+0x141/0x150 [<f8a8366f>] link_schedule_port+0x13f/0x1e0 [tipc] [<f8a86095>] link_send_sections_fast+0x5b5/0x870 [tipc] [<f8a97b92>] tipc_send+0x92/0x9d0 [tipc] [<c03f14cf>] schedule+0x37f/0x7a0 [<f8aa3386>] recv_msg+0x2b6/0x560 [tipc] [<f8aa2d30>] send_packet+0x90/0x180 [tipc] [<c011b0d0>] default_wake_function+0x0/0x20 [<c036c83e>] sock_sendmsg+0x8e/0xb0 [<f8a8784e>] tipc_recv_msg+0x47e/0x8c0 [tipc] [<c01435ba>] buffered_rmqueue+0xfa/0x220 [<c036c61a>] sockfd_lookup+0x1a/0x80 [<c036dd61>] sys_sendto+0xe1/0x100 [<c0128f62>] del_timer_sync+0x42/0x140 [<c036d109>] sock_poll+0x29/0x30 [<c017884b>] do_pollfd+0x5b/0xa0 [<c036ddb6>] sys_send+0x36/0x40 [<c036e60e>] sys_socketcall+0x12e/0x240 [<c0105373>] syscall_call+0x7/0xb port->publ lock timeout Call Trace: [<f8a8383a>] link_wakeup_ports+0x12a/0x1d0 [tipc] [<f8a87bce>] tipc_recv_msg+0x7fe/0x8c0 [tipc] [<c014949d>] __kmalloc+0x19d/0x250 [<f8aa5d59>] recv_msg+0x39/0x50 [tipc] [<c0375af2>] netif_receive_skb+0x172/0x1b0 [<c0375bb4>] process_backlog+0x84/0x120 [<c0375cd0>] net_rx_action+0x80/0x120 [<c0124bc8>] __do_softirq+0xb8/0xc0 [<c0124c05>] do_softirq+0x35/0x40 [<c0107ce5>] do_IRQ+0x175/0x230 [<c0105ea5>] nmi_stack_correct+0x1e/0x2e [<c0105ce0>] common_interrupt+0x18/0x20 -- Mark Haverkamp <ma...@os...> |
From: Jon M. <jon...@er...> - 2004-05-12 21:11:44
|
PS: There were more comments further down in my response, just in case you maye have missed them. /jon Mark Haverkamp wrote: On Wed, 2004-05-12 at 12:09, Jon Maloy wrote: Yeah, that's a classical one. I also think your solution is ok; the pending ports will be awakened at next message reception, so no harm done. Thanks /jon OK, here is what I plan to check in. Take a look. I fixed a few other things that seemed wrong. See comments by each diff. Mark. p.s. here is another problem I've been seeing: If I do management port access during congestion, the I/O never completes even when the congestion is over. ---------------------------------------- cvs diff -u media.c reg.c link.c -------------------------------------------------- I think that this should be an unlock since it is locked a few lines above. Index: media.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/media.c,v retrieving revision 1.13 diff -u -r1.13 media.c --- media.c 6 May 2004 15:35:31 -0000 1.13 +++ media.c 12 May 2004 20:10:30 -0000 @@ -221,7 +221,7 @@ bearer_schedule_unlocked(this, link); res = 0; } - spin_lock_bh(&this->publ.lock); + spin_unlock_bh(&this->publ.lock); return res; } --------------------------------------------------- Daniel found this one. ref_lock_deref can grab an invalid object pointer if the lock is released too soon. This waits until the ref number is changed so it won't match in ref_lock_deref after it gets the lock. Index: reg.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/reg.c,v retrieving revision 1.8 diff -u -r1.8 reg.c --- reg.c 5 May 2004 18:39:37 -0000 1.8 +++ reg.c 12 May 2004 20:10:31 -0000 @@ -167,7 +167,6 @@ assert(entry->object != 0); assert(entry->data.reference == ref_nb); entry->object = 0; - spin_unlock_bh(&entry->lock); if (ref_table.first_free == 0) ref_table.first_free = index; else @@ -175,6 +174,7 @@ ref_table.entries[ref_table.last_free].data.next_plus_upper |= index; ref_table.last_free = index; entry->data.next_plus_upper = (ref_nb & ~index_mask) + index_mask + 1; + spin_unlock_bh(&entry->lock); write_unlock_bh(&ref_lock); } ------------------------------------------------ link_schedule_port took the global port_lock/port lock in a different order than everywhere else. Added the spin_trylock_bh in link_wakeup_ports. Added an unlock in link_recv_fragment on an error exit. Fixed a couple places where cheking the this pointer looked wrong. Index: link.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/link.c,v retrieving revision 1.24 diff -u -r1.24 link.c --- link.c 7 May 2004 23:16:03 -0000 1.24 +++ link.c 12 May 2004 20:10:32 -0000 @@ -440,10 +440,14 @@ int link_schedule_port(struct link *this, uint origport,uint sz) { - struct port *port = port_lock_deref(origport); - if (!port) - return TIPC_CONGESTION; + struct port *port; + spin_lock_bh(&port_lock); + port = port_lock_deref(origport); + if (!port) { + spin_unlock_bh(&port_lock); + return TIPC_CONGESTION; + } if (!port->wakeup) goto exit; if (!list_empty(&port->wait_list)) @@ -453,8 +457,8 @@ port->waiting_pkts = 1 + sz/link_max_pkt(this); list_add_tail(&port->wait_list, &this->waiting_ports); exit: - spin_unlock_bh(&port_lock); spin_unlock_bh(port->publ.lock); + spin_unlock_bh(&port_lock); return TIPC_CONGESTION; } @@ -467,7 +471,8 @@ win = 100000; if (win <= 0) return; - spin_lock_bh(&port_lock); + if (!spin_trylock_bh(&port_lock)) + return; if (link_congested(this)) goto exit; list_for_each_entry_safe(port, tp, &this->waiting_ports, wait_list) { @@ -2365,6 +2370,7 @@ if (msg_size(imsg) > TIPC_MAX_MSG_SIZE + LONG_H_SIZE){ msg_print(CONS,fragm,"<REC<Oversized: "); buf_discard(fbuf); + spin_unlock_bh(&this->owner->lock); return; } buf = buf_acquire(msg_size(imsg)); @@ -2623,7 +2629,7 @@ struct link *this; read_lock_bh(&net_lock); this = link_find_by_name(name); - if (!this){ + if (this){ this->exp_msg_count = LINK_BLOCKED; spin_unlock_bh(&this->owner->lock); res = TIPC_OK; @@ -2640,7 +2646,7 @@ struct link *this; read_lock_bh(&net_lock); this = link_find_by_name(name); - if (!this){ + if (this){ this->exp_msg_count = 0; spin_unlock_bh(&this->owner->lock); res = TIPC_OK; |
From: Jon M. <jon...@er...> - 2004-05-12 21:09:49
|
Hi, All your corrections look ok. The (!this) bug was a little embarassing... /jon Mark Haverkamp wrote: On Wed, 2004-05-12 at 12:09, Jon Maloy wrote: Yeah, that's a classical one. I also think your solution is ok; the pending ports will be awakened at next message reception, so no harm done. Thanks /jon OK, here is what I plan to check in. Take a look. I fixed a few other things that seemed wrong. See comments by each diff. Mark. p.s. here is another problem I've been seeing: If I do management port access during congestion, the I/O never completes even when the congestion is over. This may be related to message priorities. When the internal manager tries to respond to a message during congestion he can not be blocked, as we can do with sockets/processes, so I set it's priority to TIPC_NON_REJECTABLE. Is it possible that tipc_send2name() etc still returns TIPC_CONGESTION in such cases ? This should not happen, but if the importance priority by some reason does not make it into the message, or is not handled properly along the data path, who knows? Hint: Check the return values of tipc_send2XX() in mng_respond(), that may give a clue. ---------------------------------------- cvs diff -u media.c reg.c link.c -------------------------------------------------- I think that this should be an unlock since it is locked a few lines above. Index: media.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/media.c,v retrieving revision 1.13 diff -u -r1.13 media.c --- media.c 6 May 2004 15:35:31 -0000 1.13 +++ media.c 12 May 2004 20:10:30 -0000 @@ -221,7 +221,7 @@ bearer_schedule_unlocked(this, link); res = 0; } - spin_lock_bh(&this->publ.lock); + spin_unlock_bh(&this->publ.lock); return res; } Definitely a bug. --------------------------------------------------- Daniel found this one. ref_lock_deref can grab an invalid object pointer if the lock is released too soon. This waits until the ref number is changed so it won't match in ref_lock_deref after it gets the lock. Index: reg.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/reg.c,v retrieving revision 1.8 diff -u -r1.8 reg.c --- reg.c 5 May 2004 18:39:37 -0000 1.8 +++ reg.c 12 May 2004 20:10:31 -0000 @@ -167,7 +167,6 @@ assert(entry->object != 0); assert(entry->data.reference == ref_nb); entry->object = 0; - spin_unlock_bh(&entry->lock); if (ref_table.first_free == 0) ref_table.first_free = index; else @@ -175,6 +174,7 @@ ref_table.entries[ref_table.last_free].data.next_plus_upper |= index; ref_table.last_free = index; entry->data.next_plus_upper = (ref_nb & ~index_mask) + index_mask + 1; + spin_unlock_bh(&entry->lock); write_unlock_bh(&ref_lock); } Also a bug. ------------------------------------------------ link_schedule_port took the global port_lock/port lock in a different order than everywhere else. Added the spin_trylock_bh in link_wakeup_ports. Added an unlock in link_recv_fragment on an error exit. Fixed a couple places where cheking the this pointer looked wrong. Index: link.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/link.c,v retrieving revision 1.24 diff -u -r1.24 link.c --- link.c 7 May 2004 23:16:03 -0000 1.24 +++ link.c 12 May 2004 20:10:32 -0000 @@ -440,10 +440,14 @@ int link_schedule_port(struct link *this, uint origport,uint sz) { - struct port *port = port_lock_deref(origport); - if (!port) - return TIPC_CONGESTION; + struct port *port; + spin_lock_bh(&port_lock); + port = port_lock_deref(origport); + if (!port) { + spin_unlock_bh(&port_lock); + return TIPC_CONGESTION; + } if (!port->wakeup) goto exit; if (!list_empty(&port->wait_list)) @@ -453,8 +457,8 @@ port->waiting_pkts = 1 + sz/link_max_pkt(this); list_add_tail(&port->wait_list, &this->waiting_ports); exit: - spin_unlock_bh(&port_lock); spin_unlock_bh(port->publ.lock); + spin_unlock_bh(&port_lock); return TIPC_CONGESTION; } @@ -467,7 +471,8 @@ win = 100000; if (win <= 0) return; - spin_lock_bh(&port_lock); + if (!spin_trylock_bh(&port_lock)) + return; if (link_congested(this)) goto exit; list_for_each_entry_safe(port, tp, &this->waiting_ports, wait_list) { @@ -2365,6 +2370,7 @@ if (msg_size(imsg) > TIPC_MAX_MSG_SIZE + LONG_H_SIZE){ msg_print(CONS,fragm,"<REC<Oversized: "); buf_discard(fbuf); + spin_unlock_bh(&this->owner->lock); return; } buf = buf_acquire(msg_size(imsg)); @@ -2623,7 +2629,7 @@ struct link *this; read_lock_bh(&net_lock); this = link_find_by_name(name); - if (!this){ + if (this){ this->exp_msg_count = LINK_BLOCKED; spin_unlock_bh(&this->owner->lock); res = TIPC_OK; @@ -2640,7 +2646,7 @@ struct link *this; read_lock_bh(&net_lock); this = link_find_by_name(name); - if (!this){ + if (this){ this->exp_msg_count = 0; spin_unlock_bh(&this->owner->lock); res = TIPC_OK; |
From: Mark H. <ma...@os...> - 2004-05-12 20:26:01
|
On Wed, 2004-05-12 at 12:09, Jon Maloy wrote: > Yeah, that's a classical one. I also think your solution is ok; the > pending ports will be awakened at next message reception, so > no harm done. > > Thanks /jon > > OK, here is what I plan to check in. Take a look. I fixed a few other things that seemed wrong. See comments by each diff. Mark. p.s. here is another problem I've been seeing: If I do management port access during congestion, the I/O never completes even when the congestion is over. ---------------------------------------- cvs diff -u media.c reg.c link.c -------------------------------------------------- I think that this should be an unlock since it is locked a few lines above. Index: media.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/media.c,v retrieving revision 1.13 diff -u -r1.13 media.c --- media.c 6 May 2004 15:35:31 -0000 1.13 +++ media.c 12 May 2004 20:10:30 -0000 @@ -221,7 +221,7 @@ bearer_schedule_unlocked(this, link); res = 0; } - spin_lock_bh(&this->publ.lock); + spin_unlock_bh(&this->publ.lock); return res; } --------------------------------------------------- Daniel found this one. ref_lock_deref can grab an invalid object pointer if the lock is released too soon. This waits until the ref number is changed so it won't match in ref_lock_deref after it gets the lock. Index: reg.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/reg.c,v retrieving revision 1.8 diff -u -r1.8 reg.c --- reg.c 5 May 2004 18:39:37 -0000 1.8 +++ reg.c 12 May 2004 20:10:31 -0000 @@ -167,7 +167,6 @@ assert(entry->object != 0); assert(entry->data.reference == ref_nb); entry->object = 0; - spin_unlock_bh(&entry->lock); if (ref_table.first_free == 0) ref_table.first_free = index; else @@ -175,6 +174,7 @@ ref_table.entries[ref_table.last_free].data.next_plus_upper |= index; ref_table.last_free = index; entry->data.next_plus_upper = (ref_nb & ~index_mask) + index_mask + 1; + spin_unlock_bh(&entry->lock); write_unlock_bh(&ref_lock); } ------------------------------------------------ link_schedule_port took the global port_lock/port lock in a different order than everywhere else. Added the spin_trylock_bh in link_wakeup_ports. Added an unlock in link_recv_fragment on an error exit. Fixed a couple places where cheking the this pointer looked wrong. Index: link.c =================================================================== RCS file: /cvsroot/tipc/source/unstable/net/tipc/link.c,v retrieving revision 1.24 diff -u -r1.24 link.c --- link.c 7 May 2004 23:16:03 -0000 1.24 +++ link.c 12 May 2004 20:10:32 -0000 @@ -440,10 +440,14 @@ int link_schedule_port(struct link *this, uint origport,uint sz) { - struct port *port = port_lock_deref(origport); - if (!port) - return TIPC_CONGESTION; + struct port *port; + spin_lock_bh(&port_lock); + port = port_lock_deref(origport); + if (!port) { + spin_unlock_bh(&port_lock); + return TIPC_CONGESTION; + } if (!port->wakeup) goto exit; if (!list_empty(&port->wait_list)) @@ -453,8 +457,8 @@ port->waiting_pkts = 1 + sz/link_max_pkt(this); list_add_tail(&port->wait_list, &this->waiting_ports); exit: - spin_unlock_bh(&port_lock); spin_unlock_bh(port->publ.lock); + spin_unlock_bh(&port_lock); return TIPC_CONGESTION; } @@ -467,7 +471,8 @@ win = 100000; if (win <= 0) return; - spin_lock_bh(&port_lock); + if (!spin_trylock_bh(&port_lock)) + return; if (link_congested(this)) goto exit; list_for_each_entry_safe(port, tp, &this->waiting_ports, wait_list) { @@ -2365,6 +2370,7 @@ if (msg_size(imsg) > TIPC_MAX_MSG_SIZE + LONG_H_SIZE){ msg_print(CONS,fragm,"<REC<Oversized: "); buf_discard(fbuf); + spin_unlock_bh(&this->owner->lock); return; } buf = buf_acquire(msg_size(imsg)); @@ -2623,7 +2629,7 @@ struct link *this; read_lock_bh(&net_lock); this = link_find_by_name(name); - if (!this){ + if (this){ this->exp_msg_count = LINK_BLOCKED; spin_unlock_bh(&this->owner->lock); res = TIPC_OK; @@ -2640,7 +2646,7 @@ struct link *this; read_lock_bh(&net_lock); this = link_find_by_name(name); - if (!this){ + if (this){ this->exp_msg_count = 0; spin_unlock_bh(&this->owner->lock); res = TIPC_OK; -- Mark Haverkamp <ma...@os...> |
From: Jon M. <jon...@er...> - 2004-05-12 19:10:00
|
Yeah, that's a classical one. I also think your solution is ok; the pending ports will be awakened at next message reception, so no harm done. Thanks /jon Mark Haverkamp wrote: >Here is what looks to be happening with the spin lock deadlock. I >replaced all the spin_lock_bh calls with a wrapper that tries to get the >lock for a while then prints out a debug message if it can't get the >lock. > >As an experiment, I changed the spin_lock_bh in link_wakeup_ports to a >trylock and exited if it couldn't get the lock. I am now not able to >get the deadlock. > > >CPU 0: >release -- > tipc_delete_port (get port lock) -- > port_abort_peer -- > port_send_proto_msg -- > net_route_msg -- > link_send (get node lock) -- (hung spinning) > >CPU 1: >common_interrupt -- > do_softirq -- > net_rx_action -- > netif_receive_skb -- > recv_msg (tipc eth) -- > tipc_recv_msg (get node lock) -- > link_wakeup_ports (get port lock) -- (hung spinning) > >Stack dumps: > >port lock timeout >Call Trace: > [<f8a837ab>] link_wakeup_ports+0x9b/0x230 [tipc] > [<f8a87c2e>] tipc_recv_msg+0x7fe/0x8c0 [tipc] > [<c014949d>] __kmalloc+0x19d/0x250 > [<f8aa5db9>] recv_msg+0x39/0x50 [tipc] > [<c0375af2>] netif_receive_skb+0x172/0x1b0 > [<c0375bb4>] process_backlog+0x84/0x120 > [<c0375cd0>] net_rx_action+0x80/0x120 > [<c0124bc8>] __do_softirq+0xb8/0xc0 > [<c0124c05>] do_softirq+0x35/0x40 > [<c0107ce5>] do_IRQ+0x175/0x230 > [<c0105ce0>] common_interrupt+0x18/0x20 > [<c0221c91>] copy_from_user+0x1/0x80 > [<f8a866bf>] link_send_sections_long+0x30f/0xb30 [tipc] > [<c0221694>] __delay+0x14/0x20 > [<f8a8366f>] link_schedule_port+0x13f/0x1e0 [tipc] > [<f8a860f5>] link_send_sections_fast+0x5b5/0x870 [tipc] > [<c011b12a>] __wake_up_common+0x3a/0x60 > [<f8a97bf2>] tipc_send+0x92/0x9d0 [tipc] > [<c011d736>] __mmdrop+0x36/0x50 > [<c03f15b7>] schedule+0x467/0x7a0 > [<f8aa33e6>] recv_msg+0x2b6/0x560 [tipc] > [<f8aa2d90>] send_packet+0x90/0x180 [tipc] > [<c011b0d0>] default_wake_function+0x0/0x20 > [<c036c83e>] sock_sendmsg+0x8e/0xb0 > [<f8aa5db9>] recv_msg+0x39/0x50 [tipc] > [<c01435ba>] buffered_rmqueue+0xfa/0x220 > [<c036c61a>] sockfd_lookup+0x1a/0x80 > [<c036dd61>] sys_sendto+0xe1/0x100 > [<c0128f62>] del_timer_sync+0x42/0x140 > [<c036d109>] sock_poll+0x29/0x30 > [<c017884b>] do_pollfd+0x5b/0xa0 > [<c036ddb6>] sys_send+0x36/0x40 > [<c036e60e>] sys_socketcall+0x12e/0x240 > [<c0105373>] syscall_call+0x7/0xb > >&node->lock lock timeout >Call Trace: > [<f8a8549a>] link_send+0xda/0x2a0 [tipc] > [<f8a92cee>] net_route_msg+0x41e/0x43d [tipc] > [<f8a949c2>] port_send_proto_msg+0x1a2/0x2a0 [tipc] > [<f8a95983>] port_abort_peer+0x83/0x90 [tipc] > [<f8a9458f>] tipc_deleteport+0x19f/0x280 [tipc] > [<f8aa25b2>] release+0x72/0x130 [tipc] > [<c036c76b>] sock_release+0x7b/0xc0 > [<c036d176>] sock_close+0x36/0x50 > [<c016315a>] __fput+0x10a/0x120 > [<c0161597>] filp_close+0x57/0x90 > [<c0121dbc>] put_files_struct+0x7c/0xf0 > [<c0122d5a>] do_exit+0x23a/0x5a0 > [<c012aa35>] __dequeue_signal+0xf5/0x1b0 > [<c0123240>] do_group_exit+0xe0/0x150 > [<c012ab1d>] dequeue_signal+0x2d/0x90 > [<c012cbef>] get_signal_to_deliver+0x26f/0x510 > [<c0105136>] do_signal+0xb6/0xf0 > [<c036ddb6>] sys_send+0x36/0x40 > [<c036e60e>] sys_socketcall+0x12e/0x240 > [<c01051cb>] do_notify_resume+0x5b/0x5d > [<c01053be>] work_notifysig+0x13/0x15 > > > |
From: Mark H. <ma...@os...> - 2004-05-12 18:24:38
|
Here is what looks to be happening with the spin lock deadlock. I replaced all the spin_lock_bh calls with a wrapper that tries to get the lock for a while then prints out a debug message if it can't get the lock. As an experiment, I changed the spin_lock_bh in link_wakeup_ports to a trylock and exited if it couldn't get the lock. I am now not able to get the deadlock. CPU 0: release -- tipc_delete_port (get port lock) -- port_abort_peer -- port_send_proto_msg -- net_route_msg -- link_send (get node lock) -- (hung spinning) CPU 1: common_interrupt -- do_softirq -- net_rx_action -- netif_receive_skb -- recv_msg (tipc eth) -- tipc_recv_msg (get node lock) -- link_wakeup_ports (get port lock) -- (hung spinning) Stack dumps: port lock timeout Call Trace: [<f8a837ab>] link_wakeup_ports+0x9b/0x230 [tipc] [<f8a87c2e>] tipc_recv_msg+0x7fe/0x8c0 [tipc] [<c014949d>] __kmalloc+0x19d/0x250 [<f8aa5db9>] recv_msg+0x39/0x50 [tipc] [<c0375af2>] netif_receive_skb+0x172/0x1b0 [<c0375bb4>] process_backlog+0x84/0x120 [<c0375cd0>] net_rx_action+0x80/0x120 [<c0124bc8>] __do_softirq+0xb8/0xc0 [<c0124c05>] do_softirq+0x35/0x40 [<c0107ce5>] do_IRQ+0x175/0x230 [<c0105ce0>] common_interrupt+0x18/0x20 [<c0221c91>] copy_from_user+0x1/0x80 [<f8a866bf>] link_send_sections_long+0x30f/0xb30 [tipc] [<c0221694>] __delay+0x14/0x20 [<f8a8366f>] link_schedule_port+0x13f/0x1e0 [tipc] [<f8a860f5>] link_send_sections_fast+0x5b5/0x870 [tipc] [<c011b12a>] __wake_up_common+0x3a/0x60 [<f8a97bf2>] tipc_send+0x92/0x9d0 [tipc] [<c011d736>] __mmdrop+0x36/0x50 [<c03f15b7>] schedule+0x467/0x7a0 [<f8aa33e6>] recv_msg+0x2b6/0x560 [tipc] [<f8aa2d90>] send_packet+0x90/0x180 [tipc] [<c011b0d0>] default_wake_function+0x0/0x20 [<c036c83e>] sock_sendmsg+0x8e/0xb0 [<f8aa5db9>] recv_msg+0x39/0x50 [tipc] [<c01435ba>] buffered_rmqueue+0xfa/0x220 [<c036c61a>] sockfd_lookup+0x1a/0x80 [<c036dd61>] sys_sendto+0xe1/0x100 [<c0128f62>] del_timer_sync+0x42/0x140 [<c036d109>] sock_poll+0x29/0x30 [<c017884b>] do_pollfd+0x5b/0xa0 [<c036ddb6>] sys_send+0x36/0x40 [<c036e60e>] sys_socketcall+0x12e/0x240 [<c0105373>] syscall_call+0x7/0xb &node->lock lock timeout Call Trace: [<f8a8549a>] link_send+0xda/0x2a0 [tipc] [<f8a92cee>] net_route_msg+0x41e/0x43d [tipc] [<f8a949c2>] port_send_proto_msg+0x1a2/0x2a0 [tipc] [<f8a95983>] port_abort_peer+0x83/0x90 [tipc] [<f8a9458f>] tipc_deleteport+0x19f/0x280 [tipc] [<f8aa25b2>] release+0x72/0x130 [tipc] [<c036c76b>] sock_release+0x7b/0xc0 [<c036d176>] sock_close+0x36/0x50 [<c016315a>] __fput+0x10a/0x120 [<c0161597>] filp_close+0x57/0x90 [<c0121dbc>] put_files_struct+0x7c/0xf0 [<c0122d5a>] do_exit+0x23a/0x5a0 [<c012aa35>] __dequeue_signal+0xf5/0x1b0 [<c0123240>] do_group_exit+0xe0/0x150 [<c012ab1d>] dequeue_signal+0x2d/0x90 [<c012cbef>] get_signal_to_deliver+0x26f/0x510 [<c0105136>] do_signal+0xb6/0xf0 [<c036ddb6>] sys_send+0x36/0x40 [<c036e60e>] sys_socketcall+0x12e/0x240 [<c01051cb>] do_notify_resume+0x5b/0x5d [<c01053be>] work_notifysig+0x13/0x15 -- Mark Haverkamp <ma...@os...> |
From: Jon M. <jon...@er...> - 2004-05-12 14:55:10
|
Well, we do have bottom halves (timer and signal tasklets) that may interfere with message sending/reception if executed, even on UPs. I am sure there are cases when the _bh() version is not really needed, but I wanted to keep it safe when I developed it. Something we can look into later, maybe. /jon Mark Haverkamp wrote: On Wed, 2004-05-12 at 07:15, Jon Maloy wrote: Mark, I think you can easily verify your hypothesis by replacing spin_lock_bh() with a spin_trylock_bh() in tipc_recv_msg(), and then just discard the packet and continue the loop if it fails. (If this happens the packet will be retransmitted anyway.) It is worth a try. I'll give it a try. Out of curiosity, why are you using the *_bh versions of spinlock? |
From: Mark H. <ma...@os...> - 2004-05-12 14:21:12
|
On Wed, 2004-05-12 at 07:15, Jon Maloy wrote: > Mark, > I think you can easily verify your hypothesis by replacing > spin_lock_bh() with a spin_trylock_bh() in tipc_recv_msg(), and > then just discard the packet and continue the loop if it fails. > (If this happens the packet will be retransmitted anyway.) > It is worth a try. I'll give it a try. Out of curiosity, why are you using the *_bh versions of spinlock? -- Mark Haverkamp <ma...@os...> |
From: Mark H. <ma...@os...> - 2004-05-12 14:18:49
|
On Tue, 2004-05-11 at 16:22, Jon Maloy wrote: > I am not sure. In your first analysis of the problem you > found that it was a dying process that was causing the > processor to hang while trying to grab the node lock, which > was then presumably held by the tipc_rev_msg interrupt, > or a tasklet. > That is the opposite scenario of what you describe here. I see the hang both ways. > > Also, this was no different in the old implementation: > we shared the same lock between user level/tasklet code > and interrupts without any problems. If you are right, > why don't we have deadlocks all the time then ? That is a good point, but it does seem a possibility. > > I think I will re-read Rusty Lynch "unreliable guide" > once more... That is Rusty Russell. -- Mark Haverkamp <ma...@os...> |
From: Jon M. <jon...@er...> - 2004-05-12 14:15:58
|
Mark, I think you can easily verify your hypothesis by replacing spin_lock_bh() with a spin_trylock_bh() in tipc_recv_msg(), and then just discard the packet and continue the loop if it fails. (If this happens the packet will be retransmitted anyway.) It is worth a try. /Jon Mark Haverkamp wrote: On Thu, 2004-05-06 at 16:37, Jon Maloy wrote: I think your analysis is correct, but I don't know where this omission happens. The problem I am trying to solve may be related, - when I run parallel links between two nodes comunication on one link sometimes seem to stop under heavy load. With a little luck we are looking for the same bug. /Jon Jon, I was thinking about this today and it occurred to me that spin_lock_bh doesn't prevent interrupts from happening. If true, we can get in a deadlock situation when a CPU has the node lock, an ethernet interrupt happens causing tipc_recv_msg to get called. One of the first things that tipc_recv_msg does is try to get the node lock. This seems to be a possible explanation for the spin hang on the node lock. Does this make sense to you? Mark. Mark Haverkamp wrote: Jon, Daniel and I have been seeing a tipc hang for 3 or 4 weeks when we kill a running application in a certain order. While running the tipc benchmark program we can get tipc to hang the computer by killing the client while it has the 32 processes running. Although, to get the hang, I have to have tried to run some management port accesses which are stalled due to congestion. After doing some tracing, I have narrowed it down to an exiting process spinning while trying to get the node lock. Our assumption is that some other process hasn't released the lock by accident, although its not obvious where. I have included the stack dump from the sysrq P console command. SysRq : Show Regs Pid: 2001, comm: client_tipc_tp EIP: 0060:[<f8a913d9>] CPU: 0 EIP is at .text.lock.link+0xd7/0x3ce [tipc] EFLAGS: 00000286 Not tainted (2.6.6-rc3) EAX: f7c8ef6c EBX: 00000000 ECX: 01001011 EDX: 00000013 ESI: f7c8eee0 EDI: f359a000 EBP: f359bcf8 DS: 007b ES: 007b CR0: 8005003b CR2: 080e2ce8 CR3: 0053d000 CR4: 000006d0 Call Trace: [<c0126a38>] __do_softirq+0xb8/0xc0 [<f8a9818b>] net_route_msg+0x48b/0x4ad [tipc] [<c015b3a1>] __pte_chain_free+0x81/0x90 [<f8a99e6e>] port_send_proto_msg+0x1ae/0x2d0 [tipc] [<f8a9af73>] port_abort_peer+0x83/0x90 [tipc] [<f8a999a1>] tipc_deleteport+0x181/0x2a0 [tipc] [<f8aa7ae2>] release+0x72/0x130 [tipc] [<c0378ff9>] sock_release+0x99/0xf0 [<c0379a16>] sock_close+0x36/0x50 [<c016740d>] __fput+0x12d/0x140 [<c0165857>] filp_close+0x57/0x90 [<c0123adc>] put_files_struct+0x7c/0xf0 [<c0124b1c>] do_exit+0x26c/0x600 [<c012cc05>] __dequeue_signal+0xf5/0x1b0 [<c0125057>] do_group_exit+0x107/0x190 [<c012cced>] dequeue_signal+0x2d/0x90 [<c012f14c>] get_signal_to_deliver+0x28c/0x590 [<c0105286>] do_signal+0xb6/0xf0 [<c037a736>] sys_send+0x36/0x40 [<c037af8e>] sys_socketcall+0x12e/0x240 [<c010531b>] do_notify_resume+0x5b/0x5d [<c010554a>] work_notifysig+0x13/0x15 You can see that the process is trying to exit. I have traced the EIP to the spin_lock_bh(&node->lock) in link_lock_select from a disassembly of link.o. Any ideas on this? Mark. ------------------------------------------------------- This SF.Net email is sponsored by Sleepycat Software Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver higher performing products faster, at low TCO. http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 <http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3> _______________________________________________ TIPC-discussion mailing list TIP...@li... <mailto:TIP...@li...> https://lists.sourceforge.net/lists/listinfo/tipc-discussion <https://lists.sourceforge.net/lists/listinfo/tipc-discussion> |
From: Jon M. <jon...@er...> - 2004-05-11 23:22:49
|
I am not sure. In your first analysis of the problem you found that it was a dying process that was causing the processor to hang while trying to grab the node lock, which was then presumably held by the tipc_rev_msg interrupt, or a tasklet. That is the opposite scenario of what you describe here. Also, this was no different in the old implementation: we shared the same lock between user level/tasklet code and interrupts without any problems. If you are right, why don't we have deadlocks all the time then ? I think I will re-read Rusty Lynch "unreliable guide" once more... Regards /Jon Mark Haverkamp wrote: On Thu, 2004-05-06 at 16:37, Jon Maloy wrote: I think your analysis is correct, but I don't know where this omission happens. The problem I am trying to solve may be related, - when I run parallel links between two nodes comunication on one link sometimes seem to stop under heavy load. With a little luck we are looking for the same bug. /Jon Jon, I was thinking about this today and it occurred to me that spin_lock_bh doesn't prevent interrupts from happening. If true, we can get in a deadlock situation when a CPU has the node lock, an ethernet interrupt happens causing tipc_recv_msg to get called. One of the first things that tipc_recv_msg does is try to get the node lock. This seems to be a possible explanation for the spin hang on the node lock. Does this make sense to you? Mark. Mark Haverkamp wrote: Jon, Daniel and I have been seeing a tipc hang for 3 or 4 weeks when we kill a running application in a certain order. While running the tipc benchmark program we can get tipc to hang the computer by killing the client while it has the 32 processes running. Although, to get the hang, I have to have tried to run some management port accesses which are stalled due to congestion. After doing some tracing, I have narrowed it down to an exiting process spinning while trying to get the node lock. Our assumption is that some other process hasn't released the lock by accident, although its not obvious where. I have included the stack dump from the sysrq P console command. SysRq : Show Regs Pid: 2001, comm: client_tipc_tp EIP: 0060:[<f8a913d9>] CPU: 0 EIP is at .text.lock.link+0xd7/0x3ce [tipc] EFLAGS: 00000286 Not tainted (2.6.6-rc3) EAX: f7c8ef6c EBX: 00000000 ECX: 01001011 EDX: 00000013 ESI: f7c8eee0 EDI: f359a000 EBP: f359bcf8 DS: 007b ES: 007b CR0: 8005003b CR2: 080e2ce8 CR3: 0053d000 CR4: 000006d0 Call Trace: [<c0126a38>] __do_softirq+0xb8/0xc0 [<f8a9818b>] net_route_msg+0x48b/0x4ad [tipc] [<c015b3a1>] __pte_chain_free+0x81/0x90 [<f8a99e6e>] port_send_proto_msg+0x1ae/0x2d0 [tipc] [<f8a9af73>] port_abort_peer+0x83/0x90 [tipc] [<f8a999a1>] tipc_deleteport+0x181/0x2a0 [tipc] [<f8aa7ae2>] release+0x72/0x130 [tipc] [<c0378ff9>] sock_release+0x99/0xf0 [<c0379a16>] sock_close+0x36/0x50 [<c016740d>] __fput+0x12d/0x140 [<c0165857>] filp_close+0x57/0x90 [<c0123adc>] put_files_struct+0x7c/0xf0 [<c0124b1c>] do_exit+0x26c/0x600 [<c012cc05>] __dequeue_signal+0xf5/0x1b0 [<c0125057>] do_group_exit+0x107/0x190 [<c012cced>] dequeue_signal+0x2d/0x90 [<c012f14c>] get_signal_to_deliver+0x28c/0x590 [<c0105286>] do_signal+0xb6/0xf0 [<c037a736>] sys_send+0x36/0x40 [<c037af8e>] sys_socketcall+0x12e/0x240 [<c010531b>] do_notify_resume+0x5b/0x5d [<c010554a>] work_notifysig+0x13/0x15 You can see that the process is trying to exit. I have traced the EIP to the spin_lock_bh(&node->lock) in link_lock_select from a disassembly of link.o. Any ideas on this? Mark. ------------------------------------------------------- This SF.Net email is sponsored by Sleepycat Software Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver higher performing products faster, at low TCO. http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 <http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3> _______________________________________________ TIPC-discussion mailing list TIP...@li... <mailto:TIP...@li...> https://lists.sourceforge.net/lists/listinfo/tipc-discussion <https://lists.sourceforge.net/lists/listinfo/tipc-discussion> |
From: Jon M. <jon...@er...> - 2004-05-11 23:07:39
|
I realize the problem, but I have difficulties with seeing how we can do otherwise if we want lock granularity per node instance. I tried to make it clear via the function names, but it can certainly be improved by better comments on this. (I already have such comments at some places.) I think improved comments is the best we can do, unless somebody has a better idea. Regards /Jon Daniel McNeil wrote: On Thu, 2004-04-29 at 08:15, Jon Maloy wrote: Do you suggest that we delay the LKML-announcement until we have done these changes ? To replace all the linked lists is intrusive, and it may take weeks re-stabilize the code. Otherwise, see my comments below. Regards /jon Jon, One other thing about TIPC that Mark and I have noticed that might cause mainline acceptance problems is the: ref_lock_acquire(void *object,spinlock_t ** lock) ref_lock_deref() namesub_lock_deref() ref_unlock_discard(s->publ.s.ref); port_lock_deref() spin_unlock_bh(this->publ.lock); ref_unlock_discard(ref); 2 big concerns: 1. This hides the locking since a pointer is set to the "real" lock. The usage above just does not seem clear on what is being locked. 2. The lock is locked in one function and returned with the spinlock held and then unlock() in a different function. Maybe more comments on functions saying which locks are assumed held and which locks are unlocked that came in locked might clear up some of the confusion. Thanks, Daniel |
From: Mark H. <ma...@os...> - 2004-05-11 22:24:14
|
On Thu, 2004-05-06 at 16:37, Jon Maloy wrote: > I think your analysis is correct, but I don't know where this omission > happens. The problem I am trying to solve may be related, > - when I run parallel links between two nodes comunication on > one link sometimes seem to stop under heavy load. > > With a little luck we are looking for the same bug. > > /Jon > Jon, I was thinking about this today and it occurred to me that spin_lock_bh doesn't prevent interrupts from happening. If true, we can get in a deadlock situation when a CPU has the node lock, an ethernet interrupt happens causing tipc_recv_msg to get called. One of the first things that tipc_recv_msg does is try to get the node lock. This seems to be a possible explanation for the spin hang on the node lock. Does this make sense to you? Mark. > > Mark Haverkamp wrote: > > >Jon, > > > >Daniel and I have been seeing a tipc hang for 3 or 4 weeks when we kill > >a running application in a certain order. > > > >While running the tipc benchmark program we can get tipc to hang the > >computer by killing the client while it has the 32 processes running. > >Although, to get the hang, I have to have tried to run some management > >port accesses which are stalled due to congestion. After doing some > >tracing, I have narrowed it down to an exiting process spinning while > >trying to get the node lock. Our assumption is that some other process > >hasn't released the lock by accident, although its not obvious where. I > >have included the stack dump from the sysrq P console command. > > > >SysRq : Show Regs > > > >Pid: 2001, comm: client_tipc_tp > >EIP: 0060:[<f8a913d9>] CPU: 0 > >EIP is at .text.lock.link+0xd7/0x3ce [tipc] > > EFLAGS: 00000286 Not tainted (2.6.6-rc3) > >EAX: f7c8ef6c EBX: 00000000 ECX: 01001011 EDX: 00000013 > >ESI: f7c8eee0 EDI: f359a000 EBP: f359bcf8 DS: 007b ES: 007b > >CR0: 8005003b CR2: 080e2ce8 CR3: 0053d000 CR4: 000006d0 > >Call Trace: > > [<c0126a38>] __do_softirq+0xb8/0xc0 > > [<f8a9818b>] net_route_msg+0x48b/0x4ad [tipc] > > [<c015b3a1>] __pte_chain_free+0x81/0x90 > > [<f8a99e6e>] port_send_proto_msg+0x1ae/0x2d0 [tipc] > > [<f8a9af73>] port_abort_peer+0x83/0x90 [tipc] > > [<f8a999a1>] tipc_deleteport+0x181/0x2a0 [tipc] > > [<f8aa7ae2>] release+0x72/0x130 [tipc] > > [<c0378ff9>] sock_release+0x99/0xf0 > > [<c0379a16>] sock_close+0x36/0x50 > > [<c016740d>] __fput+0x12d/0x140 > > [<c0165857>] filp_close+0x57/0x90 > > [<c0123adc>] put_files_struct+0x7c/0xf0 > > [<c0124b1c>] do_exit+0x26c/0x600 > > [<c012cc05>] __dequeue_signal+0xf5/0x1b0 > > [<c0125057>] do_group_exit+0x107/0x190 > > [<c012cced>] dequeue_signal+0x2d/0x90 > > [<c012f14c>] get_signal_to_deliver+0x28c/0x590 > > [<c0105286>] do_signal+0xb6/0xf0 > > [<c037a736>] sys_send+0x36/0x40 > > [<c037af8e>] sys_socketcall+0x12e/0x240 > > [<c010531b>] do_notify_resume+0x5b/0x5d > > [<c010554a>] work_notifysig+0x13/0x15 > > > >You can see that the process is trying to exit. I have traced the EIP to > >the spin_lock_bh(&node->lock) in link_lock_select from a disassembly of > >link.o. > > > >Any ideas on this? > > > >Mark. > > > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by Sleepycat Software > Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver > higher performing products faster, at low TCO. > http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 > _______________________________________________ > TIPC-discussion mailing list > TIP...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion -- Mark Haverkamp <ma...@os...> |
From: Daniel M. <da...@os...> - 2004-05-11 21:59:45
|
On Thu, 2004-04-29 at 08:15, Jon Maloy wrote: > Do you suggest that we delay the LKML-announcement until we have done > these changes ? To replace all the linked lists is intrusive, and it > may > take weeks re-stabilize the code. Otherwise, see my comments below. > > Regards /jon > Jon, One other thing about TIPC that Mark and I have noticed that might cause mainline acceptance problems is the: ref_lock_acquire(void *object,spinlock_t ** lock) ref_lock_deref() namesub_lock_deref() ref_unlock_discard(s->publ.s.ref); port_lock_deref() spin_unlock_bh(this->publ.lock); ref_unlock_discard(ref); 2 big concerns: 1. This hides the locking since a pointer is set to the "real" lock. The usage above just does not seem clear on what is being locked. 2. The lock is locked in one function and returned with the spinlock held and then unlock() in a different function. Maybe more comments on functions saying which locks are assumed held and which locks are unlocked that came in locked might clear up some of the confusion. Thanks, Daniel |