From: Partha <par...@gm...> - 2019-10-18 13:17:45
|
Hi Rune, Your systems memory seems to be fragmented, and you need to perform forced reclaim. Can you check the buddy for higher order allocations? cat /proc/buddyinfo BTW, I fixed this in: 57d5f64d83ab tipc: allocate user memory with GFP_KERNEL flag And it was Reported-by: Rune Torgersen <ru...@in...> Its in upstream v4.10-rc3-167-g57d5f64d83ab regards Partha On 2019-10-17 22:08, Rune Torgersen wrote: > Looks like I can kind of make it happen on one system mow. > Stopping some programs (not pattern in which ones) makes it work, and starting some back up again makes it fail. > > Tipc nametable has 231 entries when failing and 183 entries when succeeding (however on a different system the nametable has 251 entries and it is not failing). > > How do I look for memory used by TIPC in the kernel? > > -----Original Message----- > From: Rune Torgersen <ru...@in...> > Sent: Thursday, October 17, 2019 14:53 > > > I will have to look for leaks next time I can make it happen. > I was trying stuff and shut down a different program that was unrelated (but had some TIPC sockets open on a different address (104)), and as soon as I did, the sends started working again. > > It is possible that one of those unrelated sockets has something stuck (as one of them was only ever used to send RDM messages but nothing ever reads it). > > Any suggestions as to what to start looking at (netstat, tipc, tipc_config or kernel params) to try to track it down?. > > Problem with testing a patch (or using Unbuntu 18 LTS) is that we cannot reliably make it happen. > > -----Original Message----- > From: Jon Maloy <jon...@er...> > Sent: Thursday, October 17, 2019 14:35 > > > Hi Rune, > > Do you see any signs of general memory leak ("free") on your node? > > Anyway there can be no doubt that this happens because the big buffer pool is running empty. > > We fixed that in commit 4c94cc2d3d57 ("tipc: fall back to smaller MTU if allocation of local send skb fails") which was delivered to Linux 4.16. > > Do you have any opportunity to apply that patch and try it? > > BR > ///jon > >> -----Original Message----- >> From: Rune Torgersen <ru...@in...> >> Sent: 17-Oct-19 12:38 >> To: 'tip...@li...' <tipc- >> dis...@li...> >> Subject: [tipc-discussion] Error allocating memeory error when sending RDM >> message >> >> Hi. >> >> I am running into an issue when sending SOCK_RDM or SOCK_DGRAM >> messages. On a system that has been up for a time (120+ days inthis case), I >> cannot send any RDM/DGRAM type TIPC messages that are larger than about >> 16000 bytes (16033+ fails, 15100 and smaller still works). >> Any larger messages fails with erro code 12 :"Cannot allocate memory". >> >> Really odd thing about it only happens on some connections and not others, >> on the same system (example, sending to tipc node 103:1003 gets no error, >> while sending to 103:3 get error). >> When it gets into this state, it seems to happen forever on the same >> destination address, and not on others until system is rebooted. (restarting the >> server side application makes no difference). >> The sends are done on the same node as the receiver is on. >> >> Kernel is Ubuntu 16.04 LTS 4.4.0-150 in this case, also seen on 161. >> >> Nametable for 103: >> 103 2 2 <1.1.1:2328193343> 2328193344 cluster >> 103 3 3 <1.1.2:3153441800> 3153441801 cluster >> 103 5 5 <1.1.4:269294867> 269294868 cluster >> 103 1002 1002 <1.1.1:490133365> 490133366 cluster >> 103 1003 1003 <1.1.2:2552019732> 2552019733 cluster >> 103 1005 1005 <1.1.4:625110186> 625110187 cluster >> >> _______________________________________________ >> tipc-discussion mailing list >> tip...@li... >> https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > > > _______________________________________________ > tipc-discussion mailing list > tip...@li... > https://lists.sourceforge.net/lists/listinfo/tipc-discussion > |