From: SourceForge.net <no...@so...> - 2005-03-17 07:22:43
|
Bugs item #1107050, was opened at 2005-01-21 17:53 Message generated for change (Comment added) made by wysochanski You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=387023&aid=1107050&group_id=26396 Category: iscsi-drvr Group: None Status: Open Resolution: None Priority: 9 Submitted By: Dave Wysochanski (wysochanski) Assigned to: Manish Kumar Bhojasia (bhojas) Summary: memory deadlock with 3.6.2 with large file on 1GB ram machin Initial Comment: Looks like it's a lot easier to hit the memory deadlock situation than I ever imagined. The below is a simple 'dd' command to create a large file, and involves no disruptions whatsoever, but it deadlocks the machine. Steps to reproduce: 1) Boot a Linux host with 1GB ram (or with "mem=1G") 2) Create a 10GB LUN 3) Start iscsi, and create 1 partition via fdisk that covers the whole LUN 3) Create an ext3 filesystem on the LUN 4) run the following command: dd if=/dev/zero of=/iscsi_mnt/large_luns/10-1/file bs=1024 count=50000000 This same test seems to pass with diffferent memory schemes (e.g. mem=512M, 4G, etc). This may mean the problem is just less likely in these scenarios, or it may truly have something to do with the 1G memory size. ---------------------------------------------------------------------- >Comment By: Dave Wysochanski (wysochanski) Date: 2005-03-17 02:22 Message: Logged In: YES user_id=752546 This appears to be isolated to the one kernel series that we originally found this on. I have no evidence to believe this particular issue can be repro'd on other kernels (e.g. vanilla 2.4.x, etc). ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2005-01-28 15:38 Message: Logged In: NO in 2.4 do the sockets have the snd/recv timeouts? You could set those to see if you are hitting the initial skbuff mem allocation problem. The llds themselves allocate mem too so you could hit one of them. Or do you know if the session timeouts have expired? In the 2.6 driver when the session drops the socket layer kmallocs mem with GFP_KERNEL. ---------------------------------------------------------------------- Comment By: Dave Wysochanski (wysochanski) Date: 2005-01-28 10:34 Message: Logged In: YES user_id=752546 I've seen this the most on RHEL3 kernels. In particular, RHEL3, U4 seems really easy to repro the hang. Other distros/kernels probably have this problem as well. In some cases, I've had to put the 'dd' command in a loop to see the hang, e.g. (running bash): while true; do dd if=/dev/zero of=/iscsi_mnt/large_luns/10-1/file bs=1024 count=50000000; done In all cases though, the hang seems to occur somewhere from 1-30 min's (it usually only takes a few minutes). We know a little more about this now. It's not clear who's getting blocked, but it looks like both kswapd and bdflushd get blocked waiting for a bounce buffer. The iscsi-tx thread looks blocked as well, waiting for memory. I'm not sure what gets blocked first - maybe iscsi-tx, and then bdflush/kswapd try to free memory, but the pages they pick are dirty page cache buffers that go to the iscsi device. Sometimes my systems have frozen for 30 seconds or more, with sysrq-m indicating there are no pages free, and somehow later the system comes out of the deadlock state. I'm not sure really what's going on at this point. I know there's a fundamental vm / networking / IO layers problem, but I'd like to understand why we don't hit the problem in most cases, and why I'm seeing it now with no disruptions at all. ---------------------------------------------------------------------- Comment By: Dave Wysochanski (wysochanski) Date: 2005-01-21 19:18 Message: Logged In: YES user_id=752546 Sorry - typeo on the original 'dd' command (one too many zeros). The intent was to create one 5GB file on the 10GB LUN. The correct command should be: dd if=/dev/zero of=/iscsi_mnt/large_luns/10-1/file bs=1024 count=5000000 ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=387023&aid=1107050&group_id=26396 |