From: Pasi <pa...@ik...> - 2008-01-29 10:03:48
|
On Mon, Jan 28, 2008 at 09:43:40AM -0500, Ross S. W. Walker wrote: > > > > Fixes: > > > > 1) We have taken the VM files off of iSCSI and moved them to > > NFS, and that seems to have solved the problem. We are still > > using iSCSI to access the Windows data, but have not had an > > issue once the VM files were not iSCSI mounted. > > > > 2) We moved from open-iscsi to an Qlogic QLA4050 as the > > initiator. This solved the problem even with the VM files on > > the iSCSi target. > > > > 3) The one site that had the problem with disk-to-disk > > backups which also had the almost idle failure, we stopped > > doing disk-to-disk backups and we moved the VM files to NFS, > > and have not seen the problem since. > > > > > > > > I wish I knew what was causing the problem when I was > > accessing the VMware virtual disk files via iSCSI with a > > software initiator, but that seems to be the problem combination. > > > > Sorry for the extremely late reply. > > The problems you are describing sounds a lot like a result of the > Broadcom TOE engine that ships with a lot of Dells these days. > > I found that disabling the TOE feature on these adapters or > switching to the Intel Pro 1000 adapters fixes this. > > Also I recommend disabling Jumbo Frames and tx/rx flow control > as these can also cause issues where traffic stalls unexpectantly > due to compatibility problems between switch and nic implementation. > > Jumbo Frames are really only needed in 10Gbe and tx/rx > flow control isn't really necessary with today's modern TCP/IP > stacks (but also may be adviseable with 10Gbe and under powered > CPUs). > Could you explain that "flow control isn't really necessary" in more depth? There seems to be a lot of different views of this subject.. With my understanding you need ethernet flow control with iSCSI to get good performance from a bigger IP-SAN.. delays caused by flow control are much shorter than delays caused by TCP retransmits (and the TCP performance drop caused by retransmits).. flow control is there to prevent packet loss (=tcp retransmits) when for example multiple initiators are writing to the same switch/target port and obviously the traffic flows need to be throttled down.. Equallogic recommends flow control over jumbo frames.. actually they say good flow control implementation (cisco 3750) is a key to get good performance from a bigger IP-SAN. Ethernet flow control is also a requirement for FCOE (Fibre Channel over Ethernet) protocol. -- Pasi |