From: Juhani R. <jr...@ik...> - 2006-03-07 06:16:37
|
On Mon, 6 Mar 2006, David Coulson wrote: > I've been beta testing VMWare ESX 3.0 with IET, and the guys from VMWare > told me this: > > By the way, looking at IET ver. 0.4.13 they do not seem to handle > RESERVE/RELEASE correctly (it always OK them no matter what), so I would > not recommend it for shared VMFS (it is guaranteed to corrupt your VMFS > sooner or later). > > Can someone explain if this will be fixed in IET soon? I made patch for it but reveals big problems with IET reset handling and most of the times it results to BUG() when cluster does failover using target RESET. You can check archives for the patch if you're interested (I'm not sure if it still applies cleanly). I've been planning to get back to it but haven't had time to deepdive to iet code and check what's wrong with reset handling (there is something definitely wrong if you check mailinglist by seraching for BUG()). Good news it is that even without patch cluster works correctly as long as cluster nodes have somekind of connection. One time when real corruption can happen is when for some reason both nodes are alive but can't see each other. This is moment when the inactive node tries to come active and sends RESERVE which succeeds and starts to use disk. The other node also continues to use disk because it also has RESERVE which it thinks is valid. One way to make sure that this doesn't happen is to keep heartbeat in every connection and make sure that machines can also reach each other through those connections. With my patch inactive node doesn't succeed with RESERVE so it sends target RESET and that most of the time this triggers BUG() in iet. > David Juhani -- Juhani Rautiainen jr...@ik... |