|
From: Vladislav B. <vs...@vl...> - 2012-07-17 21:08:38
|
Hi, It seems your config has some weird circular dependency, like: backend devices don't start working never completing received requests until SCST config done and SCST config can't finish waiting for the devices to complete. Logs from the beginning of failover can shed some light on this. Vlad Pascal BERTON, on 07/14/2012 08:26 AM wrote: > Hi all ! > > I'm currently facing weird problems with SCST, and after days of various > experiments and observations, trying to isolate as precisely as possible the > problem, I conclude that I now need a hand. Could somebody help me a bit on > that ? > > Basically, we're running a 2 nodes single-primary DRBD/Pacemaker cluster > (kernel version 2.6.32-71.7.1., based from Openfiler 2.99 distro) hosting 4 > DRBD resources each presented to 4 VMware hosts (ESXi 4.1) using two SCST > (vdisk_fileio) and ISCSI-SCST targets (version 2.0.0.1 at first, now in > 2.2.0 but the problem persists) per resource. Resources are spread over the > 2 nodes, 3 active TB per node overall. DRBD replication link is a dual 10GbE > link bonded in LACP (mode 4). Volumes are hardware RAID5 made up of 9 15krpm > 146 or 300GB SAS drives (I mean, disk IO perf doesn't seem to be in cause) > > Basically, the issue is : Cluster starts resources, the 4 DRBD primaries go > up, then the 4 pairs of virtual IPs, then the SCST services and things run > fine. Until you try to migrate resources back and forth. When you do that, > it works once, twice, sometimes even 3 times, but then you can see DRBD > promoted correctly, then the IPs wake up, but the SCST resource remains > stuck down, running into timeout after the configured 60s. At that moment, > everything fails back to its former place, as it should. If you try again, > same story. In the end, you obtain a cluster but the resource is stuck on a > node, unable to failover either manually or, more embarrassing, following a > node crash (Which we inevitably faced recently, thanks Mr Murphy.). > > After digging the various logs, what I see is : > > - DRBD does its job 100% correctly > > - Pacemaker seems to do its job, with the resource it has, in the > state they are in. (I mean, the errors it mentions look normal errors in the > global failing context) > > - SCST starts its job, but hangs on the device handling section > (BTW, my RA agent uses the sysfs interface and is based on Patrick Zwahlen's > implementation that I customized a bit, mostly to add more friendly tracing, > and also to invert the order of activation : iSCSI target first, then the > backend device, instead of the reverse, although I now doubt it has a real > impact). Basically, all the iSCSI target setup stuff runs fine, but then : > > o Either it hangs on backend device creation > > o Or it hangs an LUN 0 assignment > >> From that point on, it hangs until the configured start timeout, and then > everybody goes back home, however. The backend device that refused to get > created correctly has been created and remains, eventually the target > directory and even sometimes the LUN0 directory in it too. From that point > on, it turns into a good mess, in fact problems start here. After that, any > migration try is doomed to failure! If I reboot the node, it will accept a > couple of migrations again, and then fail again in the same manner. |