|
From: Pascal B. <pas...@fr...> - 2012-07-18 01:48:27
|
Vlad, Here are my logs from both source and destination node following a manual failover attempt launched at 03h04:46. Hope it will tell you something... In the meantime, I've investigated a couple of things : - I did reduce to 16 the QueuedCommands parameter on my targets (Not possible from my vSphere4.1 initiators, requires v5 so I did it from my RA agent) and I feel its better. I see more Abort stuffs if I reset it back to 32. But the "FLAG SUSPENDED set, skipping" persists. Even if I lower it even more down to 8. - I noticed that in Patricks sample implementation, he provides block device files (i.e. /dev/drbd0) to the vdisk_fileio handler. And after checking that, I found out that I had done the same thing. Today I have "reshaped" 3 of my 4 targets, so I now have 3 regular files as backend devices (XFS filesystem) but still 1 /dev DRBD backend. Must I consider using device files in a fileio context is a mistake ? Patricks samples are public, and some way I thought that if it had turned to be a problem, somebody would have complained ? Don't know... Anyhow, using regular files can do no harm, right? Well, all in all none of these 2 changes did make things better, failover is still manual AND painfull! Apart from that, I start having doubts on my RA agent especially after your reply, it starts smelling timing issue. Is there one I could trust fairly blindly out there ? I've read somebody saying one was included in the SCST bits, but I must have searched in the wrong place, obviously nothing such in my scst or iscsi-scst 2.2.0 directories... Ah, I also had the surprise to discover that the target's eui is an ASCII equivalent of the backend device file, limited to 8 chars (quite logical for an eui)... This gave me some fun when changing my block /dev/drbdX by /drbd0/vol_drbd0, /drbd1/vol_drbd1. I scratched my head a little while... Oh well... :) Thanks for your help! Regards, Pascal. -----Message d'origine----- De : Vladislav Bolkhovitin [mailto:vs...@vl...] Envoyé : mardi 17 juillet 2012 23:08 À : Pascal BERTON Cc : scs...@li... Objet : Re: [Scst-devel] SCST backend device activation problems : scst_translate_lun:FLAG SUSPENDED set, skipping Hi, It seems your config has some weird circular dependency, like: backend devices don't start working never completing received requests until SCST config done and SCST config can't finish waiting for the devices to complete. Logs from the beginning of failover can shed some light on this. Vlad Pascal BERTON, on 07/14/2012 08:26 AM wrote: > Hi all ! > > I'm currently facing weird problems with SCST, and after days of > various experiments and observations, trying to isolate as precisely > as possible the problem, I conclude that I now need a hand. Could > somebody help me a bit on that ? > > Basically, we're running a 2 nodes single-primary DRBD/Pacemaker > cluster (kernel version 2.6.32-71.7.1., based from Openfiler 2.99 > distro) hosting 4 DRBD resources each presented to 4 VMware hosts > (ESXi 4.1) using two SCST > (vdisk_fileio) and ISCSI-SCST targets (version 2.0.0.1 at first, now > in > 2.2.0 but the problem persists) per resource. Resources are spread > over the > 2 nodes, 3 active TB per node overall. DRBD replication link is a dual > 10GbE link bonded in LACP (mode 4). Volumes are hardware RAID5 made up > of 9 15krpm > 146 or 300GB SAS drives (I mean, disk IO perf doesn't seem to be in > cause) > > Basically, the issue is : Cluster starts resources, the 4 DRBD > primaries go up, then the 4 pairs of virtual IPs, then the SCST > services and things run fine. Until you try to migrate resources back > and forth. When you do that, it works once, twice, sometimes even 3 > times, but then you can see DRBD promoted correctly, then the IPs wake > up, but the SCST resource remains stuck down, running into timeout > after the configured 60s. At that moment, everything fails back to its > former place, as it should. If you try again, same story. In the end, > you obtain a cluster but the resource is stuck on a node, unable to > failover either manually or, more embarrassing, following a node crash (Which we inevitably faced recently, thanks Mr Murphy.). > > After digging the various logs, what I see is : > > - DRBD does its job 100% correctly > > - Pacemaker seems to do its job, with the resource it has, in the > state they are in. (I mean, the errors it mentions look normal errors > in the global failing context) > > - SCST starts its job, but hangs on the device handling section > (BTW, my RA agent uses the sysfs interface and is based on Patrick > Zwahlen's implementation that I customized a bit, mostly to add more > friendly tracing, and also to invert the order of activation : iSCSI > target first, then the backend device, instead of the reverse, > although I now doubt it has a real impact). Basically, all the iSCSI target setup stuff runs fine, but then : > > o Either it hangs on backend device creation > > o Or it hangs an LUN 0 assignment > >> From that point on, it hangs until the configured start timeout, and >> then > everybody goes back home, however. The backend device that refused to > get created correctly has been created and remains, eventually the > target directory and even sometimes the LUN0 directory in it too. From > that point on, it turns into a good mess, in fact problems start here. > After that, any migration try is doomed to failure! If I reboot the > node, it will accept a couple of migrations again, and then fail again in the same manner. |