|
From: Vladislav B. <vs...@vl...> - 2016-08-25 23:43:07
|
Consus wrote on 08/25/2016 01:42 AM: > On 20:30 Tue 23 Feb, Vladislav Bolkhovitin wrote: >> Hi, >> >> Recently I reviewed SCST ALUA code and figured out that some commands, e.g. TEST UNIT >> READY, were allowed in ALUA states, namely, "standby" and "offline" states, where they >> by SPC(-4) rules are not supposed to be allowed. This is not a big deal and formally >> not a violation, because according to the SPC "The device server may support other >> commands while in the [those] state[s]", but potentially might lead to bad initiators >> confusions leading to initiators start sending data access commands in those states, >> then got deadly confused by seeing them refused (they are not allowed in those states). >> It could happen, because SUCCESS reply on TEST UNIT READY by definition of this command >> means that this device is fully ready, including READ/WRITE commands. >> >> So, if you are using SCST with "standby" or "offline" ALUA states, we would greatly >> appreciate, if you try the latest trunk and check that it continues working with all >> your initiators. >> >> Thanks, >> Vlad > > I can confirm Bart's doubts: your patch really breaks failover process > for ESXI (6.x at least). I use SCST A/P setup with ESXI and my failover > process looks like this: > > node0 (Active) > node1 (Passive) > nodex (ESXI) > > node1 Passive -> Transitioning > node0 Active -> Transitioning > node0 Transitioning -> Standby > node1 Transitioning -> Active > > This process breaks with an I/O error when nodex receives 'TP in Standby > state' sense. I tried merging node0's 'Active -> Transitioning -> > Standby' into 'Active -> Standby', no effect. Also your patch > > Delay SCSI commands in TRANSITIONING ALUA state to ease initiators load > > breaks ESXI failover for yet unknown reason. Are you using Transitioning ALUA state? Are you sure that your initiator (ESXI) supports it? For instance, Linux at the moment does not support it, namely, it does not handle ASYMMETRIC STATE TRANSITION sense and considers it as a kind of regular failure. Sounds like ESXI is similarly does not support it, so you should simply avoid this state. As you can see in the README, the recommended way to change ALUA states is to keep commands blocked until transitioning done to either Standby or Active state. If you unblock them in-between, you must preensure that your initiator will be fine with that. Thanks, Vlad |