Re: [Scst-devel] Sharing persistent reservation data between nodes running SCST (High Availability)

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Wed, 04/23/2014 10:50 AM, Richard Sharpe &lt;rea...@gm...> wrote:
> On Wed, Apr 23, 2014 at 7:29 AM, Errol Neal <en...@bu...> wrote:
> > I think this topic has been brought up but before, but I'm wondering if there has been any thought
> > or work or if there is even the desire to enhance SCST in this regard.OR is this a premium feature that
> > Fusion-IO has developed for a soon-to-be-released commercial product?
> 
> Interesting that you should ask that.
> 
> One question I have about the semantics of this is:
> 
> 1. Suppose you have an implementation of distributed persistent reservations.
> 
> 2. When a PERSISTENT_RESERVE_IN arrives with an action that will
> affect IOs (perhaps even the registration of a key does that), we
> clearly have to hold all IOs that arrive after that point on any node
> until the new persistent reservation state has been communicated to
> all nodes in the, ahhh, cluster.
> 
> 3. However, what about requests being processed on other nodes at the
> time? Do we have to ensure they complete before the new state is
> propagated? While writes are usually written to NVRAM and responded to
> quickly, as long as there is space in NVRAM, what about READs that had
> to go all the way to storage and could not be served out of cache?
> 
> 

Richard,

I would think that a certain period of time would be given for all IOs to be completed if any actions that would affect IOs in flight, but I see your point. A lot of factors are involved..
That said, for my application,  I can't really see this being a problem unless I'm missing something (which is very possible as Errol Neal <<--- Novice)

1. Maximum of two SCST nodes 
2. For a given LUN or service, only one node may have its target ports set to a non-standby state for that LUN or service

In my configuration, write and reads will only hit a single target port. Since the initiator doesn't know that device server is actually two distinct systems, I'm concerned about this section 5.8.2.4.4 of the SPC-3 rev. 23 

       "When being accessed through a target port in the standby target
        port asymmetric access state, the device server shall support
        those of the following commands that it supports while in the
        active/optimized target port asymmetric access state:
        ...
        l) PERSISTENT RESERVE IN;
        m) PERSISTENT RESERVE OUT;"

For example, take MS clustering. In the most basic of configuration (e.g. no MPIO). I'd have two logical paths over which one (I'm assuming the preferred path) the reservation would occur. If a fault occurs on the device server serving the active/preferred target, magic happens internally, the initiator probably looses the path to the faulty device server and the remaining node's target port gets upgraded to nonoptimized or something like that. 
Does a reservation error occur on the MS clustering side?

Obviously I should be testing this, but I was still just wondering what is anyone thinking about these same [potential] challenges and possible solutions?