|
From: matthew p. <pat...@ya...> - 2015-05-01 12:56:03
|
I can't speak to this race condition, but why in the world would SCST be written in such a way to deliberately go into a system-wide (temporary) inoperable state over a LUN removal? Acquire a lock on the structure, flip a bit in the 'available/present LUN' mask or otherwise signal the worker thread for that LUN that you need it to shut down and clean up. Commercial arrays have hard limits like 4096 total LUNs so a 4K or even 8K mask each for PRESENT/DEFINED and another one for SUSPENDED is hardly a memory hog. Admittedly my background is running Tomcat applications but we do message processing and we have a main dispatcher thread that sends messages to threads that have registered themselves as responsible for handling a set of devices. Couldn't this signaling be accomplished with proprietary SCSI commands? There shouldn't be more than number of real cpu cores/2 (hyperthread shouldn't count) minus 1 kernel threads dedicated to be storage engines with LUN tasking defined either simply, or based on configuration directives, or even some "long term" trend observation. Each engine is signaled with it's assignment at initialization and periodically as system-wide changes occur. In the dynamic case, each update would include an ASSIGNED and ACTIVE/SUSPEND mask. This is basically what Akamai uses in their self-regulating, self-healing loadbalancers. If each Scsi Target thread (really more of a protocol handler and listener) has an inbox and outbox (say 256 entry ring buffer per assigned LUN then a Target acquires a lock on the LUN's inbox 'tail' structure, tacks on messages, and updates the Tail pointer and releases. Storage engine threads (child of Target) meanwhile processes messages from Head. ACKs and response codes get written back to the Target's outbox which are recomposed into iSCSI, SAS, or FC packets back to the Initiator. To me this would end the problem of SCST-wide stalls because of misbehaving LUNs. If the Target threads are unable to append to a LUN ring buffer then they issue the queue-full/backoff message or even RESET and FAILs as the protocol dictates. |