[quattor-discuss] Rock solid test deployments and back-outs

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

(I sent this email on Wednesday but it has been awaiting moderator approval ever since as it just slipped over 40kb in size, resending now in plain text in the hopes that will be smaller ...)

Hi,

I'd like to discuss how we get Aquilon and Quattor to a point where we have rock solid dry-runs and back-outs.  By which I mean:

   * Ability to do a test deploy of a sandbox, where your changes are dispatched to all hosts in the target domain but not as a new live profile, only as a test profile which is used to communicate back in a precise, succinct and consistent fashion the complete list of on-host changes that would have been made and the commands that would have been executed in order to make your changes.
   * A consistent machine executable log to be kept on-host of all changes made in a single deploy event, including a machine executable list of all commands required for successful rollback, that can also be displayed in a human friendly format.
   * Ability to undo the deployment of a sandbox, which would include launching all of the rollback commands recorded in the aforementioned log.

I think the individual steps required to achieve such a lofty goal are numerous but not overly complex and definitely within our technical abilities.

Here are some of my thoughts on how we might achieve this, working from the bottom-up:

NCM components today are a mix of arbitrary Perl and CAF objects, like CAF::Process and CAF::FileWriter (any others?) mainly used for actual execution of an individual change step.  We could invent two new objects: CAF::Evaluate in which we can enclose arbitrary code that makes a change but that cannot be expressed by any other CAF routine, and CAF::Executor which is used to wrap a collection of these other CAF objects.  CAF::Executor also takes a human friendly description of the change being made, as well as the steps required to undo the change.  CAF::Process will additionally need to capture the command needed to undo the change in order to inform CAF::Executor of the same.  CAF::FileWriter can automatically ensure that CAF::Executor has an undo capability by backing up the file before it makes any changes.  If there is no way to undo a change, there should be a way to flag this up the framework.

Here is an example visual representation of the objects in play for a component that wants to change a file and then send a HUP signal to a process:

CAF::Executor
-	Change #1 => {
-	      execute => {
-	            object => CAF::FileWriter(<config_file>, <backup_file>)
-	            desc   => "Add <new_item> to <config_file>"
-	      }
-	      undo => {
-	            object => same CAF::FileWriter (restore <backup_file>)
-	            desc   => "Restore <backup_file> to <config_file>"
-	      }
-	}
-	Change #2 => {
-	      execute => {
-	            object => CAF::Process(["pkill", "-HUP", "<process>"])
-	            desc   => "Send HUP signal to <process>"
-	      }
-	      undo => {
-	            # same as execute
-	      }
-	
-	}
-	Change #3 => {
-	      execute => {
-	            object => CAF::Evaluate(sub {<arbitrary code>})
-	            desc   => "Some custom description"
-	      }
-	      undo => {
-	            object => CAF::Evaluate(sub {<arbitrary code>})
-	            desc   => "Some custom reverse description"
-	            reverse => 1   # When undoing, changes marked with this flag are executed
-	                           # in reverse order
-	      }
-	}

When run in NoAction mode, the CAF::Executor object can be used to provide a much more consistent description of all tasks that the component would have undertaken.  When run in live mode, the CAF::Executor object executes the changes in order, recording to a file on disk the exact steps taken and the exact steps required to undo the change.  A new tool can be provided for interrogating the contents of this file.

Then we come to the problem of how to handle a test deployment from the perspective of an individual host.  The ncm-cdispd process needs to accept a new type of profile: a test profile.  It stores a test profile in a different location, this will never become a live profile, and it can be deleted once the process has finished.  On receipt of a test profile, ncm-cdispd launches ncm-ncd -configure -noaction -all.  The data created by the new CAF::Executor objects can then be pushed back to the broker.  I don't believe we have any kind of push back mechanism today, so we would have to think carefully about this aspect.  Ultimately the broker (or some other server we designate) needs to receive data from all affected hosts stating what would have changed and in what order.  This is a lot of data, and could make a server pretty busy receiving it ...

The ncm-cdispd process will also send back the same data on a live run, so there is a central record of what was changed by a live deployment and how to undo that.

The ncm-cdispd process must also be able to receive a request from the broker to undo changes up to a particular date, or to undo a number of changes.  The data written by CAF::Executor is used to undo those changes, and the host profile that existed prior to those changes then becomes live.  The component code does not decide on the undo steps, it is the content of the data written when the change was made.  This means that you can safely ship a new version of a component without breaking the ability to back it out.  If a component needs to fix undo logic that was previously incorrect, we could provide an API for it to do this in the installer when the component is updated.

I think in principal this is how it could work.  There's some tidying up of details to be done, and a bit of polish to be added, but what do people think?  Has anyone else put work into something like this before?  Do others see the need for this too?  Is my suggestion reasonably sane or totally insane?

Thanks,
Mark.

--------------------------------------------------------------------------------

NOTICE: Morgan Stanley is not acting as a municipal advisor and the opinions or views contained herein are not intended to be, and do not constitute, advice within the meaning of Section 975 of the Dodd-Frank Wall Street Reform and Consumer Protection Act. If you have received this communication in error, please destroy all electronic and paper copies; do not disclose, use or act upon the information; and notify the sender immediately. Mistransmission is not intended to waive confidentiality or privilege. Morgan Stanley reserves the right, to the extent permitted under applicable law, to monitor electronic communications. This message is subject to terms available at the following link: http://www.morganstanley.com/disclaimers. If you cannot access these links, please notify us by reply message and we will send the contents to you. By messaging with Morgan Stanley you consent to the foregoing.