Vesta Configuration Management System / Feature Requests / #119 Shipping alternative to -stop-after-tool-error

#119 Shipping alternative to -stop-after-tool-error

Status: open

Owner: nobody

Labels: Evaluator (39)

Priority: 5

Updated: 2005-12-15

Created: 2005-12-15

Creator: Kenneth C. Schalk

Private: No

Some people would prefer to have the state of a failed
tool run made available outside the volatile tree for
post-mortem debugging. In other words, rather than
having to examine the state of a failed tool execution
directly in the volatile tree, some users would prefer
to have an option similar to shipping which would allow
them to get a copy of the state of the tool that failed.

The desire for this has to do with:

1. Some tools which write log files with additional
information which may be needed to understand what went
wrong

2. Cases which require examining intermediate files
(the results of previous tools which are inputs to the
failing tool) in order to understand what went wrong

3. Migrating a non-Vesta build flow into Vesta and
giving users an equivalent level of visibility into the
state of the build at the point of a failure

There seem to be several things a user might want:

- The state of the tool when it started (i.e. ./root
given to _run_tool, or what would be found in the
volatile directory with -stop-before-tool).

- The state when the tool exits (i.e. exactly what
would be found in the volatile directory with
-stop-after-tool).

- Just the modifications made by the tool (i.e. the
"root" sub-value within the result of _run_tool). The
tricky part of this one is deletions, which are
represented by the value FALSE in the result of
_run_tool. I think the best way to handle this is to
write a file with a listing of the files deleted by the
tool

- The command line and environment variables used for
the tool. These should probably also be written to
separate files.

Discussion

Irina - 2005-12-28

Logged In: YES
user_id=1144459

Those changes are implemented in
/vesta/vestasys.org/vesta/eval/73.ShipFailedToolState

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Kenneth C. Schalk - 2006-02-16

Logged In: YES
user_id=304837

After a few iterations and some testing, we've come to the
conclusion that this feature is more complicated than we
first expected.

A major issue currently is that it's shipping the files as
vadmin rather than as the user that invoked the evaluator.
Normally the evaluator drops its vadmin privileges (given
by a setuid bit on the executable) before shipping, but this
happens in the middle of the build, possibly while other
threads are still working. Fixing this requires making the
evaluator no longer be setuid vadmin. (I've wanted to do
this anyway, and will open a separate tracker entry for it.)

The pre-shipping cleaning feature is potentially problematic
with this feature. In some cases you would want it, but in
others you might preserve to have it create a new directory
for each new tool failure.

Dealing with multiple threads is also a problem, as there
could be more than tool failing in parallel. The current
version in the branch creates directories with the thread
labels printed before the tool command lines, but those are
essentially random. We could use the _run_tool PK instead
(as that would stay the same from across multiple evaluator
runs), but it would be a little ugly. However that would
not necessarily be unique for an entire evaluator run, so
we would probably need to uniquify the directory names. One
possible solution would be making the choice of whether to
ship the tools failed state interactive. (This could
introduce similar problems to the bug titled "Hang with
-stop-before/after-tool", so we want to fix that one first.)

Shipping the entire tool state (including the entire
filesystem before/after) may be prohibitive, which is why
there's a "-ship-failed-tool-state-from" option in the
branch. However, in some cases the user isn't so much
interested in extracting a single piece from the tool as
suppressing the shipping of some large and uninteresting
pieces. With normal shipping the assumption is that the
result will be in the cache so the user can just re-run the
evaluator to ship more sub-pieces of the result, but with
the state of a failed tool that's not the case. This
presents a thorny user-interface issue.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Shipping alternative to -stop-after-tool-error

Group

Searches

Help

#119 Shipping alternative to -stop-after-tool-error

Discussion