Menu

#119 Shipping alternative to -stop-after-tool-error

open
nobody
Evaluator (39)
5
2005-12-15
2005-12-15
No

Some people would prefer to have the state of a failed
tool run made available outside the volatile tree for
post-mortem debugging. In other words, rather than
having to examine the state of a failed tool execution
directly in the volatile tree, some users would prefer
to have an option similar to shipping which would allow
them to get a copy of the state of the tool that failed.

The desire for this has to do with:

1. Some tools which write log files with additional
information which may be needed to understand what went
wrong

2. Cases which require examining intermediate files
(the results of previous tools which are inputs to the
failing tool) in order to understand what went wrong

3. Migrating a non-Vesta build flow into Vesta and
giving users an equivalent level of visibility into the
state of the build at the point of a failure

There seem to be several things a user might want:

- The state of the tool when it started (i.e. ./root
given to _run_tool, or what would be found in the
volatile directory with -stop-before-tool).

- The state when the tool exits (i.e. exactly what
would be found in the volatile directory with
-stop-after-tool).

- Just the modifications made by the tool (i.e. the
"root" sub-value within the result of _run_tool). The
tricky part of this one is deletions, which are
represented by the value FALSE in the result of
_run_tool. I think the best way to handle this is to
write a file with a listing of the files deleted by the
tool

- The command line and environment variables used for
the tool. These should probably also be written to
separate files.

Discussion

  • Irina

    Irina - 2005-12-28

    Logged In: YES
    user_id=1144459

    Those changes are implemented in
    /vesta/vestasys.org/vesta/eval/73.ShipFailedToolState

     
  • Kenneth C. Schalk

    Logged In: YES
    user_id=304837

    After a few iterations and some testing, we've come to the
    conclusion that this feature is more complicated than we
    first expected.

    A major issue currently is that it's shipping the files as
    vadmin rather than as the user that invoked the evaluator.
    Normally the evaluator drops its vadmin privileges (given
    by a setuid bit on the executable) before shipping, but this
    happens in the middle of the build, possibly while other
    threads are still working. Fixing this requires making the
    evaluator no longer be setuid vadmin. (I've wanted to do
    this anyway, and will open a separate tracker entry for it.)

    The pre-shipping cleaning feature is potentially problematic
    with this feature. In some cases you would want it, but in
    others you might preserve to have it create a new directory
    for each new tool failure.

    Dealing with multiple threads is also a problem, as there
    could be more than tool failing in parallel. The current
    version in the branch creates directories with the thread
    labels printed before the tool command lines, but those are
    essentially random. We could use the _run_tool PK instead
    (as that would stay the same from across multiple evaluator
    runs), but it would be a little ugly. However that would
    not necessarily be unique for an entire evaluator run, so
    we would probably need to uniquify the directory names. One
    possible solution would be making the choice of whether to
    ship the tools failed state interactive. (This could
    introduce similar problems to the bug titled "Hang with
    -stop-before/after-tool", so we want to fix that one first.)

    Shipping the entire tool state (including the entire
    filesystem before/after) may be prohibitive, which is why
    there's a "-ship-failed-tool-state-from" option in the
    branch. However, in some cases the user isn't so much
    interested in extracting a single piece from the tool as
    suppressing the shipping of some large and uninteresting
    pieces. With normal shipping the assumption is that the
    result will be in the cache so the user can just re-run the
    evaluator to ship more sub-pieces of the result, but with
    the state of a failed tool that's not the case. This
    presents a thorny user-interface issue.

     

Log in to post a comment.

MongoDB Logo MongoDB