#107 Reproducibility violation with fatal unused code

open
nobody
Evaluator (37)
5
2005-10-12
2005-10-12
Anonymous
No

Say foo() calls bar(), but foo()'s result does not depend
upon bar()'s result. There is no dependence for foo() on
the function body of bar(). So, if bar() is changed, foo()
can still cache hit. However, a fatal bug may be
introduced into bar(). Foo()'s cache hit would hide this
fatal bug. So a developer could introduce a bug, but still
build successfully. The bug would only surface once
the cache is flushed (likely after the developer released
a seemingly functional build).

Fundamentally, the issue is that the fatality of a function
is not considered to be part of its result. And the cache
does not record dependencies upon a function's fatality.

Ken has prepared a reproducer, included inline below.

It might at first appear sufficient to record dependencies
upon all function bodies that are called, regardless of
any dependency upon their results, but, as Ken pointed
out, this is not sufficient. The secondary inputs of foo()
can affect the code flow, and therefore, whether bar() is
called. A cache entry can be created for a foo() call
which does not call bar(), and has no knowledge that bar
() has any potential to be called. There is no easy way
to record the dependence on the secondary input which
affected the code flow.

One solution might be to defer the fatality of a function.
Return a "fatal" indication as the function result, and
propagate this through the evaluation. Only if the build
result depends on the failing function's return value
would the build fail. Unfortunately, this approach is
nearly impossible to implement. The evaluator's
determination of what the build depends on is
conservative, meaning it might record more
dependencies than are necessary or a courser-grained
dependency than necessary. This would leave open the
possibility that a call of foo() which does call bar()
is "conservatively" detected as being dependent on bar
(), and therefore can be fatal based on bar(), when it is
not truly dependent. In this case, a cache entry could
exist for a foo() call which does not call bar() and hits for
this falsely-dependent-and-therefore-fatal build. So, the
evaluator's dependence analysis cannot be used to
determine the fatality of a bug. (At least that's the
conclusion I got to.)

It would seem the only feasible solution is to remove
fatality from the build process altogether. This means
not just failing _run_tool() invocations, but any SDL
errors for a function as well. This would not exactly be
a backward compatible change, but it would only
change the behavior of previously-failing builds. This is
probably not a real problem. SDL could check the
return values for failure (ERR?), but special-casing
failure is not required. Even if not explicitly checked,
the error would be reported to stderr. If the SDL is not
written to be resilient to errors, this error would likely
lead to many others downstream. As with many
compilers/interpreters, only the first bug is guaranteed
to be meaningful, but it would be a potential benefit to
see multiple bugs from a single evaluation. (Or the
evaluator could hide these.) The additional stderr output
would be clutter, but existing error reports have a clutter
issue already. It would be good for the evaluator to flag
the error and disable subsequent generation of cache
entries. This would enable regeneration of the error
output. Also, a flag to make errors fatal would be fine
too -- wait, this gets us right back where we started --
yes, but now there is a non-fatal mode of operation that
is guaranteed to produce a reproducible result.

So, maybe all we need is to add a "-non-fatal" flag that
will, on previously-fatal-error, disable caching, and return
out of the enclosing function with an ERR return value.
Builds without this flag would have the reproducibility
issue, but with this flag would not. And no current
behavior would be affected. A project must consider a
build "good", independent of the build's fatality without -
non-fatal, as it is not possible to prevent these "hidden"
errors from sneaking in. Any rebuilds of successful
builds, as well as integration builds (which merge
successful builds), should use -non-fatal. General
building for development might or might not use this
flag, but code should not be released that has visible
errors. Looking at errors from things like integration
builds is still useful. Unfortunately, it would be difficult
to distinguish hidden errors from those released by
someone who did not follow proper procedures, so there
could still be some misdirected blame. And this very
complex issue is exposed to the user through the -non-
fatal flag. But it seems to be our best option.

Note: The -k option to "vesta" controls fatality of
_run_tool() calls.

Ken's reproducer:

{
/**nocache**/
good()
{
_=_print("good function called");
return 1;
};

call_but_dont_use(b)
{
// Note: the function has to be passed in a complex
data structure
// like a binding so it doesn't become part of the PK of
this
// function.
_=b/f();

// Return something which doesn't depend on the
function so we
// don't record any dependency on it.
return 1;
};

return <call_but_dont_use([f=good]), // This
populates the cache.
call_but_dont_use([f="This isn't even a function!"])>;
// This would fail, but it is instead a cache hit!!!
}

Discussion

  • Tim Mann

    Tim Mann - 2005-10-13

    Logged In: YES
    user_id=95236

    This issue is not a surprise to me, and I find it hard to
    get excited about it. Well, I admit it's a reproducibility
    issue: there's a chance that you did a build once that
    should have failed but didn't, and later when you try to
    recreate it, it does fail.

    This is bad, but as a consolation, it should be easy to fix
    the failing build -- the error should appear when evaluating
    the bad function, and since the build result does not depend
    on that function in any way, the fix is simply to delete the
    call to it. I guess you can create scenarios in which the
    call is hard to remove because of some elaborate code
    structure, and you also can't delete the function body
    because it's called elsewhere. Sigh.

    Another small consolation is that it shouldn't happen much
    since it should be fairly uncommon to call a function and
    discard its result. Shouldn't it?

    I didn't find the idea for making it possible for failures
    to return a noncacheable ERR and continue to be very
    convincing. This is actually an idea we had in the initial
    Vesta SDL design (ERR might not even exist if we'd never had
    the idea), but it didn't work out well and we discarded it.
    I suppose it's possible we just didn't understand how to
    make it work, but it still seems kind of questionable. One
    specific problem is that if you hit an early failure but
    there is still a ton more work that could be done on the
    evaluation, you'll do all that work instead of stopping and
    reporting the failure back to the user early. I think the
    latter is usually what people would prefer.

     
  • Kenneth C. Schalk

    Logged In: YES
    user_id=304837

    Steve Hoover wrote:
    > One solution might be to defer the fatality of a function.
    > Return a "fatal" indication as the function result, and
    > propagate this through the evaluation. Only if the build
    > result depends on the failing function's return value
    > would the build fail. Unfortunately, this approach is
    > nearly impossible to implement.

    I don't think this is "nearly impossible to implement",
    though it has some potential problems.

    Once upon a time, run-time errors during SDL evaluation were
    not immediately fatal. Instead, an operation with an error
    resulted in the special error value ERR. It would not be
    hard to go back to that way of doing things, as making such
    problems fatal was done by making the constructor for the
    ErrorVC class throw an exception. (See line 1250 in
    /vesta/vestasys.org/vesta/eval/72/src/Val.C.)

    I'm not sure I recall the exact motivations for this change
    (it was made more than 4 years ago, before Vesta was
    released as free software and before I became the primary
    maintainer of it), but I think they were:

    - Failing immediately rather than propagating ERR values up
    the call chain makes it easier to debug errors, because you
    can see precisely where the error occurs. (When the ERR
    value was instead generated, it might propagate up through
    several levels of the SDL call stack losing the exact point
    where the error occurred.)

    - Bridges would sometimes handle or generate ERR values,
    which added a bunch of additional complexity to them. (For
    an example of this see line 21 of
    /vesta/vestasys.org/bridges/mtex/1/build.ves.)

    - Tool failures were already immediately fatal (which also
    simplified bridges and debugging), and it seemed confusing
    to have two different policies on how failures would affect
    the continuation of an evaluation

    I don't think it would be too hard to deal with these issues
    and still allow execution to proceed past errors. Aside
    from the idea Steve (the submitter) proposed of having a
    single fatal/non-fatal switch for error handling (and we
    could use the existing "-k" flag for that), here are some
    other ideas:

    - We could have the error value record and carry with it a
    stack trace of where it cam from. This would make it
    possible to determine the cause of an error even if it makes
    its way all the way up to the final result of a build. To
    really make this work we would have to have all the
    operators and primitive functions handle ERR as an input
    value and add more information to this stored error context,
    rather than simply recording a new error context and
    discarding the old one.

    - In the "non-fatal" case we could have a failed _run_tool
    return ERR rather than its normal result.

    Of course much of this might become moot is the evaluator
    had a built-in debugger:

    https://sourceforge.net/tracker/index.php?func=detail&aid=1223244&group_id=34164&atid=410430

    This whole problem seems to me to represent a fundamental
    tension between the desired behavior while a user is making
    changes and when they wish to repeat an existing build.
    While actively working on changes, errors should be reported
    immediately and precisely and impede forward progress. When
    repeating a previous build, errors should only be an issue
    if the result actually depends upon them.

    Tim Mann wrote:
    > This is bad, but as a consolation, it should be easy to
    > fix the failing build -- the error should appear when
    > evaluating the bad function, and since the build result
    > does not depend on that function in any way, the fix is
    > simply to delete the call to it.

    While you're right that it shouldn't be too hard for someone
    knowledgeable in SDL to fix any individual instance of this,
    I don't think we can just brush this off for several
    reasons:

    1. In my our experience, the vast majority of users aren't
    very knowledgeable about SDL. This means that most users
    would be confounded by running into such a mysterious
    failure. I think it may also make it more likely for this
    sort of problem to occur in the first place: a novice SDL
    writer is more likely to write code which calls a function
    but doesn't use its result.

    2. Suppose that we did just accept that any such broken but
    previously successful builds would have to be patched up
    when a user revisited it. It seems likely to me that for
    some builds this could happen repeatedly, which could really
    waste the user's time. How would a user find a previously
    patched-up version of the build they're interested in?

    3. This could go unnoticed for a long time. The longer the
    cache lasts (and these days a catastrophic loss of the cache
    is much less common than it used to be), the more such
    broken but previously successful builds could be created.
    It could even take place at multiple sites simultaneously,
    if they all first evaluated the necessary successful build
    to cause a cache hit in the right place.

    4. Records referencing specific builds (such as issue
    tracking databases) could be left with pointers to broken
    but previously successful builds. In general it would be
    impractical to find and fix such references.

    I can imagine some paranoid users wanting to qualify builds
    by using an empty cache or after discovering such a problem
    wanting to alter the immutable SDL file which holds the
    broken function to fix up everything at once.

    I have to agree with Steve that it's worth doing something
    about this.

    Tim Mann wrote:
    > Another small consolation is that it shouldn't happen much
    > since it should be fairly uncommon to call a function and
    > discard its result. Shouldn't it?

    How would you know how often it happens? The only builds
    you can be sure don't have this problem are ones built
    against an empty cache. That doesn't happen very often in
    practice.

     
  • Tim Mann

    Tim Mann - 2005-11-16

    Logged In: YES
    user_id=95236

    Thinking about this some more, I was way too dismissive
    about it. Sorry. I'm sure we did think about this problem
    early on, and it's why ERR got into the language in the
    first place. We had some problems working the idea out
    correctly and eventually dropped it, but I'm thinking now
    that that was a mistake.

    It would make sense, I think, to expand the -k flag so that
    "fatal" errors return a noncacheable ERR instead. Maybe
    that's what we should have done in the first place, instead
    of switching from always returning ERR on these errors to
    always failing immediately.

    There were a few points that we (or at least I) found
    confusing when we had the feature of errors returning ERR
    and continuing.

    One basic one is that we made ERR a constant in the language
    that you could write explicitly. This created a lot of
    confusion between ERR caused by an error and ERR written by
    a programmer intentionally as an out of band value like NULL.

    It was especially confusing since you could write things
    like "foo == ERR". If foo is an ERR value propagated from
    an error, should foo == ERR return TRUE or continue to
    propagate the ERR? Similarly, should _isbool(foo) return
    FALSE or ERR?

    It would be nice to get rid of the ERR constant to avoid
    this confusion, but it's been in the language for years and
    so there could be lots of code that uses it. One idea is to
    say that the literal ERR is simply a way to intentionally
    generate a fatal error. If you evaluate an ERR in the "stop
    on errors" mode (no -k flag), the evaluation would stop
    right there. This is an attractive notion but may break
    existing code. (I wonder if we ever had a version of the
    system that did this? I'm not sure.) Anyway, in the
    following I'm going to ignore non fatal ERR written by the
    programmer; I'll use ERR only to mean a real error that
    would be fatal in the current evaluator.

    In general, specifying when an ERR input to a function
    should propagate and when it should be discarded is a tricky
    issue. In order to fix the problem you're concerned with, I
    think ERR must be discarded whenever the result would have
    no dependency on the ERR and kept whenever it would. It
    seems like this makes the definition
    implementation-dependent -- it depends on how fine-grained
    and smart the dependency analysis is. I'm not sure, though;
    maybe there's a clear way to specify it. Maybe what we had
    in the language spec before we made errors always fatal
    actually was right, or close, despite being confusingly worded.

    Hmm, I see that wording is still in SRC-1997-005c. I guess
    we didn't take it out until the book draft became the
    maintained version.

    Note that it has to be possible to carry a fatal ERR around
    in a binding, since you can have an SDL program that
    evaluates several things, puts them into a binding, then
    selects one of them and ignores the rest, so that the result
    has no dependency on those. Is the same true for lists?

    I think when an ERR is discarded the result can be made
    cacheable again. In fact, you need this so that evauations
    of the sort you're concerned about will get cached when you
    reproduce them. Knowing when ERR is fully gone is not
    entirely trivial, though, since an ERR can be nested deep
    inside a binding. A function has to have a result with no
    ERRs in it to be cacheable.

    Note that we don't have to worry separately about possibly
    caching a value with a dependency on ERR, if the principle I
    gave above is correct -- any value that depends on an ERR
    has to contain an ERR.

    Just some thoughts.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks