Bugs item #1032001, was opened at 2004-09-21 12:00
Message generated for change (Comment added) made by xorian
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=410427&aid=1032001&group_id=34164
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Evaluator
Group: None
Status: Open
Resolution: None
>Priority: 7
Private: No
Submitted By: Kenneth C. Schalk (xorian)
Assigned to: Nobody/Anonymous (nobody)
Summary: Weeding hole => missing shortids
Initial Comment:
There's a bug which can cause derived shortids to be
deleted by the weeder when they should be kept. This
affects shortids created by SDL operators (e.g. +) or
primitive functions (e.g. _sub) and derived files
created by tools where there's a time delay between the
modification of the file and the completing of the tool
(i.e. long-running tools).
The currently proposed solution is to have the
evaluator "touch" (update the modification time of)
derived shortids after creating a cache entry for them.
See earlier discussion of this bug on vesta-devel,
including more details on the proposed solution:
https://sourceforge.net/mailarchive/forum.php?thread_id=4895159&forum_id=7858
----------------------------------------------------------------------
>Comment By: Kenneth C. Schalk (xorian)
Date: 2007-12-03 18:48
Message:
Logged In: YES
user_id=304837
Originator: YES
This has unfortunately been occurring with gradually
increasing frequency at Intel. These recent incidents are
due to the cause we postulated back in 2004 of a
long-running tool that completes during weeding and modifies
some files early in its run before the weeder starts.
Scott Venier and I spent a little while going over the
details of how best to fix this late last week. Here's what
we discussed:
- For the _run_tool primitive, all we should need to do is
update the mtime of all the shortids of files
created/modified by the tool after the cache entry has been
added but before the volatile directory for the tool is
deleted.
- To ensure that the new mtime of the files is the
repository server's current time, we want to add a new call
to the repository's network interface which will take a set
of shortids and update the mtime on all of them to the
current time.
- When adding cache entries for other kinds of functions, we
need to determine whether the new cache entry will be the
first reference to any derived files (i.e. files created by
other SDL primitives like + or _sub). (This is different
from my earlier proposal of adding cache entries for each
such operation.) To accomplish this, each time the
evaluator creates a new file (with the function
"CreateDerived"), the file's shortid will be added to a set
of shortids not yet protected by any cache entry. When
adding a cache entry, we simply find the intersection of the
derived files referenced by the new cache entry and this
shortid set. After adding the cache entry, the evaluator
will use the new repository call to update the mtime of all
such files. We can then remove them from the set of
not-yet-protected shortids after adding the cache entry.
Note that finding this intersection efficiently will require
re-working the data structures we use to represent sets of
shortids (currently simple lists are used), as there could
be a large number in both the set referenced by a new cache
entry (we've seen cache entries with over 200,000 referenced
derived files) and the set of shortids not yet protected by
any cache entry (e.g. if many files are created during an
evaluation for inputs to tools but never referenced as a
result of any function).
- Note that the _run_tool primitive can reference files
created with "CreateDerived" when it captures standard
output/error in a file. So the code which handles caching
for _run_tool will need to remove those shortids from the
set of shortids not yet protected by any cache entry and
make sure that their mtime gets updated.
Unfortunately, this still leaves a small hole. If the
evaluator were to add a cache entry and then exit before
making the call to the repository to update the mtime of the
shortids which the new cache entry is the first reference to
(e.g. by a user hitting control-C at just the right moment),
weeding could still delete them incorrectly resulting in
later failures. To eliminate this possibility, the cache
daemon would need to make the call to the repository after
it does all the other work of adding the entry. However,
doing this complicates things further in two ways:
1. The evaluator would have to pass an additional argument
to the AddEntry call with the set of shortids which have not
been protected by an earlier cache entry.
2. It would introduce direct communication between the cache
daemon and the repository. (This probably isn't much of an
issue, but up until this point the cache has never directly
communicated with the repository.)
My preference would be to completely close the hole by
having the cache make the call to the repository, but it's a
larger change and we'll need to consider whether it's worth
the added complexity and time to implement to close the last
little bit of the hole.
----------------------------------------------------------------------
Comment By: Kenneth C. Schalk (xorian)
Date: 2007-09-07 10:09
Message:
Logged In: YES
user_id=304837
Originator: YES
The link in the original posting seems to have gone stale.
The e-mail thread it referenced took place in June of 2004
and had the subject "A new weeding bug". This links works
currently to find it:
https://sourceforge.net/mailarchive/message.php?msg_name=20040610050922.GA25491%40xorian.net
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=410427&aid=1032001&group_id=34164
|