[ww-users] cleaning out old directories
Status: Beta
Brought to you by:
worden
|
From: Lee W. <wor...@gm...> - 2011-03-29 03:24:17
|
I have some open questions about WorkingWiki's features, where I would
appreciate your perspective as people who use WW, or even run your own
WW sites. Please feel free to ignore this if you're busy or not interested.
For quite a while, we've had a need to clean out old data that piles up
and isn't in use - not in the wiki pages themselves, but in the working
directories that we use behind the scenes to compute the output of
latex, R, etc.
Most obvious is the preview sessions - any time you preview a page that
includes WW data while editing, it makes a copy of the data in the back
end to keep it separate from the unedited page's data. When you save,
it gets rid of the copied files by merging with the saved ones, but if
you abandon the changes without saving (a completely reasonable thing to
do) the copy is left sitting there, and needs to be cleaned out sometime
later. These can be quite large - we've seen project directories that
take up 4GB or even more. That cleanout has been on my to do list, and
now it's getting done.
More controversially, there are also old projects that eventually need
to disappear. For instance, if someone creates a project and then
change its name, the old working directory just sits there abandoned.
Or if a page once had some WW files on it and now it doesn't, the
project directory is abandoned. But also if I use the inline latex
features to add something like $$\alpha + \beta$$ to a page, then change
it to something else, a project is created to process that latex code,
and it needs to not be kept forever. So at some point project
directories need to be cleared away or the disk will eventually fill up
with files that no one wants. I'll probably do this by erasing things
that haven't been touched in over 3 months or something.
Generally, this should be harmless even if I erase files that someone is
using, because they can be remade from the source files - it will just
mean waiting a minute or two (maybe more...) for them to be made.
Unfortunately, in the worst case it could mean erasing a directory full
of output files that can't be easily recreated. I could implement a "Do
Not Erase" feature to mark particular projects that are sensitive and
should never be erased.
My first question: Is there a better way to protect project files that
should be permanent? Does anyone have strong feelings about all this?
Finally, I think background jobs should generally be left alone for as
long as it takes for people to decide whether to erase them. But there
is a slight danger: suppose I create a project and run a background job,
then erase or rename the project. The background job becomes orphaned,
and it won't show up in any listings. So I should probably do something
to erase things like that.
One way to address both of these things is to check whether each project
is actually connected to current pages in one of the wikis - that would
clear up whether it's orphaned or not. But the directory cleaning is
done in a separate back-end component ("ProjectEngine") when requested
by the front end ("WorkingWiki"), and I'm trying to avoid two-way
communication where ProjectEngine has to ask questions of WorkingWiki
while it's completing a request, so I'm looking for an alternative...
Lee
|