[ww-users] cleaning out old directories
Status: Beta
Brought to you by:
worden
From: Lee W. <wor...@gm...> - 2011-03-29 03:24:17
|
I have some open questions about WorkingWiki's features, where I would appreciate your perspective as people who use WW, or even run your own WW sites. Please feel free to ignore this if you're busy or not interested. For quite a while, we've had a need to clean out old data that piles up and isn't in use - not in the wiki pages themselves, but in the working directories that we use behind the scenes to compute the output of latex, R, etc. Most obvious is the preview sessions - any time you preview a page that includes WW data while editing, it makes a copy of the data in the back end to keep it separate from the unedited page's data. When you save, it gets rid of the copied files by merging with the saved ones, but if you abandon the changes without saving (a completely reasonable thing to do) the copy is left sitting there, and needs to be cleaned out sometime later. These can be quite large - we've seen project directories that take up 4GB or even more. That cleanout has been on my to do list, and now it's getting done. More controversially, there are also old projects that eventually need to disappear. For instance, if someone creates a project and then change its name, the old working directory just sits there abandoned. Or if a page once had some WW files on it and now it doesn't, the project directory is abandoned. But also if I use the inline latex features to add something like $$\alpha + \beta$$ to a page, then change it to something else, a project is created to process that latex code, and it needs to not be kept forever. So at some point project directories need to be cleared away or the disk will eventually fill up with files that no one wants. I'll probably do this by erasing things that haven't been touched in over 3 months or something. Generally, this should be harmless even if I erase files that someone is using, because they can be remade from the source files - it will just mean waiting a minute or two (maybe more...) for them to be made. Unfortunately, in the worst case it could mean erasing a directory full of output files that can't be easily recreated. I could implement a "Do Not Erase" feature to mark particular projects that are sensitive and should never be erased. My first question: Is there a better way to protect project files that should be permanent? Does anyone have strong feelings about all this? Finally, I think background jobs should generally be left alone for as long as it takes for people to decide whether to erase them. But there is a slight danger: suppose I create a project and run a background job, then erase or rename the project. The background job becomes orphaned, and it won't show up in any listings. So I should probably do something to erase things like that. One way to address both of these things is to check whether each project is actually connected to current pages in one of the wikis - that would clear up whether it's orphaned or not. But the directory cleaning is done in a separate back-end component ("ProjectEngine") when requested by the front end ("WorkingWiki"), and I'm trying to avoid two-way communication where ProjectEngine has to ask questions of WorkingWiki while it's completing a request, so I'm looking for an alternative... Lee |