Thread: [ww-users] cleaning out old directories
Status: Beta
Brought to you by:
worden
From: Lee W. <wor...@gm...> - 2011-03-29 03:24:17
|
I have some open questions about WorkingWiki's features, where I would appreciate your perspective as people who use WW, or even run your own WW sites. Please feel free to ignore this if you're busy or not interested. For quite a while, we've had a need to clean out old data that piles up and isn't in use - not in the wiki pages themselves, but in the working directories that we use behind the scenes to compute the output of latex, R, etc. Most obvious is the preview sessions - any time you preview a page that includes WW data while editing, it makes a copy of the data in the back end to keep it separate from the unedited page's data. When you save, it gets rid of the copied files by merging with the saved ones, but if you abandon the changes without saving (a completely reasonable thing to do) the copy is left sitting there, and needs to be cleaned out sometime later. These can be quite large - we've seen project directories that take up 4GB or even more. That cleanout has been on my to do list, and now it's getting done. More controversially, there are also old projects that eventually need to disappear. For instance, if someone creates a project and then change its name, the old working directory just sits there abandoned. Or if a page once had some WW files on it and now it doesn't, the project directory is abandoned. But also if I use the inline latex features to add something like $$\alpha + \beta$$ to a page, then change it to something else, a project is created to process that latex code, and it needs to not be kept forever. So at some point project directories need to be cleared away or the disk will eventually fill up with files that no one wants. I'll probably do this by erasing things that haven't been touched in over 3 months or something. Generally, this should be harmless even if I erase files that someone is using, because they can be remade from the source files - it will just mean waiting a minute or two (maybe more...) for them to be made. Unfortunately, in the worst case it could mean erasing a directory full of output files that can't be easily recreated. I could implement a "Do Not Erase" feature to mark particular projects that are sensitive and should never be erased. My first question: Is there a better way to protect project files that should be permanent? Does anyone have strong feelings about all this? Finally, I think background jobs should generally be left alone for as long as it takes for people to decide whether to erase them. But there is a slight danger: suppose I create a project and run a background job, then erase or rename the project. The background job becomes orphaned, and it won't show up in any listings. So I should probably do something to erase things like that. One way to address both of these things is to check whether each project is actually connected to current pages in one of the wikis - that would clear up whether it's orphaned or not. But the directory cleaning is done in a separate back-end component ("ProjectEngine") when requested by the front end ("WorkingWiki"), and I'm trying to avoid two-way communication where ProjectEngine has to ask questions of WorkingWiki while it's completing a request, so I'm looking for an alternative... Lee |
From: Tallulah A. <tal...@gm...> - 2011-03-29 14:12:09
|
Perhaps having button to delete old/moved projects that would mark them for clean out. This could potentially be combined into a delete wiki page function particularly for when pages are moved or renamed that would erase the page and clear out all background jobs and projects associated with the page. I think a Do-Not-Erase is sufficient for preserving slow projects. Tallulah On Mon, Mar 28, 2011 at 11:24 PM, Lee Worden <wor...@gm...> wrote: > I have some open questions about WorkingWiki's features, where I would > appreciate your perspective as people who use WW, or even run your own > WW sites. Please feel free to ignore this if you're busy or not > interested. > > For quite a while, we've had a need to clean out old data that piles up > and isn't in use - not in the wiki pages themselves, but in the working > directories that we use behind the scenes to compute the output of > latex, R, etc. > > Most obvious is the preview sessions - any time you preview a page that > includes WW data while editing, it makes a copy of the data in the back > end to keep it separate from the unedited page's data. When you save, > it gets rid of the copied files by merging with the saved ones, but if > you abandon the changes without saving (a completely reasonable thing to > do) the copy is left sitting there, and needs to be cleaned out sometime > later. These can be quite large - we've seen project directories that > take up 4GB or even more. That cleanout has been on my to do list, and > now it's getting done. > > More controversially, there are also old projects that eventually need > to disappear. For instance, if someone creates a project and then > change its name, the old working directory just sits there abandoned. > Or if a page once had some WW files on it and now it doesn't, the > project directory is abandoned. But also if I use the inline latex > features to add something like $$\alpha + \beta$$ to a page, then change > it to something else, a project is created to process that latex code, > and it needs to not be kept forever. So at some point project > directories need to be cleared away or the disk will eventually fill up > with files that no one wants. I'll probably do this by erasing things > that haven't been touched in over 3 months or something. > > Generally, this should be harmless even if I erase files that someone is > using, because they can be remade from the source files - it will just > mean waiting a minute or two (maybe more...) for them to be made. > Unfortunately, in the worst case it could mean erasing a directory full > of output files that can't be easily recreated. I could implement a "Do > Not Erase" feature to mark particular projects that are sensitive and > should never be erased. > > My first question: Is there a better way to protect project files that > should be permanent? Does anyone have strong feelings about all this? > > Finally, I think background jobs should generally be left alone for as > long as it takes for people to decide whether to erase them. But there > is a slight danger: suppose I create a project and run a background job, > then erase or rename the project. The background job becomes orphaned, > and it won't show up in any listings. So I should probably do something > to erase things like that. > > One way to address both of these things is to check whether each project > is actually connected to current pages in one of the wikis - that would > clear up whether it's orphaned or not. But the directory cleaning is > done in a separate back-end component ("ProjectEngine") when requested > by the front end ("WorkingWiki"), and I'm trying to avoid two-way > communication where ProjectEngine has to ask questions of WorkingWiki > while it's completing a request, so I'm looking for an alternative... > > Lee > > > ------------------------------------------------------------------------------ > Enable your software for Intel(R) Active Management Technology to meet the > growing manageability and security demands of your customers. Businesses > are taking advantage of Intel(R) vPro (TM) technology - will your software > be a part of the solution? Download the Intel(R) Manageability Checker > today! http://p.sf.net/sfu/intel-dev2devmar > _______________________________________________ > workingwiki-users mailing list > wor...@li... > https://lists.sourceforge.net/lists/listinfo/workingwiki-users > |
From: Peter L. R. <pl...@st...> - 2011-03-29 17:42:34
|
I always feel nervous when there are large, costly files that WW might still decide to go ahead and re-make without my intending to. Perhaps what we need is some easy mechanism to change the status of such files -- change them to 'source files' for instance. This would be similar to a "DNE" flag, but would also flag "do not re-make"; so re-making would require first manual deletion. Then any files not flagged as such could be deleted without problem. --peter On Mon, Mar 28, 2011 at 08:24:01PM -0700, Lee Worden wrote: > I have some open questions about WorkingWiki's features, where I would > appreciate your perspective as people who use WW, or even run your own > WW sites. Please feel free to ignore this if you're busy or not interested. > > For quite a while, we've had a need to clean out old data that piles up > and isn't in use - not in the wiki pages themselves, but in the working > directories that we use behind the scenes to compute the output of > latex, R, etc. > > Most obvious is the preview sessions - any time you preview a page that > includes WW data while editing, it makes a copy of the data in the back > end to keep it separate from the unedited page's data. When you save, > it gets rid of the copied files by merging with the saved ones, but if > you abandon the changes without saving (a completely reasonable thing to > do) the copy is left sitting there, and needs to be cleaned out sometime > later. These can be quite large - we've seen project directories that > take up 4GB or even more. That cleanout has been on my to do list, and > now it's getting done. > > More controversially, there are also old projects that eventually need > to disappear. For instance, if someone creates a project and then > change its name, the old working directory just sits there abandoned. > Or if a page once had some WW files on it and now it doesn't, the > project directory is abandoned. But also if I use the inline latex > features to add something like $$\alpha + \beta$$ to a page, then change > it to something else, a project is created to process that latex code, > and it needs to not be kept forever. So at some point project > directories need to be cleared away or the disk will eventually fill up > with files that no one wants. I'll probably do this by erasing things > that haven't been touched in over 3 months or something. > > Generally, this should be harmless even if I erase files that someone is > using, because they can be remade from the source files - it will just > mean waiting a minute or two (maybe more...) for them to be made. > Unfortunately, in the worst case it could mean erasing a directory full > of output files that can't be easily recreated. I could implement a "Do > Not Erase" feature to mark particular projects that are sensitive and > should never be erased. > > My first question: Is there a better way to protect project files that > should be permanent? Does anyone have strong feelings about all this? > > Finally, I think background jobs should generally be left alone for as > long as it takes for people to decide whether to erase them. But there > is a slight danger: suppose I create a project and run a background job, > then erase or rename the project. The background job becomes orphaned, > and it won't show up in any listings. So I should probably do something > to erase things like that. > > One way to address both of these things is to check whether each project > is actually connected to current pages in one of the wikis - that would > clear up whether it's orphaned or not. But the directory cleaning is > done in a separate back-end component ("ProjectEngine") when requested > by the front end ("WorkingWiki"), and I'm trying to avoid two-way > communication where ProjectEngine has to ask questions of WorkingWiki > while it's completing a request, so I'm looking for an alternative... > > Lee > > ------------------------------------------------------------------------------ > Enable your software for Intel(R) Active Management Technology to meet the > growing manageability and security demands of your customers. Businesses > are taking advantage of Intel(R) vPro (TM) technology - will your software > be a part of the solution? Download the Intel(R) Manageability Checker > today! http://p.sf.net/sfu/intel-dev2devmar > _______________________________________________ > workingwiki-users mailing list > wor...@li... > https://lists.sourceforge.net/lists/listinfo/workingwiki-users |
From: Lee W. <wor...@gm...> - 2011-04-01 23:04:45
|
Hi Peter! So the recommended treatment for big costly files is to isolate their make rules to be run only in background jobs: http://lalashan.mcmaster.ca/theobio/projects/index.php/Background_Jobs#Background_jobs_design_pattern If you do that, they won't be remade unless you do it explicitly by creating a background job. Or you can just make sure to use <project-file ... make=false/> and write your make rules so that nothing that makes automatically has a dependency on any of your costly targets. If we did have a "do not remake" flag I'd hesitate to assume that anything not tagged could be erased, because it's common to put off housekeeping tasks like making sure all the files are tagged when you add new ones. A way to change things to source files might be useful - or something like Jonathan has suggested (if I remember right), where project files that are being archived in the wiki get written back out to the working directory, so they're preserved even if the whole directory gets removed. But those kind of ideas are confusing and troubling to me and I don't know what to do with them. Thanks- lw On 03/29/11 10:24, Peter L. Ralph wrote: > I always feel nervous when there are large, costly files that WW might > still decide to go ahead and re-make without my intending to. Perhaps > what we need is some easy mechanism to change the status of such files > -- change them to 'source files' for instance. This would be similar to > a "DNE" flag, but would also flag "do not re-make"; so re-making would > require first manual deletion. Then any files not flagged as such could > be deleted without problem. > > --peter > > On Mon, Mar 28, 2011 at 08:24:01PM -0700, Lee Worden wrote: >> I have some open questions about WorkingWiki's features, where I would >> appreciate your perspective as people who use WW, or even run your own >> WW sites. Please feel free to ignore this if you're busy or not interested. >> >> For quite a while, we've had a need to clean out old data that piles up >> and isn't in use - not in the wiki pages themselves, but in the working >> directories that we use behind the scenes to compute the output of >> latex, R, etc. >> >> Most obvious is the preview sessions - any time you preview a page that >> includes WW data while editing, it makes a copy of the data in the back >> end to keep it separate from the unedited page's data. When you save, >> it gets rid of the copied files by merging with the saved ones, but if >> you abandon the changes without saving (a completely reasonable thing to >> do) the copy is left sitting there, and needs to be cleaned out sometime >> later. These can be quite large - we've seen project directories that >> take up 4GB or even more. That cleanout has been on my to do list, and >> now it's getting done. >> >> More controversially, there are also old projects that eventually need >> to disappear. For instance, if someone creates a project and then >> change its name, the old working directory just sits there abandoned. >> Or if a page once had some WW files on it and now it doesn't, the >> project directory is abandoned. But also if I use the inline latex >> features to add something like $$\alpha + \beta$$ to a page, then change >> it to something else, a project is created to process that latex code, >> and it needs to not be kept forever. So at some point project >> directories need to be cleared away or the disk will eventually fill up >> with files that no one wants. I'll probably do this by erasing things >> that haven't been touched in over 3 months or something. >> >> Generally, this should be harmless even if I erase files that someone is >> using, because they can be remade from the source files - it will just >> mean waiting a minute or two (maybe more...) for them to be made. >> Unfortunately, in the worst case it could mean erasing a directory full >> of output files that can't be easily recreated. I could implement a "Do >> Not Erase" feature to mark particular projects that are sensitive and >> should never be erased. >> >> My first question: Is there a better way to protect project files that >> should be permanent? Does anyone have strong feelings about all this? >> >> Finally, I think background jobs should generally be left alone for as >> long as it takes for people to decide whether to erase them. But there >> is a slight danger: suppose I create a project and run a background job, >> then erase or rename the project. The background job becomes orphaned, >> and it won't show up in any listings. So I should probably do something >> to erase things like that. >> >> One way to address both of these things is to check whether each project >> is actually connected to current pages in one of the wikis - that would >> clear up whether it's orphaned or not. But the directory cleaning is >> done in a separate back-end component ("ProjectEngine") when requested >> by the front end ("WorkingWiki"), and I'm trying to avoid two-way >> communication where ProjectEngine has to ask questions of WorkingWiki >> while it's completing a request, so I'm looking for an alternative... >> >> Lee >> >> ------------------------------------------------------------------------------ >> Enable your software for Intel(R) Active Management Technology to meet the >> growing manageability and security demands of your customers. Businesses >> are taking advantage of Intel(R) vPro (TM) technology - will your software >> be a part of the solution? Download the Intel(R) Manageability Checker >> today! http://p.sf.net/sfu/intel-dev2devmar >> _______________________________________________ >> workingwiki-users mailing list >> wor...@li... >> https://lists.sourceforge.net/lists/listinfo/workingwiki-users |
From: Jonathan D. <du...@mc...> - 2011-04-05 03:56:25
|
On Fri, Apr 1, 2011 at 7:04 PM, Lee Worden <wor...@gm...> wrote: > Hi Peter! > So the recommended treatment for big costly files is to isolate their > make rules to be run only in background jobs: > http://lalashan.mcmaster.ca/theobio/projects/index.php/Background_Jobs#Background_jobs_design_pattern > If you do that, they won't be remade unless you do it explicitly by > creating a background job. Another recommended move is to save them on the wiki as archived project files. > Or you can just make sure to use <project-file ... make=false/> and > write your make rules so that nothing that makes automatically has a > dependency on any of your costly targets. This is tricky, because you typically want to make things from your costly targets, and you would like your make rules to reflect that accurately. > If we did have a "do not remake" flag I'd hesitate to assume that > anything not tagged could be erased, because it's common to put off > housekeeping tasks like making sure all the files are tagged when you > add new ones. > A way to change things to source files might be useful - or something > like Jonathan has suggested (if I remember right), where project files > that are being archived in the wiki get written back out to the working > directory, so they're preserved even if the whole directory gets > removed. But those kind of ideas are confusing and troubling to me and > I don't know what to do with them. Here's something I've always wanted to try. You could have a valuable target be an archived project file living in the Media: space. Then another project could use that same Media: page as a source file. Seems like it should work well. I was thinking it should be done across projects, but it seems like it would work to allow you to control the flow within a project as well. JD |
From: Lee W. <wor...@gm...> - 2011-04-05 04:40:52
|
On 04/04/11 20:56, Jonathan Dushoff wrote: > On Fri, Apr 1, 2011 at 7:04 PM, Lee Worden<wor...@gm...> wrote: >> Hi Peter! > >> So the recommended treatment for big costly files is to isolate their >> make rules to be run only in background jobs: >> http://lalashan.mcmaster.ca/theobio/projects/index.php/Background_Jobs#Background_jobs_design_pattern > >> If you do that, they won't be remade unless you do it explicitly by >> creating a background job. > > Another recommended move is to save them on the wiki as archived project files. > >> Or you can just make sure to use<project-file ... make=false/> and >> write your make rules so that nothing that makes automatically has a >> dependency on any of your costly targets. > > This is tricky, because you typically want to make things from your > costly targets, and you would like your make rules to reflect that > accurately. True. >> If we did have a "do not remake" flag I'd hesitate to assume that >> anything not tagged could be erased, because it's common to put off >> housekeeping tasks like making sure all the files are tagged when you >> add new ones. > >> A way to change things to source files might be useful - or something >> like Jonathan has suggested (if I remember right), where project files >> that are being archived in the wiki get written back out to the working >> directory, so they're preserved even if the whole directory gets >> removed. But those kind of ideas are confusing and troubling to me and >> I don't know what to do with them. > > Here's something I've always wanted to try. You could have a valuable > target be an archived project file living in the Media: space. Then > another project could use that same Media: page as a source file. > Seems like it should work well. I was thinking it should be done > across projects, but it seems like it would work to allow you to > control the flow within a project as well. > > JD Oh, man. You could do that. I should make sure that when it archives to that page it flags it properly, because any time a source file is changed, pages need to be expired from the parser cache so things will be remade. I think it would work actually. (Though it wouldn't matter if it's in the same project, because it's coming from the working directory and syncing it back to the directory would be a null operation and wouldn't necessitate any further makes, so actually it doesn't need to be flagged in that case. But it does if it's a source file in a different project.) If you do try it I'd like to watch! :) |
From: Peter L. R. <pl...@st...> - 2011-04-05 17:23:18
|
> >> A way to change things to source files might be useful - or something > >> like Jonathan has suggested (if I remember right), where project files > >> that are being archived in the wiki get written back out to the working > >> directory, so they're preserved even if the whole directory gets > >> removed. But those kind of ideas are confusing and troubling to me and > >> I don't know what to do with them. > > > > Here's something I've always wanted to try. You could have a valuable > > target be an archived project file living in the Media: space. Then > > another project could use that same Media: page as a source file. > > Seems like it should work well. I was thinking it should be done > > across projects, but it seems like it would work to allow you to > > control the flow within a project as well. > > > > JD > > Oh, man. You could do that. I should make sure that when it archives > to that page it flags it properly, because any time a source file is > changed, pages need to be expired from the parser cache so things will > be remade. I think it would work actually. (Though it wouldn't matter > if it's in the same project, because it's coming from the working > directory and syncing it back to the directory would be a null operation > and wouldn't necessitate any further makes, so actually it doesn't need > to be flagged in that case. But it does if it's a source file in a > different project.) If you do try it I'd like to watch! :) Yes, this is just what I was thinking; perhaps we can assume that if any file is costly enough it should not be erased that it will be stored as an archived project file? --peter |