Hi all. I am of course the original ezrsnapshots author.
I just wanted to say I haven't dissappeared entirely.
Actually I did make some needed improvements to the ezrsnapshots script. Significantly improved process control and signal catching etc, including some bug fixes. I found the tee logging output the way it was done could cause serious problems and also that if the script was not run as the process leader that process control and cleanup did not work correctly so now it tries to restart itself as process group leader if needed. There might have been problems with the locking behavior. I can't remember now.
I haven't honestly looked at the code here yet to see if it even uses any of my original framework, or just the core algorithms or what.
I'm glad actually to see someone "stealing" my work (well it was gpl right) because I have no time to work on this.
Anyway, I'd be glad to share my update and maybe you can diff it and see the change log and see what if any fixes are still relevant to the ezrsync.
I hope ezrsync goes well... now I just need to find where I can attach a file.
Last edit: Dagurasu 2012-11-19
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I started out with your code, but refactored, recoded and restyled so much that you will find some resemblance, but diff-ing won't be useful.
I did get rid of that locking and teeing, but if you would ever have some time to look at the code, I'm sure you could point out potential pitfalls. I would value your expertise, but understand if you have little/no time.
I am subscribed to that initial thread in rsnapshots where you shared your files. You could set up an independent ezrsnapshots project page, it's not much work at all, and then you can easily upload your files.
Cheers, Peter
Last edit: PePa 2012-11-19
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think your improvements look great on first glance and the code direly needed some cleanup and refactoring. Glad you liked it enough to bother!
I didn't notice removing locking. I do think locking is probably important but maybe you've dealt with the issue somehow or you just meant you redid it. I did have a locking bug in some version so maybe that's related to your removing it.
Teeing was a mess and I was being silly about fixing it. Your solution is clearly simple and right. I was probably too busy worrying about the backup logic.
I probably won't make my own page because I think you're doing a far better job than I would. I'll give it a try when I can.
Last edit: Dagurasu 2012-11-19
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Actually now I remember why I stuck to teeing/tailing.. because while your Echo is nice and simple, it doesn't log stray standard error output from external commands.
Given what it eventually took to make it work well maybe it's worth losing that, Then again I did eventually get it to work well.
A nice compromise would probably be to redirect std error to the log file and just not show it on screen. It gets the simplicity and still allows to go see the errors when something goes wrong. It's a one line addition. As you probably know it's possible redirect std error somewhere even while your Echo function sends output to the standard error device without interfering. You just need to copy (via exec) the standard filedescriptor to a new descriptor before redirecting it and use the new one in the Echo function.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I had 3 lines commented out around the tail-bit, the last one was '# exec &>>"$logfile"'. I think I can change that to 'exec 2>>"$logfile"', do you think that is correct?
I think I decided to no longer use exec when I read that its use (in general) gave people problems, I think when calling from within another script, but that doesn't really apply here anyway.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
There can be various i/o and process control issues when calling the whole program from inside a script. I investigated that in detail. Now I can't recall if exec itself made problems. So long as our program is a subshell (it's not "sourced".. bash lingo for included) I think that wasn't a problem. Still my latest version worked all that out.. so maybe I'll make a page after all at least to stick it there for posterity.
The problem with
exec 2>>$logfile is I think you are echoing to 2 in your Echo function. So first you need:
exec 4<&2
Which copies 2 to 4 before it's redirected:
then
exec 2>>"$logfile"
I'd make sure none of that is run from inside a subshell. Function is ok, so long as the function is called without back ticks or parenthasese around it.
then later you do your echos as
echo blah blah blah 1>&4
As for calling things from a script, I also made detection for that and restarted the whole program in that case as a process group leader. That was some serious trickery though especially all the while keeping output, process control, and return values going out through the master process. I think my biggest reason for that was for process control though, not because of any problems with redirections. Basically you cannot easily kill a group of processes if they aren't a process group. I did test it all called in several ways in the end.
What version of ezrsnapshots did you start from 0.3.3 I think?
Edit:
Oh and maybe you took inspiration from me in echoing to standard error, but actually I didn't really. My standard error eventually redirected out to standard out (through the tail), which is what people expect if they want to grep output or something.
I think if you copy standard output too/instead, you can probably echo to that copy and circumvent the normal problem with echoing on standard out interfering with functions. I'd test it to make sure.
Bash can be a pain. Some things about it are quite useful though and speed is not a big problem in a program that does all file handling.
Edit again:
Of course any function called in a subshell (like a pipeline) probably doesn't get standard error output even with these workarounds. That's probably enough rambling for now.
Last edit: Dagurasu 2012-11-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Kudos for figuring out all this process-stuff and redirections.
I wanted the 'verbose' messages to go to the screen, but found I couldn't use stdout (because I was using it in functions to return values), so I directed them to stderr. And of course, any rogue error output would also land on the screen. Then the 'loglevel' stuff needed to go to the logfile. So this works.
Now we're wanting to get any rogue (stderr) output to the logfile instead. So, at the very start, I do: exec 4<&2
Then when the logfile is ready, I do: exec 2>>"$logfile"
And in Echo I do echo >&4 (I can leave out the '1', right?)
(Any reason I can't use '3' instead of '4'?)
Edit:
Let me see if I understand correctly: exec 4<&2 makes all stderr go to &4. We want &4 to go to the logfile when it's ready, so exec 4>>"$logfile" (or are my semantics off?). In Echo I want to do echo >&3, but the output should go to stdout, so there must be a exec 1<&3 (or would that interfere with the functions? is it exec 3<&1 ??) I don't have a good grasp on this redirection stuff...
I started with the source of timedsyncs-0.2.1, but I incorporated some of your modifications and improvements until 0.3.3.
Last edit: PePa 2012-11-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I get why you used standard error, I did the same thing for the same reason, but it eventually came back around to standard out via the log file and tails. It's not a big deal though.
you can use 3.. I use 3 to copy 1 though. that's the only reason
wow.. starting with 0.2.1... yeah.. there have been some important and subtle modifications since then.
By the way, I love your multiple config files. Exactly what I imagined but never did.
Last edit: Dagurasu 2012-11-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Edit:
Let me see if I understand correctly: exec 4<&2 makes all stderr go to &4. We want &4 to go to the logfile when it's ready, so exec 4>>"$logfile" (or are my semantics off?). In Echo I want to do echo >&3, but the output should go to stdout, so there must be a exec 1<&3 (or would that interfere with the functions? is it exec 3<&1 ??) I don't have a good grasp on this redirection stuff...
Careful how you think of this. The numbers are descriptors. You don't actually redirect, you are just redefining the descriptor.
So exec >> $logfile
changes 1 from whatever device it was to the logfile. Now 1 IS the logfile.
exec 3<&1
makes the file descriptor 3 point to whatever 1 is pointed when the exec is done.
The leftward < just means 3 is an output stream as I recall.
Changing 1 later with
exec >> $logfile
will not change 3. 3 does not become an alias for 1.
So yeah I think in the main level no subshell (function is ok)
do:
exec 3<&1
exec >> $logfile 2>&1
This last line is not obvious using above logic. in the above the optional 2>&1 does map all the way through to the # logfile.. this is an idiom.
then you can echo stuff like :
echo "some message" 1>&3
or likewise using 2 of course.
As for interfering.. I'm not certain. I haven't tried reusing the standard output device through a higher file descriptor for this purpose. I THINK it should not interfere with function output but should bypass and go straight to screen, but I'd write a tiny test script to test that.
Update:
Normally if you're asking for the standard output as a function return, then that means you're calling the function in a subshell and 1 is pointed at standard in of the calling command. using the saved value of 1 instead bypasses it to the screen device. One thing I think I did in later versions was make less use of subshells and thus less use of echoing return values.
Last edit: Dagurasu 2012-11-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Also if you do want to redirect "rogue" output in a subshell, you can do that to two ways
1) directly redirect each command to the log file
2) put exec at the top of the function.
subshells inherit all file descriptors above 2 (but above 9 is reserved )
So you can still
exec >> &3
or something like that, or just straight to the logfile again.
Realizing when you're using a subshell is of course critical to any advanced bash scripting, for multiple reasons.
Again though, my primary solution was just to reduce use of subshells and thus also of echoing results back. There's only 2 or three places where I still do that and 1 or two of those use no external commands anyway.
Last edit: Dagurasu 2012-11-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for your clarifications, I will implement & test when I find the time.
Question though, I didn't find any evidence in the Bash documentation that functions are in effect subshells, other that the way they deal with variable scope. Could you provide a pointer here??
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Functions are not subshells, but they can be called in subshells and often are, and almost always if they use standard out to return data.
For example if you assign to the "return value" of MyFunction by
a= $(MyFunction)
or likewise using back ticks, then you used it in a subshell.
There's no good way to get data from the standard out unless the standard out is directed from within a subshell to the input of another command (expansion onto its command line as above, or into standard input by using a pipe or into a temporary file descriptor by using command substitution or redirects), all standard tools in the chest for bash scripting. So usually if you're echoing returns, you're using a subshell. To avoid using subshells you need functions without return values or you need to return via a global variable like $RET_VAL.
Last edit: Dagurasu 2012-12-14
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
You are right, I didn't get rid of locking at all. :-)
I'll have to say, I would never on my own have initiated such a huge (in my eyes) undertaking, but I was not satisfied with rsnapshot, and I really liked what you had been doing, in bash, and especially the idea of just specifying all the desired intervals, and let the script through invocations by cron sort it out. I liked exploring/discovering a amenable coding style in bash (I'm sure that's visible...).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
So I guess you inspired me to spend a little time, thought time at least, which is easier than programming time.
I do see many things in the refactoring that break and/or don't implement fixes (cant' always remember which) to subtle but import points in some of my originally intended logic ideas and I can't possibly predict all the differences but I can point out some as I see them.
A big one I notice is that it looks like your
Trash_backup() can delete multiple non-forced backups in one go.
while((count>=intervalretain[$2])); do
That's dangerous. Why?: A single configuration file typo can result in destroying all or many of someone's backups. Things like this are much more critical bugs in a backup apps IMO than in other apps and I wouldn't even call it a bug in another app.
Part of the transaction commit/rollback in ezrsnapshots is ensuring that only one non-forced non-persistant backup is deleted per new sync. If the new sync fails and the transaction is not committed, Then even that deletion gets reverted.
That is the only logic that can guarantee that the number of backups is never reduced and it's the primary reason I had rollback as opposed to just cleanup. There was a brief (milliseconds) state consistency bug in the rollback logic in one of my versions but I think it's fixed up so a kill -9 at any time still guarantees this 1 to 1 behavior.
I always meant to make an OPTION to prune backups in the case of intentional reduction of the retain value, but that should be a deliberate maintenance option IMO.
There are more I think.. but that's enough for this post.
Last edit: PePa 2012-12-04
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Another point.. I'll make separate posts for different issues:
It looks like you use a bottom up sync and promote (recycle) logic. I abandoned this logic long ago for many reasons partially including code simplicity and the simplicity of doing transactional commits. Obviously this took some code changes and it's one of the reasons I split the logic parts into simplified small separate functions (which are probably much better in 3.x versioins than 2.x) but that's a matter of taste.
I wrote a bunch of thoughts on the issue in my readme. I will add them here.
Sorry, you'll probably have to read it slowly and each sentence twice to get it, but well, it's complicated. I did start an implementation of a fully transactional one sync-one delete sync and promote method, but I never finished it (maybe you did and I'm just not seeing the whole picture yet). In the end, I became very comfortable with instant promotion (defined below) which is mostly why I didn't finish a proper sync and promote method.
Also recylcling is not something I left out now, rather it is guaraunteed to never be needed in my present logic. The deleted sync will never be needed in another level nor in any way unnecessarily wasted nor does any backup ever need to get renamed. It's all part of the one sync, one delete philosophy and is very related to correctly selecting which level to sync into in the first place.. in a top down logic while noting that having a monthly backup today is just as good as having a daily backup today.
Scheduling Philosophy (Instant vs Delayed promotions):
The primary advantage of ezrsnapshots is of course moving the timing
definitions from the crontab and anacron config files into the snapshot
config file and removing the multiple call types with potential timing
conflicts. Removing the rotation step is not strictly a requirement for that
but fits well with it as using static snapshot names with the snapshot date
and time in the name facilitates easy determination of snapshot due dates
and removes unnecessary complication at no loss of functionality. However
removing the promotion step is even less critical but has several effects.
The first, unpublished version of this program still used a promotion scheme
where all levels were checked for needed promotions. For example, if the
newest monthly backup was more than a month older than the oldest weekly
backup, then the oldest weekly backup would be promoted to monthly. Instead,
now, if the newest monthly backup is more than a month older than now, the
next sync produces a monthly backup. No separate daily backup is produced
for that day. Effectively, the daily backup is promoted at birth to its
final interval level depending on the slowest level which is out of date.
This has a clear advantage that one can see early on which backups, by date,
will end up kept for a long time and which will not.
There are pros and cons to using delayed promotions or instant promotions.
Instant promotion means that a backup becomes, for instance, monthly on
birth and, while using up one monthly snapshot count, requires time to
mature before it is truly useful as a monthly backup. On the other hand a
daily snapshot will get displaced and so there will be room for another
daily snapshot and the dailys will span farther in time than for the delayed
promotion scheme. For example, if you configure 30 daily backups and 4
weekly backups you will get 34 backups but maybe not as you expect. You will
have a daily backup every day except, maybe day 1, 8, 15, and 29, where you
have weekly backups instead. You will still have 30 daily backups though, so
they will extend to day 34. Your weekly backups will essentially serve no
purpose since they are not given a chance to mature beyond the daily
schedule. You would probably want 26 daily backups and 8 weekly backups.
This behavior might be unexpected especially for anyone accustomed to
rsnapshot.
In either case the total number of backups is the same, and essentially
equivalent functionality can be produced with different choices of the
retained snapshot values.
A strong motivation for using instant promotion is code simplicity and
reliability. ezrsnapshots aims to make the life cycle of backups fully
determined/predictable even if the code is interrupted at any random place. This
also ties in with the goal of maintaining strict conservation of snapshots. For
every deletion there must be a sync performed in a concurrent transaction pair
which can be completed or rolled back depending on certain commit conditions. In
this way the number of backups is strictly conserved so configuration changes
cannot auto delete backups accidentally, and we never change the destiny of a
backup by deleting one that should have been promoted. Using delayed promotions
complicates this process since between sync and deletion there will be several
rename operations which should also stay in a consistent state.
In the delayed promotion scheme, The real monthly interval is limited to the
time resolution of the next fastest interval which it promotes from, so a
week for instance. In reality "monthly" backups must be every 4 or 5
weeks. On the other hand, in the instant promotion scheme, a new monthly
backup can be made on the first run of ezrsnapshots where the monthly backup
is exactly one month old or more. This could be good or bad depending on how
you look at it. On the first of the month you may end up with a monthly and
weekly backup separated by only 2 days (or even 1 on March 1 on leap year).
But the instant promotion method guarantees maximum interval spacing.
Also with the traditional promotion method, reduction in the number of
backups kept can produce gaps in backups. If 8 of the oldest weekly backups
are removed, that removes two months worth of backups that would have fed
the monthly backups, producing a gap in the monthly backups unless done very
carefully. With the instant promotion method this is never a concern.
Presently ezrsnapshots only allows manual reduction of backups when the
configuration is changed. Manual or not, the instant promotion method is
immune to this effect.
Even if a delayed promotion schemes as added to ezrsnapshots, there would
still be differences compared to a simple implementations of Mike Rubel's
original cron job rotation/promotion method. Fine tuning of behavior is
certainly easier in ezrsnapshots (because it is far more self aware than a
cron job that has no knowledge even about the frequency of the backup cycle
for which it is responsible) and so problems like the one in the last
paragraph can be avoided with a little thought. Also with a monthly cron job
forcing a promotion every month, you will get monthly backups having
sometimes 4 weeks sometimes 5 weeks in spacing, but on average monthly.
Implementing a promotion scheme in ezrsnapshots would most naively involve
due date checking so that 4 a week gap in backups would never trigger a
promotion and a 5 week gap always would. So "monthly" backups would always
be in 5 week spacings. You could certainly define a 4weekly interval if you
require 4 week spacing. In fact, with the right coding it's possible to
reproduce anything crontab can do but there's probably no reason to add such
complication to guarantee average spacing anyway. This is all presently
hypothetical since ezrsnapshots does not presently have a delayed promotion
method.
The intuitiveness of snapshot counting probably favors a delayed promotion
method slightly. Programming simplicity, interval guarantees and behavior
continuity across configuration changes favor instant promotion, especially
in a system without cron jobs. Several minor variations are possible for
either method. See comments at the end of the code for a pseudo-code summary
of delayed promotion.
Last edit: Dagurasu 2012-11-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Firstly, your rigour in making sure nothing is lost seems to be your main focus, probably rightly so for a backup program. :-)
I ended up doing bottom up to stay closer to the rsnapshot paradigm. Like you said, ezrsnapshots' behaviour is bound to surprise people. I wanted to get all the lowest interval slots filled before starting on the next interval up.
I don't use rsnapshot's promotion logic either, so I don't get higher intervals with the resolution of the interval below it, but I check to see whether an interval is 'due'. So the resolution of any interval only depends on the resolution of the cron job. I'd recommend setting that resolution well below the shortest interval, otherwise a 'crowding out' effect could happen (and will happen once the cron resolution gets above the shortest interval)! Maybe there should be a check of the used interval in the cron job?? Perhaps an exported variable that is used in the cron job that can be compared in the script? (Of course, guarantees are hard to come by here...)
The recycling logic I implemented mainly in order to not waste the time & effort of making a backup in case it could still be used somehow. Only when I know that a backup that has been made will never be useful in the foreseeable future, it is deleted.
Basically, I went for 'expected behaviour' combined with staying close to the desired interval lengths and retain numbers.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Minor point, but if you have hourly intervals and your cron is set for hourly, then after one hour there's a race to see if you need a new backup or not, because it's been EXACTLY one hour.
I don't see any buffer time in the code, but I certainly could have missed it. I use a 1 min slop in the comparison just in case the machine is really hung up swapping badly or whatever. In practice "now" is calculated very early in the script (in mine and probably also yours) and 1 second is probably even enough buffer usually. And in fact with 1s date precision putting a >= instead of just > would catch it most of the time, With it as just > as you have it now I'd be worried you'd lose the race pretty often.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I noticed your buffer time. I'd recommend using a cron interval that is smaller than the shortest interval, the shorter the better for more 'accurate' results.
But I changed my > to >= it is indeed better.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi all. I am of course the original ezrsnapshots author.
I just wanted to say I haven't dissappeared entirely.
Actually I did make some needed improvements to the ezrsnapshots script. Significantly improved process control and signal catching etc, including some bug fixes. I found the tee logging output the way it was done could cause serious problems and also that if the script was not run as the process leader that process control and cleanup did not work correctly so now it tries to restart itself as process group leader if needed. There might have been problems with the locking behavior. I can't remember now.
I haven't honestly looked at the code here yet to see if it even uses any of my original framework, or just the core algorithms or what.
I'm glad actually to see someone "stealing" my work (well it was gpl right) because I have no time to work on this.
Anyway, I'd be glad to share my update and maybe you can diff it and see the change log and see what if any fixes are still relevant to the ezrsync.
I hope ezrsync goes well... now I just need to find where I can attach a file.
Last edit: Dagurasu 2012-11-19
Hi Dagurasu,
I started out with your code, but refactored, recoded and restyled so much that you will find some resemblance, but diff-ing won't be useful.
I did get rid of that locking and teeing, but if you would ever have some time to look at the code, I'm sure you could point out potential pitfalls. I would value your expertise, but understand if you have little/no time.
I am subscribed to that initial thread in rsnapshots where you shared your files. You could set up an independent ezrsnapshots project page, it's not much work at all, and then you can easily upload your files.
Cheers, Peter
Last edit: PePa 2012-11-19
I just meant diff against my older one.. but....
I think your improvements look great on first glance and the code direly needed some cleanup and refactoring. Glad you liked it enough to bother!
I didn't notice removing locking. I do think locking is probably important but maybe you've dealt with the issue somehow or you just meant you redid it. I did have a locking bug in some version so maybe that's related to your removing it.
Teeing was a mess and I was being silly about fixing it. Your solution is clearly simple and right. I was probably too busy worrying about the backup logic.
I probably won't make my own page because I think you're doing a far better job than I would. I'll give it a try when I can.
Last edit: Dagurasu 2012-11-19
Actually now I remember why I stuck to teeing/tailing.. because while your Echo is nice and simple, it doesn't log stray standard error output from external commands.
Given what it eventually took to make it work well maybe it's worth losing that, Then again I did eventually get it to work well.
A nice compromise would probably be to redirect std error to the log file and just not show it on screen. It gets the simplicity and still allows to go see the errors when something goes wrong. It's a one line addition. As you probably know it's possible redirect std error somewhere even while your Echo function sends output to the standard error device without interfering. You just need to copy (via exec) the standard filedescriptor to a new descriptor before redirecting it and use the new one in the Echo function.
I had 3 lines commented out around the tail-bit, the last one was '# exec &>>"$logfile"'. I think I can change that to 'exec 2>>"$logfile"', do you think that is correct?
I think I decided to no longer use exec when I read that its use (in general) gave people problems, I think when calling from within another script, but that doesn't really apply here anyway.
There can be various i/o and process control issues when calling the whole program from inside a script. I investigated that in detail. Now I can't recall if exec itself made problems. So long as our program is a subshell (it's not "sourced".. bash lingo for included) I think that wasn't a problem. Still my latest version worked all that out.. so maybe I'll make a page after all at least to stick it there for posterity.
The problem with
exec 2>>$logfile is I think you are echoing to 2 in your Echo function. So first you need:
exec 4<&2
Which copies 2 to 4 before it's redirected:
then
exec 2>>"$logfile"
I'd make sure none of that is run from inside a subshell. Function is ok, so long as the function is called without back ticks or parenthasese around it.
then later you do your echos as
echo blah blah blah 1>&4
As for calling things from a script, I also made detection for that and restarted the whole program in that case as a process group leader. That was some serious trickery though especially all the while keeping output, process control, and return values going out through the master process. I think my biggest reason for that was for process control though, not because of any problems with redirections. Basically you cannot easily kill a group of processes if they aren't a process group. I did test it all called in several ways in the end.
What version of ezrsnapshots did you start from 0.3.3 I think?
Edit:
Oh and maybe you took inspiration from me in echoing to standard error, but actually I didn't really. My standard error eventually redirected out to standard out (through the tail), which is what people expect if they want to grep output or something.
I think if you copy standard output too/instead, you can probably echo to that copy and circumvent the normal problem with echoing on standard out interfering with functions. I'd test it to make sure.
Bash can be a pain. Some things about it are quite useful though and speed is not a big problem in a program that does all file handling.
Edit again:
Of course any function called in a subshell (like a pipeline) probably doesn't get standard error output even with these workarounds. That's probably enough rambling for now.
Last edit: Dagurasu 2012-11-20
Kudos for figuring out all this process-stuff and redirections.
I wanted the 'verbose' messages to go to the screen, but found I couldn't use stdout (because I was using it in functions to return values), so I directed them to stderr. And of course, any rogue error output would also land on the screen. Then the 'loglevel' stuff needed to go to the logfile. So this works.
Now we're wanting to get any rogue (stderr) output to the logfile instead. So, at the very start, I do: exec 4<&2
Then when the logfile is ready, I do: exec 2>>"$logfile"
And in Echo I do echo >&4 (I can leave out the '1', right?)
(Any reason I can't use '3' instead of '4'?)
Edit:
Let me see if I understand correctly: exec 4<&2 makes all stderr go to &4. We want &4 to go to the logfile when it's ready, so exec 4>>"$logfile" (or are my semantics off?). In Echo I want to do echo >&3, but the output should go to stdout, so there must be a exec 1<&3 (or would that interfere with the functions? is it exec 3<&1 ??) I don't have a good grasp on this redirection stuff...
I started with the source of timedsyncs-0.2.1, but I incorporated some of your modifications and improvements until 0.3.3.
Last edit: PePa 2012-11-20
I get why you used standard error, I did the same thing for the same reason, but it eventually came back around to standard out via the log file and tails. It's not a big deal though.
you can use 3.. I use 3 to copy 1 though. that's the only reason
wow.. starting with 0.2.1... yeah.. there have been some important and subtle modifications since then.
By the way, I love your multiple config files. Exactly what I imagined but never did.
Last edit: Dagurasu 2012-11-20
I enormously appreciate your feedback, I'm learning and processing and will reply to the various points you have raised when I find more time.
I just came across your reply from the 13th of June to my initial question whether there were any new versions after timedsyncs-0.2.1..!
Careful how you think of this. The numbers are descriptors. You don't actually redirect, you are just redefining the descriptor.
So exec >> $logfile
changes 1 from whatever device it was to the logfile. Now 1 IS the logfile.
exec 3<&1
makes the file descriptor 3 point to whatever 1 is pointed when the exec is done.
The leftward < just means 3 is an output stream as I recall.
Changing 1 later with
exec >> $logfile
will not change 3. 3 does not become an alias for 1.
So yeah I think in the main level no subshell (function is ok)
do:
exec 3<&1
exec >> $logfile 2>&1
This last line is not obvious using above logic. in the above the optional 2>&1 does map all the way through to the # logfile.. this is an idiom.
then you can echo stuff like :
echo "some message" 1>&3
or likewise using 2 of course.
As for interfering.. I'm not certain. I haven't tried reusing the standard output device through a higher file descriptor for this purpose. I THINK it should not interfere with function output but should bypass and go straight to screen, but I'd write a tiny test script to test that.
Update:
Normally if you're asking for the standard output as a function return, then that means you're calling the function in a subshell and 1 is pointed at standard in of the calling command. using the saved value of 1 instead bypasses it to the screen device. One thing I think I did in later versions was make less use of subshells and thus less use of echoing return values.
Last edit: Dagurasu 2012-11-20
Also if you do want to redirect "rogue" output in a subshell, you can do that to two ways
1) directly redirect each command to the log file
2) put exec at the top of the function.
subshells inherit all file descriptors above 2 (but above 9 is reserved )
So you can still
exec >> &3
or something like that, or just straight to the logfile again.
Realizing when you're using a subshell is of course critical to any advanced bash scripting, for multiple reasons.
Again though, my primary solution was just to reduce use of subshells and thus also of echoing results back. There's only 2 or three places where I still do that and 1 or two of those use no external commands anyway.
Last edit: Dagurasu 2012-11-20
Thanks for your clarifications, I will implement & test when I find the time.
Question though, I didn't find any evidence in the Bash documentation that functions are in effect subshells, other that the way they deal with variable scope. Could you provide a pointer here??
Functions are not subshells, but they can be called in subshells and often are, and almost always if they use standard out to return data.
For example if you assign to the "return value" of MyFunction by
a= $(MyFunction)
or likewise using back ticks, then you used it in a subshell.
There's no good way to get data from the standard out unless the standard out is directed from within a subshell to the input of another command (expansion onto its command line as above, or into standard input by using a pipe or into a temporary file descriptor by using command substitution or redirects), all standard tools in the chest for bash scripting. So usually if you're echoing returns, you're using a subshell. To avoid using subshells you need functions without return values or you need to return via a global variable like $RET_VAL.
Last edit: Dagurasu 2012-12-14
You are right, I didn't get rid of locking at all. :-)
I'll have to say, I would never on my own have initiated such a huge (in my eyes) undertaking, but I was not satisfied with rsnapshot, and I really liked what you had been doing, in bash, and especially the idea of just specifying all the desired intervals, and let the script through invocations by cron sort it out. I liked exploring/discovering a amenable coding style in bash (I'm sure that's visible...).
I started to attach my latest version and removed it. It seems pepa65 has it all going in the right directions anyway.
Last edit: Dagurasu 2012-11-19
So I guess you inspired me to spend a little time, thought time at least, which is easier than programming time.
I do see many things in the refactoring that break and/or don't implement fixes (cant' always remember which) to subtle but import points in some of my originally intended logic ideas and I can't possibly predict all the differences but I can point out some as I see them.
A big one I notice is that it looks like your
Trash_backup() can delete multiple non-forced backups in one go.
That's dangerous. Why?: A single configuration file typo can result in destroying all or many of someone's backups. Things like this are much more critical bugs in a backup apps IMO than in other apps and I wouldn't even call it a bug in another app.
Part of the transaction commit/rollback in ezrsnapshots is ensuring that only one non-forced non-persistant backup is deleted per new sync. If the new sync fails and the transaction is not committed, Then even that deletion gets reverted.
That is the only logic that can guarantee that the number of backups is never reduced and it's the primary reason I had rollback as opposed to just cleanup. There was a brief (milliseconds) state consistency bug in the rollback logic in one of my versions but I think it's fixed up so a kill -9 at any time still guarantees this 1 to 1 behavior.
I always meant to make an OPTION to prune backups in the case of intentional reduction of the retain value, but that should be a deliberate maintenance option IMO.
There are more I think.. but that's enough for this post.
Last edit: PePa 2012-12-04
Another point.. I'll make separate posts for different issues:
It looks like you use a bottom up sync and promote (recycle) logic. I abandoned this logic long ago for many reasons partially including code simplicity and the simplicity of doing transactional commits. Obviously this took some code changes and it's one of the reasons I split the logic parts into simplified small separate functions (which are probably much better in 3.x versioins than 2.x) but that's a matter of taste.
I wrote a bunch of thoughts on the issue in my readme. I will add them here.
Sorry, you'll probably have to read it slowly and each sentence twice to get it, but well, it's complicated. I did start an implementation of a fully transactional one sync-one delete sync and promote method, but I never finished it (maybe you did and I'm just not seeing the whole picture yet). In the end, I became very comfortable with instant promotion (defined below) which is mostly why I didn't finish a proper sync and promote method.
Also recylcling is not something I left out now, rather it is guaraunteed to never be needed in my present logic. The deleted sync will never be needed in another level nor in any way unnecessarily wasted nor does any backup ever need to get renamed. It's all part of the one sync, one delete philosophy and is very related to correctly selecting which level to sync into in the first place.. in a top down logic while noting that having a monthly backup today is just as good as having a daily backup today.
Scheduling Philosophy (Instant vs Delayed promotions):
The primary advantage of ezrsnapshots is of course moving the timing
definitions from the crontab and anacron config files into the snapshot
config file and removing the multiple call types with potential timing
conflicts. Removing the rotation step is not strictly a requirement for that
but fits well with it as using static snapshot names with the snapshot date
and time in the name facilitates easy determination of snapshot due dates
and removes unnecessary complication at no loss of functionality. However
removing the promotion step is even less critical but has several effects.
The first, unpublished version of this program still used a promotion scheme
where all levels were checked for needed promotions. For example, if the
newest monthly backup was more than a month older than the oldest weekly
backup, then the oldest weekly backup would be promoted to monthly. Instead,
now, if the newest monthly backup is more than a month older than now, the
next sync produces a monthly backup. No separate daily backup is produced
for that day. Effectively, the daily backup is promoted at birth to its
final interval level depending on the slowest level which is out of date.
This has a clear advantage that one can see early on which backups, by date,
will end up kept for a long time and which will not.
There are pros and cons to using delayed promotions or instant promotions.
Instant promotion means that a backup becomes, for instance, monthly on
birth and, while using up one monthly snapshot count, requires time to
mature before it is truly useful as a monthly backup. On the other hand a
daily snapshot will get displaced and so there will be room for another
daily snapshot and the dailys will span farther in time than for the delayed
promotion scheme. For example, if you configure 30 daily backups and 4
weekly backups you will get 34 backups but maybe not as you expect. You will
have a daily backup every day except, maybe day 1, 8, 15, and 29, where you
have weekly backups instead. You will still have 30 daily backups though, so
they will extend to day 34. Your weekly backups will essentially serve no
purpose since they are not given a chance to mature beyond the daily
schedule. You would probably want 26 daily backups and 8 weekly backups.
This behavior might be unexpected especially for anyone accustomed to
rsnapshot.
In either case the total number of backups is the same, and essentially
equivalent functionality can be produced with different choices of the
retained snapshot values.
A strong motivation for using instant promotion is code simplicity and
reliability. ezrsnapshots aims to make the life cycle of backups fully
determined/predictable even if the code is interrupted at any random place. This
also ties in with the goal of maintaining strict conservation of snapshots. For
every deletion there must be a sync performed in a concurrent transaction pair
which can be completed or rolled back depending on certain commit conditions. In
this way the number of backups is strictly conserved so configuration changes
cannot auto delete backups accidentally, and we never change the destiny of a
backup by deleting one that should have been promoted. Using delayed promotions
complicates this process since between sync and deletion there will be several
rename operations which should also stay in a consistent state.
In the delayed promotion scheme, The real monthly interval is limited to the
time resolution of the next fastest interval which it promotes from, so a
week for instance. In reality "monthly" backups must be every 4 or 5
weeks. On the other hand, in the instant promotion scheme, a new monthly
backup can be made on the first run of ezrsnapshots where the monthly backup
is exactly one month old or more. This could be good or bad depending on how
you look at it. On the first of the month you may end up with a monthly and
weekly backup separated by only 2 days (or even 1 on March 1 on leap year).
But the instant promotion method guarantees maximum interval spacing.
Also with the traditional promotion method, reduction in the number of
backups kept can produce gaps in backups. If 8 of the oldest weekly backups
are removed, that removes two months worth of backups that would have fed
the monthly backups, producing a gap in the monthly backups unless done very
carefully. With the instant promotion method this is never a concern.
Presently ezrsnapshots only allows manual reduction of backups when the
configuration is changed. Manual or not, the instant promotion method is
immune to this effect.
Even if a delayed promotion schemes as added to ezrsnapshots, there would
still be differences compared to a simple implementations of Mike Rubel's
original cron job rotation/promotion method. Fine tuning of behavior is
certainly easier in ezrsnapshots (because it is far more self aware than a
cron job that has no knowledge even about the frequency of the backup cycle
for which it is responsible) and so problems like the one in the last
paragraph can be avoided with a little thought. Also with a monthly cron job
forcing a promotion every month, you will get monthly backups having
sometimes 4 weeks sometimes 5 weeks in spacing, but on average monthly.
Implementing a promotion scheme in ezrsnapshots would most naively involve
due date checking so that 4 a week gap in backups would never trigger a
promotion and a 5 week gap always would. So "monthly" backups would always
be in 5 week spacings. You could certainly define a 4weekly interval if you
require 4 week spacing. In fact, with the right coding it's possible to
reproduce anything crontab can do but there's probably no reason to add such
complication to guarantee average spacing anyway. This is all presently
hypothetical since ezrsnapshots does not presently have a delayed promotion
method.
The intuitiveness of snapshot counting probably favors a delayed promotion
method slightly. Programming simplicity, interval guarantees and behavior
continuity across configuration changes favor instant promotion, especially
in a system without cron jobs. Several minor variations are possible for
either method. See comments at the end of the code for a pseudo-code summary
of delayed promotion.
Last edit: Dagurasu 2012-11-20
And I realize your new code might not be targeted at sticking to my vision or philosophies exactly, just sharing my thoughts though.
Firstly, your rigour in making sure nothing is lost seems to be your main focus, probably rightly so for a backup program. :-)
I ended up doing bottom up to stay closer to the rsnapshot paradigm. Like you said, ezrsnapshots' behaviour is bound to surprise people. I wanted to get all the lowest interval slots filled before starting on the next interval up.
I don't use rsnapshot's promotion logic either, so I don't get higher intervals with the resolution of the interval below it, but I check to see whether an interval is 'due'. So the resolution of any interval only depends on the resolution of the cron job. I'd recommend setting that resolution well below the shortest interval, otherwise a 'crowding out' effect could happen (and will happen once the cron resolution gets above the shortest interval)! Maybe there should be a check of the used interval in the cron job?? Perhaps an exported variable that is used in the cron job that can be compared in the script? (Of course, guarantees are hard to come by here...)
The recycling logic I implemented mainly in order to not waste the time & effort of making a backup in case it could still be used somehow. Only when I know that a backup that has been made will never be useful in the foreseeable future, it is deleted.
Basically, I went for 'expected behaviour' combined with staying close to the desired interval lengths and retain numbers.
Date race:
Minor point, but if you have hourly intervals and your cron is set for hourly, then after one hour there's a race to see if you need a new backup or not, because it's been EXACTLY one hour.
I don't see any buffer time in the code, but I certainly could have missed it. I use a 1 min slop in the comparison just in case the machine is really hung up swapping badly or whatever. In practice "now" is calculated very early in the script (in mine and probably also yours) and 1 second is probably even enough buffer usually. And in fact with 1s date precision putting a >= instead of just > would catch it most of the time, With it as just > as you have it now I'd be worried you'd lose the race pretty often.
I noticed your buffer time. I'd recommend using a cron interval that is smaller than the shortest interval, the shorter the better for more 'accurate' results.
But I changed my > to >= it is indeed better.