[Dar-libdar_api] Re: last archive slice
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
|
From: Denis C. <dar...@fr...> - 2004-07-25 19:08:09
|
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Johnathan Burchill wrote:
| On July 24, 2004 11:22 pm, Denis Corbin wrote:
|
|[...]
|>This is a user mistake to hide under the same basename in the same
|>directory slices of different archives.
|>
|>There is no use to keep last slices of an old archive, it only waste
|>your disk space: If you miss the first slice, you won't be able to
|>restore anything from your old archive. Else, if you definitively want
|>to keep several archives with the same name, it is not a very good idea
|>to mix their respective slices toghether in the same directory... ;-) no
|>?
|
|
| That is exactly the problem I was trying to address. The slices _are_
| absolutely useless, and just waste disk space, so why not have libdar
| remove them?
For me, the problem in adding additionnal slice removal feature in
libdar is that there is more risk to erase precious data. So I would
prefer that the user take their responsibilities in managing their own
data. But...
But, I propose that if slice overwriting is detected and other slices
than the first are present in the same directory, dar will ask the user
whether the other old slices must be removed first or not (this, in any
case, no option to deactivate that question). Nothing will be done to
ask the user to provide missing slices of the old archive that could be
located on other directory, filesystems, floppy, hard disk, ...
The situation where overwriting occures stays for me a user error.
allowing the deletion of old slices in the current directory, fixes the
not so intuitive behavior of lidbar when remains older slices in the
given directory when the new archive has been created. For the rest,
this is under the user responsibility to through away the old CD-R and
to erase the old slices located on separated floppies, etc.
| [...]
|>
|>| I see the following solutions:
|>|
|>| 1) libdar deletes all slices on a disk if they have the same storage
|>| directory and basename as the new one being created. It might do this
|>| before creating the archive, or after creating the archive. Perhaps
|>| doing this after will be more efficient, since you only delete slices
|>| that haven't been overwritten, if there are any.
|>
|>I do not agree. libdar deletes nothing. 'rm' command does. ('rm' or
|>unlink() system call)
I keep thinking that when the user sees a warning telling that a file is
about to be overwritten, he must understand that the warning is not only
for decoration nor it is for legal disclosure. If the user ignores the
warning and continues with dar, he must assume his action and take his
responsibilities.
See tar for example: tar does never warn against archive overwriting (by
default at least) and nobody complains, when an archive is lost this way.
|
|
| Libdar does delete files on disk if the appropriate conditions are met
| during a restoration, but I quibble :).
:-)
|
| But libdar does overwrite archive slices! Is this not a form of deletion?
yes, and that's already too much. As explained above. Do the same with
tar, you will not get any warning at all, do you ?
| Consider the echo command:
|
| echo "hello" > file1
| echo "bye" > file1
|
| What is file1?
|
| cat file1
| bye
|
| It is not, "byelo", yet this is the analogous behavior that dar does when
| creating a new archive with slices, while replacing one that already
| exists. As you intend this to be the behaviour, I suggest that it is not
| the obvious or intuitive one, and at least should be documented.
Apologies
| in advance if it is already in the manpage.
you have not to apology, there is nothing in man page about all that a
user can do, clever things or less clever ones. There is only
documentation about what dar does.
to take your example, using tar in place of dar,
tar -cf file1 <some files>
tar -cf file1 <some other files>
So, what is in file1 ?
cat file1
<some other files>
No, it is not "<some files><some other files>" as this would be more
intuitive to be. Maybe you thing this normal ? Then, this it is normal
too that when you overwrite 7 files, the other remaing files stay
unchanged and are not deleted, are they ? ;-) Just to quibble, too. ;-)
|
| I conceptualize disk archives as objects, whether they are made up of
| slices or not. Making a new backup in a directory that contains an
archive
| of the same name should replace the archive as a whole, not on a
| slice-by-slice basis. I do not see anything "clever" or assuming about
| that for libdar.
because there is not conceptualization of disk archives as "objets",
they is not a single entity when an archive is split in slices, for the
user has the total freedom to dispose some *files* on floppy, some
others on disk and why not send the remaing ones by email. They stay files.
I would even say that you may miss a middle slice while dar knows
nothing that it is missing and keeps working properly restoring a subset
of files. Where is then disk archive as conceptualized object ? ;-)
(still quibbling).
|
| An archive with one slice, made with a slice size of 0, would be
completely
| replaced by the new archive. By your logic you should overwrite the
| initial portion of the old archive with the new one, not completely
| replace it.
nop, because when dar opens a file in O_TRUNC mode (which implies the
user has accepted to loose the data in the file) you loose the data in
that file only (not in the other files).
nop too, because, when you overwrite in a directory you do not replace
all the files of that directory. Directory (that contain slices) do not
act like plain files (that contain data). this is Unix feature ;-)
|
| Same thing for when someone saves a file in a document processor. You
don't
| expect to have to first remove the one on disk before saving what is in
| memory to the same filename.
true, but when you overwrite an file you loose all its contents just
after you have openned it. So there is not much difference in deleting
it first before creating a new one with the same name. So such
applications usually save the new content in a temporary files then swap
the filenames. Do you wan't dar do the same ? :-)
|
| In fact, in the API, libdar::archive is a class for which you can
| instantiate archive objects. The objects aren't individual slices, they
| are entire archives. Overwriting an old archive object with a new archive
| object with the same name, same directory, should completely replace the
| old archive object.
implementation is one thing that is hidden to the user. You can have
many different implementations for the same behavior, if dar was
implemented else, no user would see the difference and your argument
would not stay valid.
|[...]
|
|
| I understand your point now.
|
| I am not suggesting though that libdar just stop reporting that there are
| extra useless slices, I am suggesting that it do something to stop that
| situation from happening. I will make a weaker request then, that at
least
| libdar warn the user of this situation during the creation process.
|
| For instance, would it not make sense for libdar to report during the
| creation process that there are extra useless slices, report which ones
| they are, and ask the user whether they should be removed, or warn the
| user that they should be removed manually? You eliminate the
assumption by
| asking the question.
That should be correct to my point of view. But you know some user
already told me that dar was too talkative, in particular about the
initial warning telling that all question will abort the program.
|
| After all, you do ask if your user intends to get into the "endless loop"
| by backing up a directory that contains the archive, with no exclusion
| file filters. You don't let it happen without the question.
yes, because the user may stay a long time before understanding what's
the matter. Some user may just thing dar is broken and only waste disk
space and memory. This is only to avoid having wasting time for support
about that common situation. B-)=)
|
|
|>I don't like MS-office like programs that tend to have the prentention
|>to be more clever than their users. They either get much restrictive and
|>blindly forbid operation needed by more clever users than the program
|>developers, or automatically do stupid things or have stupid questions
|>to the user, that even less clever users find borring.
|
|
| I wholeheartedly agree! :)
:)
|[...]
|
|>To my point of view, this is here again a user mistake. Once you
|>overwrite the first slice of the old archive, all the other slices of
|>this archives are useless. Why not removing first all the slices of the
|>old archive you have planned to overwrite ?
|
|
| I agree, if the user removes the old archive before creating the new one,
| this situation will never arise in KDar.
|
| What I do not understand is why, given that you admit the remaining
slices
| become useless once the first one is overwritten, libdar does not remove
| those useless extra slices automatically.
In fact, the problem is different. If to find the last slice of an
archive dar was more simple, and incrementally look slice by slice for a
terminating slice, the remaining old slices would not cause a warning.
If there were not a internal number to check against slice mixing of
different archive, there would not be a warning, and dar would simply
tell there is incoherent data structure in the archive.
But it is not so simple, it looks in the current archive for the highest
slice available opens it looking whether it is a terminating slice. It
also check whether it is part of the same archive and if not reports the
error to the user asking for help because it is unable (and does not
want) to figure out what to do.
If it was more simple and reported a incoherent data structure due to
the mix of archive slices, do you thing user would complain anymore ?
|
| Can you imagine some situation where, for a more clever user, those extra
| slices are not useless?
I don't pretend being more clever than dar's users. Maybe some of them
could find something to do that I cannot even imagin.
|
| Perhaps this is a stupid question! :)
perhaps not ! :)
|
|
|>| Perhaps libdar should write the "N" to the termination type only if
|>| the user says it's okay to continue, and write an "I" to the
|>| termination type if the user cancels. That way we know that the "last"
|>| archive slice is actually invalid, i.e. the creation process was
|>| aborted, or failed in
|>
|>some
|>
|>| way.
I think this is too complicated, and not much useful. This assumes that
the user has to agree at each slice creation, and that you can change
afterward a byte (thus you cannot dump anymore an archive to a pipe).
|>
|>perhaps, KDAR could propose a "remove old archive slice" option ? ;-)
|>option, that can simply rely on the rm command or unlink system call...
|>
|
|
| Agreed. Except I still think if you allow the user to overwrite an
archive,
| the entire archive should be replaced, not just individual slices.
yes, I agree.
|
|
|>| For the status indicator, the best solution would be for libdar to
|>| have a "currentSlice" method, which could be called at any time to
|>| determine the slice number of the current one being written or read.
|>
|>a new callback function ? ;-)
|
|
| There's no need for a callback function. The method just returns the
slice
| number, a libdar::infinint. I envision usage as such:
|
| //(start a new creation thread):
|
| libdar::archive = newArchive;
| createArchiveThread *createThread = new
| createArchiveThread( newArchive, ...);
| createThread->start();
|
| //(occasionaly check the current slice number):
| while ( createThread->running() )
| {
| libdar::infinint currentSlice = theArchive->currentSlice();
| //report the current slice to the user
| updateStatusBar( currentSlice );
| //Sleep and check again
| sleep( DURATION );
| }
well, this could be the subject of another discussion. "What do we
consider when speaking about thread safe library ?".
For me, this is when a given library proposes a set of functions and/or
data structures that can be used in independant threads. This does not
mean that a given data can be used transparently by several threads at
the same time. In particular, think about an integer variable, in
parallel programming you need to make critical section around any
modification of this variable, to have only one thread modifying it at a
given time. Now, replace the integer variable by an "archive" object:
when it is executing an action (through a method like archive creation),
that means it is run by a first thread, thus it should be in a critical
section, thus no other thread should modify this same object, in
particular another thread should not call a method of that same object.
Instead, if you want to know the current slice, you just have --execute
option that can return this information, at each slice creation, and
pass this information from one thread to another thanks to a auxilliary
variable (which may be accessed through a critical section).
|
|
|>| Should I file a bug report for any of this?
|>
|>I don't think so, do you still ?
|>
|
|
| I do see now that the issue is whether the user or the library should be
| removing slices. Fair enough, your philosophy of design is keep it simple
| and unassuming. Just because I cannot imagine any situation in which the
| user will ever need those "useless" extra slices, doesn't mean that
| someone else won't.
Exactly. I'm happy to be understood ! :-)
|
| I will implement a "do not overwrite entire archive" option, that
defaults
| to being off.
well, maybe you will not have to... as explained at the beginning of my
reply here.
|
| [...]
|
|>Cheers,
|>Denis.
|
|
| Cheers,
| JB
|
Cheers,
Denis.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iD8DBQFBBAWTpC5CI8gYGlIRAkpVAKC/0+Y2/C5P6mG/2A+LctZFsgL2/gCeLg9Q
YP9HqzQG6Xhn+eU6jBqz44g=
=kHZe
-----END PGP SIGNATURE-----
|