Re: [Dar-libdar_api] Re: last archive slice
For full, incremental, compressed and encrypted backups or archives
Brought to you by:
edrusb
|
From: Johnathan B. <jk...@sh...> - 2004-07-25 20:13:08
|
On July 25, 2004 01:10 pm, Denis Corbin wrote:
> Johnathan Burchill wrote:
> | On July 24, 2004 11:22 pm, Denis Corbin wrote:
> |
> |[...]
> |
> |>This is a user mistake to hide under the same basename in the same
> |>directory slices of different archives.
> |>
> |>There is no use to keep last slices of an old archive, it only waste
> |>your disk space: If you miss the first slice, you won't be able to
> |>restore anything from your old archive. Else, if you definitively want
> |>to keep several archives with the same name, it is not a very good
> |> idea to mix their respective slices toghether in the same
> |> directory... ;-) no ?
> |
> | That is exactly the problem I was trying to address. The slices _are_
> | absolutely useless, and just waste disk space, so why not have libdar
> | remove them?
>
> For me, the problem in adding additionnal slice removal feature in
> libdar is that there is more risk to erase precious data. So I would
> prefer that the user take their responsibilities in managing their own
> data. But...
Certainly.
> >=20
> But, I propose that if slice overwriting is detected and other slices
> than the first are present in the same directory, dar will ask the user
> whether the other old slices must be removed first or not (this, in any
> case, no option to deactivate that question). Nothing will be done to
> ask the user to provide missing slices of the old archive that could be
> located on other directory, filesystems, floppy, hard disk, ...
>
> The situation where overwriting occures stays for me a user error.
> allowing the deletion of old slices in the current directory, fixes the
> not so intuitive behavior of lidbar when remains older slices in the
> given directory when the new archive has been created. For the rest,
> this is under the user responsibility to through away the old CD-R and
> to erase the old slices located on separated floppies, etc.
Ah, I had not considered this issue. I understand now that with regular,=20
self-contained, files, you can be sure you have deleted the entire=20
contents when unlinking them. With slices, they could be all over the=20
place, some on disk, some on CD, etc.You can't treat an archive as a=20
self-contained object then, except in the case where it has only one=20
slice.
>
> | [...]
> |
> |>| I see the following solutions:
> |>|
> |>| 1) libdar deletes all slices on a disk if they have the same storage
> |>| directory and basename as the new one being created. It might do
> |>| this before creating the archive, or after creating the archive.
> |>| Perhaps doing this after will be more efficient, since you only
> |>| delete slices that haven't been overwritten, if there are any.
> |>
> |>I do not agree. libdar deletes nothing. 'rm' command does. ('rm' or
> |>unlink() system call)
>
> I keep thinking that when the user sees a warning telling that a file is
> about to be overwritten, he must understand that the warning is not only
> for decoration nor it is for legal disclosure. If the user ignores the
> warning and continues with dar, he must assume his action and take his
> responsibilities.
>
> See tar for example: tar does never warn against archive overwriting (by
> default at least) and nobody complains, when an archive is lost this
> way.
>
> | Libdar does delete files on disk if the appropriate conditions are met
> | during a restoration, but I quibble :).
> |
> :-)
> :
> | But libdar does overwrite archive slices! Is this not a form of
> | deletion?
>
> yes, and that's already too much. As explained above. Do the same with
> tar, you will not get any warning at all, do you ?
>
> | Consider the echo command:
> |
> | echo "hello" > file1
> | echo "bye" > file1
> |
> | What is file1?
> |
> | cat file1
> | bye
> |
> | It is not, "byelo", yet this is the analogous behavior that dar does
> | when creating a new archive with slices, while replacing one that
> | already exists. As you intend this to be the behaviour, I suggest that
> | it is not the obvious or intuitive one, and at least should be
> | documented.
>
> Apologies
>
> | in advance if it is already in the manpage.
>
> you have not to apology, there is nothing in man page about all that a
> user can do, clever things or less clever ones. There is only
> documentation about what dar does.
>
> to take your example, using tar in place of dar,
>
> tar -cf file1 <some files>
> tar -cf file1 <some other files>
>
> So, what is in file1 ?
>
> cat file1
> <some other files>
>
> No, it is not "<some files><some other files>" as this would be more
> intuitive to be. Maybe you thing this normal ? Then, this it is normal
> too that when you overwrite 7 files, the other remaing files stay
> unchanged and are not deleted, are they ? ;-) Just to quibble, too. ;-)
My argument was based on the idea that an archive is a self-contained=20
object. Indeed, this does not seem to hold in the real world. :)
>
> | I conceptualize disk archives as objects, whether they are made up of
> | slices or not. Making a new backup in a directory that contains an
>
> archive
>
> | of the same name should replace the archive as a whole, not on a
> | slice-by-slice basis. I do not see anything "clever" or assuming about
> | that for libdar.
>
> because there is not conceptualization of disk archives as "objets",
> they is not a single entity when an archive is split in slices, for the
> user has the total freedom to dispose some *files* on floppy, some
> others on disk and why not send the remaing ones by email. They stay
> files.
>
> I would even say that you may miss a middle slice while dar knows
> nothing that it is missing and keeps working properly restoring a subset
> of files. Where is then disk archive as conceptualized object ? ;-)
> (still quibbling).
Is not a file an object? I can delete a character from the middle and it=20
remains a valid file. An archive with missing slices may fail some CRC=20
checks, but it is still an archive with (some accessible) data.
Actually, I am not convinced that archives should not be considered=20
objects. For example, libdar does not give the user the option of=20
specifying different backup directories for each slice. The archive is in=20
one directory. Libdar does not care whether that one directory is spread=20
over different times, all the slices are still in that directory. When the=
=20
user is asked to provide a missing slice, it has to be placed in that=20
directory. As far as the user is concerned, an archive may not be a=20
self-contained object, what with some of it sent over email, some on=20
floppy, some on disk. But whenever libdar interacts with a given archive,=20
all of the slices have to be available from the same source.
I must consider this further before coming to a conclusion.
>
> | An archive with one slice, made with a slice size of 0, would be
>
> completely
>
> | replaced by the new archive. By your logic you should overwrite the
> | initial portion of the old archive with the new one, not completely
> | replace it.
>
> nop, because when dar opens a file in O_TRUNC mode (which implies the
> user has accepted to loose the data in the file) you loose the data in
> that file only (not in the other files).
> nop too, because, when you overwrite in a directory you do not replace
> all the files of that directory. Directory (that contain slices) do not
> act like plain files (that contain data). this is Unix feature ;-)
>
> | Same thing for when someone saves a file in a document processor. You
>
> don't
>
> | expect to have to first remove the one on disk before saving what is
> | in memory to the same filename.
>
> true, but when you overwrite an file you loose all its contents just
> after you have openned it. So there is not much difference in deleting
> it first before creating a new one with the same name. So such
> applications usually save the new content in a temporary files then swap
> the filenames. Do you wan't dar do the same ? :-)
>
> | In fact, in the API, libdar::archive is a class for which you can
> | instantiate archive objects. The objects aren't individual slices,
> | they are entire archives. Overwriting an old archive object with a new
> | archive object with the same name, same directory, should completely
> | replace the old archive object.
>
> implementation is one thing that is hidden to the user. You can have
> many different implementations for the same behavior, if dar was
> implemented else, no user would see the difference and your argument
> would not stay valid.
>
> |[...]
> |
> |
> | I understand your point now.
> |
> | I am not suggesting though that libdar just stop reporting that there
> | are extra useless slices, I am suggesting that it do something to stop
> | that situation from happening. I will make a weaker request then, that
> | at
>
> least
>
> | libdar warn the user of this situation during the creation process.
> |
> | For instance, would it not make sense for libdar to report during the
> | creation process that there are extra useless slices, report which
> | ones they are, and ask the user whether they should be removed, or
> | warn the user that they should be removed manually? You eliminate the
>
> assumption by
>
> | asking the question.
>
> That should be correct to my point of view. But you know some user
> already told me that dar was too talkative, in particular about the
> initial warning telling that all question will abort the program.
A clever user. ;)
>
> | After all, you do ask if your user intends to get into the "endless
> | loop" by backing up a directory that contains the archive, with no
> | exclusion file filters. You don't let it happen without the question.
>
> yes, because the user may stay a long time before understanding what's
> the matter. Some user may just thing dar is broken and only waste disk
> space and memory. This is only to avoid having wasting time for support
> about that common situation. B-)=3D)
>
> |>I don't like MS-office like programs that tend to have the prentention
> |>to be more clever than their users. They either get much restrictive
> |> and blindly forbid operation needed by more clever users than the
> |> program developers, or automatically do stupid things or have stupid
> |> questions to the user, that even less clever users find borring.
> |
> | I wholeheartedly agree! :)
> |
> :)
> :
> |[...]
> |
> |>To my point of view, this is here again a user mistake. Once you
> |>overwrite the first slice of the old archive, all the other slices of
> |>this archives are useless. Why not removing first all the slices of
> |> the old archive you have planned to overwrite ?
> |
> | I agree, if the user removes the old archive before creating the new
> | one, this situation will never arise in KDar.
> |
> | What I do not understand is why, given that you admit the remaining
>
> slices
>
> | become useless once the first one is overwritten, libdar does not
> | remove those useless extra slices automatically.
>
> In fact, the problem is different. If to find the last slice of an
> archive dar was more simple, and incrementally look slice by slice for a
> terminating slice, the remaining old slices would not cause a warning.
> If there were not a internal number to check against slice mixing of
> different archive, there would not be a warning, and dar would simply
> tell there is incoherent data structure in the archive.
>
> But it is not so simple, it looks in the current archive for the highest
> slice available opens it looking whether it is a terminating slice. It
> also check whether it is part of the same archive and if not reports the
> error to the user asking for help because it is unable (and does not
> want) to figure out what to do.
It makes sense to me now.
>
> If it was more simple and reported a incoherent data structure due to
> the mix of archive slices, do you thing user would complain anymore ?
>
> | Can you imagine some situation where, for a more clever user, those
> | extra slices are not useless?
>
> I don't pretend being more clever than dar's users. Maybe some of them
> could find something to do that I cannot even imagin.
>
> | Perhaps this is a stupid question! :)
>
> perhaps not ! :)
>
> |>| Perhaps libdar should write the "N" to the termination type only if
> |>| the user says it's okay to continue, and write an "I" to the
> |>| termination type if the user cancels. That way we know that the
> |>| "last" archive slice is actually invalid, i.e. the creation process
> |>| was aborted, or failed in
> |>
> |>some
> |>
> |>| way.
>
> I think this is too complicated, and not much useful. This assumes that
> the user has to agree at each slice creation, and that you can change
> afterward a byte (thus you cannot dump anymore an archive to a pipe).
Got it.
>
> |>perhaps, KDAR could propose a "remove old archive slice" option ? ;-)
> |>option, that can simply rely on the rm command or unlink system
> |> call...
> |
> | Agreed. Except I still think if you allow the user to overwrite an
>
> archive,
>
> | the entire archive should be replaced, not just individual slices.
>
> yes, I agree.
>
> |>| For the status indicator, the best solution would be for libdar to
> |>| have a "currentSlice" method, which could be called at any time to
> |>| determine the slice number of the current one being written or read.
> |>
> |>a new callback function ? ;-)
> |
> | There's no need for a callback function. The method just returns the
>
> slice
>
> | number, a libdar::infinint. I envision usage as such:
> |
> | //(start a new creation thread):
> |
> | libdar::archive =3D newArchive;
> | createArchiveThread *createThread =3D new
> | createArchiveThread( newArchive, ...);
> | createThread->start();
> |
> | //(occasionaly check the current slice number):
> | while ( createThread->running() )
> | {
> | libdar::infinint currentSlice =3D theArchive->currentSlice();
> | //report the current slice to the user
> | updateStatusBar( currentSlice );
> | //Sleep and check again
> | sleep( DURATION );
> | }
>
> well, this could be the subject of another discussion. "What do we
> consider when speaking about thread safe library ?".
>
> For me, this is when a given library proposes a set of functions and/or
> data structures that can be used in independant threads. This does not
> mean that a given data can be used transparently by several threads at
> the same time. In particular, think about an integer variable, in
> parallel programming you need to make critical section around any
> modification of this variable, to have only one thread modifying it at a
> given time. Now, replace the integer variable by an "archive" object:
> when it is executing an action (through a method like archive creation),
> that means it is run by a first thread, thus it should be in a critical
> section, thus no other thread should modify this same object, in
> particular another thread should not call a method of that same object.
>
> Instead, if you want to know the current slice, you just have --execute
> option that can return this information, at each slice creation, and
> pass this information from one thread to another thanks to a auxilliary
> variable (which may be accessed through a critical section).
>
> |>| Should I file a bug report for any of this?
> |>
> |>I don't think so, do you still ?
> |
> | I do see now that the issue is whether the user or the library should
> | be removing slices. Fair enough, your philosophy of design is keep it
> | simple and unassuming. Just because I cannot imagine any situation in
> | which the user will ever need those "useless" extra slices, doesn't
> | mean that someone else won't.
>
> Exactly. I'm happy to be understood ! :-)
>
> | I will implement a "do not overwrite entire archive" option, that
>
> defaults
>
> | to being off.
>
> well, maybe you will not have to... as explained at the beginning of my
> reply here.
>
Okay, I will wait and see how things go in the API. Thanks for the=20
discussion!
JB
> | [...]
> |
> |>Cheers,
> |>Denis.
> |
> | Cheers,
> | JB
>
> Cheers,
> Denis.
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by BEA Weblogic Workshop
> FREE Java Enterprise J2EE developer tools!
> Get your free copy of BEA WebLogic Workshop 8.1 today.
> http://ads.osdn.com/?ad_id=3D4721&alloc_id=3D10040&op=3Dclick
> _______________________________________________
> Dar-libdar_api mailing list
> Dar...@li...
> https://lists.sourceforge.net/lists/listinfo/dar-libdar_api
=2D-=20
Johnathan K. Burchill, Ph.D.
Department of Physics and Astronomy
University of Calgary
2500 University Drive N.W.
Calgary, AB T2N 1N4
Canada
(403) 217-4286
jk...@sh...
|