Menu

#156 Auto-detect min-digits

none
closed
None
abandoned
1
2017-12-31
2012-09-06
Jim Avera
No

It would be very convenient and user-friendly if dar automatically recognized slice files with leading digits, without the requirement of the --min-digits option (which is currently required if the files have leading zeroes in their numbers). It is especially stressful to use --min-digits if the -A, -@ and output archives use different conventions.

The docs say that dar can't do this because "dar should try opening the file "basename.1.dar", but if it fails, it should try opening the file "basename.01.dar", then "base‐name.001.dar", ... up to infinity."

For your enjoyment, here are some arguments why that is not a valid reason:

  1. Assume the user creates slices at a maximum rate of 1 slice/second, and the entire run takes one year to complete. Then the maximum number of possible slices is 31536000. So dar would only have to check 8 file names before concluding that the first slice is not present (000000001 etc.).

  2. But theoretically a user could create slices much faster; so consider the limits imposed by the total mass of th earth (10^24 Kg). If each slice is stored on a medium weighing 1 gram, the maximum number of slices, consuming all matter in the planet, is 10^27, so checking up to 27 leading zeroes should be sufficient.

  3. Okay, but what about when dar is used by space travellers thousands of years in the fugure? Storage technology may require only a single electron mass for each bit, so a great many slices could be stored (nevermind that the maximum size of each slice would likely increase at the same rate!). The total mass of the universe is about 10^44 Kg, and and electron weighs 10^-30 Kg, to the total electrons in the observable universe is 10^74. Checking 74 file names (up to 74 leading zeroes) is still feasible.

Thank you for considering this enhancement.

P.S. dar is a great tool, I use it routinely.

Discussion

  • Denis Corbin

    Denis Corbin - 2012-09-07

    Hello Jim,

    Nice explanations, really, I like them :)

    I just can't figure out how dar of the future (which has removed the use of this stressing --min-digits option) would determine the number of leading zero to use at backup time? For example, when creating the first slice, time at which not even yet a single byte has been inspected for backup? Should it do a dry-run execution of the backup first and, assuming the computed resulting archive size will not change much due to data change, one year later (yes, this is a huge archive), do a second pass for real? [compression and encryption may be involved so it's not just adding file sizes that will bring the total archive size, it depends on the data and its ability to be compressed].

    If so the --min-digits option cannot be removed at least at archive creation time, then if our user of the future plans to backup his whole olfactory/interactive/tactile "movie" library which as you can imagine would need at least 10^74 leading zeros for slices of convenient size, what would rest on his (remote) storage space after a few days of processing in case of magnetic storm that leads dar to be interrupted? A small set of slices having 10^74 leading zeros and containing some of our space traveller's movie library. Unfortunately our traveller's computer had been damaged by this same magnetic storm, so he would like to recover what could be saved so far remotely over the intergalactic network. How would dar guess the correct number of leading zeros to apply in order to open the existing slice of this interrupted archive?

    Any solution to this problem must not make use of any arbitrary number (like "use x digit by default, it is large enough for today's needs").

    Kind Regards,
    Denis.

     
  • Jim Avera

    Jim Avera - 2012-09-07

    I agree that --min-digits is necessary when creating archives. No trial runs needed -- the user gets what they ask for.

    It is the other direction (reading existing slice files) that I think dar could easily handle automatically.
    dar would not need to guess the number of leading zeroes;
    instead, it would try opening the slice without any leading zeroes, and if that failed, try with one leading zero, etc.
    up to a small maximum (even in the extreme argument the max is 74 zeroes, not 10^74).

    Here is some Perl code which illustrates the algorithm:

    my $MAX_ZEROES = 74; # log(10) of number of electrons in the universe

    sub find_slice {
    my ($basename, $slice_number) = @_; # get arguments

    for (my $numz=1; $numz<=$MAX_ZEROES; $numz++) {
    my $path= sprintf("${basename}.%0${numz}d.dar",$numz);
    my $filehandle;
    if ( open($filehandle, $path) ) {
    print "Found slice $path\n" if $verbose;
    return [$filehandle, $path];
    }
    }
    print "Slice $slice_number of archive $basename not found.\n" if $verbose;
    return undef;
    }

     
  • Jim Avera

    Jim Avera - 2012-09-07

    P.S. --min-digits should be restricted to no more than the maximum number of zeroes which dar checks for (with the proposed enhancement), i.e. 8 or 27 or 74 depending on your cosmological viewpoint.

    That would prevent users from creating slices which dar could not recognize later.

     
  • Denis Corbin

    Denis Corbin - 2012-09-07

    Well, my cosmological viewpoint is "infinity". I mean that any arbitrarily high value is still an arbitrary value. Soon or later it will become maladapted (see the history of computer science like the 640KB limit, 1MB, 512MB, 2GB, 4GB boundaries and next to come... each time it seemed large enough for long enough).

    In the area you mentioned, 8, 27 or 74 digits are still arbitrary values. Why choosing 27? Why choosing 74? Soon or later one would ask and I would have no answer to give him. In the other hand, as I find no interest in building a poorly or limited designed feature, I prefer to stay in the current situation which brings more possibilities than the modification you proposes, at the cost of "stressing" the user as you reported.

    Last, there are actually more highly useful features to work on (Which I'm sure you will enjoy, maybe next year if I can succeed this challenge). But OK, when time will come, with an algorithm that matches my "cosmological viewpoint", yes, maybe I will find a way to improve this feature. In the meanwhile, this tracker will stay open to remind me that.

    Best Regards,
    Denis.

     
  • Jim Avera

    Jim Avera - 2012-09-09

    It's not arbitrary. The theoretical maximum number of zeroes is 74 or less
    (because no future technology could create more than 10^74 slices).

    Of course, it is your project and you should work on whatever is most satisfying for you.

    Best wishes
    -Jim

     
  • Denis Corbin

    Denis Corbin - 2014-03-09

    So they say of 512 KB 25 years ago. :)

     
  • Denis Corbin

    Denis Corbin - 2014-03-09
    • Priority: 4 --> 1
    • km stone :): --> none
     
  • Denis Corbin

    Denis Corbin - 2017-12-31
    • status: open --> closed
    • Progression: requested --> abandoned
     

Log in to post a comment.