[Dibs-discussion] Congrats & some problems
Status: Alpha
Brought to you by:
emin63
|
From: Christian S. <cs...@ic...> - 2004-09-18 22:17:29
|
Hello Emin and others,
First of all I want to say that I really like your approach to backup
and, from what I can tell, your design and implementation seems to be of
very high qualtiy. (The latter opinion is based on some casual source
code surfing.)
Secondly, I want to report on some problems that I and some friends of
mine ran into while experimenting with DIBS.
So, let me start chronologically. The documentation is quite good, but
there were a few things that weren't really clear on first reading and
some others which still aren't clear to me. :-)
It wasn't immediatly clear to me how I have to set the --talk/listen
options to add_peer. I think it might be a good idea to simply say that
--listen is essentially always the same, ie the communication mode
(active/passive/mail) of my local box, whereas --talk always refers to the
mode of my peer. (Maybe, the option could even be renamed to
--local/remote-mode or something.) Actually, it is not clear to me why
the distinction between active and passive is needed at all. Can't DIBS
figure this out on its own and operate on a best effort basis?
...checking the list archive... Hmm, so I see that Emin essentially is
open to any ideas in this direction. Good. :-) Maybe I'll come up with
something.
The next thing, which is not clear to me from the docs even until today,
is how DIBS decides to split up the files into pieces and to which peers
they'll be send. First of all there is the "redundantPieces" user
option of wich the documentation says:
This variables specifies how many redundant pieces of a file will be
created. If a file is chopped into k pieces (see kbPerFile), this
many extra pieces will be added using a Reed-Solomon code. For
example, if a file is chopped into 5 pieces and redundantPieces is
2, then 7 pieces will be sent such that the original file can be
recovered from any 5 of those 7 pieces.
Is a file only chopped into pieces if it is above the kbPerFile limit?
(I'm sure the answer is no, but it's not clear from the documentation.)
Also, the default value of redundantPieces is not specified?
Now, what happened to me was that I wanted to slowly grow my network of
backup peers. First I hooked up with my friend Peter Froehlich and we
decided to mutually provide 1GB. When I designated 40MB of data to be
backed up Peter received 120MB of data. That makes me assume that the
default for redundantPieces is 2, and that DIBS put three copies of my
data on Peters box (1 original + 2 redundant copies). Of course, that
does not make much sense, especially if it takes Peter's box busy for
more than half a day to verify all signatures of my data (more about
this (hypothesis) later). There seems to be a UI issue here that needs
to be resolved.
This experience also makes me believe that DIBS tries to distribute a
file over all available peers. So, if I cooperate with 10 peers then my
files are split in 10 including the 2 redundant copies. Is that right?
Now what happens if several of the peers provide more space than
others? Let's say 5 of my peers provide 2GB instead of 1GB. Are some
of my files split in 5 pieces then? How does DIBS make the decision
which files to chose for the 5-way split? (Note that the with an
expected failure rate of say 30% the 5-way split seems like the better
backup option.)
[Side track issue: Given the above scenario maybe the best option is to
split files into 7.5 pieces on average. This would achieve the most
homogenous distribution, it seems. And just to complicate things
further, what if some of the shared amounts of space change now?]
What's DIBS backup distribution strategy and is there an easy way to
describe it so that no surprises happen as above?
As hinted at before there is one other major issue which we (or actually
Peter) experienced: Once the remote peer's data has been received DIBS
seems to consume lots of computing resources. My guess is that this is
due to the gpg subprocesses used to verify the authenticity (and
integrity) of the just received data. Is that correct? And is there
possibly a way to speed this up? (Even not verifying at all seems to be
a better option! FYI, this was on a Mac PowerBook. More details
provided if needed.)
Despite all the problems mentioned above, let me reiterate that I really
enjoy DIBS and it's this kind of software that makes open source great!
It's just that UI or, in this case especially, perfomance issues like
the last one can make or break the adoption of great ideas like DIBS.
:-)
Thanks for all your work,
Chris
PS: I crossposted this to the backup mailing list so that all my backup
peers are in the loop too.
--
Chris Stork <> Support eff.org! <> http://www.ics.uci.edu/~cstork/
OpenPGP fingerprint: B08B 602C C806 C492 D069 021E 41F3 8C8D 50F9 CA2F
|