Re: [Dibs-discussion] Congrats & some problems

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Sat, Sep 25, 2004 at 04:48:04PM -0400, Emin Martinian wrote:
> Christian Stork <dib...@cs...> writes:
>=20
> >> > In the above example of 10 peers with redundancy 2 and kbPerFile =3D=
 1MB
> >> > the result would be that all files smaller than 1MB are "redundanize=
d"
> >> > (better word?) to three equal-sized copies and these three copies/pi=
eces
> >> > are randomly distributed on different hosts.  Furthermore, each of t=
hese
> >> > copies suffices to reconstruct the file.  Is this correct?
> >>=20
> >> Yes.
> >
> > What's your benefit of using Solomon-Reed then?  Three regular copies
> > would do, wouldn't they?
>=20
> In this case, there is no benefit of using Reed-Solomon (RS) codes,
> and the pieces files are actually copies of each other.  Then benefit
> of RS codes come in when you split files. =20
=20
Good, then I wasn't confused. ;-)

> For example, imagine that your kbPerFile limit is 1MB and you want to
> backup a 10MB file such that you are robust to 2 peer failures.  When
> set to produce 2 redundant copies, DIBS will split the file into 10
> pieces, produce 2 redundant copies and spread these among different
> peers (assuming you have enough room on enough peers).  The total
> amount of data backed up is 12 MB (plus a little overhead for
> encryption).
=20
Again, checking my understanding:  In order to be resilient to at least
a 2 peer failure, the 12 pieces need to be spread among at least 12
peers.  (Let's say we only have 10 peers, then two will store 2 pieces
and if both of these two peers fail, we only have 8 pieces to recover
=66rom, which is insufficient.)

> If you had used straight copying instead of RS codes and you wanted to
> be robust against two peer failures, you would need to produce 3
> copies for a total of 30 MB and split these 3 copies amongst the 12
> peers.  To summarize, the benefit of RS codes comes into play once
> files are split into more than 1 piece.

Thanks for the expalantion.  Got your point.

For me the quintessence is that there is a non-trivial configuration/UI
issue here.  Your above reasoning entails that a file of size (X *
kbPerFile) KB will need at least X + R peers to be stored with a
redundancy of R peer failures.  Given that only a finite number of
backup peers is available, a sufficiently large file will not be backed
up with the requested redundancy R.

<brainstorm>
DIBS could take the number N of current backup peers and encode every
file (independent of its size) as N parts, R of which are redundant.
Each peer will store one part.  If a part is larger than kbPerFile KB it
is subdivided so that its pieces stay below the limit.
</brainstorm>

--=20
Chris Stork   <>  Support eff.org!  <>   http://www.ics.uci.edu/~cstork/
OpenPGP fingerprint:  B08B 602C C806 C492 D069  021E 41F3 8C8D 50F9 CA2F