[parchive-devel] Re: Recovery Set ID

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hello Michael,

On Thursday, September 12, 2002, 7:04:13 PM, you wrote:

MN> Peter,
MN>          I agree that using file indices is a valid design choice
MN> and it does prevent any possibility of two files having the same
MN> identifier.  I did considered using it before going with the file
MN> ID hash. 

MN>          I chose not to use file indices because I wanted the
MN> packets to be constructed deterministically and independantly.

MN> By deterministically, I mean that every client should produce the
MN> same packets with the same inputs.

Well determinism is obviously achievable by both methods.

MN> By independently, I mean without looking at any other packets.  I
MN> wanted packets to be calculated independently because it makes it
MN> easier to replace other packets.

And clearly neither method achieves independence.

MN> For example, if we decided to replace the main packet design in
MN> PAR 2.1, I wanted that to go easily.  By minimizing the reliance
MN> on other packets, that kind of change is more easily accomplished. 

MN>          With file indices, you have to have a mapping from files
MN> to indices.  To do that without clashing relies on interdependence
MN> - the opposite of independence.

But all packets are completely interdependent in your current spec
anyway (purely because they all contain the recovery_set_id).

The interdependence between all packets in the current spec occurs as
follows:

1) The very last part of a packet that is computed is the packet_hash.

2) The packet_hash for each packet cannot be computed until the
recovery_set_id is known.

3) The recovery_set_id cannot be computed until the file_id_hash of
every file has been computed and an ordering determined for the
resulting file_id_hash values.

Once the file_id_hash values have been ordered at (3), the computed
recovery_set_id can be written into the header of every packet.

At this point the file_index of every file is known (since it is
determined by the ordering that has been chosen for the file_id_hash
values). Therefore when writing the recovery_set_id into every packet
header, you could also write the file_index into the headers of those
packets that related directly to specific files.

MN> It also makes it difficult to use the File Description packet for
MN> other uses, since its use is intertwined with the main packet.

With Both methods every File Description Packet is intertwined with
the Main Packet because the Main Packet is used to determine whether
or not an individual file is a recoverable or non-recoverable file and
what base values will be used for each data slice.

With Both methods there is nothing to prevent you making changes to
the format of the Main Packet (in later versions) providing that in
both cases the recoverable_file_count is included and in your case the
sorted list of file_id_hash values is also included.

Whichever method is used it Will be possible to have File Description
Packets that are used for other things simply by using a file_index or
file_id_hash that does not match entries in the recoverable or
non-recoverable file sets.

MN>          I was willing to accept a 1 in 2^128 chance of a clash in
MN> order to gain more independence.  It also made the Recovery Set ID
MN> easy to understand and calculate.

As detailed above, you have not gained any independence between
packets, and the use of a file_index instead of the file_id_hash to
group packets that related to a single file does not require any
change to the way the recovery_set_id is calculated.

MN>          I can accept that you would (and did) make another
MN> decision if the design was yours.  However, I don't plan on
MN> changing this design unless you can produce a higher cost than a
MN> few bytes and preventing a 1 in 340 milllion milllion milllion
MN> milllion milllion milllion chance. 
MN>          Mike

Well that is entirely up to you, but I would be interested to hear
what others think.

This is obviously not a critical part of the design, so if my
suggestion is not taken on board, then the spec will still work.

-- 
Best regards,
 Peter