From: Peter B C. <pe...@sh...> - 2002-09-13 09:10:03
|
Hello Michael, On Thursday, September 12, 2002, 7:04:13 PM, you wrote: MN> Peter, MN> I agree that using file indices is a valid design choice MN> and it does prevent any possibility of two files having the same MN> identifier. I did considered using it before going with the file MN> ID hash. MN> I chose not to use file indices because I wanted the MN> packets to be constructed deterministically and independantly. MN> By deterministically, I mean that every client should produce the MN> same packets with the same inputs. Well determinism is obviously achievable by both methods. MN> By independently, I mean without looking at any other packets. I MN> wanted packets to be calculated independently because it makes it MN> easier to replace other packets. And clearly neither method achieves independence. MN> For example, if we decided to replace the main packet design in MN> PAR 2.1, I wanted that to go easily. By minimizing the reliance MN> on other packets, that kind of change is more easily accomplished. MN> With file indices, you have to have a mapping from files MN> to indices. To do that without clashing relies on interdependence MN> - the opposite of independence. But all packets are completely interdependent in your current spec anyway (purely because they all contain the recovery_set_id). The interdependence between all packets in the current spec occurs as follows: 1) The very last part of a packet that is computed is the packet_hash. 2) The packet_hash for each packet cannot be computed until the recovery_set_id is known. 3) The recovery_set_id cannot be computed until the file_id_hash of every file has been computed and an ordering determined for the resulting file_id_hash values. Once the file_id_hash values have been ordered at (3), the computed recovery_set_id can be written into the header of every packet. At this point the file_index of every file is known (since it is determined by the ordering that has been chosen for the file_id_hash values). Therefore when writing the recovery_set_id into every packet header, you could also write the file_index into the headers of those packets that related directly to specific files. MN> It also makes it difficult to use the File Description packet for MN> other uses, since its use is intertwined with the main packet. With Both methods every File Description Packet is intertwined with the Main Packet because the Main Packet is used to determine whether or not an individual file is a recoverable or non-recoverable file and what base values will be used for each data slice. With Both methods there is nothing to prevent you making changes to the format of the Main Packet (in later versions) providing that in both cases the recoverable_file_count is included and in your case the sorted list of file_id_hash values is also included. Whichever method is used it Will be possible to have File Description Packets that are used for other things simply by using a file_index or file_id_hash that does not match entries in the recoverable or non-recoverable file sets. MN> I was willing to accept a 1 in 2^128 chance of a clash in MN> order to gain more independence. It also made the Recovery Set ID MN> easy to understand and calculate. As detailed above, you have not gained any independence between packets, and the use of a file_index instead of the file_id_hash to group packets that related to a single file does not require any change to the way the recovery_set_id is calculated. MN> I can accept that you would (and did) make another MN> decision if the design was yours. However, I don't plan on MN> changing this design unless you can produce a higher cost than a MN> few bytes and preventing a 1 in 340 milllion milllion milllion MN> milllion milllion milllion chance. MN> Mike Well that is entirely up to you, but I would be interested to hear what others think. This is obviously not a critical part of the design, so if my suggestion is not taken on board, then the spec will still work. -- Best regards, Peter |