currently when generating checksum fields for items with certain checksum: 0, Link, or Remove, the checksum value is a random integer, to avoid load all going to a single node when there is load splitting (such as post_exchange_split ) It would be better if the load splitting were predictable, as well as distributed, so that files with the same name would go to the same node. First idea: a good way of accomplishing that would be to change the algorithms to use the 'n' algorithm... ie. take an md5 checksum of the filename as the value.
That way the computation is reproducible. for 'R', that should work well. for Link, it would be even better to have the checksum be based on the link content. so that the checksum will change if the link ever does.
Even the 'n' algorithm, as is, has a problem. If a file is partitioned, all of the parts will go to the same node (as they have the same checksum.) It is probably important to include the partition information in the checksum calculation. Doing so would also make it perfect for use by sr_winnow.
So, perhaps ideally, we create an 'N' algorithm, which concatenates name, and the 'parts' header,
and uses the checksum of that string.
Implemented in the C-side.
used the SHA512, which is way over the top, but it avoid discussions...
python support is now issue 27 on github.com/MetPX/Sarracenia