IMO retrieving the original MD5 calculated by the
's3cmd put' and attached to the file in metadata
defeats the purpose of the S3 MD5 hash scheme.
The point is to be able to verify that the cloud
copy of the file is the same as the original, and
so only the hash calculated by Amazon is relevant.
The metadata MD5 value certainly is useful for the
'sync' command and for validating restored files.
Here is a patch that restores the original 's3cmd
ls --list-md5' behavior. It's straightforward to
calculate the Amazon S3 MD5 value as described in
my recent post covering the algorithm.
I've also enhanced the 'info' subcommand to show
$ s3cmd info s3://offsite-backup-wc/OFFSITE_20130707170754s3://offsite-backup-wc/OFFSITE_20130707170754 (object):
File size: 5675190650
Last mod: Fri, 02 Aug 2013 17:05:24 GMT
MIME type: application/octet-stream; charset=binary
AZ3MD5 sum: 9f85d36e45625b0d50a46f8628dd80bb-6
MD5 sum: 18ed4d9c695c72329001d34939d74a88
ACL: amazon_s3: FULL_CONTROL
One interesting discovery is that S3 prevents
copying, moving or even editing the metadata for
files that are larger the 5G (5368709120 bytes).
If a multipart file is smaller then 5G it can be
copied and the AZ3MD5 sum will be re-calculated
for a single chunk and will match the normal MD5
calculation. I suspect that Amazon may migrate
>5G files to different arrays even though
customers are prohibited from similar operations.
In that case the file may be divided into some
standard chunk size determined by Amazon and the
AZ3MD5 value recalculated accordingly. If anyone
knows what that size might be please advise.
Knowing the value in advance allows one to
calculate what the migration AZ3MD5 value would
be before the original local file is deleted.
Sometime in the next week I'll post a proper
'az3md5' script that performs the S3 hash
calculation for a specified segment size.