On Twitter, there was a conversation a few days ago that started:

@fbeausoleil: How do you ensure end-to-end integrity of the data you store on #S3 ? I calculate a checksum, upload, download then compare. Thoughts?

That didn't sound right to me, and indeed, @rafaelrosafu pointed out shortly thereafter: @fbeausoleil and docs.aws.amazon.com/AmazonS3/lates… under Content-MD5 header

But in fact, s3cmd wasn't issuing the Content-MD5 header.  Even when we could know what to put into it.  So S3 couldn't explicitly tell us when an object was corrupted on upload (which should be a rarity anyhow). Who needs end-to-end checking anyhow?

So, a few patches later, and we can.  I'd appreciate some more eyes on this patch series before pulling it into master, but it feels about right.

https://github.com/mdomsch/s3cmd/commits/feature/content-md5

Matt Domsch (5):
      add Content-MD5 header to PUT objects (not multipart)
      add Content-MD5 header for each multipart chunk
      Don't double-calculate MD5s on multipart chunks
      add Content-MD5 on put, not just sync
      handle errors during multipart uploads


If I got this right, if S3 returns a 400 BadDigest, we retry (like we would any other retry-able error).

I'll also note, we aren't explicitly capturing a 503 SlowDown anywhere else (multipart does now in this series) to use for future operations; it's only caught and used during retries of this one operation on this one object, not thereafter.  Maybe that's OK.  But I'm tempted to catch SlowDown at the send_request() level rather than higher, and retry there (with exponential backoff rather than the linear backoff currently being used).  Otherwise we have to scatter retry and backoff logic all over the place.

I'm wondering if this isn't the source of the various "failed retry" errors that are routinely posted to the bug list.

Thanks,
Matt