For the S3 side of the comparison, Amazon stores an MD5SUM value for each file, and returns it in the directory listing as the ETag.

For the local side of the comparison, if using code in the github master branch (1.5.0-alpha3+), you can specify a local --cache-file=(some local file) where MD5SUM values are stored the first time the file is synced to S3, so it doesn't have to be read in during future runs to compare.  If the file's inode changes (modified date, size), it is recalculated on the next sync.  This greatly speeds up the process as all the local files don't need to be read in, just the cache file, while preserving the comparison of using the MD5SUM values.  One can also skip using the MD5SUM values with --no-check-md5, if you know already-uploaded files haven't changed.

Older versions of s3cmd don't have the --cache-file option.

There are some very new patches pending to the master branch which can correctly handle the S3 side MD5SUM value even for files uploaded as multipart (for most s3cmd invocations, files > 5MB).  Otherwise, on older versions, no MD5SUM comparison is done for such multipart files.


On Sun, Aug 18, 2013 at 2:30 PM, Phill Coxon <> wrote:
Hi there.

I've been using s3cmd for years but have only just started using s3cmd
--sync to keep a 5Gb folder with 19,000 images synced with S3.

Would someone clarify how the sync option works?

On each run does it calculate a md5 of every local file and then pull
the corresponding md5 from the S3 metadata to compare?

Or does it store the local md5s somewhere and only recalculates when a
local file changes?

The reason I ask is that running the sync can take a long time (40+
minutes) before any changes are listed and start to sync.  The hosting
company my client is with charges horrendously for international
bandwidth (we're based in New Zealand) so I want to make sure that doing
a sync only uses a small amount of bandwidth (pulling md5s etc) to make
the comparison before uploading any changes.


Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
S3tools-general mailing list