Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo

Close

#7 [PATCH] Check that PUT of files are correct

Enhancement_request
closed-fixed
Michal Ludvig
s3cmd (118)
5
2008-04-28
2007-09-22
Kim Minh Kaplan
No

Amazon S3 uses the Content-MD5 and ETag HTTP headers to provide for consistency checking. The attached patch makes sure that any upload error is reported.

Please include this in your next release.

PS: I do not know what kind of error I should raise in case of inconsistency so I just raise a descriptive string.

Discussion

  • Make sure upload are correct.

     
    Attachments
  • Michal Ludvig
    Michal Ludvig
    2007-09-23

    • assigned_to: nobody --> ludvigm
     
  • Michal Ludvig
    Michal Ludvig
    2007-09-23

    Logged In: YES
    user_id=344740
    Originator: NO

    Hi,

    did it ever happen to you that the stored file had a different checksum from the one on your disk? Or is it just a safety check to be sure?

    Michal

     
  • Logged In: YES
    user_id=24992
    Originator: YES

    I have never had any corruption on Amazon S3 yet. It is sanity check. TCP does not protect against transmission corruption (I have experienced this).

     
  • Michal Ludvig
    Michal Ludvig
    2007-09-26

    • status: open --> open-accepted
     
  • Michal Ludvig
    Michal Ludvig
    2007-09-26

    Logged In: YES
    user_id=344740
    Originator: NO

    I like the idea, but will need a better patch:

    - this behaviour should be optional (i.e. need a new command line switch for it) because it's not that important in case of HTTPS and some users may opt for a faster operation without precomputing MD5.

    - the response for PUT-Object contains ETag with MD5 of the uploaded file as Amazon stored it. IMO it's better to compute the MD5 in the upload loop so the file is read from the disk only once. After the whole file is uploaded we'll compare our MD5 with ETag from response and see if they match. If not, re-upload.

    - with 'sync' command we do some MD5 checksums as well. It may be worth to cache them and reuse for upload.

    Are you keen to rework the patch or should I do it?

    Michal

     
  • Logged In: YES
    user_id=24992
    Originator: YES

    Second try... There is now -5, --no-early-md5.

    Unless it is specified the MD5 is computed. Note that it also affects sync command. Sync computation is used.
    File Added: s3cmd-md5.patch

     
  • Revised consistency checks Content-MD5 and ETag

     
    Attachments
  • Logged In: YES
    user_id=24992
    Originator: YES

    Oops, I forgot a couple of lines. Here is the correct patch.
    File Added: s3cmd-md5-2.patch

     
  • Michal Ludvig
    Michal Ludvig
    2008-04-28

    Logged In: YES
    user_id=344740
    Originator: NO

    Sorry it took "a bit longer". s3cmd in SVN computes the md5 sum in the upload loop (in S3.send_file()) which makes it more efficient.

     
  • Michal Ludvig
    Michal Ludvig
    2008-04-28

    • status: open-accepted --> closed-fixed