#7 [PATCH] Check that PUT of files are correct

Enhancement_request
closed-fixed
s3cmd (119)
5
2008-04-28
2007-09-22
No

Amazon S3 uses the Content-MD5 and ETag HTTP headers to provide for consistency checking. The attached patch makes sure that any upload error is reported.

Please include this in your next release.

PS: I do not know what kind of error I should raise in case of inconsistency so I just raise a descriptive string.

Discussion

  • Michal Ludvig

    Michal Ludvig - 2007-09-23
    • assigned_to: nobody --> ludvigm
     
  • Michal Ludvig

    Michal Ludvig - 2007-09-23

    Logged In: YES
    user_id=344740
    Originator: NO

    Hi,

    did it ever happen to you that the stored file had a different checksum from the one on your disk? Or is it just a safety check to be sure?

    Michal

     
  • Kim Minh Kaplan

    Kim Minh Kaplan - 2007-09-23

    Logged In: YES
    user_id=24992
    Originator: YES

    I have never had any corruption on Amazon S3 yet. It is sanity check. TCP does not protect against transmission corruption (I have experienced this).

     
  • Michal Ludvig

    Michal Ludvig - 2007-09-26
    • status: open --> open-accepted
     
  • Michal Ludvig

    Michal Ludvig - 2007-09-26

    Logged In: YES
    user_id=344740
    Originator: NO

    I like the idea, but will need a better patch:

    - this behaviour should be optional (i.e. need a new command line switch for it) because it's not that important in case of HTTPS and some users may opt for a faster operation without precomputing MD5.

    - the response for PUT-Object contains ETag with MD5 of the uploaded file as Amazon stored it. IMO it's better to compute the MD5 in the upload loop so the file is read from the disk only once. After the whole file is uploaded we'll compare our MD5 with ETag from response and see if they match. If not, re-upload.

    - with 'sync' command we do some MD5 checksums as well. It may be worth to cache them and reuse for upload.

    Are you keen to rework the patch or should I do it?

    Michal

     
  • Kim Minh Kaplan

    Kim Minh Kaplan - 2007-09-29

    Logged In: YES
    user_id=24992
    Originator: YES

    Second try... There is now -5, --no-early-md5.

    Unless it is specified the MD5 is computed. Note that it also affects sync command. Sync computation is used.
    File Added: s3cmd-md5.patch

     
  • Kim Minh Kaplan

    Kim Minh Kaplan - 2007-09-30

    Revised consistency checks Content-MD5 and ETag

     
  • Kim Minh Kaplan

    Kim Minh Kaplan - 2007-09-30

    Logged In: YES
    user_id=24992
    Originator: YES

    Oops, I forgot a couple of lines. Here is the correct patch.
    File Added: s3cmd-md5-2.patch

     
  • Michal Ludvig

    Michal Ludvig - 2008-04-28

    Logged In: YES
    user_id=344740
    Originator: NO

    Sorry it took "a bit longer". s3cmd in SVN computes the md5 sum in the upload loop (in S3.send_file()) which makes it more efficient.

     
  • Michal Ludvig

    Michal Ludvig - 2008-04-28
    • status: open-accepted --> closed-fixed
     

Log in to post a comment.