From: Jeremy W. <jer...@gm...> - 2015-03-23 20:59:50
|
> > Q: What checksums does Amazon S3 employ to detect data corruption? > Amazon S3 uses a combination of Content-MD5 checksums and cyclic > redundancy checks (CRCs) to detect data corruption. Amazon S3 performs > these checksums on data at rest and repairs any corruption using redundant > data. In addition, the service calculates checksums on all network traffic > to detect corruption of data packets when storing or retrieving data. – http://aws.amazon.com/s3/faqs/ FWIW, I've never had a corrupt file on S3 and we move a hundred thousand files a day. Jeremy Wadsack On Mon, Mar 23, 2015 at 1:44 PM, Russell Gadd <ru...@gm...> wrote: > So if I want to verify that Amazon holds a valid copy of my files, then I > could get s3cmd to list those of size less than 15MB (assuming an unchanged > default chunk size) with --list-md5, but for larger files I'd need to > download them and calculate the MD5 to compare with the original. I think > I'm likely to leave this to the end of the project as I've got some local > backup anyway. > > Many thanks for the information. > > Russell > > On 23 March 2015 at 17:14, Matt Domsch <ma...@do...> wrote: > >> amazon calculates its own MD5 and puts it in the <ETag> field of a >> BUCKET_LIST. s3cmd does not send a Content-MD5 header, which AWS could use >> to validate that the received object matches the sent object, as the new >> AWS signature v4 method now calculates the sha256 of the content and sends >> that, which is effectively the same. >> >> However, for multipart-sent files, and for server-side-encrypted files >> (either with a KMS key or an Amazon-managed key), the resulting ETag >> doesn't match the original content's MD5. >> >> For this reason, s3cmd sends an x-amz-meta-s3cmd-attribs header which >> includes the original object's MD5. This metadata is stored in S3 and is >> returned by an object HEAD or GET call. >> >> >> On Mon, Mar 23, 2015 at 4:50 AM, Russell Gadd <ru...@gm...> wrote: >> >>> Thanks for the comments Will. I had a look at Duplicity and as you say >>> it looks like a decent backup tool but isn't what I'm looking for. I will >>> have a better look at Python but at present my inclination is to stick to >>> bash/sed/awk. >>> >>> I wonder if Matt could answer the question regarding MD5's returned by a >>> list operation, i.e. does Amazon calculate them based on its own file copy >>> or does it expect to be given the MD5 by the upload software? I seem to >>> remember reading somewhere that Amazon only uses the MD5 for verification >>> of transfer, so that if a file is uploaded in multiple parts it only >>> calculates its own MD5s for each part. Maybe that's outdated information. >>> >>> I tried to verify this by uploading a 35MB file using the s3 console (so >>> s3cmd wouldn't know anything about it) and checking how long it took to >>> download vs how long to list with the --list-md5 option (doing the list >>> operation first). The download was about 15 seconds on my system but the >>> MD5 listing was almost instant, so Amazon had the MD5. However I don't >>> think the upload was multipart, because it restarted 3 times, sometimes >>> getting over half way, before it managed the upload and restarted from the >>> beginning. So I'm still none the wiser. >>> >>> I do believe in verifying backups which is why I'm keen on the MD5 check >>> based on the actual file at s3. I haven't seen any cloud service offer to >>> do hashes on their data - I think one which did would have an extra selling >>> point. As far as I'm concerned I'd be happy to pay a fee for such a >>> service, they wouldn't have to charge much to make it viable. Of course >>> you'd have to make sure their client software didn't cheat by doing the >>> hash on your own PC and you'd want to use independent software locally to >>> verify their hash. >>> >>> Regards >>> Russell >>> >>> On 21 March 2015 at 21:10, Will McCown <wi...@ro...> wrote: >>> >>>> On 3/21/2015 11:51 AM, Russell Gadd wrote: >>>> > My questions are: >>>> > >>>> > 1. Where does Amazon get its MD5 from? Is it calculated locally in my >>>> > PC and sent in some headers? If Amazon calculates it at their end >>>> > from the file it has on its servers then the verification is ok >>>> but >>>> > otherwise how do I know their copy of the file is valid? >>>> >>>> I believe that Amazon calculates it on their end, or at least I hope so >>>> as I use it as an integrity check for my own backups. If you learn >>>> otherwise please let us know. >>>> >>>> > 2. How easy is it to find out how to use Amazon's AWS CLI in Linux? I >>>> > have tried out s3cmd and it seems easy to use, but at first glance >>>> > the AWS CLI looks pretty complex. >>>> > 3. I plan to use Bash and a little sed / awk in Linux. I've already >>>> > done some code to create and manipulate this index as a trial. I >>>> > don't particularly like Bash as such but it does a job. >>>> > Alternatively I could perhaps use this project to learn some other >>>> > language such as Python, but I'm not particularly keen to do this >>>> > unless it confers particular advantages. Any opinions would be >>>> > welcome (leaning perhaps to a C-like language if possible). >>>> >>>> I would certainly borrow heavily from s3cmd as an example. I've looked >>>> at the CLI as well and find it pretty complex (but I'm not a really >>>> a programmer). You might also want to check out the package called >>>> "duplicity". I've been using it with s3 as the back end for a while >>>> and it seems to work pretty well (but works in the classical >>>> full/incremental backup mode which isn't quite what you are >>>> are thinking of). But duplicity is written in python and will >>>> be another example of an implementation of an s3 back end. >>>> >>>> I used to write lots of complicated base/sed/awk scripts to do stuff, >>>> but these days I think Perl or Python is a much better choice for >>>> such things. Both languages have a tremendous open-source library >>>> bases to draw upon that you can do a lot with very little actual >>>> coding. >>>> >>>> -- >>>> Will McCown, Rolling Hills Estates, CA >>>> wi...@ro... >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Dive into the World of Parallel Programming The Go Parallel Website, >>>> sponsored >>>> by Intel and developed in partnership with Slashdot Media, is your hub >>>> for all >>>> things parallel software development, from weekly thought leadership >>>> blogs to >>>> news, videos, case studies, tutorials and more. Take a look and join the >>>> conversation now. http://goparallel.sourceforge.net/ >>>> _______________________________________________ >>>> S3tools-general mailing list >>>> S3t...@li... >>>> https://lists.sourceforge.net/lists/listinfo/s3tools-general >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Dive into the World of Parallel Programming The Go Parallel Website, >>> sponsored >>> by Intel and developed in partnership with Slashdot Media, is your hub >>> for all >>> things parallel software development, from weekly thought leadership >>> blogs to >>> news, videos, case studies, tutorials and more. Take a look and join the >>> conversation now. http://goparallel.sourceforge.net/ >>> _______________________________________________ >>> S3tools-general mailing list >>> S3t...@li... >>> https://lists.sourceforge.net/lists/listinfo/s3tools-general >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Dive into the World of Parallel Programming The Go Parallel Website, >> sponsored >> by Intel and developed in partnership with Slashdot Media, is your hub >> for all >> things parallel software development, from weekly thought leadership >> blogs to >> news, videos, case studies, tutorials and more. Take a look and join the >> conversation now. http://goparallel.sourceforge.net/ >> _______________________________________________ >> S3tools-general mailing list >> S3t...@li... >> https://lists.sourceforge.net/lists/listinfo/s3tools-general >> >> > > > ------------------------------------------------------------------------------ > Dive into the World of Parallel Programming The Go Parallel Website, > sponsored > by Intel and developed in partnership with Slashdot Media, is your hub for > all > things parallel software development, from weekly thought leadership blogs > to > news, videos, case studies, tutorials and more. Take a look and join the > conversation now. http://goparallel.sourceforge.net/ > _______________________________________________ > S3tools-general mailing list > S3t...@li... > https://lists.sourceforge.net/lists/listinfo/s3tools-general > > |