From: Forrest A. <fo...@gm...> - 2014-02-19 23:07:20
|
I had an interesting exchange with Amazon today, where I put in support tickets asking: * How can we determine how much space our buckets (or folders) are using? * Is there a way to do a checksum comparison with local-file vs s3-file to determine integrity? Both options were responded with "no". I find the first option odd -- why wouldn't you include at least on the web UI a running total of space being consumed? Hmmm... They referred me to "s3cmd du" and another app that might help. The problem with this, is we have literally thousands of objects under a particular bucket. I'm actually running a test now of "$ s3cmd --verbose du s3://ourbucket" to see how long that completes and what it says. I'm guessing this will take a very long time, and isn't something we can run on a regular basis due to that fact. Can you perhaps comment or elaborate on these two issues as it applies to s3cmd at least, or how I can better accomplish these. I realize AWS uses an Etag, but that can vary depending upon how the content was shipped up to the cloud, as I understand. Thanks. |
From: Matt D. <ma...@do...> - 2014-02-20 01:47:29
|
On Wed, Feb 19, 2014 at 5:07 PM, Forrest Aldrich <fo...@gm...> wrote: > I had an interesting exchange with Amazon today, where I put in support > tickets asking: > > * How can we determine how much space our buckets (or folders) are using? > S3 will gladly tell you through their usage reports, which are calculated daily at least, maybe hourly. If it's good enough for AWS to use as the measure to charge you each month, that's probably sufficient. > * Is there a way to do a checksum comparison with local-file vs s3-file > to determine integrity? > If you use a recent version of s3cmd, particularly with --cache-file=<filename>, you _could_ do so. It stores the md5sum of the file into the cache-file, at upload time, and sticks the md5sum into the file's metadata stored on S3. There isn't a trivial s3cmd to automatically do the checking though. |
From: Matt D. <ma...@do...> - 2014-02-20 03:00:21
|
As a follow-up, verifying that a file in S3 is actually not corrupted would best be done from an AMI running within the same S3 region your files are in, so as to not incur download bandwidth charges and the slow pipe out of AWS. On Wed, Feb 19, 2014 at 7:47 PM, Matt Domsch <ma...@do...> wrote: > On Wed, Feb 19, 2014 at 5:07 PM, Forrest Aldrich <fo...@gm...> wrote: > >> I had an interesting exchange with Amazon today, where I put in support >> tickets asking: >> >> * How can we determine how much space our buckets (or folders) are >> using? >> > > S3 will gladly tell you through their usage reports, which are calculated > daily at least, maybe hourly. If it's good enough for AWS to use as the > measure to charge you each month, that's probably sufficient. > > > >> * Is there a way to do a checksum comparison with local-file vs s3-file >> to determine integrity? >> > > If you use a recent version of s3cmd, particularly with > --cache-file=<filename>, you _could_ do so. It stores the md5sum of the > file into the cache-file, at upload time, and sticks the md5sum into the > file's metadata stored on S3. There isn't a trivial s3cmd to automatically > do the checking though. > > |