From: WagnerOne <wa...@wa...> - 2014-03-13 00:54:36
|
While this feature is fantastic, I can't find a lot of detail on it in general. I wonder how to disable it? During initial uploads at least, our DirectConnect link seems to be faster in copying the files themselves than s3cmd is at telling S3 to "remote copy" objects. Would that simply be using s3cmd switch --no-check-md5 ? This would seem likely to reduce the RAM required to enumerate the source files too? Thanks, Mike |
From: Matt D. <ma...@do...> - 2014-03-13 02:13:07
|
There's a bug I see (and I created) in current upstream master, where --no-check-md5 will still do the file I/O on local files to get the md5sums for them, exactly to decide if it can do remote copying. That's annoying. This bug also means --no-check-md5 won't, as you might expect, disable remote copying. As no one has asked to be able to disable remote copying, I never coded for it. I'll think about this a bit. There's probably a cleaner way to solve both problems. On Wed, Mar 12, 2014 at 5:14 PM, WagnerOne <wa...@wa...> wrote: > While this feature is fantastic, I can't find a lot of detail on it in > general. I wonder how to disable it? > > During initial uploads at least, our DirectConnect link seems to be faster > in copying the files themselves than s3cmd is at telling S3 to "remote > copy" objects. > > Would that simply be using s3cmd switch --no-check-md5 ? > > This would seem likely to reduce the RAM required to enumerate the source > files too? > > Thanks, > Mike > > > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > S3tools-general mailing list > S3t...@li... > https://lists.sourceforge.net/lists/listinfo/s3tools-general > > |
From: Matt D. <ma...@do...> - 2014-03-13 23:32:27
|
Try the upstream master branch now with --no-check-md5. This should disable all md5 calculations, thus also disable hardlinking and remote copying. Thanks, Matt On Wed, Mar 12, 2014 at 9:12 PM, Matt Domsch <ma...@do...> wrote: > There's a bug I see (and I created) in current upstream master, where > --no-check-md5 will still do the file I/O on local files to get the md5sums > for them, exactly to decide if it can do remote copying. That's annoying. > > This bug also means --no-check-md5 won't, as you might expect, disable > remote copying. As no one has asked to be able to disable remote copying, > I never coded for it. > > I'll think about this a bit. There's probably a cleaner way to solve both > problems. > > > > On Wed, Mar 12, 2014 at 5:14 PM, WagnerOne <wa...@wa...> wrote: > >> While this feature is fantastic, I can't find a lot of detail on it in >> general. I wonder how to disable it? >> >> During initial uploads at least, our DirectConnect link seems to be >> faster in copying the files themselves than s3cmd is at telling S3 to >> "remote copy" objects. >> >> Would that simply be using s3cmd switch --no-check-md5 ? >> >> This would seem likely to reduce the RAM required to enumerate the source >> files too? >> >> Thanks, >> Mike >> >> >> >> >> >> ------------------------------------------------------------------------------ >> Learn Graph Databases - Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and their >> applications. Written by three acclaimed leaders in the field, >> this first edition is now available. Download your free book today! >> http://p.sf.net/sfu/13534_NeoTech >> _______________________________________________ >> S3tools-general mailing list >> S3t...@li... >> https://lists.sourceforge.net/lists/listinfo/s3tools-general >> >> > |
From: WagnerOne <wa...@wa...> - 2014-04-03 18:15:24
|
Hi Matt, Sorry for the delay in testing and responding. I appreciate the effort you put in this to date. When I run a s3cmd sync like so: s3cmd sync --no-preserve --no-check-md5 --verbose --progress /blah/ s3://blah/ I see this in the output indicating it seemingly still generating md5 values: INFO: Compiling list of local files... INFO: Running stat() and reading/calculating MD5 values on 21726 files, this may take some time... INFO: [1000/21726] INFO: [2000/21726] INFO: [3000/21726] INFO: [4000/21726] INFO: [5000/21726] INFO: [6000/21726] INFO: [7000/21726] INFO: [8000/21726] INFO: [9000/21726] INFO: [10000/21726] INFO: [11000/21726] INFO: [12000/21726] INFO: [13000/21726] INFO: [14000/21726] I am not a github expert, but I believe I pulled down the correct s3cmd version. I did "Download Zip" from here: https://github.com/s3tools/s3cmd/tree/master Unzipped it and did python setup.py install The s3cmd --version output is: s3cmd version 1.5.0-beta1 I grabbed the latest github s3cmd master branch commit diff and compared it to the corresponding file from the downloaded zip and it matches, so unless I think I should be using the version with this patch incorporated. Mike On Mar 13, 2014, at 6:32 PM, Matt Domsch <ma...@do...> wrote: > Try the upstream master branch now with --no-check-md5. This should disable all md5 calculations, thus also disable hardlinking and remote copying. > > Thanks, > Matt > > > On Wed, Mar 12, 2014 at 9:12 PM, Matt Domsch <ma...@do...> wrote: > There's a bug I see (and I created) in current upstream master, where --no-check-md5 will still do the file I/O on local files to get the md5sums for them, exactly to decide if it can do remote copying. That's annoying. > > This bug also means --no-check-md5 won't, as you might expect, disable remote copying. As no one has asked to be able to disable remote copying, I never coded for it. > > I'll think about this a bit. There's probably a cleaner way to solve both problems. > > > > On Wed, Mar 12, 2014 at 5:14 PM, WagnerOne <wa...@wa...> wrote: > While this feature is fantastic, I can't find a lot of detail on it in general. I wonder how to disable it? > > During initial uploads at least, our DirectConnect link seems to be faster in copying the files themselves than s3cmd is at telling S3 to "remote copy" objects. > > Would that simply be using s3cmd switch --no-check-md5 ? > > This would seem likely to reduce the RAM required to enumerate the source files too? > > Thanks, > Mike > > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > S3tools-general mailing list > S3t...@li... > https://lists.sourceforge.net/lists/listinfo/s3tools-general > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech_______________________________________________ > S3tools-general mailing list > S3t...@li... > https://lists.sourceforge.net/lists/listinfo/s3tools-general -- wa...@wa... "Every generation laughs at the old fashions, but follows religiously the new."-Thoreau |
From: WagnerOne <wa...@wa...> - 2014-04-03 18:30:22
|
Ah, I believe it is merely a simple matter of feedback diction from s3cmd. When I removed the --no-check-md5, the INFO gathering delay was significantly higher (indicating to me it was indeed then calculating md5 values). So, with --no-check-md5 it is just doing the stat() work. BTW, I really appreciate the "INFO: " addition! Mike On Apr 3, 2014, at 1:14 PM, WagnerOne <wa...@wa...> wrote: > Hi Matt, > > Sorry for the delay in testing and responding. I appreciate the effort you put in this to date. > > When I run a s3cmd sync like so: > > s3cmd sync --no-preserve --no-check-md5 --verbose --progress /blah/ s3://blah/ > > I see this in the output indicating it seemingly still generating md5 values: > > INFO: Compiling list of local files... > INFO: Running stat() and reading/calculating MD5 values on 21726 files, this may take some time... > INFO: [1000/21726] > INFO: [2000/21726] > INFO: [3000/21726] > INFO: [4000/21726] > INFO: [5000/21726] > INFO: [6000/21726] > INFO: [7000/21726] > INFO: [8000/21726] > INFO: [9000/21726] > INFO: [10000/21726] > INFO: [11000/21726] > INFO: [12000/21726] > INFO: [13000/21726] > INFO: [14000/21726] > > I am not a github expert, but I believe I pulled down the correct s3cmd version. > > I did "Download Zip" from here: https://github.com/s3tools/s3cmd/tree/master > > Unzipped it and did > > python setup.py install > > The s3cmd --version output is: > s3cmd version 1.5.0-beta1 > > I grabbed the latest github s3cmd master branch commit diff and compared it to the corresponding file from the downloaded zip and it matches, so unless I think I should be using the version with this patch incorporated. > > Mike > > > On Mar 13, 2014, at 6:32 PM, Matt Domsch <ma...@do...> wrote: > >> Try the upstream master branch now with --no-check-md5. This should disable all md5 calculations, thus also disable hardlinking and remote copying. >> >> Thanks, >> Matt >> >> >> On Wed, Mar 12, 2014 at 9:12 PM, Matt Domsch <ma...@do...> wrote: >> There's a bug I see (and I created) in current upstream master, where --no-check-md5 will still do the file I/O on local files to get the md5sums for them, exactly to decide if it can do remote copying. That's annoying. >> >> This bug also means --no-check-md5 won't, as you might expect, disable remote copying. As no one has asked to be able to disable remote copying, I never coded for it. >> >> I'll think about this a bit. There's probably a cleaner way to solve both problems. >> >> >> >> On Wed, Mar 12, 2014 at 5:14 PM, WagnerOne <wa...@wa...> wrote: >> While this feature is fantastic, I can't find a lot of detail on it in general. I wonder how to disable it? >> >> During initial uploads at least, our DirectConnect link seems to be faster in copying the files themselves than s3cmd is at telling S3 to "remote copy" objects. >> >> Would that simply be using s3cmd switch --no-check-md5 ? >> >> This would seem likely to reduce the RAM required to enumerate the source files too? >> >> Thanks, >> Mike >> >> >> >> >> ------------------------------------------------------------------------------ >> Learn Graph Databases - Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and their >> applications. Written by three acclaimed leaders in the field, >> this first edition is now available. Download your free book today! >> http://p.sf.net/sfu/13534_NeoTech >> _______________________________________________ >> S3tools-general mailing list >> S3t...@li... >> https://lists.sourceforge.net/lists/listinfo/s3tools-general >> >> >> >> ------------------------------------------------------------------------------ >> Learn Graph Databases - Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and their >> applications. Written by three acclaimed leaders in the field, >> this first edition is now available. Download your free book today! >> http://p.sf.net/sfu/13534_NeoTech_______________________________________________ >> S3tools-general mailing list >> S3t...@li... >> https://lists.sourceforge.net/lists/listinfo/s3tools-general > > -- > wa...@wa... > "Every generation laughs at the old fashions, but follows religiously the new."-Thoreau > > > ------------------------------------------------------------------------------ > _______________________________________________ > S3tools-general mailing list > S3t...@li... > https://lists.sourceforge.net/lists/listinfo/s3tools-general -- wa...@wa... "A bad Dead show is better than a good day at work." |
From: WagnerOne <wa...@wa...> - 2014-04-03 19:14:46
|
Hi, I believe I may have uncovered a bug regarding using an IAM role vs. a user key/secret key combination. The instance I'm using s3cmd on has an IAM role allowing it write access to an s3 bucket. At some point after having that IAM role assigned, it was deprecated in favor of using a user account's keys. I noticed s3cmd was still using the IAM role and went to --configure it again. When running --configure, s3cmd grabbed (what I am guessing) are the automatically generated IAM role keys as the defaults. I overwrote those with the given user's account keys. When it came time in the --configure process to test the config: Test access with supplied credentials? [Y/n] y Please wait, attempting to list all buckets... ERROR: Test failed: 400 (InvalidToken): The provided token is malformed or otherwise invalid. I saved the config anyway and was able to do s3 operations with the account keys. The access_token in my old config file for use with the IAM role and the new config file with the account keys, the access_token is the same. So it would appear that access_token is being generated off the IAM role even if I put in the user keys at s3cmd config time. Mike |