From: Matt D. <Mat...@de...> - 2012-07-15 03:45:37
|
In hopes of getting my pull requests merged, as well as the pending backlog of pull requests, I went ahead and created a tree that has (nearly) all the pull requests in it now. Of the 22 open requests, the only one I haven't pulled is firstclown's Pickle encryption request, because it wasn't clear from the comments that it's in shape enough to consider pulling. https://github.com/mdomsch/s3cmd/tree/merge (aka the 'merge' branch on my tree at git://github.com/mdomsch/s3cmd.git) I had to make a few minor manual cleanups, but I think they were straightforward. Shortlog appended from s3tools/s3cmd.git master to top of my 'merge' branch. I did a few operations to be sure it still ran, and it seems to work as expected. Thanks, Matt -- Matt Domsch Technology Strategist Dell | Office of the CTO Brendan O'Connor (1): Bugfix: use extra headers -- allows Requester Pays for buckets you don't own. Patch from http://arxiv.org/help/bulk_data_s3 Charlie Schluting (2): s3cmd du can gobble gigs of RAM on a bucket with millions of objects. Re-worked 'du' to traverse the structure and store only the sum at each level, allowing python to free memory at each folder. Went from 4GB consumption (and being killed) on a test bucket with ~50M objects in thousands of directories, to a max of 80M usage. that's not necessary any more. Eric Connell (1): added the ability to upload files by reading from stdin George Hickman (2): Don't include python files in the manifest. Add a makefile for releasing James Brown (1): Respect the $TZ environment variable to show local times if requested Jason Godfrey (1): Quick fix for cfcreate and cf modify not working. These commands would generate a KeyError for 'Origin'. (Bug 3536626) Joe Fiorini (1): Added command to set bucket access policy Josep del Rio (4): When using default index, invalidate path Corrected the new flags for CF index invalidation Removed debug messages, make sure default index is defined Fixed exception for root document Karl Matthias (1): Add support for putting arbitrary additional headers in s3cfg file. Karsten Sperling (13): Added example /etc/magic file with better support for web file types Add support for setting Content-Encoding header where it can be guessed, in particular Content-Encoding: gzip is handled. Flush stdout after logging messages, in case buffered pipes are involved Merge branch 'master' of git://github.com/s3tools/s3cmd Added example /etc/magic file with better support for web file types Add support for setting Content-Encoding header where it can be guessed, in particular Content-Encoding: gzip is handled. Flush stdout after logging messages, in case buffered pipes are involved Fix up code that was merged incorrectly Merge branch 'mime-guessing' Add -M short option for --guess-mime-type Fix ValueError when magic module is not present Add support for http://pypi.python.org/pypi/filemagic version of python-magic Merge branch 'mime-guessing' Matt Domsch (48): Apply excludes/includes at local os.walk() time add --delete-after option for sync add more --delete-after to sync variations Merge remote-tracking branch 'origin/master' into merge Merge branch 'delete-after' into merge add Config.delete_after Merge branch 'delete-after' into merge fix os.walk() exclusions for new upstream code Merge branch 'master' into merge add --delay-updates option finish merge Handle hardlinks and duplicate files hardlink/copy fix remote_copy() doesn't need to know dst_list anymore handle remote->local transfers with local hardlink/copy if possible sync: add --add-destination, parallelize uploads to multiple destinations add local tree MD5 caching fix getting uid Apply excludes/includes at local os.walk() time add --delete-after option for sync add more --delete-after to sync variations add Config.delete_after fix os.walk() exclusions for new upstream code add --delay-updates option Handle hardlinks and duplicate files hardlink/copy fix remote_copy() doesn't need to know dst_list anymore handle remote->local transfers with local hardlink/copy if possible sync: add --add-destination, parallelize uploads to multiple destinations add local tree MD5 caching fix getting uid HashCache: add missing break during purge merge manpage conflict] Merge branch 'cache-local-md5' into hardlink-fixes sync: refactor parent/child and single process code Merge branch 'econnell/master' into econnell-merge merge relistan/master Merge remote-tracking branch 'brendano/master' into brendano-merge Merge remote-tracking branch 'ksperling/master' into ksperling-merge Merge remote-tracking branch 'michaeltyson/master' into michaeltyson-merge Merge remote-tracking branch 'joseprio/master' into joseprio-merge Merge remote-tracking branch 'ghickman/packaging' into ghickman-merge Merge remote-tracking branch 'mattswe/patch-1' into mattswe-merge Merge remote-tracking branch 'joefiorini/access-policy' into joefiorini-merge Merge remote-tracking branch 'Roguelazer/tz_support' into Roguelazer-merge Merge remote-tracking branch 'godfreja/fix_cf_keyerror' into godfreja-merge Merge remote-tracking branch 'manos/master' into manos-merge Merge remote-tracking branch 'ghickman/manifest-cleanup' into merge Matt Sweeney (1): Add command line options for AWS access key and secret key Michael Tyson (1): Added support for --acl-grant/--acl-revoke to 'sync' command Tom Wilkie (1): Fix inline if so that it works on python2.4 econnell (1): Merge pull request #1 from tomwilkie/master |
From: Matt D. <Mat...@de...> - 2012-07-17 18:04:03
|
On Sat, Jul 14, 2012 at 10:45:30PM -0500, Domsch, Matt wrote: > In hopes of getting my pull requests merged, as well as the pending > backlog of pull requests, I went ahead and created a tree that has > (nearly) all the pull requests in it now. Of the 22 open requests, > the only one I haven't pulled is firstclown's Pickle encryption > request, because it wasn't clear from the comments that it's in shape > enough to consider pulling. > > https://github.com/mdomsch/s3cmd/tree/merge > (aka the 'merge' branch on my tree at > git://github.com/mdomsch/s3cmd.git) > > I had to make a few minor manual cleanups, but I think they were > straightforward. Shortlog appended from s3tools/s3cmd.git master to > top of my 'merge' branch. > > I did a few operations to be sure it still ran, and it seems to work > as expected. I've had to add two patches to my 'merge' branch to account for two bugs I've run into. 1) fix mime detection for Fedora python-mime. This accounts for newer Python throwing a TypeError instead of an AttributeError when invoking magic.Magic(flags=...) when flags isn't a valid argument. 2) fix multipart uploads. This was introduced by the merge, the value of self.chunk_size was getting set a little too late after merging in the 'read from stdin' patch. Thanks, Matt -- Matt Domsch Technology Strategist Dell | Office of the CTO |
From: John S. <Joh...@sy...> - 2012-07-20 12:21:41
Attachments:
signature.asc
|
Matt, When I first attempted to use your merged s3cmd, I found that specifying --exclude='*' --include='*.gpg' gave a different result. Previously, it would include all the gpg files in the tree; now it includes only the gpg files at the top level. You noted that you were now applying excludes/includes at local os.walk() time--I wonder if this might be the cause of the changed behavior. I have reverted to the previous s3cmd, but I will install your new one and re-test if you need more details or have a workaround. By the way, it would be easier to discuss your new s3cmd if it had a different version number than the s3cmd it is derived from. John Sauter (Joh...@sy...) |
From: Matt D. <Mat...@de...> - 2012-07-20 14:27:13
|
On Fri, Jul 20, 2012 at 07:06:33AM -0500, John Sauter wrote: > Matt, > > When I first attempted to use your merged s3cmd, I found that specifying > > --exclude='*' --include='*.gpg' > > gave a different result. Previously, it would include all the gpg files > in the tree; now it includes only the gpg files at the top level. You > noted that you were now applying excludes/includes at local os.walk() > time--I wonder if this might be the cause of the changed behavior. > > I have reverted to the previous s3cmd, but I will install your new one > and re-test if you need more details or have a workaround. You are correct, the way excludes/includes are processed now this won't work because the * matches all directories during the walk. You'll have to somehow explicitly include the directories that lead to each of the *.gpg files. I need to think about this some more for an example. There's also a fairly large failure introduced by the merge such that sync_local2remote looks awful, unnecessarily duplicated code and errors with undefined references. I'll try to fix that up too. > By the way, it would be easier to discuss your new s3cmd if it had a > different version number than the s3cmd it is derived from. Good thought. Thanks, Matt -- Matt Domsch Technology Strategist Dell | Office of the CTO |
From: Matt D. <Mat...@de...> - 2012-07-31 18:30:59
|
On Fri, Jul 20, 2012 at 09:27:04AM -0500, Domsch, Matt wrote: > On Fri, Jul 20, 2012 at 07:06:33AM -0500, John Sauter wrote: > > Matt, > > > > When I first attempted to use your merged s3cmd, I found that specifying > > > > --exclude='*' --include='*.gpg' > > > > gave a different result. Previously, it would include all the gpg files > > in the tree; now it includes only the gpg files at the top level. You > > noted that you were now applying excludes/includes at local os.walk() > > time--I wonder if this might be the cause of the changed behavior. > > > > I have reverted to the previous s3cmd, but I will install your new one > > and re-test if you need more details or have a workaround. > > You are correct, the way excludes/includes are processed now this > won't work because the * matches all directories during the walk. > You'll have to somehow explicitly include the directories that lead to > each of the *.gpg files. I need to think about this some more for an > example. I have found that by using: --exclude='*' --include='small/' --include='*.gpg' I can cause the files in small/somefile.gpg to be included even though the rest of the directories alongside small/ are excluded. How deep is your directory tree? Could you put these patterns in a file and pull them in with --include-from and --exclude-from ? The problem I was solving was avoiding the need to recurse through a ton of subdirectories (e.g. netapp .snapshot*) that were meaningless to my sync, yet the os.walk() would be forced to decend through them. The time for my sync was greatly reduced by implementing excludes at os.walk() time. The cost of such capability is needing to be more explicit about what's excluded and included knowing it applies first during os.walk(), not only afterwards to the set discovered by os.walk(). > > There's also a fairly large failure introduced by the merge such that > sync_local2remote looks awful, unnecessarily duplicated code and > errors with undefined references. I'll try to fix that up too. Fixed these in my merge tree. Also pulled in the newer pull request by Christopher Noyes to include the encoding type in the content-type string. > > By the way, it would be easier to discuss your new s3cmd if it had a > > different version number than the s3cmd it is derived from. Done. -version = "1.1.0-beta3" +version = "1.1.0-beta3-mdomsch-merge-20120730" Still hoping Michal Ludvig will pull all these changes into his tree. Happy to take feedback on functionality of my 'merge' branch. Thanks, Matt -- Matt Domsch Technology Strategist Dell | Office of the CTO |
From: John S. <Joh...@sy...> - 2012-08-01 05:05:07
Attachments:
signature.asc
|
On Tue, 2012-07-31 at 13:30 -0500, Matt Domsch wrote: > On Fri, Jul 20, 2012 at 09:27:04AM -0500, Domsch, Matt wrote: > > On Fri, Jul 20, 2012 at 07:06:33AM -0500, John Sauter wrote: > > > Matt, > > > > > > When I first attempted to use your merged s3cmd, I found that specifying > > > > > > --exclude='*' --include='*.gpg' > > > > > > gave a different result. Previously, it would include all the gpg files > > > in the tree; now it includes only the gpg files at the top level. You > > > noted that you were now applying excludes/includes at local os.walk() > > > time--I wonder if this might be the cause of the changed behavior. > > > > > > I have reverted to the previous s3cmd, but I will install your new one > > > and re-test if you need more details or have a workaround. > > > > You are correct, the way excludes/includes are processed now this > > won't work because the * matches all directories during the walk. > > You'll have to somehow explicitly include the directories that lead to > > each of the *.gpg files. I need to think about this some more for an > > example. > > I have found that by using: > --exclude='*' --include='small/' --include='*.gpg' > > I can cause the files in small/somefile.gpg to be included even though > the rest of the directories alongside small/ are excluded. How deep > is your directory tree? Could you put these patterns in a file and > pull them in with --include-from and --exclude-from ? > > The problem I was solving was avoiding the need to recurse through a > ton of subdirectories (e.g. netapp .snapshot*) that were meaningless > to my sync, yet the os.walk() would be forced to decend through them. > The time for my sync was greatly reduced by implementing excludes at > os.walk() time. The cost of such capability is needing to be more > explicit about what's excluded and included knowing it applies first > during os.walk(), not only afterwards to the set discovered by > os.walk(). > > > > > There's also a fairly large failure introduced by the merge such that > > sync_local2remote looks awful, unnecessarily duplicated code and > > errors with undefined references. I'll try to fix that up too. > > Fixed these in my merge tree. Also pulled in the newer pull request > by Christopher Noyes to include the encoding type in the content-type string. > > > > By the way, it would be easier to discuss your new s3cmd if it had a > > > different version number than the s3cmd it is derived from. > > Done. > > -version = "1.1.0-beta3" > +version = "1.1.0-beta3-mdomsch-merge-20120730" > > > Still hoping Michal Ludvig will pull all these changes into his tree. > > Happy to take feedback on functionality of my 'merge' branch. > > Thanks, > Matt > Matt, One of the directories I use s3cmd to back up is both wide and deep: it contains all of the sound effects and microphone notes for the stage plays I have worked on since 1998. Adding another line to a file whenever I add a directory to it would be a pain. As an alternative, how about using a different keyword (or keywords) to do the os.walk excludes? That would restore the meaning of the old keywords and still let you gain your performance improvements. John Sauter (Joh...@sy...) |