From: Michal L. <mi...@lo...> - 2013-02-19 01:58:10
|
Surprise surprise! After more than a year of neglecting s3cmd and letting it rot unmaintained I thought I should spend some time bringing it back to life. Here you go: s3cmd 1.5.0-alpha1 <http://sourceforge.net/projects/s3tools/files/s3cmd/1.5.0-alpha1/s3cmd-1.5.0-alpha1.tar.gz> is now available Big thanks to Matt Domsch for integrating many pull requests over the year into his Fedora version of s3cmd. This s3cmd 1.5.0-alpha1 is very much based on his outstanding work. And of course many thanks and a big sorry to all the contributors who submitted new features and bug fixes and were left without a response, wondering if their work is ever going to be merged back into the official s3cmd codebase. Sorry guys, this indeed wasn't a prime example of a project maintainership :( So what has changed since 1.1.0-beta3 released a year ago? _s3cmd 1.5.0__-alpha1_ * Server-side copy for hardlinks/softlinks to improve performance (Matt Domsch) * New [signurl] command (Craig Ringer) * Improved symlink-loop detection (Michal Ludvig) * Add --delete-after option for sync (Matt Domsch) * Handle empty return bodies when processing S3 errors. (Kelly McLaughlin) * Upload from STDIN (Eric Connell) * Updated bucket locations (Stefhen Hovland) * Support custom HTTP headers (Brendan O'Connor, Karl Matthias) * Improved MIME support (Karsten Sperling, Christopher Noyes) * Added support for --acl-grant/--acl-revoke to 'sync' command (Michael Tyson) * CloudFront: Support default index and default root invalidation (Josep del Rio) * Command line options for access/secret keys (Matt Sweeney) * Support [setpolicy] for setting bucket policies (Joe Fiorini) * Respect the $TZ environment variable (James Brown) * Reduce memory consumption for [s3cmd du] (Charlie Schluting) * Rate limit progress updates (Steven Noonan) * Download from S3 to a temp file first (Sumit Kumar) * Reuse a single connection when doing a bucket list (Kelly McLaughlin) * Delete empty files if object_get() failed (Oren Held) That's a lot of changes all around the codebase. Although s3cmd passes the testsuite and works in the common situations I'm afraid we may have missed some corner cases. Please report any problems you experience when running s3cmd 1.5.0-alpha1 to make the final release as stable as possible. Download from here: http://sourceforge.net/projects/s3tools/files/s3cmd/1.5.0-alpha1/s3cmd-1.5.0-alpha1.tar.gz The most up to date sourcecode is available on GitHub: https://github.com/s3tools/s3cmd Also check out our website for further updates: http://s3tools.org/s3cmd Now download, test and enjoy! :) Michal & contributors |
From: Matt D. <Mat...@de...> - 2013-02-19 18:31:27
|
On Mon, Feb 18, 2013 at 07:35:11PM -0600, Michal Ludvig wrote: > So what has changed since 1.1.0-beta3 released a year ago? > > s3cmd 1.5.0-alpha1 > * Server-side copy for hardlinks/softlinks to improve performance (Matt Domsch) Thank you Michal for cleaning up the mess I left behind in the hardlink code. I'm glad you could see your way through it to make it work as expected, rather than as written. :-) -- Matt Domsch Technology Strategist Dell | Office of the CTO |
From: Simone C. <si...@co...> - 2013-02-19 22:18:11
|
On Tue, Feb 19, 2013 at 6:56 PM, Matt Domsch <Mat...@de...> wrote: > On Mon, Feb 18, 2013 at 07:35:11PM -0600, Michal Ludvig wrote: >> So what has changed since 1.1.0-beta3 released a year ago? >> >> s3cmd 1.5.0-alpha1 >> * Server-side copy for hardlinks/softlinks to improve performance (Matt Domsch) > > Thank you Michal for cleaning up the mess I left behind in the > hardlink code. I'm glad you could see your way through it to make it > work as expected, rather than as written. :-) +1 I really love this piece of software! Thanks for the update... -- -S |
From: John S. <Joh...@sy...> - 2013-02-19 18:53:02
Attachments:
signature.asc
|
In July-August of 2012 there was a discussion of the new meaning of the --exclude qualifier. I was concerned that the new meaning meant that I could no longer conveniently use S3cmd to backup the *.gpg files in a wide and deep directory tree. I suggested that the new function of --exclude should be moved to a new qualifier for compatibility. The URL for archive of the discussion is <http://sourceforge.net/mailarchive/message.php?msg_id=29617785>. Has my concern been addressed? John Sauter (Joh...@sy...) |
From: Matt D. <Mat...@de...> - 2013-02-19 19:33:57
|
On Tue, Feb 19, 2013 at 12:27:22PM -0600, John Sauter wrote: > In July-August of 2012 there was a discussion of the new meaning of the > --exclude qualifier. I was concerned that the new meaning meant that I > could no longer conveniently use S3cmd to backup the *.gpg files in a > wide and deep directory tree. I suggested that the new function of > --exclude should be moved to a new qualifier for compatibility. > > The URL for archive of the discussion is > <http://sourceforge.net/mailarchive/message.php?msg_id=29617785>. > > Has my concern been addressed? > John Sauter (Joh...@sy...) It has not. The equivalent rsync command: $ rsync -a --exclude='*' --include='*.gpg' src dst copies nothing, as 'src' matches '*'. So the new behavior of s3cmd sync matches that of rsync, which I believe is what most users would expect. I believe what you are looking for is the rsync equivalent of: $ pushd src $ find . -name \*.gpg > ../files-from.txt $ popd $ rsync --files-from=files-from.txt src dst which copies only the files specified in files-from.txt which was created apriori to the rsync. Correct? If so, I think this could be added into S3/FileLists.py:_get_filelist_local() in a fairly straightforward manner. But it would still require you to change your script. Before heading down this path, I want to be sure my understanding is correct. Please advise. Thanks, Matt -- Matt Domsch Technology Strategist Dell | Office of the CTO |
From: Matt D. <Mat...@de...> - 2013-02-19 22:16:59
|
On Tue, Feb 19, 2013 at 01:33:35PM -0600, Domsch, Matt wrote: > On Tue, Feb 19, 2013 at 12:27:22PM -0600, John Sauter wrote: > > In July-August of 2012 there was a discussion of the new meaning of the > > --exclude qualifier. I was concerned that the new meaning meant that I > > could no longer conveniently use S3cmd to backup the *.gpg files in a > > wide and deep directory tree. I suggested that the new function of > > --exclude should be moved to a new qualifier for compatibility. > > > > The URL for archive of the discussion is > > <http://sourceforge.net/mailarchive/message.php?msg_id=29617785>. > > > > Has my concern been addressed? > > John Sauter (Joh...@sy...) > > It has not. > > The equivalent rsync command: > > $ rsync -a --exclude='*' --include='*.gpg' src dst > > copies nothing, as 'src' matches '*'. So the new behavior of s3cmd > sync matches that of rsync, which I believe is what most users would > expect. > > I believe what you are looking for is the rsync equivalent of: > > $ pushd src > $ find . -name \*.gpg > ../files-from.txt > $ popd > $ rsync --files-from=files-from.txt src dst > > which copies only the files specified in files-from.txt which was > created apriori to the rsync. Correct? > > If so, I think this could be added into > S3/FileLists.py:_get_filelist_local() in a fairly straightforward > manner. But it would still require you to change your script. Before > heading down this path, I want to be sure my understanding is > correct. Please advise. Something like the patch below, found at: https://github.com/mdomsch/s3cmd/tree/files-from/ -- Matt Domsch Technology Strategist Dell | Office of the CTO >From 3ce5e98914497274defe459a57ea617b9368db65 Mon Sep 17 00:00:00 2001 From: Matt Domsch <Mat...@de...> Date: Tue, 19 Feb 2013 16:08:15 -0600 Subject: [PATCH] add --files-from=FILE to allow transfer of select files only This solves the change of behavior introduced by processing excludes/includes during os.walk(), where previously: s3cmd sync --exclude='*' --include='*.gpg' would walk the whole tree and transfer only the files named *.gpg. Since the change to os.walk(), the exclude '*' matches everything, and nothing is transferred. This patch introduces --files-from=FILE to match rsync behaviour, where the list of files to transfer (local to remote) is taken not from an os.walk(), but from the explicit list in FILE. The equivalent for remote to local, and remote to remote, is not yet implemented. --- S3/Config.py | 1 + S3/FileLists.py | 41 +++++++++++++++++++++++++++++++++++++---- s3cmd | 3 +++ 3 files changed, 41 insertions(+), 4 deletions(-) diff --git a/S3/Config.py b/S3/Config.py index c8770ca..aac6b09 100644 --- a/S3/Config.py +++ b/S3/Config.py @@ -92,6 +92,7 @@ class Config(object): website_error = "" website_endpoint = "http://%(bucket)s.s3-website-%(location)s.amazonaws.com/" additional_destinations = [] + files_from = [] cache_file = "" add_headers = "" diff --git a/S3/FileLists.py b/S3/FileLists.py index 2bf7ed9..fae9004 100644 --- a/S3/FileLists.py +++ b/S3/FileLists.py @@ -140,6 +140,35 @@ def handle_exclude_include_walk(root, dirs, files): else: debug(u"PASS: %r" % (file)) + +def _get_filelist_from_file(cfg, local_path): + def _append(d, key, value): + if key not in d: + d[key] = [value] + else: + d[key].append(value) + + filelist = {} + for fname in cfg.files_from: + f = open(fname, 'r') + for line in f: + line = line.strip() + line = os.path.normpath(os.path.join(local_path, line)) + dirname = os.path.dirname(line) + basename = os.path.basename(line) + _append(filelist, dirname, basename) + f.close() + + # reformat to match os.walk() + result = [] + keys = filelist.keys() + keys.sort() + for key in keys: + values = filelist[key] + values.sort() + result.append((key, [], values)) + return result + def fetch_local_list(args, recursive = None): def _get_filelist_local(loc_list, local_uri, cache): info(u"Compiling list of local files...") @@ -156,11 +185,15 @@ def fetch_local_list(args, recursive = None): if local_uri.isdir(): local_base = deunicodise(local_uri.basename()) local_path = deunicodise(local_uri.path()) - if cfg.follow_symlinks: - filelist = _fswalk_follow_symlinks(local_path) + if len(cfg.files_from): + filelist = _get_filelist_from_file(cfg, local_path) + single_file = False else: - filelist = _fswalk_no_symlinks(local_path) - single_file = False + if cfg.follow_symlinks: + filelist = _fswalk_follow_symlinks(local_path) + else: + filelist = _fswalk_no_symlinks(local_path) + single_file = False else: local_base = "" local_path = deunicodise(local_uri.dirname()) diff --git a/s3cmd b/s3cmd index 1aa31ae..c1a1a28 100755 --- a/s3cmd +++ b/s3cmd @@ -1738,6 +1738,7 @@ def main(): optparser.add_option( "--rinclude", dest="rinclude", action="append", metavar="REGEXP", help="Same as --include but uses REGEXP (regular expression) instead of GLOB") optparser.add_option( "--rinclude-from", dest="rinclude_from", action="append", metavar="FILE", help="Read --rinclude REGEXPs from FILE") + optparser.add_option( "--files-from", dest="files_from", action="append", metavar="FILE", help="Read list of source-file names from FILE") optparser.add_option( "--bucket-location", dest="bucket_location", help="Datacentre to create bucket in. As of now the datacenters are: US (default), EU, ap-northeast-1, ap-southeast-1, sa-east-1, us-west-1 and us-west-2") optparser.add_option( "--reduced-redundancy", "--rr", dest="reduced_redundancy", action="store_true", help="Store object with 'Reduced redundancy'. Lower per-GB price. [put, cp, mv]") @@ -1910,6 +1911,8 @@ def main(): if options.additional_destinations: cfg.additional_destinations = options.additional_destinations + if options.files_from: + cfg.files_from = options.files_from ## Set output and filesystem encoding for printing out filenames. sys.stdout = codecs.getwriter(cfg.encoding)(sys.stdout, "replace") -- 1.8.1.2 |
From: John S. <Joh...@sy...> - 2013-02-20 01:07:47
Attachments:
signature.asc
|
On Tue, 2013-02-19 at 16:16 -0600, Matt Domsch wrote: > On Tue, Feb 19, 2013 at 01:33:35PM -0600, Domsch, Matt wrote: > > On Tue, Feb 19, 2013 at 12:27:22PM -0600, John Sauter wrote: > > > In July-August of 2012 there was a discussion of the new meaning of the > > > --exclude qualifier. I was concerned that the new meaning meant that I > > > could no longer conveniently use S3cmd to backup the *.gpg files in a > > > wide and deep directory tree. I suggested that the new function of > > > --exclude should be moved to a new qualifier for compatibility. > > > > > > The URL for archive of the discussion is > > > <http://sourceforge.net/mailarchive/message.php?msg_id=29617785>. > > > > > > Has my concern been addressed? > > > John Sauter (Joh...@sy...) > > > > It has not. > > > > The equivalent rsync command: > > > > $ rsync -a --exclude='*' --include='*.gpg' src dst > > > > copies nothing, as 'src' matches '*'. So the new behavior of s3cmd > > sync matches that of rsync, which I believe is what most users would > > expect. > > > > I believe what you are looking for is the rsync equivalent of: > > > > $ pushd src > > $ find . -name \*.gpg > ../files-from.txt > > $ popd > > $ rsync --files-from=files-from.txt src dst > > > > which copies only the files specified in files-from.txt which was > > created apriori to the rsync. Correct? > > > > If so, I think this could be added into > > S3/FileLists.py:_get_filelist_local() in a fairly straightforward > > manner. But it would still require you to change your script. Before > > heading down this path, I want to be sure my understanding is > > correct. Please advise. > > > Something like the patch below, found at: > https://github.com/mdomsch/s3cmd/tree/files-from/ > > -- > Matt Domsch > Technology Strategist > Dell | Office of the CTO > > >From 3ce5e98914497274defe459a57ea617b9368db65 Mon Sep 17 00:00:00 2001 > From: Matt Domsch <Mat...@de...> > Date: Tue, 19 Feb 2013 16:08:15 -0600 > Subject: [PATCH] add --files-from=FILE to allow transfer of select files only > > This solves the change of behavior introduced by processing > excludes/includes during os.walk(), where previously: > > s3cmd sync --exclude='*' --include='*.gpg' > > would walk the whole tree and transfer only the files named *.gpg. > > Since the change to os.walk(), the exclude '*' matches everything, and > nothing is transferred. > > This patch introduces --files-from=FILE to match rsync behaviour, > where the list of files to transfer (local to remote) is taken not > from an os.walk(), but from the explicit list in FILE. > > The equivalent for remote to local, and remote to remote, is not yet > implemented. > --- > S3/Config.py | 1 + > S3/FileLists.py | 41 +++++++++++++++++++++++++++++++++++++---- > s3cmd | 3 +++ > 3 files changed, 41 insertions(+), 4 deletions(-) > > diff --git a/S3/Config.py b/S3/Config.py > index c8770ca..aac6b09 100644 > --- a/S3/Config.py > +++ b/S3/Config.py > @@ -92,6 +92,7 @@ class Config(object): > website_error = "" > website_endpoint = "http://%(bucket)s.s3-website-%(location)s.amazonaws.com/" > additional_destinations = [] > + files_from = [] > cache_file = "" > add_headers = "" > > diff --git a/S3/FileLists.py b/S3/FileLists.py > index 2bf7ed9..fae9004 100644 > --- a/S3/FileLists.py > +++ b/S3/FileLists.py > @@ -140,6 +140,35 @@ def handle_exclude_include_walk(root, dirs, files): > else: > debug(u"PASS: %r" % (file)) > > + > +def _get_filelist_from_file(cfg, local_path): > + def _append(d, key, value): > + if key not in d: > + d[key] = [value] > + else: > + d[key].append(value) > + > + filelist = {} > + for fname in cfg.files_from: > + f = open(fname, 'r') > + for line in f: > + line = line.strip() > + line = os.path.normpath(os.path.join(local_path, line)) > + dirname = os.path.dirname(line) > + basename = os.path.basename(line) > + _append(filelist, dirname, basename) > + f.close() > + > + # reformat to match os.walk() > + result = [] > + keys = filelist.keys() > + keys.sort() > + for key in keys: > + values = filelist[key] > + values.sort() > + result.append((key, [], values)) > + return result > + > def fetch_local_list(args, recursive = None): > def _get_filelist_local(loc_list, local_uri, cache): > info(u"Compiling list of local files...") > @@ -156,11 +185,15 @@ def fetch_local_list(args, recursive = None): > if local_uri.isdir(): > local_base = deunicodise(local_uri.basename()) > local_path = deunicodise(local_uri.path()) > - if cfg.follow_symlinks: > - filelist = _fswalk_follow_symlinks(local_path) > + if len(cfg.files_from): > + filelist = _get_filelist_from_file(cfg, local_path) > + single_file = False > else: > - filelist = _fswalk_no_symlinks(local_path) > - single_file = False > + if cfg.follow_symlinks: > + filelist = _fswalk_follow_symlinks(local_path) > + else: > + filelist = _fswalk_no_symlinks(local_path) > + single_file = False > else: > local_base = "" > local_path = deunicodise(local_uri.dirname()) > diff --git a/s3cmd b/s3cmd > index 1aa31ae..c1a1a28 100755 > --- a/s3cmd > +++ b/s3cmd > @@ -1738,6 +1738,7 @@ def main(): > optparser.add_option( "--rinclude", dest="rinclude", action="append", metavar="REGEXP", help="Same as --include but uses REGEXP (regular expression) instead of GLOB") > optparser.add_option( "--rinclude-from", dest="rinclude_from", action="append", metavar="FILE", help="Read --rinclude REGEXPs from FILE") > > + optparser.add_option( "--files-from", dest="files_from", action="append", metavar="FILE", help="Read list of source-file names from FILE") > optparser.add_option( "--bucket-location", dest="bucket_location", help="Datacentre to create bucket in. As of now the datacenters are: US (default), EU, ap-northeast-1, ap-southeast-1, sa-east-1, us-west-1 and us-west-2") > optparser.add_option( "--reduced-redundancy", "--rr", dest="reduced_redundancy", action="store_true", help="Store object with 'Reduced redundancy'. Lower per-GB price. [put, cp, mv]") > > @@ -1910,6 +1911,8 @@ def main(): > > if options.additional_destinations: > cfg.additional_destinations = options.additional_destinations > + if options.files_from: > + cfg.files_from = options.files_from > > ## Set output and filesystem encoding for printing out filenames. > sys.stdout = codecs.getwriter(cfg.encoding)(sys.stdout, "replace") This sounds like it will solve my problem. Please let me know when you have a version of S3cmd with --files-from implemented for sync, and I'll give it a try. John Sauter (Joh...@sy...) |
From: Michael L. <ml...@lo...> - 2013-02-20 06:17:55
|
On 20/02/13 14:07, John Sauter wrote: > On Tue, 2013-02-19 at 16:16 -0600, Matt Domsch wrote: >> On Tue, Feb 19, 2013 at 01:33:35PM -0600, Domsch, Matt wrote: >>> I believe what you are looking for is the rsync equivalent of: >>> >>> $ pushd src >>> $ find . -name \*.gpg > ../files-from.txt >>> $ popd >>> $ rsync --files-from=files-from.txt src dst >>> >>> which copies only the files specified in files-from.txt which was >>> created apriori to the rsync. Correct? >>> >>> >>> Something like the patch below, found at: >>> https://github.com/mdomsch/s3cmd/tree/files-from/ >>> > This sounds like it will solve my problem. Please let me know when you > have a version of S3cmd with --files-from implemented for sync, and I'll > give it a try. Hi Matt Can we have --files-from enabled for stdin? I think it would be a common scenario: find . -name \*.gpg | s3cmd --files-from=- put s3://... Thanks! Michal |
From: Matt D. <Mat...@de...> - 2013-02-20 21:10:59
|
On Tue, Feb 19, 2013 at 11:49:23PM -0600, Michael Ludvig wrote: > Can we have --files-from enabled for stdin? I think it would be a common > scenario: > > find . -name \*.gpg | s3cmd --files-from=- put s3://... Done. I put this in my files-from branch, which is represented in pull request #116. Patch below. Note, this works with sync local2remote, and put. When used with sync remote2local, at present it causes the local (destination) side to think the only files present are those specified in files-from, which is not the behavior we want. We want it to operate on the source side, not the destination side. However, where we get the file lists (local or remote), we don't know know if it's source or destination. Since we want to know that, we should pass src or dest information into fetch_{remote,local}_list() as an argument, and then make the corresponding change to add processing to fetch_remote_list(). Unless there's a better way I'm not seeing... Also note, this is scary to do with any --delete options enabled, as the source list will be exactly the list passed. Thanks, Matt -- Matt Domsch Technology Strategist Dell | Office of the CTO >From b76c5b38457caf93e1c3c16fa5dcbbb1dd9b1709 Mon Sep 17 00:00:00 2001 From: Matt Domsch <Mat...@de...> Date: Wed, 20 Feb 2013 14:11:37 -0600 Subject: [PATCH] accept --files-from=- to read from stdin This allows shell syntax: find . -name \*.gpg | s3cmd sync --files-from=- src dst to take the list of files to transfer from stdin. Be careful, as using with a --delete option will cause files on the remote side not listed in stdin to be deleted too. --- S3/FileLists.py | 14 ++++++++++++-- s3cmd | 2 +- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/S3/FileLists.py b/S3/FileLists.py index fae9004..4b84b08 100644 --- a/S3/FileLists.py +++ b/S3/FileLists.py @@ -14,6 +14,7 @@ from HashCache import HashCache from logging import debug, info, warning, error import os +import sys import glob import copy @@ -150,14 +151,23 @@ def _get_filelist_from_file(cfg, local_path): filelist = {} for fname in cfg.files_from: - f = open(fname, 'r') + if fname == u'-': + f = sys.stdin + else: + try: + f = open(fname, 'r') + except IOError, e: + warning(u"--files-from input file %s could not be opened for reading (%s), skipping." % (fname, e.strerror)) + continue + for line in f: line = line.strip() line = os.path.normpath(os.path.join(local_path, line)) dirname = os.path.dirname(line) basename = os.path.basename(line) _append(filelist, dirname, basename) - f.close() + if f != sys.stdin: + f.close() # reformat to match os.walk() result = [] diff --git a/s3cmd b/s3cmd index c1a1a28..0804db8 100755 --- a/s3cmd +++ b/s3cmd @@ -1738,7 +1738,7 @@ def main(): optparser.add_option( "--rinclude", dest="rinclude", action="append", metavar="REGEXP", help="Same as --include but uses REGEXP (regular expression) instead of GLOB") optparser.add_option( "--rinclude-from", dest="rinclude_from", action="append", metavar="FILE", help="Read --rinclude REGEXPs from FILE") - optparser.add_option( "--files-from", dest="files_from", action="append", metavar="FILE", help="Read list of source-file names from FILE") + optparser.add_option( "--files-from", dest="files_from", action="append", metavar="FILE", help="Read list of source-file names from FILE. Use - to read from stdin.") optparser.add_option( "--bucket-location", dest="bucket_location", help="Datacentre to create bucket in. As of now the datacenters are: US (default), EU, ap-northeast-1, ap-southeast-1, sa-east-1, us-west-1 and us-west-2") optparser.add_option( "--reduced-redundancy", "--rr", dest="reduced_redundancy", action="store_true", help="Store object with 'Reduced redundancy'. Lower per-GB price. [put, cp, mv]") -- 1.8.1.2 |