yes, --exclude will remove the whole directory (and its child files and subdirs) from the run.  At least from the local os.walk(); it can't do so getting the list from S3. 


On Mon, Mar 10, 2014 at 6:07 PM, WagnerOne <wagner@wagnerone.com> wrote:
I've identified the subdir in my content to be transferred with the huge file count that I need to systematically transfer.

Will --exclude allow me to sync everything but said directory, so I can then work within that subdir or will I hit the same memory related problems?

In other words, if I --exclude something, is exclude entirely during the source discovery stage?

Thanks,
Mike


On Mar 6, 2014, at 7:55 PM, Matt Domsch <matt@domsch.com> wrote:

Thanks for the kudos.  Unfortunately, memory consumption is based on the number of objects in the trees being synchronized.  On a 32-bit system, it tends to hit a python MemoryError syncing trees that are ~1M files in size.  You are hitting a kernel OOM well before that though.  You have several options available:
1) run on a 64-bit VM with 8+GB RAM (64-bit python is a huge memory hog, compared to 32-bit python; you have to have 2x RAM on 64-bit python to have equivalent number of objects as on 32-bit python).
2) split your sync into multiple subtrees (as you have surmised)


There are no significant efforts under way to figure out a better way to handle this in s3cmd itself, given how python operates.  One option would be to add in a sqlite on-disk or in-memory database for transient use in storing and comparing the local and remote file lists, but that's a fairly heavy undertaking and not one anyone has chosen to develop.

Thanks,
Matt




On Thu, Mar 6, 2014 at 4:18 PM, WagnerOne <wagner@wagnerone.com> wrote:
Hi,

I was recently charged with moving a lot of data (TBs) into s3 and discovered the great tool that is s3cmd. It's working well and I like the familiar rsync-like interactions.

I'm attempting to use s3cmd to copy a directory with tons of small files amounting to about 700GB to s3.

During my tests with ~1GB transfers, things went well. When I got to this larger test set, s3cmd worked for upwards of 40 minutes (gathering md5 data I assume) on the local data before the kernel killed the process due to excessive RAM consumption. 

I'm was using an ec2 t1.micro with a NAS NFS mounted to it to transfer data to said NAS to s3. The t1.micro had only 500MB of ram, so I bumped it to a m3.medium, which has 4 GB of ram. 

When I attempted this failed copy with the m3.medium, s3cmd ran about 3x longer before being terminated as above. 

I was hoping for a painless, big single sync job, but it's looking like I might have to write a wrapper to iterate over the big directories I need to copy to get them to a more manageable size for s3cmd. 

I'm guessing I've hit a limitation of the implementation as it stands currently, but wondered if anyone has suggestions in terms of s3cmd itself.

Thanks and thanks for a great tool!

Mike

# s3cmd --version
s3cmd version 1.5.0-beta1

# time s3cmd sync --verbose --progress content s3://somewhere
INFO: Compiling list of local files... Killed

real    214m53.181s
user    8m34.448s
sys     4m5.803s

# tail /var/log/messages
xxxx Out of memory: Kill process 1680 (s3cmd) score 948 or sacrifice child
xxxx Killed process 1680 (s3cmd) total-vm:3942604kB, anon-rss:3755584kB, filers:0kB


-- 
wagner@wagnerone.com
"Linux supports the notion of a command line for the same reason that only children read books with only pictures in them."



------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general


------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

-- 
wagner@wagnerone.com
"Always consider the possibility your assumptions are wrong."-Wheel of Time




------------------------------------------------------------------------------
Learn Graph Databases - Download FREE O'Reilly Book
"Graph Databases" is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/13534_NeoTech
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general