Thanks for the responses, folks. I appreciate your feedback!

Mike

On Mar 6, 2014, at 7:55 PM, Matt Domsch <matt@domsch.com> wrote:

Thanks for the kudos.  Unfortunately, memory consumption is based on the number of objects in the trees being synchronized.  On a 32-bit system, it tends to hit a python MemoryError syncing trees that are ~1M files in size.  You are hitting a kernel OOM well before that though.  You have several options available:
1) run on a 64-bit VM with 8+GB RAM (64-bit python is a huge memory hog, compared to 32-bit python; you have to have 2x RAM on 64-bit python to have equivalent number of objects as on 32-bit python).
2) split your sync into multiple subtrees (as you have surmised)


There are no significant efforts under way to figure out a better way to handle this in s3cmd itself, given how python operates.  One option would be to add in a sqlite on-disk or in-memory database for transient use in storing and comparing the local and remote file lists, but that's a fairly heavy undertaking and not one anyone has chosen to develop.

Thanks,
Matt




On Thu, Mar 6, 2014 at 4:18 PM, WagnerOne <wagner@wagnerone.com> wrote:
Hi,

I was recently charged with moving a lot of data (TBs) into s3 and discovered the great tool that is s3cmd. It's working well and I like the familiar rsync-like interactions.

I'm attempting to use s3cmd to copy a directory with tons of small files amounting to about 700GB to s3.

During my tests with ~1GB transfers, things went well. When I got to this larger test set, s3cmd worked for upwards of 40 minutes (gathering md5 data I assume) on the local data before the kernel killed the process due to excessive RAM consumption. 

I'm was using an ec2 t1.micro with a NAS NFS mounted to it to transfer data to said NAS to s3. The t1.micro had only 500MB of ram, so I bumped it to a m3.medium, which has 4 GB of ram. 

When I attempted this failed copy with the m3.medium, s3cmd ran about 3x longer before being terminated as above. 

I was hoping for a painless, big single sync job, but it's looking like I might have to write a wrapper to iterate over the big directories I need to copy to get them to a more manageable size for s3cmd. 

I'm guessing I've hit a limitation of the implementation as it stands currently, but wondered if anyone has suggestions in terms of s3cmd itself.

Thanks and thanks for a great tool!

Mike

# s3cmd --version
s3cmd version 1.5.0-beta1

# time s3cmd sync --verbose --progress content s3://somewhere
INFO: Compiling list of local files... Killed

real    214m53.181s
user    8m34.448s
sys     4m5.803s

# tail /var/log/messages
xxxx Out of memory: Kill process 1680 (s3cmd) score 948 or sacrifice child
xxxx Killed process 1680 (s3cmd) total-vm:3942604kB, anon-rss:3755584kB, filers:0kB


-- 
wagner@wagnerone.com
"Linux supports the notion of a command line for the same reason that only children read books with only pictures in them."



------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general


------------------------------------------------------------------------------
Subversion Kills Productivity. Get off Subversion & Make the Move to Perforce.
With Perforce, you get hassle-free workflows. Merge that actually works.
Faster operations. Version large binaries.  Built-in WAN optimization and the
freedom to use Git, Perforce or both. Make the move to Perforce.
http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk_______________________________________________
S3tools-general mailing list
S3tools-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/s3tools-general

-- 
wagner@wagnerone.com
"An inglorious peace is better than a dishonorable war."- Mark Twain