I was recently charged with moving a lot of data (TBs) into s3 and discovered the great tool that is s3cmd. It's working well and I like the familiar rsync-like interactions.

I'm attempting to use s3cmd to copy a directory with tons of small files amounting to about 700GB to s3.

During my tests with ~1GB transfers, things went well. When I got to this larger test set, s3cmd worked for upwards of 40 minutes (gathering md5 data I assume) on the local data before the kernel killed the process due to excessive RAM consumption. 

I'm was using an ec2 t1.micro with a NAS NFS mounted to it to transfer data to said NAS to s3. The t1.micro had only 500MB of ram, so I bumped it to a m3.medium, which has 4 GB of ram. 

When I attempted this failed copy with the m3.medium, s3cmd ran about 3x longer before being terminated as above. 

I was hoping for a painless, big single sync job, but it's looking like I might have to write a wrapper to iterate over the big directories I need to copy to get them to a more manageable size for s3cmd. 

I'm guessing I've hit a limitation of the implementation as it stands currently, but wondered if anyone has suggestions in terms of s3cmd itself.

Thanks and thanks for a great tool!


# s3cmd --version
s3cmd version 1.5.0-beta1

# time s3cmd sync --verbose --progress content s3://somewhere
INFO: Compiling list of local files... Killed

real    214m53.181s
user    8m34.448s
sys     4m5.803s

# tail /var/log/messages
xxxx Out of memory: Kill process 1680 (s3cmd) score 948 or sacrifice child
xxxx Killed process 1680 (s3cmd) total-vm:3942604kB, anon-rss:3755584kB, filers:0kB

"Linux supports the notion of a command line for the same reason that only children read books with only pictures in them."