I was recently charged with moving a lot of data (TBs) into s3 and discovered the great tool that is s3cmd. It's working well and I like the familiar rsync-like interactions.
I'm attempting to use s3cmd to copy a directory with tons of small files amounting to about 700GB to s3.
During my tests with ~1GB transfers, things went well. When I got to this larger test set, s3cmd worked for upwards of 40 minutes (gathering md5 data I assume) on the local data before the kernel killed the process due to excessive RAM consumption.
I'm was using an ec2 t1.micro with a NAS NFS mounted to it to transfer data to said NAS to s3. The t1.micro had only 500MB of ram, so I bumped it to a m3.medium, which has 4 GB of ram.
When I attempted this failed copy with the m3.medium, s3cmd ran about 3x longer before being terminated as above.
I was hoping for a painless, big single sync job, but it's looking like I might have to write a wrapper to iterate over the big directories I need to copy to get them to a more manageable size for s3cmd.
I'm guessing I've hit a limitation of the implementation as it stands currently, but wondered if anyone has suggestions in terms of s3cmd itself.
Thanks and thanks for a great tool!
# s3cmd --version
s3cmd version 1.5.0-beta1
INFO: Compiling list of local files... Killed
# tail /var/log/messages
xxxx Out of memory: Kill process 1680 (s3cmd) score 948 or sacrifice child
xxxx Killed process 1680 (s3cmd) total-vm:3942604kB, anon-rss:3755584kB, filers:0kB
"Linux supports the notion of a command line for the same reason that only children read books with only pictures in them."