Re: [S3tools-general] Out of memory: Kill process s3cmd - v1.5.0-beta1

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Thanks for the kudos.  Unfortunately, memory consumption is based on the
number of objects in the trees being synchronized.  On a 32-bit system, it
tends to hit a python MemoryError syncing trees that are ~1M files in size.
 You are hitting a kernel OOM well before that though.  You have several
options available:
1) run on a 64-bit VM with 8+GB RAM (64-bit python is a huge memory hog,
compared to 32-bit python; you have to have 2x RAM on 64-bit python to have
equivalent number of objects as on 32-bit python).
2) split your sync into multiple subtrees (as you have surmised)

There are no significant efforts under way to figure out a better way to
handle this in s3cmd itself, given how python operates.  One option would
be to add in a sqlite on-disk or in-memory database for transient use in
storing and comparing the local and remote file lists, but that's a fairly
heavy undertaking and not one anyone has chosen to develop.

Thanks,
Matt

On Thu, Mar 6, 2014 at 4:18 PM, WagnerOne <wa...@wa...> wrote:

> Hi,
>
> I was recently charged with moving a lot of data (TBs) into s3 and
> discovered the great tool that is s3cmd. It's working well and I like the
> familiar rsync-like interactions.
>
> I'm attempting to use s3cmd to copy a directory with tons of small files
> amounting to about 700GB to s3.
>
> During my tests with ~1GB transfers, things went well. When I got to this
> larger test set, s3cmd worked for upwards of 40 minutes (gathering md5 data
> I assume) on the local data before the kernel killed the process due to
> excessive RAM consumption.
>
> I'm was using an ec2 t1.micro with a NAS NFS mounted to it to transfer
> data to said NAS to s3. The t1.micro had only 500MB of ram, so I bumped it
> to a m3.medium, which has 4 GB of ram.
>
> When I attempted this failed copy with the m3.medium, s3cmd ran about 3x
> longer before being terminated as above.
>
> I was hoping for a painless, big single sync job, but it's looking like I
> might have to write a wrapper to iterate over the big directories I need to
> copy to get them to a more manageable size for s3cmd.
>
> I'm guessing I've hit a limitation of the implementation as it stands
> currently, but wondered if anyone has suggestions in terms of s3cmd itself.
>
> Thanks and thanks for a great tool!
>
> Mike
>
> # s3cmd --version
> s3cmd version 1.5.0-beta1
>
> # time s3cmd sync --verbose --progress content s3://somewhere
> INFO: Compiling list of local files... Killed
>
> real    214m53.181s
> user    8m34.448s
> sys     4m5.803s
>
> # tail /var/log/messages
> xxxx Out of memory: Kill process 1680 (s3cmd) score 948 or sacrifice child
> xxxx Killed process 1680 (s3cmd) total-vm:3942604kB, anon-rss:3755584kB,
> filers:0kB
>
>
> --
> wa...@wa...
> "Linux supports the notion of a command line for the same reason that only
> children read books with only pictures in them."
>
>
>
>
> ------------------------------------------------------------------------------
> Subversion Kills Productivity. Get off Subversion & Make the Move to
> Perforce.
> With Perforce, you get hassle-free workflows. Merge that actually works.
> Faster operations. Version large binaries.  Built-in WAN optimization and
> the
> freedom to use Git, Perforce or both. Make the move to Perforce.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk
> _______________________________________________
> S3tools-general mailing list
> S3t...@li...
> https://lists.sourceforge.net/lists/listinfo/s3tools-general
>
>

Re: [S3tools-general] Out of memory: Kill process s3cmd - v1.5.0-beta1

Command line tool for managing Amazon S3 and CloudFront services

Re: [S3tools-general] Out of memory: Kill process s3cmd - v1.5.0-beta1