From: Matt D. <ma...@do...> - 2014-03-07 01:55:16
|
Thanks for the kudos. Unfortunately, memory consumption is based on the number of objects in the trees being synchronized. On a 32-bit system, it tends to hit a python MemoryError syncing trees that are ~1M files in size. You are hitting a kernel OOM well before that though. You have several options available: 1) run on a 64-bit VM with 8+GB RAM (64-bit python is a huge memory hog, compared to 32-bit python; you have to have 2x RAM on 64-bit python to have equivalent number of objects as on 32-bit python). 2) split your sync into multiple subtrees (as you have surmised) There are no significant efforts under way to figure out a better way to handle this in s3cmd itself, given how python operates. One option would be to add in a sqlite on-disk or in-memory database for transient use in storing and comparing the local and remote file lists, but that's a fairly heavy undertaking and not one anyone has chosen to develop. Thanks, Matt On Thu, Mar 6, 2014 at 4:18 PM, WagnerOne <wa...@wa...> wrote: > Hi, > > I was recently charged with moving a lot of data (TBs) into s3 and > discovered the great tool that is s3cmd. It's working well and I like the > familiar rsync-like interactions. > > I'm attempting to use s3cmd to copy a directory with tons of small files > amounting to about 700GB to s3. > > During my tests with ~1GB transfers, things went well. When I got to this > larger test set, s3cmd worked for upwards of 40 minutes (gathering md5 data > I assume) on the local data before the kernel killed the process due to > excessive RAM consumption. > > I'm was using an ec2 t1.micro with a NAS NFS mounted to it to transfer > data to said NAS to s3. The t1.micro had only 500MB of ram, so I bumped it > to a m3.medium, which has 4 GB of ram. > > When I attempted this failed copy with the m3.medium, s3cmd ran about 3x > longer before being terminated as above. > > I was hoping for a painless, big single sync job, but it's looking like I > might have to write a wrapper to iterate over the big directories I need to > copy to get them to a more manageable size for s3cmd. > > I'm guessing I've hit a limitation of the implementation as it stands > currently, but wondered if anyone has suggestions in terms of s3cmd itself. > > Thanks and thanks for a great tool! > > Mike > > # s3cmd --version > s3cmd version 1.5.0-beta1 > > # time s3cmd sync --verbose --progress content s3://somewhere > INFO: Compiling list of local files... Killed > > real 214m53.181s > user 8m34.448s > sys 4m5.803s > > # tail /var/log/messages > xxxx Out of memory: Kill process 1680 (s3cmd) score 948 or sacrifice child > xxxx Killed process 1680 (s3cmd) total-vm:3942604kB, anon-rss:3755584kB, > filers:0kB > > > -- > wa...@wa... > "Linux supports the notion of a command line for the same reason that only > children read books with only pictures in them." > > > > > ------------------------------------------------------------------------------ > Subversion Kills Productivity. Get off Subversion & Make the Move to > Perforce. > With Perforce, you get hassle-free workflows. Merge that actually works. > Faster operations. Version large binaries. Built-in WAN optimization and > the > freedom to use Git, Perforce or both. Make the move to Perforce. > > http://pubads.g.doubleclick.net/gampad/clk?id=122218951&iu=/4140/ostg.clktrk > _______________________________________________ > S3tools-general mailing list > S3t...@li... > https://lists.sourceforge.net/lists/listinfo/s3tools-general > > |