From: Michał B. <mic...@ge...> - 2011-09-30 08:31:27
|
Hi We had some tests with creating big (hundreds of gigabytes in size) truecrypt (http://www.truecrypt.org/) volumes stored in the MooseFS which is seen as one file and underneath you can have as much small files as you want. Truecrypt volumes are easily "rsyncable", so that one minor change causes also a small change only in one part of the file (http://www.rsync.net/resources/howto/windows_truecrypt.html). Though in MooseFS it causes replacement of the whole chunk, but the replaced chunk gets deleted in the background. This way you do not lose space having lots of small files. It is very good solution for read only files, but would need some further performance tests if small files get modified very often. Kind regards Michal Borychowski -----Original Message----- From: Patrick Feliciano [mailto:fus...@gm...] Sent: Friday, September 30, 2011 9:45 AM To: moo...@li... Subject: Re: [Moosefs-users] Small file sizes revisited - 12x space used On 09/28/2011 07:26 PM, Kristofer Pettijohn wrote: > GFS2 in Google was redesigned for smaller files. Multi-master design is needed, but that is a huge overhaul and a lot of work to complete. > > Ask and beg for it; you might see it some day. > Those are interesting points, that MooseFS has an architecture like GoogleFS and now Google has the GFS2 aka Colossus. Colossus is designed for smaller files and has a distributed master design. Maybe that is what MooseFS 2 will work to emulate as well. > On Sep 28, 2011, at 8:55 PM, Ken wrote: > >> Distribute filesystem always design for huge space. Waste often exist. eg: >> Haystack in facebook, GFS in google never recycling space of delete >> files, they mark flag for deleted status. >> It isn't true that all distributed file systems are designed for huge files. Lustre for instance uses the block size of the underlying file system. I disagree that the concept of distributed file systems is synonymous with large files. That doesn't strike me as a valid reason to dismiss the idea of variable block sizes at compile time. >> Much small size files put into moose filesystem cause master server >> memory bottleneck. >> IMHO, space saving will never be main target in these systems. >> My servers can support 148GB of RAM which is enough for hundreds of millions of files. That would give our site years of growth, I'm not as worried about that as I am about the fact that we only have 10TB of space unused on the web farm that I want to use with MooseFS. With 64KB blocks we will run out of that space well before we reach a hundred million files. With 3 copies of the data we'd be out already with just the 50 million files we currently have. >> If we must handle much small files, just like photo files, should >> bundle them into a big file(s). And use URL locate content, like >> '/prefix/bundle_filename/offset/length/check_sum.jpg'. That is an interesting idea and I'm not against it if you can tell me what tools will do that and allow me to present it as a standard POSIX filesystem path. Seems to me though that a smaller block size for this awesome filesystem is still the better fix. ---------------------------------------------------------------------------- -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |