Re: [Moosefs-users] Small file sizes revisited - 12x space used

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Curious if your underlying filesystem was ZFS (or similar), you could enable compression. I'd guess that chunks that were padded to be 64k, i.e. only 4k of data would be well compressed to near 4k. I haven't tested this, but it would be an interesting work around. Of course you're adding CPU load to your chunk servers by doing this. I'll test this theory at some point since I plan on using compression behind MooseFS anyways.

Ben

On Sep 30, 2011, at 1:44 AM, Patrick Feliciano wrote:

> On 09/28/2011 07:26 PM, Kristofer Pettijohn wrote:
>> GFS2 in Google was redesigned for smaller files.  Multi-master design is needed, but that is a huge overhaul and a lot of work to complete.
>> 
>> Ask and beg for it; you might see it some day.
>> 
> Those are interesting points, that MooseFS has an architecture like 
> GoogleFS and now Google has the GFS2 aka Colossus.  Colossus is designed 
> for smaller files and has a distributed master design.  Maybe that is 
> what MooseFS 2 will work to emulate as well.
>> On Sep 28, 2011, at 8:55 PM, Ken wrote:
>> 
>>> Distribute filesystem always design for huge space. Waste often exist. eg:
>>> Haystack in facebook, GFS in google never recycling space of delete
>>> files, they mark flag for deleted status.
>>> 
> It isn't true that all distributed file systems are designed for huge 
> files.  Lustre for instance uses the block size of the underlying file 
> system.  I disagree that the concept of distributed file systems is 
> synonymous with large files.  That doesn't strike me as a valid reason 
> to dismiss the idea of variable block sizes at compile time.
>>> Much small size files put into moose filesystem cause master server
>>> memory bottleneck.
>>> IMHO, space saving will never be main target in these systems.
>>> 
> My servers can support 148GB of RAM which is enough for hundreds of 
> millions of files.  That would give our site years of growth, I'm not as 
> worried about that as I am about the fact that we only have 10TB of 
> space unused on the web farm that I want to use with MooseFS. With 64KB 
> blocks we will run out of that space well before we reach a hundred 
> million files.  With 3 copies of the data we'd be out already with just 
> the 50 million files  we currently have.
>>> If we must handle much small files, just like photo files, should
>>> bundle them into a big file(s). And use URL locate content, like
>>> '/prefix/bundle_filename/offset/length/check_sum.jpg'.
> That is an interesting idea and I'm not against it if you can tell me 
> what tools will do that and allow me to present it as a standard POSIX 
> filesystem path.   Seems to me though that a smaller block size for this 
> awesome filesystem is still the better fix.
> 
> 
> ------------------------------------------------------------------------------
> All of the data generated in your IT infrastructure is seriously valuable.
> Why? It contains a definitive record of application performance, security
> threats, fraudulent activity, and more. Splunk takes this data and makes
> sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-d2dcopy2
> _______________________________________________
> moosefs-users mailing list
> moo...@li...
> https://lists.sourceforge.net/lists/listinfo/moosefs-users

Re: [Moosefs-users] Small file sizes revisited - 12x space used

Fault tolerant, POSIX-compliant, Net Distributed Storage / File System

Re: [Moosefs-users] Small file sizes revisited - 12x space used