From: Ken <ken...@gm...> - 2011-09-29 01:55:35
|
Distribute filesystem always design for huge space. Waste often exist. eg: Haystack in facebook, GFS in google never recycling space of delete files, they mark flag for deleted status. Much small size files put into moose filesystem cause master server memory bottleneck. IMHO, space saving will never be main target in these systems. If we must handle much small files, just like photo files, should bundle them into a big file(s). And use URL locate content, like '/prefix/bundle_filename/offset/length/check_sum.jpg'. Best Regards -Ken On Thu, Sep 29, 2011 at 4:55 AM, Patrick Feliciano <fus...@gm...> wrote: > > I'd like to start with how very impressed I am with the MooseFS features > and architecture. I even prepared a presentation to sell the benefits > of MooseFS for our web services to management. It is the only thing > I've found that is easy to manage, easily extendible, with good > documentation, has automated replication, fault tolerance, self healing, > and POSIX ( a requirement of our design ). Only one problem, many of > our files are approx. 4KB. So average space used on MooseFS for that > class of files is in excess of 12 times the expected. > > Now before you reply with the same response I've read in the FAQ and > seen in the mailing list archives; I understand that MooseFS was written > for large files and that is what it is used for by Gemius. And I've > seen that others point to other systems that can handle small files. > > However none of those systems pointed to have the same feature set as > MooseFS. Even if they have extendibility and fault tolerance, none I've > seen also present a POSIX file system like we need. > > Also I agree that the block size should not be a configurable of the > compiled FS. There are too many pieces to manage to be worried that you > set the right block size configurable on each chunk server and add extra > code to deal with variable block sizes in the master etc. Ugh. Mess, I > totally agree. > > But how about at compile time as a option to ./configure ? How about I > pick block size then and compile a complete set of master, metalogger, > chunk, and client apps and/or RPMs that all have the hardcoded block > size I pick then. I would think this change would be much easier to > implement. I imagine that a constant would need to be changed somewhere. > > This would be very good for the spread and reputation of MooseFS, > enabling its wider use and adoption as a general purpose DFS, adaptable > to suit individual application needs. Also we'd be able to add our > website with millions of users to the "Using MooseFS" list. :) > > So unless someone can point me to something else that REALLY has all of > MooseFS's features, including POSIX... Well then, I think it is simply > cruel to limit such an amazing tool and exclude those of us who could > make such wonderful use of it. > > Of course, I have the source code and I can try to figure it out myself, > but it would be much easier going with your cooperation and guidance. I > would be willing to do the implementation myself and contribute it back. > > Please truly consider this, and if not, please consider at least > pointing me to the right places in the source code I should look to > implement the changes myself. > > Thank you very much, > > Patrick Feliciano > Systems Administrator > Livemocha, Inc. > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users -Ken |