From: Patrick F. <fus...@gm...> - 2011-09-28 20:56:06
|
I'd like to start with how very impressed I am with the MooseFS features and architecture. I even prepared a presentation to sell the benefits of MooseFS for our web services to management. It is the only thing I've found that is easy to manage, easily extendible, with good documentation, has automated replication, fault tolerance, self healing, and POSIX ( a requirement of our design ). Only one problem, many of our files are approx. 4KB. So average space used on MooseFS for that class of files is in excess of 12 times the expected. Now before you reply with the same response I've read in the FAQ and seen in the mailing list archives; I understand that MooseFS was written for large files and that is what it is used for by Gemius. And I've seen that others point to other systems that can handle small files. However none of those systems pointed to have the same feature set as MooseFS. Even if they have extendibility and fault tolerance, none I've seen also present a POSIX file system like we need. Also I agree that the block size should not be a configurable of the compiled FS. There are too many pieces to manage to be worried that you set the right block size configurable on each chunk server and add extra code to deal with variable block sizes in the master etc. Ugh. Mess, I totally agree. But how about at compile time as a option to ./configure ? How about I pick block size then and compile a complete set of master, metalogger, chunk, and client apps and/or RPMs that all have the hardcoded block size I pick then. I would think this change would be much easier to implement. I imagine that a constant would need to be changed somewhere. This would be very good for the spread and reputation of MooseFS, enabling its wider use and adoption as a general purpose DFS, adaptable to suit individual application needs. Also we'd be able to add our website with millions of users to the "Using MooseFS" list. :) So unless someone can point me to something else that REALLY has all of MooseFS's features, including POSIX... Well then, I think it is simply cruel to limit such an amazing tool and exclude those of us who could make such wonderful use of it. Of course, I have the source code and I can try to figure it out myself, but it would be much easier going with your cooperation and guidance. I would be willing to do the implementation myself and contribute it back. Please truly consider this, and if not, please consider at least pointing me to the right places in the source code I should look to implement the changes myself. Thank you very much, Patrick Feliciano Systems Administrator Livemocha, Inc. |
From: Ken <ken...@gm...> - 2011-09-29 01:55:35
|
Distribute filesystem always design for huge space. Waste often exist. eg: Haystack in facebook, GFS in google never recycling space of delete files, they mark flag for deleted status. Much small size files put into moose filesystem cause master server memory bottleneck. IMHO, space saving will never be main target in these systems. If we must handle much small files, just like photo files, should bundle them into a big file(s). And use URL locate content, like '/prefix/bundle_filename/offset/length/check_sum.jpg'. Best Regards -Ken On Thu, Sep 29, 2011 at 4:55 AM, Patrick Feliciano <fus...@gm...> wrote: > > I'd like to start with how very impressed I am with the MooseFS features > and architecture. I even prepared a presentation to sell the benefits > of MooseFS for our web services to management. It is the only thing > I've found that is easy to manage, easily extendible, with good > documentation, has automated replication, fault tolerance, self healing, > and POSIX ( a requirement of our design ). Only one problem, many of > our files are approx. 4KB. So average space used on MooseFS for that > class of files is in excess of 12 times the expected. > > Now before you reply with the same response I've read in the FAQ and > seen in the mailing list archives; I understand that MooseFS was written > for large files and that is what it is used for by Gemius. And I've > seen that others point to other systems that can handle small files. > > However none of those systems pointed to have the same feature set as > MooseFS. Even if they have extendibility and fault tolerance, none I've > seen also present a POSIX file system like we need. > > Also I agree that the block size should not be a configurable of the > compiled FS. There are too many pieces to manage to be worried that you > set the right block size configurable on each chunk server and add extra > code to deal with variable block sizes in the master etc. Ugh. Mess, I > totally agree. > > But how about at compile time as a option to ./configure ? How about I > pick block size then and compile a complete set of master, metalogger, > chunk, and client apps and/or RPMs that all have the hardcoded block > size I pick then. I would think this change would be much easier to > implement. I imagine that a constant would need to be changed somewhere. > > This would be very good for the spread and reputation of MooseFS, > enabling its wider use and adoption as a general purpose DFS, adaptable > to suit individual application needs. Also we'd be able to add our > website with millions of users to the "Using MooseFS" list. :) > > So unless someone can point me to something else that REALLY has all of > MooseFS's features, including POSIX... Well then, I think it is simply > cruel to limit such an amazing tool and exclude those of us who could > make such wonderful use of it. > > Of course, I have the source code and I can try to figure it out myself, > but it would be much easier going with your cooperation and guidance. I > would be willing to do the implementation myself and contribute it back. > > Please truly consider this, and if not, please consider at least > pointing me to the right places in the source code I should look to > implement the changes myself. > > Thank you very much, > > Patrick Feliciano > Systems Administrator > Livemocha, Inc. > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users -Ken |
From: Kristofer P. <kri...@cy...> - 2011-09-29 02:26:09
|
GFS2 in Google was redesigned for smaller files. Multi-master design is needed, but that is a huge overhaul and a lot of work to complete. Ask and beg for it; you might see it some day. On Sep 28, 2011, at 8:55 PM, Ken wrote: > Distribute filesystem always design for huge space. Waste often exist. eg: > Haystack in facebook, GFS in google never recycling space of delete > files, they mark flag for deleted status. > > Much small size files put into moose filesystem cause master server > memory bottleneck. > IMHO, space saving will never be main target in these systems. > > If we must handle much small files, just like photo files, should > bundle them into a big file(s). And use URL locate content, like > '/prefix/bundle_filename/offset/length/check_sum.jpg'. > > > Best Regards > -Ken > > > On Thu, Sep 29, 2011 at 4:55 AM, Patrick Feliciano <fus...@gm...> wrote: >> >> I'd like to start with how very impressed I am with the MooseFS features >> and architecture. I even prepared a presentation to sell the benefits >> of MooseFS for our web services to management. It is the only thing >> I've found that is easy to manage, easily extendible, with good >> documentation, has automated replication, fault tolerance, self healing, >> and POSIX ( a requirement of our design ). Only one problem, many of >> our files are approx. 4KB. So average space used on MooseFS for that >> class of files is in excess of 12 times the expected. >> >> Now before you reply with the same response I've read in the FAQ and >> seen in the mailing list archives; I understand that MooseFS was written >> for large files and that is what it is used for by Gemius. And I've >> seen that others point to other systems that can handle small files. >> >> However none of those systems pointed to have the same feature set as >> MooseFS. Even if they have extendibility and fault tolerance, none I've >> seen also present a POSIX file system like we need. >> >> Also I agree that the block size should not be a configurable of the >> compiled FS. There are too many pieces to manage to be worried that you >> set the right block size configurable on each chunk server and add extra >> code to deal with variable block sizes in the master etc. Ugh. Mess, I >> totally agree. >> >> But how about at compile time as a option to ./configure ? How about I >> pick block size then and compile a complete set of master, metalogger, >> chunk, and client apps and/or RPMs that all have the hardcoded block >> size I pick then. I would think this change would be much easier to >> implement. I imagine that a constant would need to be changed somewhere. >> >> This would be very good for the spread and reputation of MooseFS, >> enabling its wider use and adoption as a general purpose DFS, adaptable >> to suit individual application needs. Also we'd be able to add our >> website with millions of users to the "Using MooseFS" list. :) >> >> So unless someone can point me to something else that REALLY has all of >> MooseFS's features, including POSIX... Well then, I think it is simply >> cruel to limit such an amazing tool and exclude those of us who could >> make such wonderful use of it. >> >> Of course, I have the source code and I can try to figure it out myself, >> but it would be much easier going with your cooperation and guidance. I >> would be willing to do the implementation myself and contribute it back. >> >> Please truly consider this, and if not, please consider at least >> pointing me to the right places in the source code I should look to >> implement the changes myself. >> >> Thank you very much, >> >> Patrick Feliciano >> Systems Administrator >> Livemocha, Inc. >> >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2dcopy1 >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users > > > -Ken > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Patrick F. <fus...@gm...> - 2011-09-30 07:44:57
|
On 09/28/2011 07:26 PM, Kristofer Pettijohn wrote: > GFS2 in Google was redesigned for smaller files. Multi-master design is needed, but that is a huge overhaul and a lot of work to complete. > > Ask and beg for it; you might see it some day. > Those are interesting points, that MooseFS has an architecture like GoogleFS and now Google has the GFS2 aka Colossus. Colossus is designed for smaller files and has a distributed master design. Maybe that is what MooseFS 2 will work to emulate as well. > On Sep 28, 2011, at 8:55 PM, Ken wrote: > >> Distribute filesystem always design for huge space. Waste often exist. eg: >> Haystack in facebook, GFS in google never recycling space of delete >> files, they mark flag for deleted status. >> It isn't true that all distributed file systems are designed for huge files. Lustre for instance uses the block size of the underlying file system. I disagree that the concept of distributed file systems is synonymous with large files. That doesn't strike me as a valid reason to dismiss the idea of variable block sizes at compile time. >> Much small size files put into moose filesystem cause master server >> memory bottleneck. >> IMHO, space saving will never be main target in these systems. >> My servers can support 148GB of RAM which is enough for hundreds of millions of files. That would give our site years of growth, I'm not as worried about that as I am about the fact that we only have 10TB of space unused on the web farm that I want to use with MooseFS. With 64KB blocks we will run out of that space well before we reach a hundred million files. With 3 copies of the data we'd be out already with just the 50 million files we currently have. >> If we must handle much small files, just like photo files, should >> bundle them into a big file(s). And use URL locate content, like >> '/prefix/bundle_filename/offset/length/check_sum.jpg'. That is an interesting idea and I'm not against it if you can tell me what tools will do that and allow me to present it as a standard POSIX filesystem path. Seems to me though that a smaller block size for this awesome filesystem is still the better fix. |
From: Michał B. <mic...@ge...> - 2011-09-30 08:31:27
|
Hi We had some tests with creating big (hundreds of gigabytes in size) truecrypt (http://www.truecrypt.org/) volumes stored in the MooseFS which is seen as one file and underneath you can have as much small files as you want. Truecrypt volumes are easily "rsyncable", so that one minor change causes also a small change only in one part of the file (http://www.rsync.net/resources/howto/windows_truecrypt.html). Though in MooseFS it causes replacement of the whole chunk, but the replaced chunk gets deleted in the background. This way you do not lose space having lots of small files. It is very good solution for read only files, but would need some further performance tests if small files get modified very often. Kind regards Michal Borychowski -----Original Message----- From: Patrick Feliciano [mailto:fus...@gm...] Sent: Friday, September 30, 2011 9:45 AM To: moo...@li... Subject: Re: [Moosefs-users] Small file sizes revisited - 12x space used On 09/28/2011 07:26 PM, Kristofer Pettijohn wrote: > GFS2 in Google was redesigned for smaller files. Multi-master design is needed, but that is a huge overhaul and a lot of work to complete. > > Ask and beg for it; you might see it some day. > Those are interesting points, that MooseFS has an architecture like GoogleFS and now Google has the GFS2 aka Colossus. Colossus is designed for smaller files and has a distributed master design. Maybe that is what MooseFS 2 will work to emulate as well. > On Sep 28, 2011, at 8:55 PM, Ken wrote: > >> Distribute filesystem always design for huge space. Waste often exist. eg: >> Haystack in facebook, GFS in google never recycling space of delete >> files, they mark flag for deleted status. >> It isn't true that all distributed file systems are designed for huge files. Lustre for instance uses the block size of the underlying file system. I disagree that the concept of distributed file systems is synonymous with large files. That doesn't strike me as a valid reason to dismiss the idea of variable block sizes at compile time. >> Much small size files put into moose filesystem cause master server >> memory bottleneck. >> IMHO, space saving will never be main target in these systems. >> My servers can support 148GB of RAM which is enough for hundreds of millions of files. That would give our site years of growth, I'm not as worried about that as I am about the fact that we only have 10TB of space unused on the web farm that I want to use with MooseFS. With 64KB blocks we will run out of that space well before we reach a hundred million files. With 3 copies of the data we'd be out already with just the 50 million files we currently have. >> If we must handle much small files, just like photo files, should >> bundle them into a big file(s). And use URL locate content, like >> '/prefix/bundle_filename/offset/length/check_sum.jpg'. That is an interesting idea and I'm not against it if you can tell me what tools will do that and allow me to present it as a standard POSIX filesystem path. Seems to me though that a smaller block size for this awesome filesystem is still the better fix. ---------------------------------------------------------------------------- -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Allen, B. S <bs...@la...> - 2011-09-30 15:06:31
|
Curious if your underlying filesystem was ZFS (or similar), you could enable compression. I'd guess that chunks that were padded to be 64k, i.e. only 4k of data would be well compressed to near 4k. I haven't tested this, but it would be an interesting work around. Of course you're adding CPU load to your chunk servers by doing this. I'll test this theory at some point since I plan on using compression behind MooseFS anyways. Ben On Sep 30, 2011, at 1:44 AM, Patrick Feliciano wrote: > On 09/28/2011 07:26 PM, Kristofer Pettijohn wrote: >> GFS2 in Google was redesigned for smaller files. Multi-master design is needed, but that is a huge overhaul and a lot of work to complete. >> >> Ask and beg for it; you might see it some day. >> > Those are interesting points, that MooseFS has an architecture like > GoogleFS and now Google has the GFS2 aka Colossus. Colossus is designed > for smaller files and has a distributed master design. Maybe that is > what MooseFS 2 will work to emulate as well. >> On Sep 28, 2011, at 8:55 PM, Ken wrote: >> >>> Distribute filesystem always design for huge space. Waste often exist. eg: >>> Haystack in facebook, GFS in google never recycling space of delete >>> files, they mark flag for deleted status. >>> > It isn't true that all distributed file systems are designed for huge > files. Lustre for instance uses the block size of the underlying file > system. I disagree that the concept of distributed file systems is > synonymous with large files. That doesn't strike me as a valid reason > to dismiss the idea of variable block sizes at compile time. >>> Much small size files put into moose filesystem cause master server >>> memory bottleneck. >>> IMHO, space saving will never be main target in these systems. >>> > My servers can support 148GB of RAM which is enough for hundreds of > millions of files. That would give our site years of growth, I'm not as > worried about that as I am about the fact that we only have 10TB of > space unused on the web farm that I want to use with MooseFS. With 64KB > blocks we will run out of that space well before we reach a hundred > million files. With 3 copies of the data we'd be out already with just > the 50 million files we currently have. >>> If we must handle much small files, just like photo files, should >>> bundle them into a big file(s). And use URL locate content, like >>> '/prefix/bundle_filename/offset/length/check_sum.jpg'. > That is an interesting idea and I'm not against it if you can tell me > what tools will do that and allow me to present it as a standard POSIX > filesystem path. Seems to me though that a smaller block size for this > awesome filesystem is still the better fix. > > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2dcopy2 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Michał B. <mic...@ge...> - 2011-09-30 16:50:34
|
We had some tests with underlying systems of ext3 and FAT32. But didn't try to mount it by multiple clients - I'm not sure if Truecrypt is ready for this. But probably it could be mounted once (by the "main" client) and later other machines could also write to this mount. Probably for data which don't need to be encrypted, Truecrypt won't be the best option (though extra layer of encryption shouldn't hurt). Do you know any other cross-platfrom "volume" filesystem without encryption, but probably with compression? Regards Michal -----Original Message----- From: Allen, Benjamin S [mailto:bs...@la...] Sent: Friday, September 30, 2011 5:06 PM To: Patrick Feliciano Cc: moo...@li... Subject: Re: [Moosefs-users] Small file sizes revisited - 12x space used Curious if your underlying filesystem was ZFS (or similar), you could enable compression. I'd guess that chunks that were padded to be 64k, i.e. only 4k of data would be well compressed to near 4k. I haven't tested this, but it would be an interesting work around. Of course you're adding CPU load to your chunk servers by doing this. I'll test this theory at some point since I plan on using compression behind MooseFS anyways. Ben On Sep 30, 2011, at 1:44 AM, Patrick Feliciano wrote: > On 09/28/2011 07:26 PM, Kristofer Pettijohn wrote: >> GFS2 in Google was redesigned for smaller files. Multi-master design is needed, but that is a huge overhaul and a lot of work to complete. >> >> Ask and beg for it; you might see it some day. >> > Those are interesting points, that MooseFS has an architecture like > GoogleFS and now Google has the GFS2 aka Colossus. Colossus is designed > for smaller files and has a distributed master design. Maybe that is > what MooseFS 2 will work to emulate as well. >> On Sep 28, 2011, at 8:55 PM, Ken wrote: >> >>> Distribute filesystem always design for huge space. Waste often exist. eg: >>> Haystack in facebook, GFS in google never recycling space of delete >>> files, they mark flag for deleted status. >>> > It isn't true that all distributed file systems are designed for huge > files. Lustre for instance uses the block size of the underlying file > system. I disagree that the concept of distributed file systems is > synonymous with large files. That doesn't strike me as a valid reason > to dismiss the idea of variable block sizes at compile time. >>> Much small size files put into moose filesystem cause master server >>> memory bottleneck. >>> IMHO, space saving will never be main target in these systems. >>> > My servers can support 148GB of RAM which is enough for hundreds of > millions of files. That would give our site years of growth, I'm not as > worried about that as I am about the fact that we only have 10TB of > space unused on the web farm that I want to use with MooseFS. With 64KB > blocks we will run out of that space well before we reach a hundred > million files. With 3 copies of the data we'd be out already with just > the 50 million files we currently have. >>> If we must handle much small files, just like photo files, should >>> bundle them into a big file(s). And use URL locate content, like >>> '/prefix/bundle_filename/offset/length/check_sum.jpg'. > That is an interesting idea and I'm not against it if you can tell me > what tools will do that and allow me to present it as a standard POSIX > filesystem path. Seems to me though that a smaller block size for this > awesome filesystem is still the better fix. > > > ---------------------------------------------------------------------------- -- > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2dcopy2 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users ---------------------------------------------------------------------------- -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2dcopy2 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Ken <ken...@gm...> - 2011-10-01 01:04:45
|
On Fri, Sep 30, 2011 at 3:44 PM, Patrick Feliciano <fus...@gm...> wrote: > On 09/28/2011 07:26 PM, Kristofer Pettijohn wrote: >> GFS2 in Google was redesigned for smaller files. Multi-master design is needed, but that is a huge overhaul and a lot of work to complete. >> >> Ask and beg for it; you might see it some day. >> > Those are interesting points, that MooseFS has an architecture like > GoogleFS and now Google has the GFS2 aka Colossus. Colossus is designed > for smaller files and has a distributed master design. Maybe that is > what MooseFS 2 will work to emulate as well. >> On Sep 28, 2011, at 8:55 PM, Ken wrote: >> >>> Distribute filesystem always design for huge space. Waste often exist. eg: >>> Haystack in facebook, GFS in google never recycling space of delete >>> files, they mark flag for deleted status. >>> > It isn't true that all distributed file systems are designed for huge > files. Lustre for instance uses the block size of the underlying file > system. I disagree that the concept of distributed file systems is > synonymous with large files. That doesn't strike me as a valid reason > to dismiss the idea of variable block sizes at compile time. It is true what you said. We have plan to use moosefs for a photo storage which growth is 1 terabyte per week. Sincerely hope moosefs support small files. You know photos are small files. >>> Much small size files put into moose filesystem cause master server >>> memory bottleneck. >>> IMHO, space saving will never be main target in these systems. >>> > My servers can support 148GB of RAM which is enough for hundreds of > millions of files. That would give our site years of growth, I'm not as > worried about that as I am about the fact that we only have 10TB of > space unused on the web farm that I want to use with MooseFS. With 64KB > blocks we will run out of that space well before we reach a hundred > million files. With 3 copies of the data we'd be out already with just > the 50 million files we currently have. Let's count a few. In master server failover, mfsmetarestore should read meta log for building filesystem. Generally read speed can reach 100MB per second, 148G RAM meta data should recover in 148*1024/100=1515 seconds That mean a failure restore more than 25 minute. It's not easy to resolve these problem. In moosefs source core, filesystem.c and chunk.c have been too difficult to understand now. These feature may make it worse. >>> If we must handle much small files, just like photo files, should >>> bundle them into a big file(s). And use URL locate content, like >>> '/prefix/bundle_filename/offset/length/check_sum.jpg'. > That is an interesting idea and I'm not against it if you can tell me > what tools will do that and allow me to present it as a standard POSIX > filesystem path. Seems to me though that a smaller block size for this > awesome filesystem is still the better fix. We have plan to open source these tools. > > > ------------------------------------------------------------------------------ > All of the data generated in your IT infrastructure is seriously valuable. > Why? It contains a definitive record of application performance, security > threats, fraudulent activity, and more. Splunk takes this data and makes > sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunk-d2dcopy2 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > |
From: Kristofer P. <kri...@cy...> - 2011-10-03 19:21:58
|
Distribute filesystem always design for huge space. Waste often exist. eg: >>> >> Haystack in facebook, GFS in google never recycling space of delete >>> >> files, they mark flag for deleted status. >>> >> > It isn't true that all distributed file systems are designed for huge > files. Lustre for instance uses the block size of the underlying file > system. I disagree that the concept of distributed file systems is > synonymous with large files. That doesn't strike me as a valid reason > to dismiss the idea of variable block sizes at compile time. Just for clarification, he said huge space, not huge files. :) |