From: Quenten G. <QG...@on...> - 2012-04-21 04:19:13
|
Hi All, I've been thinking about this myself quit a bit in the last few months. MooseFS is very simular to Apache HDFS, which uses 128mb chunks instead of 64mb and a single metadata server with metaloggers etc. I mention this is because I've been investigating how the likes of Google, Yahoo etc have been setting up storage and compute clusters. What I've found is very interesting, For example yahoo use 4 x hard disks and 2 x Quad Core CPU's and 2 x 1GbE per node. They have up to 3500 nodes per cluster. Which I think is very interesting way of truly distributing there workload. Why 2x Quad CPU? Well they also use MapReduce (which is basically a distributed compute platform, think of "seti" or "folding at home" projects) So what I've basically found is certainly is "less is more", and as MFS/HDFS is always "limited" to the write speed of a single disk per process which may sound slow to some, however at scale is pretty impressive distributed platform if you think about it, so you're limited to around 50-60mb/s write per disk and reads should be the speed of your replica levels (give or take a bit). So I've set on an idea of well why not commoditise the storage nodes further and basically build them "cheap as possible" without sacrificing to much in the way of reliability e.g.: still use ECC memory or maybe we can build enough safe guards in MFS not even "require" ECC Memory in the storage nodes??? I think separating storage from compute has some significant benefits as well as combining the two so this one is always left up to the individual. But for the sake of what I'm trying to do here is separate the storage from compute in this example. Using the new Rack/Zone method you could build cheaper storage nodes with single power supply's and by using 2 x 1GBE instead of 10GBE or Infiniband you can save yourself some money without sacrificing reliability or performance so my idea was to use yahoo's example, and build 30 nodes with single Power Supply's around 4 or 8GB of RAM and 4 Hard Disks Per Node. For example if you have 20 nodes and 3 x replication and A & B power in your site you would only need to put 10 in Zone 1 and 10 in Zone 2 and set replica level of 3 and you'll always have access to your data. As long as your metadata servers have dual power supply's & ECC memory you should be perfect. Using this method we maybe able to use something like a low power Mini ITX with ECC memory and Integrated CPU ideally with a built in Software KVM/Monitoring Access simular to the Supermicro's motherboards. So what do you all think of this?? I always welcome any input =) Regards, Quenten Grasso -----Original Message----- From: Atom Powers [mailto:ap...@di...] Sent: Saturday, 21 April 2012 1:58 AM To: moo...@li... Subject: Re: [Moosefs-users] Hardware choice On 04/20/2012 04:09 AM, Chris Picton wrote: > I was looking at supermicro chassis and found the following chassis > types which seem to offer highest density: > > 2u: 12x 3.5" http://www.supermicro.com/products/chassis/2U/?chs=827 > > Does anyone have feedback on supermicro/these chassis? I use a lot of SuperMicro kit here. It performs well, is very reliable, and at the right price. (I buy from http://www.siliconmechanics.com/) I have three of the above chassis and a couple older 8-bay systems in my cluster. Because Moose is so good at dealing with system failure but slow to re-balance chunks I would recommend several "smaller" capacity servers over a few very large ones. Even at 10TB per server it takes a very long time to re-balance when I add or remove a system from the cluster; I would avoid going over about 10TB per server. Less is more in this case. -- -- Perfection is just a word I use occasionally with mustard. --Atom Powers-- Director of IT DigiPen Institute of Technology +1 (425) 895-4443 ------------------------------------------------------------------------------ For Developers, A Lot Can Happen In A Second. Boundary is the first to Know...and Tell You. Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! http://p.sf.net/sfu/Boundary-d2dvs2 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |