From: Thomas S H. <tha...@gm...> - 2011-01-18 17:06:36
|
I am architecting a potential 3P MooseFS install.... 3 Peta-bytes... and maybe even 4P (assuming we will be moving to 4T disks when they come out). My question is this, can MooseFS handle that kind of load? Are there any additional considerations that I will need to take as I approach such a high volume. As I am seeing I will have over 100 chunkservers attached to one master. I am going to change out the mfsmaster metadata store with fusionio ( http://www.fusionio.com/) drives to maintain the disk speed that metadata operations will need. This deployment will also require well over 100 million chunks. So my question again is, what, if any, special considerations should I take as I roll this out? -Thomas S Hatch |
From: Michal B. <mic...@ge...> - 2011-01-18 17:39:17
|
WOW!!! And what about FusionIO in all the chunkservers? ;) But as you are talking about FusionIO in mfsmaster so probably you are also going to use them in metaloggers? I just think if it is necessary. Metadata is cached in RAM in mfsmaster, but there are changelogs. If the system will be busy (and for sure it will be) there would be lots of operations logged to the files and transmitted to metaloggers. Please tell us how many files do you plan to store? Regards Michal From: Thomas S Hatch [mailto:tha...@gm...] Sent: Tuesday, January 18, 2011 6:06 PM To: moosefs-users Subject: [Moosefs-users] How big can MooseFS really go? I am architecting a potential 3P MooseFS install.... 3 Peta-bytes... and maybe even 4P (assuming we will be moving to 4T disks when they come out). My question is this, can MooseFS handle that kind of load? Are there any additional considerations that I will need to take as I approach such a high volume. As I am seeing I will have over 100 chunkservers attached to one master. I am going to change out the mfsmaster metadata store with fusionio (http://www.fusionio.com/) drives to maintain the disk speed that metadata operations will need. This deployment will also require well over 100 million chunks. So my question again is, what, if any, special considerations should I take as I roll this out? -Thomas S Hatch |
From: Thomas S H. <tha...@gm...> - 2011-01-18 17:56:15
|
FusionIO in all the chunkservers, thats a little too rich for my blood :) one of the problems we are seeing is that we need to have the failover work faster, and the bottleneck for mfsmetarestore looks like it is IO, and the mfsmetarestore in the failover process is what takes up the most time. Thats why we want the fusionio drives. As for number of files, we have about 13 million right now, but we have only imported a small percentage of the total number of files we are contracted to get (we are making a music "cloud", and we are getting all the music in the world, a lot is still coming). By the end of the year we should have about 100-150 million files. Right now the idea is to have two types of storage, the moose for large scale storage where write speed is not a major issue, and then a "cluster" of ssd pci cards to handle high speed storage needs, like databases, and the master and metalogger to speed up restores and to make sure the changelogs can be written fast enough when the activity is so high. Paint a better picture? -Tom 2011/1/18 Michal Borychowski <mic...@ge...> > WOW!!! > > > > And what about FusionIO in all the chunkservers? ;) But as you are talking > about FusionIO in mfsmaster so probably you are also going to use them in > metaloggers? I just think if it is necessary… Metadata is cached in RAM in > mfsmaster, but there are changelogs… If the system will be busy (and for > sure it will be) there would be lots of operations logged to the files and > transmitted to metaloggers… > > > > Please tell us how many files do you plan to store? > > > > > > Regards > > Michal > > > > > > *From:* Thomas S Hatch [mailto:tha...@gm...] > *Sent:* Tuesday, January 18, 2011 6:06 PM > *To:* moosefs-users > *Subject:* [Moosefs-users] How big can MooseFS really go? > > > > I am architecting a potential 3P MooseFS install.... 3 Peta-bytes... and > maybe even 4P (assuming we will be moving to 4T disks when they come out). > > > > My question is this, can MooseFS handle that kind of load? Are there any > additional considerations that I will need to take as I approach such a high > volume. > > > > As I am seeing I will have over 100 chunkservers attached to one master. I > am going to change out the mfsmaster metadata store with fusionio ( > http://www.fusionio.com/) drives to maintain the disk speed that > metadata operations will need. > > > > This deployment will also require well over 100 million chunks. > > > > So my question again is, what, if any, special considerations should I take > as I roll this out? > > > > -Thomas S Hatch > |
From: Michal B. <mic...@ge...> - 2011-01-21 07:20:47
|
Hi Thomas! Sorry for late reply but the last two days were very busy and full of meetings. We consulted your installation among our devs and admins and to be honest we cannot give you any more tips to what Jose and Reinis have already said. For a start you should have about 64GB RAM available in the master server (probably 48GB would be also fine). Performance is not that affected by the space in the cluster as by the number of files (objects) in the system and number of current operations (recorded in changelogs). As we already talked your files don't change often (if ever) so if you prepare client machines with much RAM quite a lot of files would get cached in RAM on them (see http://www.moosefs.org/news-reader/items/moose-file-system-v-1617-released.h tml). And what server would you use to serve the files? Lighttpd / Apache or any other? We had some plans to implement a direct module in Apache for MooseFS - it should also speed up the communication. Regards Michal From: Thomas S Hatch [mailto:tha...@gm...] Sent: Tuesday, January 18, 2011 6:56 PM To: Michal Borychowski Cc: moosefs-users Subject: Re: [Moosefs-users] How big can MooseFS really go? FusionIO in all the chunkservers, thats a little too rich for my blood :) one of the problems we are seeing is that we need to have the failover work faster, and the bottleneck for mfsmetarestore looks like it is IO, and the mfsmetarestore in the failover process is what takes up the most time. Thats why we want the fusionio drives. As for number of files, we have about 13 million right now, but we have only imported a small percentage of the total number of files we are contracted to get (we are making a music "cloud", and we are getting all the music in the world, a lot is still coming). By the end of the year we should have about 100-150 million files. Right now the idea is to have two types of storage, the moose for large scale storage where write speed is not a major issue, and then a "cluster" of ssd pci cards to handle high speed storage needs, like databases, and the master and metalogger to speed up restores and to make sure the changelogs can be written fast enough when the activity is so high. Paint a better picture? -Tom 2011/1/18 Michal Borychowski <mic...@ge...> WOW!!! And what about FusionIO in all the chunkservers? ;) But as you are talking about FusionIO in mfsmaster so probably you are also going to use them in metaloggers? I just think if it is necessary. Metadata is cached in RAM in mfsmaster, but there are changelogs. If the system will be busy (and for sure it will be) there would be lots of operations logged to the files and transmitted to metaloggers. Please tell us how many files do you plan to store? Regards Michal From: Thomas S Hatch [mailto:tha...@gm...] Sent: Tuesday, January 18, 2011 6:06 PM To: moosefs-users Subject: [Moosefs-users] How big can MooseFS really go? I am architecting a potential 3P MooseFS install.... 3 Peta-bytes... and maybe even 4P (assuming we will be moving to 4T disks when they come out). My question is this, can MooseFS handle that kind of load? Are there any additional considerations that I will need to take as I approach such a high volume. As I am seeing I will have over 100 chunkservers attached to one master. I am going to change out the mfsmaster metadata store with fusionio (http://www.fusionio.com/) drives to maintain the disk speed that metadata operations will need. This deployment will also require well over 100 million chunks. So my question again is, what, if any, special considerations should I take as I roll this out? -Thomas S Hatch |
From: Thomas S H. <tha...@gm...> - 2011-01-21 15:04:14
|
Thanks Michal! I really appreciate your efforts for us, I hope I can give back more to moosefs in the future. we are planning on giving it 128 MB of ram, minimum, and then we are going to toss a fusionio drive into the master and set it up as swap space <just in case> Right now we are serving files via apache, and the web front-end is entirely in django using wsgi as the python interface. We are looking into moving it to nginx, as most of our frontends use nginx, but an apache module would encourage us to stick with apache! But one of the big epiphanies came yesterday when we realized that we can use fusionio drives as swap space, because they perform at almost the speed of ram. We are probably not looking at this for a while, but I am serious about giving you guys a fusionio drive in the future if it will help moosefs development. We did have one question, what are the chances that the master servers exports could be dynamically loaded? This and the client caching we are really counting on! We have been crunching the numbers and it looks like we will be resting somewhere around 3.2 Petabytes when we are done with the build out. We will keep you posted and we deeply appreciate your efforts, not only with MooseFS, but also on our behalf! -Tom 2011/1/21 Michal Borychowski <mic...@ge...> > Hi Thomas! > > > > Sorry for late reply but the last two days were very busy and full of > meetings. > > > > We consulted your installation among our devs and admins and to be honest > we cannot give you any more tips to what Jose and Reinis have already said. > For a start you should have about 64GB RAM available in the master server > (probably 48GB would be also fine). Performance is not that affected by the > space in the cluster as by the number of files (objects) in the system and > number of current operations (recorded in changelogs). > > > > As we already talked your files don’t change often (if ever) so if you > prepare client machines with much RAM quite a lot of files would get cached > in RAM on them (see > http://www.moosefs.org/news-reader/items/moose-file-system-v-1617-released.html). > > > > > And what server would you use to serve the files? Lighttpd / Apache or any > other? We had some plans to implement a direct module in Apache for MooseFS > – it should also speed up the communication. > > > > > > Regards > > Michal > > > > > > > > > > *From:* Thomas S Hatch [mailto:tha...@gm...] > *Sent:* Tuesday, January 18, 2011 6:56 PM > *To:* Michal Borychowski > *Cc:* moosefs-users > *Subject:* Re: [Moosefs-users] How big can MooseFS really go? > > > > FusionIO in all the chunkservers, thats a little too rich for my blood :) > > > > one of the problems we are seeing is that we need to have the failover work > faster, and the bottleneck for mfsmetarestore looks like it is IO, and the > mfsmetarestore in the failover process is what takes up the most time. Thats > why we want the fusionio drives. > > > > As for number of files, we have about 13 million right now, but we have > only imported a small percentage of the total number of files we are > contracted to get (we are making a music "cloud", and we are getting all the > music in the world, a lot is still coming). By the end of the year we should > have about 100-150 million files. > > > > Right now the idea is to have two types of storage, the moose for large > scale storage where write speed is not a major issue, and then a "cluster" > of ssd pci cards to handle high speed storage needs, like databases, and the > master and metalogger to speed up restores and to make sure the changelogs > can be written fast enough when the activity is so high. > > > > Paint a better picture? > > > > -Tom > > 2011/1/18 Michal Borychowski <mic...@ge...> > > WOW!!! > > > > And what about FusionIO in all the chunkservers? ;) But as you are talking > about FusionIO in mfsmaster so probably you are also going to use them in > metaloggers? I just think if it is necessary… Metadata is cached in RAM in > mfsmaster, but there are changelogs… If the system will be busy (and for > sure it will be) there would be lots of operations logged to the files and > transmitted to metaloggers… > > > > Please tell us how many files do you plan to store? > > > > > > Regards > > Michal > > > > > > *From:* Thomas S Hatch [mailto:tha...@gm...] > *Sent:* Tuesday, January 18, 2011 6:06 PM > *To:* moosefs-users > *Subject:* [Moosefs-users] How big can MooseFS really go? > > > > I am architecting a potential 3P MooseFS install.... 3 Peta-bytes... and > maybe even 4P (assuming we will be moving to 4T disks when they come out). > > > > My question is this, can MooseFS handle that kind of load? Are there any > additional considerations that I will need to take as I approach such a high > volume. > > > > As I am seeing I will have over 100 chunkservers attached to one master. I > am going to change out the mfsmaster metadata store with fusionio ( > http://www.fusionio.com/) drives to maintain the disk speed that > metadata operations will need. > > > > This deployment will also require well over 100 million chunks. > > > > So my question again is, what, if any, special considerations should I take > as I roll this out? > > > > -Thomas S Hatch > > > |
From: Michal B. <mic...@ge...> - 2011-01-22 08:07:19
|
Hi Thomas! > As I understand it, when changes are made to the /etc/mfsexports.cfg file you need to restart the mfsmaster daemon. We would like to be able to modify this file and then have the new exports applied without having to restart the mfsmaster daemon. [MB] You should run "mfsmaster reload" or "killall -HUP mfsmaster" > let me know if there is a way to do this already, I was going to see if it would just take a HUP signal once out QA environment is rebuilt, but I don't think it hurts to ask. [MB] For the moment after HUP signal only "mfsexports.cfg" is reloaded - but this is just what you need Regards Michal From: Thomas S Hatch [mailto:tha...@gm...] Sent: Friday, January 21, 2011 4:04 PM To: Michal Borychowski Cc: moosefs-users Subject: Re: [Moosefs-users] How big can MooseFS really go? Thanks Michal! I really appreciate your efforts for us, I hope I can give back more to moosefs in the future. we are planning on giving it 128 MB of ram, minimum, and then we are going to toss a fusionio drive into the master and set it up as swap space <just in case> Right now we are serving files via apache, and the web front-end is entirely in django using wsgi as the python interface. We are looking into moving it to nginx, as most of our frontends use nginx, but an apache module would encourage us to stick with apache! But one of the big epiphanies came yesterday when we realized that we can use fusionio drives as swap space, because they perform at almost the speed of ram. We are probably not looking at this for a while, but I am serious about giving you guys a fusionio drive in the future if it will help moosefs development. We did have one question, what are the chances that the master servers exports could be dynamically loaded? This and the client caching we are really counting on! We have been crunching the numbers and it looks like we will be resting somewhere around 3.2 Petabytes when we are done with the build out. We will keep you posted and we deeply appreciate your efforts, not only with MooseFS, but also on our behalf! -Tom 2011/1/21 Michal Borychowski <mic...@ge...> Hi Thomas! Sorry for late reply but the last two days were very busy and full of meetings. We consulted your installation among our devs and admins and to be honest we cannot give you any more tips to what Jose and Reinis have already said. For a start you should have about 64GB RAM available in the master server (probably 48GB would be also fine). Performance is not that affected by the space in the cluster as by the number of files (objects) in the system and number of current operations (recorded in changelogs). As we already talked your files don't change often (if ever) so if you prepare client machines with much RAM quite a lot of files would get cached in RAM on them (see http://www.moosefs.org/news-reader/items/moose-file-system-v-1617-released.h tml). And what server would you use to serve the files? Lighttpd / Apache or any other? We had some plans to implement a direct module in Apache for MooseFS - it should also speed up the communication. Regards Michal From: Thomas S Hatch [mailto:tha...@gm...] Sent: Tuesday, January 18, 2011 6:56 PM To: Michal Borychowski Cc: moosefs-users Subject: Re: [Moosefs-users] How big can MooseFS really go? FusionIO in all the chunkservers, thats a little too rich for my blood :) one of the problems we are seeing is that we need to have the failover work faster, and the bottleneck for mfsmetarestore looks like it is IO, and the mfsmetarestore in the failover process is what takes up the most time. Thats why we want the fusionio drives. As for number of files, we have about 13 million right now, but we have only imported a small percentage of the total number of files we are contracted to get (we are making a music "cloud", and we are getting all the music in the world, a lot is still coming). By the end of the year we should have about 100-150 million files. Right now the idea is to have two types of storage, the moose for large scale storage where write speed is not a major issue, and then a "cluster" of ssd pci cards to handle high speed storage needs, like databases, and the master and metalogger to speed up restores and to make sure the changelogs can be written fast enough when the activity is so high. Paint a better picture? -Tom 2011/1/18 Michal Borychowski <mic...@ge...> WOW!!! And what about FusionIO in all the chunkservers? ;) But as you are talking about FusionIO in mfsmaster so probably you are also going to use them in metaloggers? I just think if it is necessary. Metadata is cached in RAM in mfsmaster, but there are changelogs. If the system will be busy (and for sure it will be) there would be lots of operations logged to the files and transmitted to metaloggers. Please tell us how many files do you plan to store? Regards Michal From: Thomas S Hatch [mailto:tha...@gm...] Sent: Tuesday, January 18, 2011 6:06 PM To: moosefs-users Subject: [Moosefs-users] How big can MooseFS really go? I am architecting a potential 3P MooseFS install.... 3 Peta-bytes... and maybe even 4P (assuming we will be moving to 4T disks when they come out). My question is this, can MooseFS handle that kind of load? Are there any additional considerations that I will need to take as I approach such a high volume. As I am seeing I will have over 100 chunkservers attached to one master. I am going to change out the mfsmaster metadata store with fusionio (http://www.fusionio.com/) drives to maintain the disk speed that metadata operations will need. This deployment will also require well over 100 million chunks. So my question again is, what, if any, special considerations should I take as I roll this out? -Thomas S Hatch |
From: jose m. <let...@us...> - 2011-01-18 18:24:35
|
El mar, 18-01-2011 a las 10:06 -0700, Thomas S Hatch escribió: > I am architecting a potential 3P MooseFS install.... 3 Peta-bytes... > and maybe even 4P (assuming we will be moving to 4T disks when they > come out). > > > My question is this, can MooseFS handle that kind of load? Are there > any additional considerations that I will need to take as > I approach such a high volume. > > > As I am seeing I will have over 100 chunkservers attached to one > master. I am going to change out the mfsmaster metadata store with > fusionio (http://www.fusionio.com/) drives to maintain the disk speed > that metadata operations will need. > > > This deployment will also require well over 100 million chunks. > > > So my question again is, what, if any, special considerations should I > take as I roll this out? > * Metadata is cached in memory, the network is, IMO, the determinant factor, NIC's 10GE and switch's apilables by means of optical fibre, commodity hardware in chunkserver or metaloggers, another network 100/1GE low cost, for administration or other jobs, mfsmaster in bonding mode 5/6 for backup, in mfsmount tune options of mfscacheto. |
From: jose m. <let...@us...> - 2011-01-18 20:52:48
|
El mar, 18-01-2011 a las 11:58 -0700, Thomas S Hatch escribió: > Thanks jose, yes, we are on 10G networks, it sounds like the primary > cap to worry about might be the ram usage on the mfsmaster. > * standard kernel opensuse 11.3, X86_64 x 2 processors Intel(R) Xeon(R) CPU E5504 @ 2.00GHz top - 20:56:26 up 18 days, 10 min, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 141 total, 1 running, 140 sleeping, 0 stopped, 0 zombie Cpu0 : 0.3%us, 0.3%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.3%hi, 0.0%si, 0.0%st Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16 GiB total , 22.8 % memory used for mfsmaster process +-20.000.000 chunks, +-6.000.000 files. , litte files, goal 3 MTU 9000 , NIC's 2x1GE, bonding mode 5. 20 chunkservers 2x1GE , bonding mode 6. litte sysctl tunning # increase TCP max buffer size net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # increase Linux autotuning TCP buffer limits # min, default, and max number of bytes to use net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 # don't cache ssthresh from previous connection net.ipv4.tcp_no_metrics_save = 1 net.ipv4.tcp_moderate_rcvbuf = 1 # recommended to increase this for 1000 BT or higher net.core.netdev_max_backlog = 2500 # for 10 GigE, use this # net.core.netdev_max_backlog = 30000 # cubic on my system net.ipv4.tcp_congestion_control = cubic # probe swappiness 0 # vm.swappiness=2 vm.vfs_cache_pressure = 10000 * Another, 83 chunkservers 1x10GE, 2x10GE in mfsmaster + 1x1GE opensuse 11.3 recompiled, numa, hig memory. 64GiB Ram, +-40% memory used por mfsmaster process +-100.000.000 chunks, +-30.000.000 litte files, goal 3 * For test purposes http://control.seycob.es:9425 http://control.seycob.es:9426 * in upgrade to 1.2.19 * problems with cpu comsuption on mfs-1.6.19, solved, fine in 1.6.20 * problems with files goal 0, ¿valid copies 1? 4 days not solved. * problems with files goal 1, valid copies 1, apply goal 2 to all filesystem and not reflected. 4 days not solved, in test cluster: http://control.seycob.es:9426/mfs.cgi * poor english, sorry |
From: Thomas S H. <tha...@gm...> - 2011-01-18 21:02:05
|
Don't worry about the English jose, this data is great, you have a few more kernel tuning changes that I had not thought of. Just like my environment, hardly any load on the cpu, your swappiness is lower that I have made mine (10) but 2 may yet be a good way to go. Thanks jose! This is good stuff! -Tom On Tue, Jan 18, 2011 at 1:52 PM, jose maria <let...@us...> wrote: > El mar, 18-01-2011 a las 11:58 -0700, Thomas S Hatch escribió: > > Thanks jose, yes, we are on 10G networks, it sounds like the primary > > cap to worry about might be the ram usage on the mfsmaster. > > > > * standard kernel opensuse 11.3, X86_64 x 2 processors > Intel(R) Xeon(R) CPU E5504 @ 2.00GHz > > top - 20:56:26 up 18 days, 10 min, 1 user, load average: 0.00, 0.00, > 0.00 > Tasks: 141 total, 1 running, 140 sleeping, 0 stopped, 0 zombie > Cpu0 : 0.3%us, 0.3%sy, 0.0%ni, 99.0%id, 0.0%wa, 0.3%hi, 0.0%si, > 0.0%st > Cpu1 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu2 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu3 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu4 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu5 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu6 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > Cpu7 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, > 0.0%st > > Mem: 16 GiB total , 22.8 % memory used for mfsmaster process > +-20.000.000 chunks, +-6.000.000 files. , litte files, goal 3 > > MTU 9000 , NIC's 2x1GE, bonding mode 5. > > 20 chunkservers 2x1GE , bonding mode 6. > > litte sysctl tunning > > # increase TCP max buffer size > net.core.rmem_max = 16777216 > net.core.wmem_max = 16777216 > # increase Linux autotuning TCP buffer limits > # min, default, and max number of bytes to use > net.ipv4.tcp_rmem = 4096 87380 16777216 > net.ipv4.tcp_wmem = 4096 65536 16777216 > # don't cache ssthresh from previous connection > net.ipv4.tcp_no_metrics_save = 1 > net.ipv4.tcp_moderate_rcvbuf = 1 > # recommended to increase this for 1000 BT or higher > net.core.netdev_max_backlog = 2500 > # for 10 GigE, use this > # net.core.netdev_max_backlog = 30000 > # cubic on my system > net.ipv4.tcp_congestion_control = cubic > # probe swappiness 0 > # > vm.swappiness=2 > vm.vfs_cache_pressure = 10000 > > > * Another, 83 chunkservers 1x10GE, > 2x10GE in mfsmaster + 1x1GE > > opensuse 11.3 recompiled, numa, hig memory. > > 64GiB Ram, +-40% memory used por mfsmaster process > +-100.000.000 chunks, +-30.000.000 litte files, goal 3 > > * For test purposes > http://control.seycob.es:9425 > http://control.seycob.es:9426 > > * in upgrade to 1.2.19 > * problems with cpu comsuption on mfs-1.6.19, solved, fine in 1.6.20 > * problems with files goal 0, ¿valid copies 1? 4 days not solved. > * problems with files goal 1, valid copies 1, apply goal 2 to all > filesystem and not reflected. 4 days not solved, in test cluster: > > http://control.seycob.es:9426/mCZ RevoDfs.cgi<http://control.seycob.es:9426/mfs.cgi> > > * poor english, sorry > > > > > > > > ------------------------------------------------------------------------------ > Protect Your Site and Customers from Malware Attacks > Learn about various malware tactics and how to avoid them. Understand > malware threats, the impact they can have on your business, and how you > can protect your company and customers by using code signing. > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |
From: jose m. <let...@us...> - 2011-01-18 21:27:35
|
El mar, 18-01-2011 a las 21:52 +0100, jose maria escribió: > > * in upgrade to 1.2.19 > * Pardon, 1.6.19 |
From: Reinis R. <r...@ro...> - 2011-01-18 19:47:07
|
> This deployment will also require well over 100 million chunks. For that number one thing you should prepare is enough ram for the master server (and depending on the chunkserver count and file goal also on the storage node servers). In our MFS setup we also plan on having 100+ m chunks but now at 40 milion file/chunk progress the master eats about 16G and chunkserver (one of total 6 with goal 3) about 6G - so we will prolly end up having 40 - 50G requirment on master (and about 10+ on chunks) for 100m. Since the master is a single point and no way distributed (except of course you can make seperate filesystems) it can end tricky (in a way of obtaining the right hardware for the job) at some point in case the file count doubles (200m) for example. rr |
From: Thomas S H. <tha...@gm...> - 2011-01-18 19:51:19
|
Thanks Reinis, that is good data to have, I think I am going to move towards upping our total mfsmaster ram as a higher priority. On Tue, Jan 18, 2011 at 12:28 PM, Reinis Rozitis <r...@ro...> wrote: > > This deployment will also require well over 100 million chunks. > > For that number one thing you should prepare is enough ram for the master > server (and depending on the chunkserver count and file goal also on the > storage node servers). > > In our MFS setup we also plan on having 100+ m chunks but now at 40 milion > file/chunk progress the master eats about 16G and chunkserver (one of > total > 6 with goal 3) about 6G - so we will prolly end up having 40 - 50G > requirment on master (and about 10+ on chunks) for 100m. > Since the master is a single point and no way distributed (except of course > you can make seperate filesystems) it can end tricky (in a way of obtaining > the right hardware for the job) at some point in case the file count > doubles > (200m) for example. > > rr > > > > > ------------------------------------------------------------------------------ > Protect Your Site and Customers from Malware Attacks > Learn about various malware tactics and how to avoid them. Understand > malware threats, the impact they can have on your business, and how you > can protect your company and customers by using code signing. > http://p.sf.net/sfu/oracle-sfdevnl > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > |
From: Michal B. <mic...@ge...> - 2011-01-21 07:30:32
|
[MB] Hi Jose! [MB] We consulted your tuning suggestions and here are some of our thoughts [...] litte sysctl tunning # increase TCP max buffer size net.core.rmem_max = 16777216 net.core.wmem_max = 16777216 # increase Linux autotuning TCP buffer limits # min, default, and max number of bytes to use net.ipv4.tcp_rmem = 4096 87380 16777216 net.ipv4.tcp_wmem = 4096 65536 16777216 [MB] As far as we know it should be already tuned in new kernels (ca 2.6.20+) # don't cache ssthresh from previous connection net.ipv4.tcp_no_metrics_save = 1 [MB] We are not sure about this. Doesn't it have a contrary result as slowing down new TCP connections at the beginning? We found out that this function is more for benchmarking and tests. net.ipv4.tcp_moderate_rcvbuf = 1 [MB] As far as we know this is on by default. # recommended to increase this for 1000 BT or higher net.core.netdev_max_backlog = 2500 # for 10 GigE, use this # net.core.netdev_max_backlog = 30000 This would be useful only if you see timeouts in connections. In our environment by 50k connection attempts / sec we do not observe any timeouts. # cubic on my system net.ipv4.tcp_congestion_control = cubic [MB] Similarly on new systems it is by default (eg. Ubuntu 10.X) # probe swappiness 0 # vm.swappiness=2 vm.vfs_cache_pressure = 10000 [MB] If the server has lots of RAM probably it won’t have any bigger effect. We found some more information here: http://www.linuxweblog.com/blogs/sandip/20080331/tuning-tcp-sysctlconf And one simple thing to speed up the chunks is to specify 'noatime' flag in fstab. By default every time a file is accessed the file's inode information is updated to reflect the last access time which incurs a write to the file system metadata. When we set noatime flag there would be no unnecessary write while reading. Regards Michal |
From: jose m. <let...@us...> - 2011-01-21 15:36:00
|
El vie, 21-01-2011 a las 08:30 +0100, Michal Borychowski escribió: > [MB] Hi Jose! > > [MB] We consulted your tuning suggestions and here are some of our thoughts > > [...] > litte sysctl tunning > * The parameters applied to the options are "mere examples" and dependent on characteristics on the cards of network and switch's, etc., they can be increased or diminished according to the observed yield and the characteristics of the hardware, in my experience in two main clusters, I have preferred not to look for the transference limits since they are connected between them and it was causing the loss of constant and random connection of chunkservers, to fit the reconnection times and and beginning in the connections has caused an entire stability in the connections, of course at the cost of a less yield in the tranferencias, but in my case this was not priority. |