You can subscribe to this list here.
| 2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 |
Jan
(20) |
Feb
(11) |
Mar
(11) |
Apr
(9) |
May
(22) |
Jun
(85) |
Jul
(94) |
Aug
(80) |
Sep
(72) |
Oct
(64) |
Nov
(69) |
Dec
(89) |
| 2011 |
Jan
(72) |
Feb
(109) |
Mar
(116) |
Apr
(117) |
May
(117) |
Jun
(102) |
Jul
(91) |
Aug
(72) |
Sep
(51) |
Oct
(41) |
Nov
(55) |
Dec
(74) |
| 2012 |
Jan
(45) |
Feb
(77) |
Mar
(99) |
Apr
(113) |
May
(132) |
Jun
(75) |
Jul
(70) |
Aug
(58) |
Sep
(58) |
Oct
(37) |
Nov
(51) |
Dec
(15) |
| 2013 |
Jan
(28) |
Feb
(16) |
Mar
(25) |
Apr
(38) |
May
(23) |
Jun
(39) |
Jul
(42) |
Aug
(19) |
Sep
(41) |
Oct
(31) |
Nov
(18) |
Dec
(18) |
| 2014 |
Jan
(17) |
Feb
(19) |
Mar
(39) |
Apr
(16) |
May
(10) |
Jun
(13) |
Jul
(17) |
Aug
(13) |
Sep
(8) |
Oct
(53) |
Nov
(23) |
Dec
(7) |
| 2015 |
Jan
(35) |
Feb
(13) |
Mar
(14) |
Apr
(56) |
May
(8) |
Jun
(18) |
Jul
(26) |
Aug
(33) |
Sep
(40) |
Oct
(37) |
Nov
(24) |
Dec
(20) |
| 2016 |
Jan
(38) |
Feb
(20) |
Mar
(25) |
Apr
(14) |
May
(6) |
Jun
(36) |
Jul
(27) |
Aug
(19) |
Sep
(36) |
Oct
(24) |
Nov
(15) |
Dec
(16) |
| 2017 |
Jan
(8) |
Feb
(13) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(10) |
Jul
(20) |
Aug
(3) |
Sep
(18) |
Oct
(8) |
Nov
|
Dec
(5) |
| 2018 |
Jan
(15) |
Feb
(9) |
Mar
(12) |
Apr
(7) |
May
(123) |
Jun
(41) |
Jul
|
Aug
(14) |
Sep
|
Oct
(15) |
Nov
|
Dec
(7) |
| 2019 |
Jan
(2) |
Feb
(9) |
Mar
(2) |
Apr
(9) |
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(6) |
Oct
(1) |
Nov
(12) |
Dec
(2) |
| 2020 |
Jan
(2) |
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
(4) |
Jul
(4) |
Aug
(1) |
Sep
(18) |
Oct
(2) |
Nov
|
Dec
|
| 2021 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
(5) |
Oct
(5) |
Nov
(3) |
Dec
|
| 2022 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Fabien G. <fab...@gm...> - 2011-08-13 20:54:05
|
Hi, On Sat, Aug 13, 2011 at 10:37 PM, Stas Oskin <sta...@gm...> wrote: > > Damaged disk is serving by one of your chunkservers, You should remove this >> disk from the chunkserver's mfshdd config file. MFSMaster is not in charge >> to remove disks. >> > > This is clear, but I replaced the disk with a new one - will chunkmaster > see this after restart? > If the disk is mounted and the mountpoint is declared in mfshdd.cfg : yes. MFS starts replication but the replication process will be done slowly (It's >> configurable). When your disk marked damaged you should see many files with >> valid copies less than their goal (You can see them on CGIServer), in >> replication process these files will come back to their goal slowly. >> > > So if I don't see any files below the goal, this means the replication > completed successfully? > Yes. Fabien |
|
From: Stas O. <sta...@gm...> - 2011-08-13 20:37:47
|
Hi. On Sat, Aug 13, 2011 at 8:43 AM, Mostafa Rokooie <mos...@gm...>wrote: > So how to remove the old, "Damaged" disk? Or mfsmaster will just replace it >> with the new one? >> > > Damaged disk is serving by one of your chunkservers, You should remove this > disk from the chunkserver's mfshdd config file. MFSMaster is not in charge > to remove disks. > This is clear, but I replaced the disk with a new one - will chunkmaster see this after restart? > > >> How soon the replication happens - the moment the disk declared as >> damaged? If I don't have any replicates stats below the goal, this means the >> replication is done? >> > > MFS starts replication but the replication process will be done slowly > (It's configurable). When your disk marked damaged you should see many files > with valid copies less than their goal (You can see them on CGIServer), in > replication process these files will come back to their goal slowly. > So if I don't see any files below the goal, this means the replication completed successfully? Regards. |
|
From: Mostafa R. <mos...@gm...> - 2011-08-13 05:43:46
|
> > So how to remove the old, "Damaged" disk? Or mfsmaster will just replace it > with the new one? > Damaged disk is serving by one of your chunkservers, You should remove this disk from the chunkserver's mfshdd config file. MFSMaster is not in charge to remove disks. > How soon the replication happens - the moment the disk declared as damaged? > If I don't have any replicates stats below the goal, this means the > replication is done? > MFS starts replication but the replication process will be done slowly (It's configurable). When your disk marked damaged you should see many files with valid copies less than their goal (You can see them on CGIServer), in replication process these files will come back to their goal slowly. --Mostafa Rokooie |
|
From: Elliot F. <efi...@gm...> - 2011-08-13 03:10:29
|
On Wed, Aug 10, 2011 at 9:11 PM, Robert Sandilands <rsa...@ne...> wrote: > mfsmaster runs on the one chunkserver. The second chunkserver is a dedicated > chunkserver. The third chunkserver also runs mfsmetalogger. The second > chunkserver only has 2.5 million of the 96 million chunks so it is not > contributing much yet. > > On the master: > > The metadata is written on a SATA RAID1 volume. The chunks are stored on a > storage array that is connected via SAS. The only activity on the SATA > volume is the OS, metadata and local syslog logging. There is a second SAS > array that is used to stage files for deduplication. Part of the > deduplication process also moves it to the MooseFS volume. The server is a > dual quad-core 2 GHz Xeon and the average load is generally less than 5. The > deduplication uses a local mfsmount but is the only user of the mount. Although it seems this box should be able to handle the load with no problem, the obvious next step in stabilizing your cluster is to move the mfsmaster onto a box dedicated to the mfsmaster process. It also seems this would be a golden opportunity for the developers to take a look at your box and see why you are getting the client disconnects. If they could figure it out and tweak the code for your box, it would make their own cluster that much more stable. Elliot |
|
From: Stas O. <sta...@gm...> - 2011-08-12 15:40:26
|
Hi. > Absolutely. But there is no metadata to clean, since this is a new empty > disk : mfsmaster will just add it to its pool. > So how to remove the old, "Damaged" disk? Or mfsmaster will just replace it with the new one? > > > A damaged disk is no more used by MooseFS, so the chunks hosted on it are > considered as lost (and will be replicated somewhere else, if the goal asks > for it). > > How soon the replication happens - the moment the disk declared as damaged? If I don't have any replicates stats below the goal, this means the replication is done? Thanks again. |
|
From: Fabien G. <fab...@gm...> - 2011-08-11 13:53:16
|
Hello, On Thu, Aug 11, 2011 at 8:39 AM, Stas Oskin <sta...@gm...> wrote: > > One of our disks has died, and we had to replace it. > > Now I see in MFS CGI the status "damaged" for the disk. Once we bring the > new disk online, and initialize the chunkserver directory on it, will MFS > recognize it, clear out any metadata for this disk and start working with it > normally? > Absolutely. But there is no metadata to clean, since this is a new empty disk : mfsmaster will just add it to its pool. > Also, about the data placed on disk - does MFS starts replicating it in > order to keep the goal value, once the disk becomes in damaged state, or/and > becomes unavailable? > A damaged disk is no more used by MooseFS, so the chunks hosted on it are considered as lost (and will be replicated somewhere else, if the goal asks for it). Fabien |
|
From: Stas O. <sta...@gm...> - 2011-08-11 06:40:34
|
Hi. Our syslogs are full of "testing chunk" messages. Can we reduce the amount of logging to warnings and errors only? Regards. |
|
From: Stas O. <sta...@gm...> - 2011-08-11 06:39:41
|
Hi. One of our disks has died, and we had to replace it. Now I see in MFS CGI the status "damaged" for the disk. Once we bring the new disk online, and initialize the chunkserver directory on it, will MFS recognize it, clear out any metadata for this disk and start working with it normally? Also, about the data placed on disk - does MFS starts replicating it in order to keep the goal value, once the disk becomes in damaged state, or/and becomes unavailable? Regards. |
|
From: Robert S. <rsa...@ne...> - 2011-08-11 03:11:57
|
These logs were from a machine that is only running mfsmount and Apache. Load is generally 10+ with I/O wait in the 40-90% range. It has 4 cores and 8 GB of RAM. It is in a DNS round-robin pool with 4 other similar machines. MooseFS is mounted in fstab using the following command: mfsmount /srv/mfs fuse mfsmaster=mfsmaster,mfsioretries=300,mfsattrcacheto=60,mfsdirentrycacheto=60,mfsentrycacheto=30,_netdev 0 0 Apache has sendfile disabled. The total amount of data transferred through the 5 mfsmounts is slightly more than 1 TB per day. It sounds impressive but it really is only around 13 MB/s. It is extremely rare for the same file to be downloaded twice in a day. Caching folders and their attributes is potentially useful. Caching files is not. mfsmaster runs on the one chunkserver. The second chunkserver is a dedicated chunkserver. The third chunkserver also runs mfsmetalogger. The second chunkserver only has 2.5 million of the 96 million chunks so it is not contributing much yet. On the master: The metadata is written on a SATA RAID1 volume. The chunks are stored on a storage array that is connected via SAS. The only activity on the SATA volume is the OS, metadata and local syslog logging. There is a second SAS array that is used to stage files for deduplication. Part of the deduplication process also moves it to the MooseFS volume. The server is a dual quad-core 2 GHz Xeon and the average load is generally less than 5. The deduplication uses a local mfsmount but is the only user of the mount. Here is the matching logs from the master: Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.65) has been closed by peer Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.102) has been closed by peer Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.14) has been closed by peer Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.14) has been closed by peer Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.102) has been closed by peer Aug 10 22:03:30 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.65) has been closed by peer Aug 10 22:03:39 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.14) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.102) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.65) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.14) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.102) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.65) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.14) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.102) has been closed by peer Aug 10 22:03:41 mfsmaster mfsmaster[xxxxx]: connection with client(ip:xxx.xxx.xxx.65) has been closed by peer Robert On 8/10/11 11:56 AM, Elliot Finley wrote: > On Tue, Aug 9, 2011 at 6:46 PM, Robert Sandilands<rsa...@ne...> wrote: >> Increasing the swap space fixed the fork() issue. It seems that you have to >> ensure that memory available is always double the memory needed by >> mfsmaster. None of the swap space was used over the last 24 hours. >> >> This did solve the extreme comb-like behavior of mfsmaster. It still does >> not resolve its sensitivity to load on the server. I am still seeing >> timeouts on the chunkservers and mounts on the hour due to the high CPU and >> I/O load when the meta data is dumped to disk. It did however decrease >> significantly. >> >> An example from the logs: >> >> Aug 9 04:03:38 http-lb-1 mfsmount[13288]: master: tcp recv error: ETIMEDOUT >> (Operation timed out) (1) >> Aug 9 04:03:39 http-lb-1 mfsmount[13288]: master: register error (read >> header: ETIMEDOUT (Operation timed out)) >> Aug 9 04:03:41 http-lb-1 mfsmount[13288]: registered to master > Are you using this server as a combination mfsmaster/chunkserver/mfsclient? > > If so, is the metadata being written to a spindle(s) that are separate > from what the chunkserver is using? > > How is this box laid out? > > Elliot |
|
From: Elliot F. <efi...@gm...> - 2011-08-10 15:56:44
|
On Tue, Aug 9, 2011 at 6:46 PM, Robert Sandilands <rsa...@ne...> wrote: > Increasing the swap space fixed the fork() issue. It seems that you have to > ensure that memory available is always double the memory needed by > mfsmaster. None of the swap space was used over the last 24 hours. > > This did solve the extreme comb-like behavior of mfsmaster. It still does > not resolve its sensitivity to load on the server. I am still seeing > timeouts on the chunkservers and mounts on the hour due to the high CPU and > I/O load when the meta data is dumped to disk. It did however decrease > significantly. > > An example from the logs: > > Aug 9 04:03:38 http-lb-1 mfsmount[13288]: master: tcp recv error: ETIMEDOUT > (Operation timed out) (1) > Aug 9 04:03:39 http-lb-1 mfsmount[13288]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > Aug 9 04:03:41 http-lb-1 mfsmount[13288]: registered to master Are you using this server as a combination mfsmaster/chunkserver/mfsclient? If so, is the metadata being written to a spindle(s) that are separate from what the chunkserver is using? How is this box laid out? Elliot |
|
From: Robert S. <rsa...@ne...> - 2011-08-10 13:12:04
|
Hi Laurent, Due to the use of ktune a lot of values are already tweaked. For example file-max. I don't have iptables loaded as I measured at some stage that conntrack was -really- slow with large numbers of connections. I am not seeing gc_threshold related log messages but I can't see any reason not to tweak that. Robert On 8/10/11 2:20 AM, Laurent Wandrebeck wrote: > On Tue, 09 Aug 2011 20:46:45 -0400 > Robert Sandilands<rsa...@ne...> wrote: > >> Increasing the swap space fixed the fork() issue. It seems that you have >> to ensure that memory available is always double the memory needed by >> mfsmaster. None of the swap space was used over the last 24 hours. >> >> This did solve the extreme comb-like behavior of mfsmaster. It still >> does not resolve its sensitivity to load on the server. I am still >> seeing timeouts on the chunkservers and mounts on the hour due to the >> high CPU and I/O load when the meta data is dumped to disk. It did >> however decrease significantly. >> >> An example from the logs: >> >> Aug 9 04:03:38 http-lb-1 mfsmount[13288]: master: tcp recv error: >> ETIMEDOUT (Operation timed out) (1) >> Aug 9 04:03:39 http-lb-1 mfsmount[13288]: master: register error (read >> header: ETIMEDOUT (Operation timed out)) >> Aug 9 04:03:41 http-lb-1 mfsmount[13288]: registered to master > Hi, > what if you apply these tweaks to ip stack on master/CS/metaloggers ? > # to avoid problems with heavily loaded servers > echo 16000> /proc/sys/fs/file-max > echo 100000> /proc/sys/net/ipv4/ip_conntrack_max > > # to avoid Neighbour table overflow > echo "512"> /proc/sys/net/ipv4/neigh/default/gc_thresh1 > echo "2048"> /proc/sys/net/ipv4/neigh/default/gc_thresh2 > echo "4048"> /proc/sys/net/ipv4/neigh/default/gc_thresh3 > > No need to restart anything, these can be applied on the fly without > disturbing services. > HTH, > > > ------------------------------------------------------------------------------ > uberSVN's rich system and user administration capabilities and model > configuration take the hassle out of deploying and managing Subversion and > the tools developers use with it. Learn more about uberSVN and get a free > download at: http://p.sf.net/sfu/wandisco-dev2dev > > > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
|
From: Laurent W. <lw...@hy...> - 2011-08-10 11:43:40
|
On Tue, 09 Aug 2011 20:46:45 -0400 Robert Sandilands <rsa...@ne...> wrote: > Increasing the swap space fixed the fork() issue. It seems that you have > to ensure that memory available is always double the memory needed by > mfsmaster. None of the swap space was used over the last 24 hours. > > This did solve the extreme comb-like behavior of mfsmaster. It still > does not resolve its sensitivity to load on the server. I am still > seeing timeouts on the chunkservers and mounts on the hour due to the > high CPU and I/O load when the meta data is dumped to disk. It did > however decrease significantly. > > An example from the logs: > > Aug 9 04:03:38 http-lb-1 mfsmount[13288]: master: tcp recv error: > ETIMEDOUT (Operation timed out) (1) > Aug 9 04:03:39 http-lb-1 mfsmount[13288]: master: register error (read > header: ETIMEDOUT (Operation timed out)) > Aug 9 04:03:41 http-lb-1 mfsmount[13288]: registered to master Hi, what if you apply these tweaks to ip stack on master/CS/metaloggers ? # to avoid problems with heavily loaded servers echo 16000 > /proc/sys/fs/file-max echo 100000 > /proc/sys/net/ipv4/ip_conntrack_max # to avoid Neighbour table overflow echo "512" > /proc/sys/net/ipv4/neigh/default/gc_thresh1 echo "2048" > /proc/sys/net/ipv4/neigh/default/gc_thresh2 echo "4048" > /proc/sys/net/ipv4/neigh/default/gc_thresh3 No need to restart anything, these can be applied on the fly without disturbing services. HTH, -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
|
From: Robert S. <rsa...@ne...> - 2011-08-10 00:46:59
|
Increasing the swap space fixed the fork() issue. It seems that you have to ensure that memory available is always double the memory needed by mfsmaster. None of the swap space was used over the last 24 hours. This did solve the extreme comb-like behavior of mfsmaster. It still does not resolve its sensitivity to load on the server. I am still seeing timeouts on the chunkservers and mounts on the hour due to the high CPU and I/O load when the meta data is dumped to disk. It did however decrease significantly. An example from the logs: Aug 9 04:03:38 http-lb-1 mfsmount[13288]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) Aug 9 04:03:39 http-lb-1 mfsmount[13288]: master: register error (read header: ETIMEDOUT (Operation timed out)) Aug 9 04:03:41 http-lb-1 mfsmount[13288]: registered to master Robert On 8/9/11 12:39 PM, Elliot Finley wrote: > On Tue, Aug 9, 2011 at 5:50 AM, Robert Sandilands<rsa...@ne...> wrote: >> On 8/9/11 12:56 AM, Elliot Finley wrote: >>> Just out of curiosity, what OS are you using? >>> >>> Elliot >> Linux (Centos 5.6 64-bit). >> >> Robert >> > If/when you get the fork working, please let us (the list) know what it took. > > Elliot |
|
From: Elliot F. <efi...@gm...> - 2011-08-09 16:39:58
|
On Tue, Aug 9, 2011 at 5:50 AM, Robert Sandilands <rsa...@ne...> wrote: > On 8/9/11 12:56 AM, Elliot Finley wrote: >> >> Just out of curiosity, what OS are you using? >> >> Elliot > > Linux (Centos 5.6 64-bit). > > Robert > If/when you get the fork working, please let us (the list) know what it took. Elliot |
|
From: Robert S. <rsa...@ne...> - 2011-08-09 12:11:51
|
On 8/9/11 12:56 AM, Elliot Finley wrote:
> On Mon, Aug 8, 2011 at 5:24 PM, Robert Sandilands<rsa...@ne...> wrote:
>> When I run a strace() on mfsmaster on the hour I get the following:
>>
>> rename("changelog.1.mfs", "changelog.2.mfs") = 0
>> rename("changelog.0.mfs", "changelog.1.mfs") = 0
>> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
>> child_tidptr=0x2b571b910b80) = -1 ENOMEM (Cannot allocate memory)
>> rename("metadata.mfs.back", "metadata.mfs.back.tmp") = 0
>> open("metadata.mfs.back", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 11
>>
>> This indicates fork() is failing with an out of memory error. The system has
>> 11 GB cached and 300 MB free. It only has 6 GB of swap. This indicates that
>> clone() ( also known as fork() ) may be trying to test whether the whole
>> process will fit into memory when cloned. Which implies that the memory
>> requirement is actually double than what is commonly believed.
> Just out of curiosity, what OS are you using?
>
> Elliot
Linux (Centos 5.6 64-bit).
Robert
|
|
From: Elliot F. <efi...@gm...> - 2011-08-09 04:56:20
|
On Mon, Aug 8, 2011 at 5:24 PM, Robert Sandilands <rsa...@ne...> wrote:
> When I run a strace() on mfsmaster on the hour I get the following:
>
> rename("changelog.1.mfs", "changelog.2.mfs") = 0
> rename("changelog.0.mfs", "changelog.1.mfs") = 0
> clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
> child_tidptr=0x2b571b910b80) = -1 ENOMEM (Cannot allocate memory)
> rename("metadata.mfs.back", "metadata.mfs.back.tmp") = 0
> open("metadata.mfs.back", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 11
>
> This indicates fork() is failing with an out of memory error. The system has
> 11 GB cached and 300 MB free. It only has 6 GB of swap. This indicates that
> clone() ( also known as fork() ) may be trying to test whether the whole
> process will fit into memory when cloned. Which implies that the memory
> requirement is actually double than what is commonly believed.
Just out of curiosity, what OS are you using?
Elliot
|
|
From: Robert S. <rsa...@ne...> - 2011-08-08 23:25:10
|
When I run a strace() on mfsmaster on the hour I get the following:
rename("changelog.1.mfs", "changelog.2.mfs") = 0
rename("changelog.0.mfs", "changelog.1.mfs") = 0
clone(child_stack=0,
flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
child_tidptr=0x2b571b910b80) = -1 ENOMEM (Cannot allocate memory)
rename("metadata.mfs.back", "metadata.mfs.back.tmp") = 0
open("metadata.mfs.back", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 11
This indicates fork() is failing with an out of memory error. The system
has 11 GB cached and 300 MB free. It only has 6 GB of swap. This
indicates that clone() ( also known as fork() ) may be trying to test
whether the whole process will fit into memory when cloned. Which
implies that the memory requirement is actually double than what is
commonly believed.
I can probably increase swap to make it happy, but that has its own set
of issues and is unlikely to solve much as it will be a similar
situation if mfsmaster starts swapping. Although in theory mfsmaster
should not start swapping as a very low percentage of the forked process
will actually be different than the original one. I am testing my theory ;-)
Robert
On 8/8/11 6:36 PM, Robert Sandilands wrote:
> Or I can log into the system on the hour and see if two processes
> named mfsmaster exists. In my case it does not which may indicate that
> fork() is failing.
>
> Running strace on the single instance of mfsmaster also indicates it
> is busy writing to a file and I can see the the following files:
>
> -rw-r----- 1 daemon daemon 11G Aug 8 18:02 metadata.mfs.back
> -rw-r----- 1 daemon daemon 11G Aug 8 17:02 metadata.mfs.back.tmp
>
> metadata.mfs.back.tmp was deleted several seconds later.
>
> iostat -x also indicates 100% utilization on the volume where the
> meta-data is stored with a very high number of writes.
>
> This leaves me with:
>
> 1. Get a faster disk for doing the metadata backups on (SSD?)
> 2. Figure out why fork() is failing
>
> mfsmaster is the only process using more than 5 GB of RAM on the
> machine (32.6 GB). mfschunkserver uses 4.8 GB. No processes seems to
> be locking any significant amount of memory. The number of processes
> created per second < 1. The machine has 64 GB of RAM.
>
> Robert
>
> On 8/8/11 3:46 PM, Elliot Finley wrote:
>> On Mon, Aug 8, 2011 at 1:33 PM, Elliot
>> Finley<efi...@gm...> wrote:
>>> Attached is a patch for filesystem.c that will indicate in your log
>>> file whether or not the fork was successful. I'd be curious to see
>>> the results.
>> Sorry, that last patch has a small problem, attached is the correct one.
>>
>> Elliot
>
>
|
|
From: Robert S. <rsa...@ne...> - 2011-08-08 22:37:24
|
Or I can log into the system on the hour and see if two processes named mfsmaster exists. In my case it does not which may indicate that fork() is failing. Running strace on the single instance of mfsmaster also indicates it is busy writing to a file and I can see the the following files: -rw-r----- 1 daemon daemon 11G Aug 8 18:02 metadata.mfs.back -rw-r----- 1 daemon daemon 11G Aug 8 17:02 metadata.mfs.back.tmp metadata.mfs.back.tmp was deleted several seconds later. iostat -x also indicates 100% utilization on the volume where the meta-data is stored with a very high number of writes. This leaves me with: 1. Get a faster disk for doing the metadata backups on (SSD?) 2. Figure out why fork() is failing mfsmaster is the only process using more than 5 GB of RAM on the machine (32.6 GB). mfschunkserver uses 4.8 GB. No processes seems to be locking any significant amount of memory. The number of processes created per second < 1. The machine has 64 GB of RAM. Robert On 8/8/11 3:46 PM, Elliot Finley wrote: > On Mon, Aug 8, 2011 at 1:33 PM, Elliot Finley<efi...@gm...> wrote: >> Attached is a patch for filesystem.c that will indicate in your log >> file whether or not the fork was successful. I'd be curious to see >> the results. > Sorry, that last patch has a small problem, attached is the correct one. > > Elliot |
|
From: Fyodor U. <uf...@uf...> - 2011-08-08 21:53:10
|
On 08/08/2011 08:31 PM, Elliot Finley wrote:
> +1000
>
> Stability over everything when it comes to a production storage cluster.
>
> On Mon, Aug 8, 2011 at 12:38 AM, Michal Borychowski
> <mic...@ge...> wrote:
>> Hi!
>>
>> Maybe not stagnation but thorough beta testing... There was one version
>> which we decided not to publish because it was buggy. And now we test the
>> next version. We like to be sure that what we publish is really stable.
>>
>>
>> Kind regards
>> -Michal
Maybe. But there is absolutely no information to understand what to do.
Install the existing version? And, maybe, upgrade to new after week or two?
Wait for the next version? How long? What is new is planned in the new
version? Maybe removing the single point of failure called mfsmaster?
Please, slightly open the veil of secrecy.
WBR,
Fyodor.
|
|
From: Elliot F. <efi...@gm...> - 2011-08-08 19:46:11
|
On Mon, Aug 8, 2011 at 1:33 PM, Elliot Finley <efi...@gm...> wrote: > Attached is a patch for filesystem.c that will indicate in your log > file whether or not the fork was successful. I'd be curious to see > the results. Sorry, that last patch has a small problem, attached is the correct one. Elliot |
|
From: Elliot F. <efi...@gm...> - 2011-08-08 19:33:43
|
On Mon, Aug 8, 2011 at 6:52 AM, Robert Sandilands <rsa...@ne...> wrote: > Hi Michal, > > With a 2 GHz Xeon I am seeing scaling problems when you approach 94 > million files. I had another crash this weekend and had to increase > timeouts yet again. At this stage the master is unresponsive for at > least 5 minutes every hour. The graphs in the CGI look like a comb with > 0 activity on the hour every hour for about 5 minutes. That is except > for CPU usage on the master which spikes to 100% for the same period. We > did see an increase in performance and stability when we moved some > tasks from the master server to other machines but at this stage we > can't move more tasks off the master without buying more hardware. > During the time of 0 activity we see read and write timeouts and the > filesystem is completely unresponsive to users. > > I am convinced that part of the scalability issue is related to the fact > that everything is single threaded and that any single task that can > take a long time has the potential to cause problems affecting > scalability and stability. Robert, Metadata access is single threaded, but at the top of every hour when the metadata is stored, the mfsmaster process is essentially dual-threaded (or more accurately dual-processed). The process forks (or at least tries to) and the metadata is stored in a background process allowing the main process to continue to serve requests. If you only have a single core on your master, then obviously both processes will have to use it and thus it will spike every hour when the metadata is stored, but it should still continue to serve requests. If the 'fork' doesn't happen for any reason then the mfsmaster will stop serving requests and store the metadata, thus pausing all clients regardless of how many cores you have. And finally, if you have multiple cores and the fork works, you *should* be able to store the metadata and continue to serve client requests without a noticeable delay. Attached is a patch for filesystem.c that will indicate in your log file whether or not the fork was successful. I'd be curious to see the results. Elliot |
|
From: Elliot F. <efi...@gm...> - 2011-08-08 17:31:47
|
+1000 Stability over everything when it comes to a production storage cluster. On Mon, Aug 8, 2011 at 12:38 AM, Michal Borychowski <mic...@ge...> wrote: > Hi! > > Maybe not stagnation but thorough beta testing... There was one version > which we decided not to publish because it was buggy. And now we test the > next version. We like to be sure that what we publish is really stable. > > > Kind regards > -Michal > > -----Original Message----- > From: Steve [mailto:st...@bo...] > Sent: Friday, August 05, 2011 3:12 PM > To: moo...@li...; Fyodor Ustinov > Subject: Re: [Moosefs-users] Stagnation? > > > Therefore hopefully we are waiting for a major revision ? > > > > > > > > > > -------Original Message------- > > > > From: Fyodor Ustinov > > Date: 05/08/2011 12:40:14 > > To: moo...@li... > > Subject: [Moosefs-users] Stagnation? > > > > Hi! > > > > In 2010 it was released 7 versions. > > In 2011 - only one, 7 month ago. > > No changes in public git. > > > > WBR, > > Fyodor. > > > > ---------------------------------------------------------------------------- > - > > > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > > The must-attend event for mobile developers. Connect with experts. > > Get tools for creating Super Apps. See the latest technologies. > > Sessions, hands-on labs, demos & much more. Register early & save! > > http://p.sf.net/sfu/rim-blackberry-1 > > _______________________________________________ > > moosefs-users mailing list > > moo...@li... > > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > > > ---------------------------------------------------------------------------- > -- > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > The must-attend event for mobile developers. Connect with experts. > Get tools for creating Super Apps. See the latest technologies. > Sessions, hands-on labs, demos & much more. Register early & save! > http://p.sf.net/sfu/rim-blackberry-1 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > > ------------------------------------------------------------------------------ > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > The must-attend event for mobile developers. Connect with experts. > Get tools for creating Super Apps. See the latest technologies. > Sessions, hands-on labs, demos & much more. Register early & save! > http://p.sf.net/sfu/rim-blackberry-1 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > |
|
From: Robert S. <rsa...@ne...> - 2011-08-08 12:53:15
|
Hi Michal, The paper is an interesting read. I think that the growth of technology is however making it impractical. Both Intel and AMD are working on 20+ core CPU's and there are some non-x86 64-core systems available today. These systems focus on slower cores but many of them. For systems to be able to scale in the future they need to be able to effectively use the hardware that will be available then. Unfortunately (according to the paper) the paradigm that is winning on both the hardware and software fronts are threads. Does threads have problems? Yes, but it may be slightly less problematic than pointers ;-) With a 2 GHz Xeon I am seeing scaling problems when you approach 94 million files. I had another crash this weekend and had to increase timeouts yet again. At this stage the master is unresponsive for at least 5 minutes every hour. The graphs in the CGI look like a comb with 0 activity on the hour every hour for about 5 minutes. That is except for CPU usage on the master which spikes to 100% for the same period. We did see an increase in performance and stability when we moved some tasks from the master server to other machines but at this stage we can't move more tasks off the master without buying more hardware. During the time of 0 activity we see read and write timeouts and the filesystem is completely unresponsive to users. I am convinced that part of the scalability issue is related to the fact that everything is single threaded and that any single task that can take a long time has the potential to cause problems affecting scalability and stability. We still have another approximately 16 TB to move to MooseFS so I do expect us to easily pass the 100 million file mark. As we are deduplicating the files as we move them it is hard to predict how much space/files it will be when we are done. We are also adding more than 4 million files and 2 TB per month (before deduplication). Robert On 8/8/11 2:52 AM, Michal Borychowski wrote: > Hi Robert > > I wrote shortly about multithreading in mfsmaster here: > http://sourceforge.net/mailarchive/message.php?msg_id=26680860 > > So no, it is not on our roadmap. > > And yes, performance of MooseFS is dependent on the performance of > mfsmaster. CPU load depends on amount of operations in the filesystem. In > our environment the master server consumes about 30% of CPU (ca. 1500 > operations per second). HDD doesn't have to be huge, but still it should be > quick for dumps of metadata and continuous saving of changelogs. > > Rough estimate how much RAM you need is here: > http://www.moosefs.org/moosefs-faq.html#sort > > And to be honest metalogger machines should be as good as the master itself > because in case of emergency metalogger should be switched to the role of > the master. > > > Kind regards > -Michal > > > -----Original Message----- > From: Robert Sandilands [mailto:rsa...@ne...] > Sent: Thursday, August 04, 2011 2:42 AM > To: moo...@li... > Subject: [Moosefs-users] mfsmaster performance and hardware > > We have been spending a lot of time trying to get MooseFS stable and > optimized. > > Something I have noticed is that mfsmaster seems to be a bottleneck in > our setup. What I also noticed is that mfsmaster is single threaded. > From reading the source code it seems to use a very interesting polling > loop to handle all communications and actions. > > So a question: Is there anything on the roadmap to make mfsmaster > multithreaded? > > It also seems that the performance of MooseFS is very dependent on the > performance of mfsmaster. If the machine running mfsmaster is slow or is > busy then it can slow everything down significantly or even cause > instability in the file system. > > This also implies that if you want to buy a dedicated machine for > mfsmaster that you have to buy the fastest possible CPU and as much RAM > as you need. Local disk space and multiple CPUs and cores are not > important. Is this correct? What would the recommendation be for an > optimal machine to run mfsmaster? > > Robert > > > ---------------------------------------------------------------------------- > -- > BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA > The must-attend event for mobile developers. Connect with experts. > Get tools for creating Super Apps. See the latest technologies. > Sessions, hands-on labs, demos& much more. Register early& save! > http://p.sf.net/sfu/rim-blackberry-1 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > |
|
From: Michal B. <mic...@ge...> - 2011-08-08 06:53:20
|
Hi Robert I wrote shortly about multithreading in mfsmaster here: http://sourceforge.net/mailarchive/message.php?msg_id=26680860 So no, it is not on our roadmap. And yes, performance of MooseFS is dependent on the performance of mfsmaster. CPU load depends on amount of operations in the filesystem. In our environment the master server consumes about 30% of CPU (ca. 1500 operations per second). HDD doesn't have to be huge, but still it should be quick for dumps of metadata and continuous saving of changelogs. Rough estimate how much RAM you need is here: http://www.moosefs.org/moosefs-faq.html#sort And to be honest metalogger machines should be as good as the master itself because in case of emergency metalogger should be switched to the role of the master. Kind regards -Michal -----Original Message----- From: Robert Sandilands [mailto:rsa...@ne...] Sent: Thursday, August 04, 2011 2:42 AM To: moo...@li... Subject: [Moosefs-users] mfsmaster performance and hardware We have been spending a lot of time trying to get MooseFS stable and optimized. Something I have noticed is that mfsmaster seems to be a bottleneck in our setup. What I also noticed is that mfsmaster is single threaded. From reading the source code it seems to use a very interesting polling loop to handle all communications and actions. So a question: Is there anything on the roadmap to make mfsmaster multithreaded? It also seems that the performance of MooseFS is very dependent on the performance of mfsmaster. If the machine running mfsmaster is slow or is busy then it can slow everything down significantly or even cause instability in the file system. This also implies that if you want to buy a dedicated machine for mfsmaster that you have to buy the fastest possible CPU and as much RAM as you need. Local disk space and multiple CPUs and cores are not important. Is this correct? What would the recommendation be for an optimal machine to run mfsmaster? Robert ---------------------------------------------------------------------------- -- BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA The must-attend event for mobile developers. Connect with experts. Get tools for creating Super Apps. See the latest technologies. Sessions, hands-on labs, demos & much more. Register early & save! http://p.sf.net/sfu/rim-blackberry-1 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
|
From: Michal B. <mic...@ge...> - 2011-08-08 06:38:58
|
Hi! Maybe not stagnation but thorough beta testing... There was one version which we decided not to publish because it was buggy. And now we test the next version. We like to be sure that what we publish is really stable. Kind regards -Michal -----Original Message----- From: Steve [mailto:st...@bo...] Sent: Friday, August 05, 2011 3:12 PM To: moo...@li...; Fyodor Ustinov Subject: Re: [Moosefs-users] Stagnation? Therefore hopefully we are waiting for a major revision ? -------Original Message------- From: Fyodor Ustinov Date: 05/08/2011 12:40:14 To: moo...@li... Subject: [Moosefs-users] Stagnation? Hi! In 2010 it was released 7 versions. In 2011 - only one, 7 month ago. No changes in public git. WBR, Fyodor. ---------------------------------------------------------------------------- - BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA The must-attend event for mobile developers. Connect with experts. Get tools for creating Super Apps. See the latest technologies. Sessions, hands-on labs, demos & much more. Register early & save! http://p.sf.net/sfu/rim-blackberry-1 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users ---------------------------------------------------------------------------- -- BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA The must-attend event for mobile developers. Connect with experts. Get tools for creating Super Apps. See the latest technologies. Sessions, hands-on labs, demos & much more. Register early & save! http://p.sf.net/sfu/rim-blackberry-1 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |