You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(20) |
Feb
(11) |
Mar
(11) |
Apr
(9) |
May
(22) |
Jun
(85) |
Jul
(94) |
Aug
(80) |
Sep
(72) |
Oct
(64) |
Nov
(69) |
Dec
(89) |
2011 |
Jan
(72) |
Feb
(109) |
Mar
(116) |
Apr
(117) |
May
(117) |
Jun
(102) |
Jul
(91) |
Aug
(72) |
Sep
(51) |
Oct
(41) |
Nov
(55) |
Dec
(74) |
2012 |
Jan
(45) |
Feb
(77) |
Mar
(99) |
Apr
(113) |
May
(132) |
Jun
(75) |
Jul
(70) |
Aug
(58) |
Sep
(58) |
Oct
(37) |
Nov
(51) |
Dec
(15) |
2013 |
Jan
(28) |
Feb
(16) |
Mar
(25) |
Apr
(38) |
May
(23) |
Jun
(39) |
Jul
(42) |
Aug
(19) |
Sep
(41) |
Oct
(31) |
Nov
(18) |
Dec
(18) |
2014 |
Jan
(17) |
Feb
(19) |
Mar
(39) |
Apr
(16) |
May
(10) |
Jun
(13) |
Jul
(17) |
Aug
(13) |
Sep
(8) |
Oct
(53) |
Nov
(23) |
Dec
(7) |
2015 |
Jan
(35) |
Feb
(13) |
Mar
(14) |
Apr
(56) |
May
(8) |
Jun
(18) |
Jul
(26) |
Aug
(33) |
Sep
(40) |
Oct
(37) |
Nov
(24) |
Dec
(20) |
2016 |
Jan
(38) |
Feb
(20) |
Mar
(25) |
Apr
(14) |
May
(6) |
Jun
(36) |
Jul
(27) |
Aug
(19) |
Sep
(36) |
Oct
(24) |
Nov
(15) |
Dec
(16) |
2017 |
Jan
(8) |
Feb
(13) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(10) |
Jul
(20) |
Aug
(3) |
Sep
(18) |
Oct
(8) |
Nov
|
Dec
(5) |
2018 |
Jan
(15) |
Feb
(9) |
Mar
(12) |
Apr
(7) |
May
(123) |
Jun
(41) |
Jul
|
Aug
(14) |
Sep
|
Oct
(15) |
Nov
|
Dec
(7) |
2019 |
Jan
(2) |
Feb
(9) |
Mar
(2) |
Apr
(9) |
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(6) |
Oct
(1) |
Nov
(12) |
Dec
(2) |
2020 |
Jan
(2) |
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
(4) |
Jul
(4) |
Aug
(1) |
Sep
(18) |
Oct
(2) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
(5) |
Oct
(5) |
Nov
(3) |
Dec
|
2022 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Ken <ken...@gm...> - 2011-10-18 02:19:04
|
I'm not an expert of moose source code, but some detail maybe help. client communicate with master and chunk server in write like: client -> master: inode, chunk index* # CUTOMA_FUSE_WRITE_CHUNK master -> client: length, chunk id, chunk server location client <-> chunk server: ... client -> master: chunkid, inode, length # CUTOMA_FUSE_WRITE_CHUNK_END master -> client: OK It's a long transaction, then master know file size is growing. * chunk index: is calculate from write offset. -Ken On Fri, Oct 14, 2011 at 6:14 PM, A.S. nagle <a.s...@ho...> wrote: > Dear all > This is nagle, a new bie to MFS, I was wondering the implementation of > append operation in MFS: > Consider about this: > 1) A client want to create a very small file, then master allocates 64KB for > it. > 2) The client writes so much for many times > 3) The size of file reaches the 64KB, but the client still wants to write > much more, i name it as append operation. > 4) In this situation, the client may find that the space is used up, then it > ask the master for much more space, then, what is going on? > Does any one give me a hint, thank you very much, > Best regards, > Nagle, > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |
From: Ken <ken...@gm...> - 2011-10-18 02:03:30
|
It's weird. strace log maybe help to resolve it, command: strace -f -o logfile vhd-util create -n vhd.img -s 2048 -Ken On Wed, Oct 12, 2011 at 3:42 PM, chen guihua <cg...@gm...> wrote: > Hello, > > > my configuration is as follows: > > version:1.6.20 > master: 172.10.18.74 filesytem:ext3 > chunkserver: 172.10.18.75 , filssystem:ext3 , data strorage dir: /data > client: 172.10.18.61 filesystem ext3, mount dir: /usr/local/eucalyptus > i deployed a distributed system, and mount the remote dir to local > /usr/local/eucalyptus successfully, using command > "/usr/local/mfs/bin/mfsmount /usr/local/eucalyptus -H 172.10.18.74" . > > and then i found that if i create a vhd-format file at > /usr/local/eucalyptus, it failed(becase i found the file size is 0, through > command "ls -lh" ), using command "vhd-util create -n vhd.img -s 2048" . > however if i umount the /usr/local/eucalyptus and create vhd-format file at > the same dir , it succesed. > > i don't know whether the mooseFS support the VHD file for create /write/read > or not ??, and if not how do i solve my problem with create vhd-format file > using MFS fs. > THK. > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |
From: Robert S. <rsa...@ne...> - 2011-10-17 17:43:21
|
Around 42 GB. mfschunkserver also seems to use about 6 GB of RAM for 32 million chunks. Robert On 10/17/11 10:19 AM, Elliot Finley wrote: > 2011/10/17 Robert Sandilands<rsa...@ne...>: >> The new master has 72 GB of RAM and it currently >> has 125 million files. > Just out of curiosity (and to plan my mfsmaster upgrade), how much RAM > does the mfsmaster process use for 125 million files? > > Thanks, > Elliot |
From: Davies L. <dav...@gm...> - 2011-10-17 16:56:33
|
I have created one in Go: https://github.com/davies/go-mfsclient/blob/master/mfsserver.go 2011/10/17 Michał Borychowski <mic...@ge...>: > Hi! > > > > We thought about something like this - probably we'd create a "direct" web > server (not a plugin for Apache, not a plugin for nginx). This would be the > quickest. But I cannot tell when. > > > > > > Kind regards > > Michał Borychowski > > MooseFS Support Manager > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > Gemius S.A. > > ul. Wołoska 7, 02-672 Warszawa > > Budynek MARS, klatka D > > Tel.: +4822 874-41-00 > > Fax : +4822 874-41-01 > > > > From: yezhou(Joe) [mailto:fan...@gm...] > Sent: Sunday, October 16, 2011 9:01 AM > To: moo...@li... > Subject: [Moosefs-users] does moosefs have a plugin for nginx? > > > > Hi, > > > > Lots of network filesystems have plugin set up for nginx to gain a better > performence. > > > > when moosefs will consider to have one? > > > > Fuse is really slow for lots of simultaneous reads or writes. > > > > > > > > thanks! > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > -- - Davies |
From: <leo...@ar...> - 2011-10-17 15:03:01
|
Greetings! Resently I've started performance benchmarking for MooseFS, _____________ __ _ _ Chunkservers: CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz RAM: 12G HDD: storage used for mfs chunks is raid0 of two WD 7200rpm caviar black disks. OS: OpenSuSE 11.3 Chunk servers number: 5 ------------------------------- Master: CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz OS: OpenSuSE 11.3 RAM: 12G -------------------------------- Client: One of chunkservers. --------------------------------------------------- LAN: 1Gbit/s LAN. ____________ _ _ __ Problem; I've experimented alot with bonnie, fio, iozone and others in testing other storages... We have people working with the source code here, so we need good random I/O for small files with moderate blocksize from 8k to 128k... Comparative testing of other storage solutions involving ZFS, different hardware raids etc showed that simple taring and untaring Linux kernel sources shows how good can the storage be for that kind of work... and it always correlates with more advanced fio and iozone tests. So, simple untaring Linux kernel sources takes about 7 seconds on chunkservers storage bypassing moosefs... but when I untar it to mounted moosefs it takes more than 230 seconds. Goal is set to 1. CPU load is OK on all the servers, RAM is sufficient. Network is not overloaded and I can untar this file in about7 secs to our nfs-mounted NAS.... I even turned on files caching on the client. And this is all is very strange... maybe fuse is the bottleneck?... |
From: Elliot F. <efi...@gm...> - 2011-10-17 14:20:03
|
2011/10/17 Robert Sandilands <rsa...@ne...>: > The new master has 72 GB of RAM and it currently > has 125 million files. Just out of curiosity (and to plan my mfsmaster upgrade), how much RAM does the mfsmaster process use for 125 million files? Thanks, Elliot |
From: Robert S. <rsa...@ne...> - 2011-10-17 13:34:30
|
Hi Michal, The machine never used the swap. I verified that over several days using sar. It just needed the swap to be able to successfully fork. It seems like Linux will fail to fork if there is not enough space to keep a complete copy of the application forking even though it won't use the memory. The Linux kernel refuses to over-subscribe the memory. I verified this by looking at the fork() code in the kernel. There is a check that verifies the current amount of used memory plus the size of the application forking is less than the memory available. If that is not the case the fork() call fails. I agree that using swap should be a last desperate measure and no production system should depend on swap to operate. Even on a much faster dedicated master with significantly more RAM we still see timeouts. It just seems to be limited to around 1 minute every hour and not 5 minutes every hour. The new master has 72 GB of RAM and it currently has 125 million files. This has improved stability and has allowed me to focus on other bottlenecks in mfsmount and mfschunkserver. Robert On 10/17/11 9:00 AM, Michał Borychowski wrote: > > Hi! > > Again, this is not that easy so state that you need to double the > memory needed by mfsmaster. Fork doesn't copy the whole memory > occupied by the process. Memory used by both processes is in "copy on > write" state and you need only space for "differences". We estimate > that for the master which makes lots of operations it would be > neccessary to have 30-40% extra of memory normally used by the process. > > And in the long run increasing swap is not good. When master starts to > use it too much during saves, it may happen that the whole system will > hung up. Probably that's why you have these timeouts. To be honest you > should increase physical RAM and not the swap. (We had 16GB RAM and it > started to be not enough when master needed 13GB - we had to put more > RAM then). > > Kind regards > > Michał Borychowski > > MooseFS Support Manager > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > Gemius S.A. > > ul. Wołoska 7, 02-672 Warszawa > > Budynek MARS, klatka D > > Tel.: +4822 874-41-00 > > Fax : +4822 874-41-01 > > *From:*Robert Sandilands [mailto:rsa...@ne...] > *Sent:* Wednesday, August 10, 2011 3:12 PM > *To:* moo...@li... > *Subject:* Re: [Moosefs-users] mfsmaster performance and hardware > > Hi Laurent, > > Due to the use of ktune a lot of values are already tweaked. For > example file-max. I don't have iptables loaded as I measured at some > stage that conntrack was -really- slow with large numbers of connections. > > I am not seeing gc_threshold related log messages but I can't see any > reason not to tweak that. > > Robert > > On 8/10/11 2:20 AM, Laurent Wandrebeck wrote: > > On Tue, 09 Aug 2011 20:46:45 -0400 > Robert Sandilands<rsa...@ne...> <mailto:rsa...@ne...> wrote: > > > Increasing the swap space fixed the fork() issue. It seems that you have > > to ensure that memory available is always double the memory needed by > > mfsmaster. None of the swap space was used over the last 24 hours. > > > > This did solve the extreme comb-like behavior of mfsmaster. It still > > does not resolve its sensitivity to load on the server. I am still > > seeing timeouts on the chunkservers and mounts on the hour due to the > > high CPU and I/O load when the meta data is dumped to disk. It did > > however decrease significantly. > > > > An example from the logs: > > > > Aug 9 04:03:38 http-lb-1 mfsmount[13288]: master: tcp recv error: > > ETIMEDOUT (Operation timed out) (1) > > Aug 9 04:03:39 http-lb-1 mfsmount[13288]: master: register error (read > > header: ETIMEDOUT (Operation timed out)) > > Aug 9 04:03:41 http-lb-1 mfsmount[13288]: registered to master > > > Hi, > what if you apply these tweaks to ip stack on master/CS/metaloggers ? > # to avoid problems with heavily loaded servers > echo 16000> /proc/sys/fs/file-max > echo 100000> /proc/sys/net/ipv4/ip_conntrack_max > > # to avoid Neighbour table overflow > echo "512"> /proc/sys/net/ipv4/neigh/default/gc_thresh1 > echo "2048"> /proc/sys/net/ipv4/neigh/default/gc_thresh2 > echo "4048"> /proc/sys/net/ipv4/neigh/default/gc_thresh3 > > No need to restart anything, these can be applied on the fly without > disturbing services. > HTH, > > > > > ------------------------------------------------------------------------------ > uberSVN's rich system and user administration capabilities and model > configuration take the hassle out of deploying and managing Subversion and > the tools developers use with it. Learn more about uberSVN and get a free > download at:http://p.sf.net/sfu/wandisco-dev2dev > > > > > _______________________________________________ > moosefs-users mailing list > moo...@li... <mailto:moo...@li...> > https://lists.sourceforge.net/lists/listinfo/moosefs-users > |
From: Michał B. <mic...@ge...> - 2011-10-17 13:33:51
|
Hi! We thought about something like this - probably we'd create a "direct" web server (not a plugin for Apache, not a plugin for nginx). This would be the quickest. But I cannot tell when. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: yezhou(Joe) [mailto:fan...@gm...] Sent: Sunday, October 16, 2011 9:01 AM To: moo...@li... Subject: [Moosefs-users] does moosefs have a plugin for nginx? Hi, Lots of network filesystems have plugin set up for nginx to gain a better performence. when moosefs will consider to have one? Fuse is really slow for lots of simultaneous reads or writes. thanks! |
From: Michał B. <mic...@ge...> - 2011-10-17 13:20:21
|
Hi Christian! Please have a look at this entry: http://www.moosefs.org/news-reader/items/moose-file-system-v-1617-released.h tml - it should explain how the cache works in MooseFS. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 -----Original Message----- From: Christian Seipel [mailto:chr...@se...] Sent: Thursday, October 13, 2011 7:46 AM To: moo...@li... Subject: [Moosefs-users] Questions about data caching Hi, in man pages I read that the mfsmount option mfscachefiles controls preserving file data in cache. My first question is, which file data cache is meant, the page cache of the kernel or an other MooseFS specific cache? In MooseFS version 1.6.17 was an automatic data cache management introduced. My second question is, if with this data cache is meant the same cache as described above (kernel or MooseFS cache)? Regards Christian S. ---------------------------------------------------------------------------- -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Michał B. <mic...@ge...> - 2011-10-17 13:03:13
|
Hi! Again, this is not that easy so state that you need to double the memory needed by mfsmaster. Fork doesn't copy the whole memory occupied by the process. Memory used by both processes is in "copy on write" state and you need only space for "differences". We estimate that for the master which makes lots of operations it would be neccessary to have 30-40% extra of memory normally used by the process. And in the long run increasing swap is not good. When master starts to use it too much during saves, it may happen that the whole system will hung up. Probably that's why you have these timeouts. To be honest you should increase physical RAM and not the swap. (We had 16GB RAM and it started to be not enough when master needed 13GB - we had to put more RAM then). Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: Robert Sandilands [mailto:rsa...@ne...] Sent: Wednesday, August 10, 2011 3:12 PM To: moo...@li... Subject: Re: [Moosefs-users] mfsmaster performance and hardware Hi Laurent, Due to the use of ktune a lot of values are already tweaked. For example file-max. I don't have iptables loaded as I measured at some stage that conntrack was -really- slow with large numbers of connections. I am not seeing gc_threshold related log messages but I can't see any reason not to tweak that. Robert On 8/10/11 2:20 AM, Laurent Wandrebeck wrote: On Tue, 09 Aug 2011 20:46:45 -0400 Robert Sandilands <mailto:rsa...@ne...> <rsa...@ne...> wrote: Increasing the swap space fixed the fork() issue. It seems that you have to ensure that memory available is always double the memory needed by mfsmaster. None of the swap space was used over the last 24 hours. This did solve the extreme comb-like behavior of mfsmaster. It still does not resolve its sensitivity to load on the server. I am still seeing timeouts on the chunkservers and mounts on the hour due to the high CPU and I/O load when the meta data is dumped to disk. It did however decrease significantly. An example from the logs: Aug 9 04:03:38 http-lb-1 mfsmount[13288]: master: tcp recv error: ETIMEDOUT (Operation timed out) (1) Aug 9 04:03:39 http-lb-1 mfsmount[13288]: master: register error (read header: ETIMEDOUT (Operation timed out)) Aug 9 04:03:41 http-lb-1 mfsmount[13288]: registered to master Hi, what if you apply these tweaks to ip stack on master/CS/metaloggers ? # to avoid problems with heavily loaded servers echo 16000 > /proc/sys/fs/file-max echo 100000 > /proc/sys/net/ipv4/ip_conntrack_max # to avoid Neighbour table overflow echo "512" > /proc/sys/net/ipv4/neigh/default/gc_thresh1 echo "2048" > /proc/sys/net/ipv4/neigh/default/gc_thresh2 echo "4048" > /proc/sys/net/ipv4/neigh/default/gc_thresh3 No need to restart anything, these can be applied on the fly without disturbing services. HTH, ---------------------------------------------------------------------------- -- uberSVN's rich system and user administration capabilities and model configuration take the hassle out of deploying and managing Subversion and the tools developers use with it. Learn more about uberSVN and get a free download at: http://p.sf.net/sfu/wandisco-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: yezhou(Joe) <fan...@gm...> - 2011-10-16 07:00:51
|
Hi, Lots of network filesystems have plugin set up for nginx to gain a better performence. when moosefs will consider to have one? Fuse is really slow for lots of simultaneous reads or writes. thanks! |
From: A.S. n. <a.s...@ho...> - 2011-10-14 10:14:41
|
Dear all This is nagle, a new bie to MFS, I was wondering the implementation of append operation in MFS:Consider about this:1) A client want to create a very small file, then master allocates 64KB for it.2) The client writes so much for many times3) The size of file reaches the 64KB, but the client still wants to write much more, i name it as append operation.4) In this situation, the client may find that the space is used up, then it ask the master for much more space, then, what is going on? Does any one give me a hint, thank you very much,Best regards,Nagle, |
From: Christian S. <chr...@se...> - 2011-10-13 05:46:25
|
Hi, in man pages I read that the mfsmount option mfscachefiles controls preserving file data in cache. My first question is, which file data cache is meant, the page cache of the kernel or an other MooseFS specific cache? In MooseFS version 1.6.17 was an automatic data cache management introduced. My second question is, if with this data cache is meant the same cache as described above (kernel or MooseFS cache)? Regards Christian S. |
From: chen g. <cg...@gm...> - 2011-10-12 07:42:27
|
Hello, my configuration is as follows: version:1.6.20 master: 172.10.18.74 filesytem:ext3 chunkserver: 172.10.18.75 , filssystem:ext3 , data strorage dir: /data client: 172.10.18.61 filesystem ext3, mount dir: /usr/local/eucalyptus i deployed a distributed system, and mount the remote dir to local /usr/local/eucalyptus successfully, using command "/usr/local/mfs/bin/mfsmount /usr/local/eucalyptus -H 172.10.18.74" . and then i found that if i create a vhd-format file at /usr/local/eucalyptus, it failed(becase i found the file size is 0, through command "ls -lh" ), using command "vhd-util create -n vhd.img -s 2048" . however if i umount the /usr/local/eucalyptus and create vhd-format file at the same dir , it succesed. i don't know whether the mooseFS support the VHD file for create /write/read or not ??, and if not how do i solve my problem with create vhd-format file using MFS fs. THK. |
From: Robert S. <rsa...@ne...> - 2011-10-06 11:16:20
|
Hi Michal, I understand open is complex but as a later email I wrote shows the limit seems to be in the fact that mfsmount uses a single socket to mfsmaster and this socket is a significant bottleneck. Another bottleneck seems to be in mfschunkserver where more than around 10 simultaneous file accesses will cause significant deterioration of performance. Any advice on mitigating or fixing these two bottlenecks would be highly appreciated. Robert On 10/6/11 3:25 AM, Michał Borychowski wrote: > > Hi Robert! > > This is normal behaviour - "open" makes several things on the master, > among others some lookups and the open itself. We tried to introduce > here folder and attribute cache but it didn't work as expected. We > could mark some files as "immutable" and keep the cache in the > mfsmount, but these files had to be read only which is rather not > acceptable. > > If you have any ideas how to speed things up, go ahead :) > > Kind regards > > Michał Borychowski > > MooseFS Support Manager > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > Gemius S.A. > > ul. Wołoska 7, 02-672 Warszawa > > Budynek MARS, klatka D > > Tel.: +4822 874-41-00 > > Fax : +4822 874-41-01 > > *From:*Robert Sandilands [mailto:rsa...@ne...] > *Sent:* Wednesday, August 31, 2011 2:54 AM > *To:* moo...@li... > *Subject:* Re: [Moosefs-users] mfsmaster performance and hardware > > Further on this subject. > > I wrote a dedicated http server to serve the files instead of using > Apache. It allowed me to gain a few extra percent of performance and > decreased the memory usage of the web servers. The web server also > gave me some interesting timings: > > File open average 405.3732 ms > File read average 238.7784 ms > File close average 286.8376 ms > File size average 0.0026 ms > Net read average 2.536 ms > Net write average 2.2148 ms > Log to access log average 0.2526 ms > Log to error log average 0.2234 ms > > Average time to process a file 936.2186 ms > Total files processed 1,503,610 > > What I really find scary is that to open a file takes nearly half a > second. To close a file a quarter of a second. The time to open() and > close() is nearly 3 times more than the time to read the data. The > server always reads in multiples of 64 kB except if there are less > data available. Although it uses posix_fadvise() to try and do some > read-ahead. This is the average over 5 machines running mfsmount and > my custom web server running for about 18 hours. > > On a machine that only serves a low number of clients the times for > open and close are negligible. open() and close() seems to scale very > badly with an increase in clients using mfsmount. > > From looking at the code for mfsmount it seems like all communication > to the master happens over a single TCP socket with a global handle > and mutex to protect it. This may be the bottle neck? If there are > multiple open()'s at the same time they may end up waiting for the > mutex to get an opportunity to communicate with the master? The same > handle and mutex is also used to read replies and this may also not > help the situation? > > What prevents multiple sockets to the master? > > It also seems to indicate that the only way to get the open() average > down is to introduce more web servers and that a single web server can > only serve a very low number of clients. Is that a correct assumption? > > Robert > > On 8/26/11 3:25 AM, Davies Liu wrote: > > Hi Robert, > > Another hint to make mfsmaster more responsive is to locate the > metadata.mfs > > on a separated disk with change logs, such as SAS array, then you > should modify > > the source code of mfsmaster to do this. > > PS: what is the average size of you files? MooseFS (like GFS) is > designed for > > large file (100M+), it can not serve well for amount of small files. > Haystack from > > Facebook may be the better choice. We (douban.com <http://douban.com>) > use MooseFS to serve > > 200+T(1M files) offline data and beansdb [1] to serve 500 million > online small > > files, it performs very well. > > [1]: http://code.google.com/p/ <http://code.google.com/p/>*beansdb*/ > > Davies > > On Fri, Aug 26, 2011 at 9:08 AM, Robert Sandilands > <rsa...@ne... <mailto:rsa...@ne...>> wrote: > > Hi Elliot, > > There is nothing in the code to change the priority. > > Taking virtually all other load from the chunk and master servers seems > to have improved this significantly. I still see timeouts from mfsmount, > but not enough to be problematic. > > To try and optimize the performance I am experimenting with accessing > the data using different APIs and block sizes but this has been > inconclusive. I have tried the effect of posix_fadvise(), sendfile() and > different sized buffers for read(). I still want to try mmap(). > Sendfile() did seem to be slightly slower than read(). > > Robert > > > On 8/24/11 11:05 AM, Elliot Finley wrote: > > On Tue, Aug 9, 2011 at 6:46 PM, Robert > Sandilands<rsa...@ne... <mailto:rsa...@ne...>> wrote: > >> Increasing the swap space fixed the fork() issue. It seems that you > have to > >> ensure that memory available is always double the memory needed by > >> mfsmaster. None of the swap space was used over the last 24 hours. > >> > >> This did solve the extreme comb-like behavior of mfsmaster. It > still does > >> not resolve its sensitivity to load on the server. I am still seeing > >> timeouts on the chunkservers and mounts on the hour due to the high > CPU and > >> I/O load when the meta data is dumped to disk. It did however decrease > >> significantly. > > Here is another thought on this... > > > > The process is niced to -19 (very high priority) so that it has good > > performance. It forks once per hour to write out the metadata. I > > haven't checked the code for this, but is the forked process lowering > > it's priority so it doesn't compete with the original process? > > > > If it's not, it should be an easy code change to lower the priority in > > the child process (metadata writer) so that it doesn't compete with > > the original process at the same priority. > > > > If you check into this, I'm sure the list would appreciate an update. :) > > > > Elliot > > > ------------------------------------------------------------------------------ > EMC VNX: the world's simplest storage, starting under $10K > The only unified storage solution that offers unified management > Up to 160% more powerful than alternatives and 25% more efficient. > Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > <mailto:moo...@li...> > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > > > -- > - Davies > |
From: Michał B. <mic...@ge...> - 2011-10-06 07:41:01
|
Hi Fyodor These extra copies will eventually be erased. It is normal - they can appear eg. during replication or cleaning the old disks, etc. Kind regards Michał -----Original Message----- From: Fyodor Ustinov [mailto:uf...@uf...] Sent: Saturday, August 06, 2011 2:30 PM To: moo...@li... Subject: [Moosefs-users] goal count. Hi! root@amanda:~# mfsfileinfo /bacula/bacula-client-gen.sh /bacula/bacula-client-gen.sh: chunk 0: 0000000000000001_00000001 / (id:1 ver:1) copy 1: 10.5.51.141:9422 copy 2: 10.5.51.145:9422 copy 3: 10.5.51.147:9422 I see 3 copy. But: root@amanda:~# mfsgetgoal /bacula /bacula: 2 root@amanda:~# mfsgetgoal /bacula/bacula-client-gen.sh /bacula/bacula-client-gen.sh: 2 Why? WBR, Fyodor. ---------------------------------------------------------------------------- -- BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA The must-attend event for mobile developers. Connect with experts. Get tools for creating Super Apps. See the latest technologies. Sessions, hands-on labs, demos & much more. Register early & save! http://p.sf.net/sfu/rim-blackberry-1 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Michał B. <mic...@ge...> - 2011-10-06 07:38:51
|
Hi These files should eventually got deleted (and without restarting the master). If you want to speed the process just remounting the folder should help. Files can be "reserved" for even two weeks. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 -----Original Message----- From: Fyodor Ustinov [mailto:uf...@uf...] Sent: Friday, August 19, 2011 2:29 PM To: moo...@li... Subject: Re: [Moosefs-users] how to clean "reserved files" Hi! I'm still about "reserved files". Studying the sources I have come to the conclusion that the only way to delete this file - add a line "PURGE" to "changelog" file and execute "mfsmetarestore". Has anyone made this? WBR, Fyodor. ---------------------------------------------------------------------------- -- Get a FREE DOWNLOAD! and learn more about uberSVN rich system, user administration capabilities and model configuration. Take the hassle out of deploying and managing Subversion and the tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Michał B. <mic...@ge...> - 2011-10-06 07:26:43
|
Hi Robert! This is normal behaviour - "open" makes several things on the master, among others some lookups and the open itself. We tried to introduce here folder and attribute cache but it didn't work as expected. We could mark some files as "immutable" and keep the cache in the mfsmount, but these files had to be read only which is rather not acceptable. If you have any ideas how to speed things up, go ahead :) Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: Robert Sandilands [mailto:rsa...@ne...] Sent: Wednesday, August 31, 2011 2:54 AM To: moo...@li... Subject: Re: [Moosefs-users] mfsmaster performance and hardware Further on this subject. I wrote a dedicated http server to serve the files instead of using Apache. It allowed me to gain a few extra percent of performance and decreased the memory usage of the web servers. The web server also gave me some interesting timings: File open average 405.3732 ms File read average 238.7784 ms File close average 286.8376 ms File size average 0.0026 ms Net read average 2.536 ms Net write average 2.2148 ms Log to access log average 0.2526 ms Log to error log average 0.2234 ms Average time to process a file 936.2186 ms Total files processed 1,503,610 What I really find scary is that to open a file takes nearly half a second. To close a file a quarter of a second. The time to open() and close() is nearly 3 times more than the time to read the data. The server always reads in multiples of 64 kB except if there are less data available. Although it uses posix_fadvise() to try and do some read-ahead. This is the average over 5 machines running mfsmount and my custom web server running for about 18 hours. On a machine that only serves a low number of clients the times for open and close are negligible. open() and close() seems to scale very badly with an increase in clients using mfsmount. >From looking at the code for mfsmount it seems like all communication to the master happens over a single TCP socket with a global handle and mutex to protect it. This may be the bottle neck? If there are multiple open()'s at the same time they may end up waiting for the mutex to get an opportunity to communicate with the master? The same handle and mutex is also used to read replies and this may also not help the situation? What prevents multiple sockets to the master? It also seems to indicate that the only way to get the open() average down is to introduce more web servers and that a single web server can only serve a very low number of clients. Is that a correct assumption? Robert On 8/26/11 3:25 AM, Davies Liu wrote: Hi Robert, Another hint to make mfsmaster more responsive is to locate the metadata.mfs on a separated disk with change logs, such as SAS array, then you should modify the source code of mfsmaster to do this. PS: what is the average size of you files? MooseFS (like GFS) is designed for large file (100M+), it can not serve well for amount of small files. Haystack from Facebook may be the better choice. We (douban.com) use MooseFS to serve 200+T(1M files) offline data and beansdb [1] to serve 500 million online small files, it performs very well. [1]: http://code.google.com/p/beansdb/ Davies On Fri, Aug 26, 2011 at 9:08 AM, Robert Sandilands <rsa...@ne...> wrote: Hi Elliot, There is nothing in the code to change the priority. Taking virtually all other load from the chunk and master servers seems to have improved this significantly. I still see timeouts from mfsmount, but not enough to be problematic. To try and optimize the performance I am experimenting with accessing the data using different APIs and block sizes but this has been inconclusive. I have tried the effect of posix_fadvise(), sendfile() and different sized buffers for read(). I still want to try mmap(). Sendfile() did seem to be slightly slower than read(). Robert On 8/24/11 11:05 AM, Elliot Finley wrote: > On Tue, Aug 9, 2011 at 6:46 PM, Robert Sandilands<rsa...@ne...> wrote: >> Increasing the swap space fixed the fork() issue. It seems that you have to >> ensure that memory available is always double the memory needed by >> mfsmaster. None of the swap space was used over the last 24 hours. >> >> This did solve the extreme comb-like behavior of mfsmaster. It still does >> not resolve its sensitivity to load on the server. I am still seeing >> timeouts on the chunkservers and mounts on the hour due to the high CPU and >> I/O load when the meta data is dumped to disk. It did however decrease >> significantly. > Here is another thought on this... > > The process is niced to -19 (very high priority) so that it has good > performance. It forks once per hour to write out the metadata. I > haven't checked the code for this, but is the forked process lowering > it's priority so it doesn't compete with the original process? > > If it's not, it should be an easy code change to lower the priority in > the child process (metadata writer) so that it doesn't compete with > the original process at the same priority. > > If you check into this, I'm sure the list would appreciate an update. :) > > Elliot ---------------------------------------------------------------------------- -- EMC VNX: the world's simplest storage, starting under $10K The only unified storage solution that offers unified management Up to 160% more powerful than alternatives and 25% more efficient. Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users -- - Davies |
From: Michał B. <mic...@ge...> - 2011-10-06 07:13:40
|
Hi! Default values are also 300 and 3600. But changing the goal causes to start replication right away. If you stop a chunkserver and start it again with fewer disks, replication also starts immediately. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: Thomas Schend [mailto:tho...@gm...] Sent: Friday, September 23, 2011 6:30 PM To: moo...@li... Subject: Re: [Moosefs-users] Problem with mfsmaster.cfg not honored Hi, i only see this at startup of the mfsmaster deamon: Sep 23 16:52:39 brick01 mfsmaster[1981]: set gid to 1003 Sep 23 16:52:39 brick01 mfsmaster[1981]: set uid to 1003 Sep 23 16:52:39 brick01 mfsmaster[1981]: sessions have been loaded Sep 23 16:52:39 brick01 mfsmaster[1981]: exports file has been loaded Sep 23 16:52:39 brick01 mfsmaster[1981]: stats file has been loaded Sep 23 16:52:39 brick01 mfsmaster[1981]: master <-> metaloggers module: listen on *:9419 Sep 23 16:52:39 brick01 mfsmaster[1981]: master <-> chunkservers module: listen on *:9420 Sep 23 16:52:39 brick01 mfsmaster[1981]: main master server module: listen on *:9421 Sep 23 16:52:39 brick01 mfsmaster[1981]: open files limit: 5000 Sep 23 16:54:00 brick01 mfsmaster[1981]: chunkservers status: Sep 23 16:54:00 brick01 mfsmaster[1981]: total: usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB), usage: 0.00% Sep 23 16:54:00 brick01 mfsmaster[1981]: no meta loggers connected !!! nothing about the mfsmaster.cfg Regards Thomas 2011/9/23 Davies Liu <dav...@gm...> You can check this in syslog if the mfsmaster.cfg is used or not. Davies On Fri, Sep 23, 2011 at 3:07 PM, Thomas Schend <tho...@gm...> wrote: Hello everyone, i have a small test setup with 4 debian 6.0 vms and moosefs 1.6.20. I have one master and 4 chunkservers running. Works very good so far. The only problem i see is that moosefs starts replication immideltiy after one chunk server is down or when i change the goal. I alos tried setting REPLICATIONS_DELAY_INIT is set to 300 and REPLICATIONS_DELAY_DISCONNECT to 3600 I even tried passing the config file with -c to the mfsmaster but no change. What can i do to solve this or am i doing something wrong? Regards Thomas |
From: Michał B. <mic...@ge...> - 2011-10-06 07:07:02
|
Hi There are limits for replication process so that it doesn't stuck the whole system but probably in your situation they are still too high. In future we'll introduce setting the replication speed on the fly. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: yezhou(Joe) [mailto:fan...@gm...] Sent: Wednesday, September 28, 2011 11:01 PM To: moo...@li... Subject: [Moosefs-users] lots of read block errors and can't connect to poper chunkserver after replaced three new hard drive. I have a chunkserver has three hard drive died last week, and I replace them with new drives. Since I have the goal of two ,there are no missing chunks. but it still reports lots of read block errors and can't connect to poper chunkserver, and I found out the reason is there are lots of write of the three new drives. so the read comes to low and the chunkserver stuck at that monent. who knows whats wrong with it? |
From: Andreas H. <ah...@it...> - 2011-10-06 06:23:59
|
Janis Valtsons <jan...@gm...> writes: > Short answer - yes. > .... > So chunks themselves are not attached to IP or hostname. Thanks, this worked out fine. I used the disk from one of the defect servers on another machines E-SATA connector: # mount /dev/sdc3 /mnt # chown -R mfs.mfs /mnt/* # echo /mnt >> /etc/mfshdd.cfg # /etc/init.d/mfs-chunkserver restart; tail -f /var/log/daemon.log This dropped my missing chunks count from ~10000 to 1783. After one night no chunks were under goal; so I can now proceed with the next disk. After some failed attempts to mark the disk for removal by prepending * in /etc/mfshdd.cfg this worked also. Thank for your Help! Andreas > On 10/04/2011 11:16 AM, Andreas Hirczy wrote: >> Hi Everyone! >> >> We had a power outage which killed the mainboards of three of our >> chunkservers. Since we have goal set to two, there are now some chunks >> missing. >> >> Is it possible to mount the disks from these machines to a different >> chunkserver (different IP address) and export from there for a faster >> recovery? >> >> Thanks in advance, >> Andreas > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2dcopy1 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > -- Andreas Hirczy <ah...@it...> http://itp.tugraz.at/~ahi/ Graz University of Technology phone: +43/316/873- 8190 Institute of Theoretical and Computational Physics fax: +43/316/873-10 8190 Petersgasse 16, A-8010 Graz mobile: +43/664/859 23 57 |
From: Janis V. <jan...@gm...> - 2011-10-04 10:23:20
|
Short answer - yes. I am doing server replacement, replacing old chunkservers each containing 13 disks by small custom built Atom based chunkservers each containing 4 disks. Using disks from old chunkservers works fine. As far as I can tell - chunkserver on startup tells master what chunks it has. So chunks themselves are not attached to IP or hostname. On 10/04/2011 11:16 AM, Andreas Hirczy wrote: > Hi Everyone! > > We had a power outage which killed the mainboards of three of our > chunkservers. Since we have goal set to two, there are now some chunks > missing. > > Is it possible to mount the disks from these machines to a different > chunkserver (different IP address) and export from there for a faster > recovery? > > Thanks in advance, > Andreas |
From: Andreas H. <ah...@it...> - 2011-10-04 08:51:28
|
Hi Everyone! We had a power outage which killed the mainboards of three of our chunkservers. Since we have goal set to two, there are now some chunks missing. Is it possible to mount the disks from these machines to a different chunkserver (different IP address) and export from there for a faster recovery? Thanks in advance, Andreas -- Andreas Hirczy <ah...@it...> http://itp.tugraz.at/~ahi/ Graz University of Technology phone: +43/316/873- 8190 Institute of Theoretical and Computational Physics fax: +43/316/873-10 8190 Petersgasse 16, A-8010 Graz mobile: +43/664/859 23 57 |
From: Kristofer P. <kri...@cy...> - 2011-10-03 19:21:58
|
Distribute filesystem always design for huge space. Waste often exist. eg: >>> >> Haystack in facebook, GFS in google never recycling space of delete >>> >> files, they mark flag for deleted status. >>> >> > It isn't true that all distributed file systems are designed for huge > files. Lustre for instance uses the block size of the underlying file > system. I disagree that the concept of distributed file systems is > synonymous with large files. That doesn't strike me as a valid reason > to dismiss the idea of variable block sizes at compile time. Just for clarification, he said huge space, not huge files. :) |
From: <wk...@bn...> - 2011-10-02 21:33:29
|
We've been using Moose for about 8 months now and finally got around to playing with snapshots. In particular we are interested in using them on VM backups. We now freeze the VM, take the snapshot, unfreeze and backup the snapshot to the backup cluster, then depending upon tool used, re-integrate the snapshot back into the live copy Since the MooseFS implementation appears to be instantaneous and could possibly preclude the need to 'freeze' the VM prior to taking the snapshot as we do with a more traditional snapshot (where it can take some time to complete the snapshot). Is that the case? Can we skip the freeze of the VM? I suppose the freeze may still be necessary in order to force writes to the disk within the VM itself. But perhaps we could get away with an fsync and rely on on the journaling in ext3/4. Even if we continue to freeze the MooseFS snapshot is so much quicker. Are there any known issues with the MooseFS SnapShots? The developers mentioned some changes they wanted to make regarding them in an upcoming release. Sincerely, -bill |