You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(20) |
Feb
(11) |
Mar
(11) |
Apr
(9) |
May
(22) |
Jun
(85) |
Jul
(94) |
Aug
(80) |
Sep
(72) |
Oct
(64) |
Nov
(69) |
Dec
(89) |
2011 |
Jan
(72) |
Feb
(109) |
Mar
(116) |
Apr
(117) |
May
(117) |
Jun
(102) |
Jul
(91) |
Aug
(72) |
Sep
(51) |
Oct
(41) |
Nov
(55) |
Dec
(74) |
2012 |
Jan
(45) |
Feb
(77) |
Mar
(99) |
Apr
(113) |
May
(132) |
Jun
(75) |
Jul
(70) |
Aug
(58) |
Sep
(58) |
Oct
(37) |
Nov
(51) |
Dec
(15) |
2013 |
Jan
(28) |
Feb
(16) |
Mar
(25) |
Apr
(38) |
May
(23) |
Jun
(39) |
Jul
(42) |
Aug
(19) |
Sep
(41) |
Oct
(31) |
Nov
(18) |
Dec
(18) |
2014 |
Jan
(17) |
Feb
(19) |
Mar
(39) |
Apr
(16) |
May
(10) |
Jun
(13) |
Jul
(17) |
Aug
(13) |
Sep
(8) |
Oct
(53) |
Nov
(23) |
Dec
(7) |
2015 |
Jan
(35) |
Feb
(13) |
Mar
(14) |
Apr
(56) |
May
(8) |
Jun
(18) |
Jul
(26) |
Aug
(33) |
Sep
(40) |
Oct
(37) |
Nov
(24) |
Dec
(20) |
2016 |
Jan
(38) |
Feb
(20) |
Mar
(25) |
Apr
(14) |
May
(6) |
Jun
(36) |
Jul
(27) |
Aug
(19) |
Sep
(36) |
Oct
(24) |
Nov
(15) |
Dec
(16) |
2017 |
Jan
(8) |
Feb
(13) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(10) |
Jul
(20) |
Aug
(3) |
Sep
(18) |
Oct
(8) |
Nov
|
Dec
(5) |
2018 |
Jan
(15) |
Feb
(9) |
Mar
(12) |
Apr
(7) |
May
(123) |
Jun
(41) |
Jul
|
Aug
(14) |
Sep
|
Oct
(15) |
Nov
|
Dec
(7) |
2019 |
Jan
(2) |
Feb
(9) |
Mar
(2) |
Apr
(9) |
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(6) |
Oct
(1) |
Nov
(12) |
Dec
(2) |
2020 |
Jan
(2) |
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
(4) |
Jul
(4) |
Aug
(1) |
Sep
(18) |
Oct
(2) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
(5) |
Oct
(5) |
Nov
(3) |
Dec
|
2022 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Michał B. <mic...@ge...> - 2010-06-10 08:27:04
|
> El Jue 03 Junio 2010, Laurent Wandrebeck escribió: > > On Thu, 3 Jun 2010 09:02:29 +0200 > > - Do you know of any « big user » relying on mfs ? I've been able to find > > several for glusterfs for example, nothing for moosefs. Such entries would > > be nice on the website, and reassuring for potential users. > > Well, I was pretty sure I saw a "Who's using" section on the website but I > can't find it. Indeed it would be nice to have one. [MB] No, it has not been yet created. We plan to implement it. [MB] At our company (http://www.gemius.com) we have four deployments, the biggest has almost 30 million files distributed over 70 chunk servers having a total space of 570TiB. Chunkserver machines at the same time are used to make other calculations. [MB] Another big Polish company which uses MooseFS for data storage is Redefine (http://www.redefine.pl/). > > I've read that you have something like half a PB. We're up to 70TB, > > going to 200 in the next months. Are there any known limits, bottlenecks, > > loads that push systems/network on their knees ? We are processing satellite > > images, so I/O is quite heavy, and I'm worrying a bit about the behaviour > > during real processing load. [MB] You can have a look at this FAQ entry: http://www.moosefs.org/moosefs-faq.html#mtu [MB] At our environment we use SATA disks and while making lots of additional calculations on chunkservers we even do not fully use the available bandwidth of the network. If you will use SAS disks it can happen that there would appear some problems we have not yet encountered. [ ... snip ... ] > > master failover is a bit tricky, which is really annoying for HA. > > That's probably a point for Gluster as it doesn't have a metadata server, but > actually there is a master (sort of) which is the one the clients connect to. > > If it goes away, there's a delay till another node becomes master, at least in > theory as I didn't test that part. [MB] You can also refer to this mini how-to: http://www.moosefs.org/mini-howtos.html#redundant-master and see how it is possible to create a fail proof solution using CARP. [ ... snip ... ] > > - At last, just to be sure I understood correctly, files are automatically > > striped through available chunkservers, so for all files with goal at 1, if > > a single chunkserver falls, files are unavailable, unless they are smaller > > than 64MB and not on the out of order chunkserver, correct ? > > I believe you're correct, and that's why you should always have at least a > goal of 2. I mean, if you consider your data important ;) [MB] Files smaller than 64MB are kept in one chunk and if you set goal=1 and a chunkserver storing this chunk fails, the file is not available. Bigger files are divided into fragments of 64MB and each of them can be stored on different chunkservers. So there is a quite substantial probability that a big file with goal=1 will be unavailable (or at least its part(s)) if one of the chunks has been stored on the failed chunkserver. The general rule is to use goal=2 for normal files and goal=3 for files that are especially important to you. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 > > Best regards, > -- > Ricardo J. Barberis > Senior SysAdmin - I+D > Dattatec.com :: Soluciones de Web Hosting > Su Hosting hecho Simple..! > > ------------------------------------------ > > Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo > son confidenciales, de uso exclusivo para el destinatario del mismo. La > divulgación y/o uso del mismo sin autorización por parte de Dattatec.com > queda prohibida. Dattatec.com no se hace responsable del mensaje por la > falsificación y/o alteración del mismo. > De no ser Ud. el destinatario del mismo y lo ha recibido por error, por > favor notifique al remitente y elimínelo de su sistema. > > Confidentiality Note: This message and any attachments (the message) are > confidential and intended solely for the addressees. Any unauthorised use > or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be > liable for the message if altered or falsified. > If you are not the intended addressee of this message, please cancel it > immediately and inform the sender. > > Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem > conter dados confidenciais ou privilegiados. Se você os recebeu por engano > ou não é um dos destinatários aos quais ela foi endereçada, por favor > destrua-a e a todos os seus eventuais anexos ou copias realizadas, > imediatamente. > É proibida a retenção, distribuição, divulgação ou utilização de quaisquer > informações aqui contidas. Por favor, informe-nos sobre o recebimento > indevido desta mensagem, retornando-a para o autor. > > ------------------------------------------------------------------------------ > ThinkGeek and WIRED's GeekDad team up for the Ultimate > GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the > lucky parental unit. See the prize list and enter to win: > http://p.sf.net/sfu/thinkgeek-promo > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Michał B. <mic...@ge...> - 2010-06-08 09:50:07
|
Hi! The system says that chunk numbered "D710" is not available (none copy of the 3 set in goal). If all chunkservers and all the disks are connected it means that this chunk simply does not exist. If reboot took place while the file had been written it can happen that such a chunk will be lost. The important question is - was it the reboot of the master server, chunkservers or the whole system? An abrupt reboot of the whole system (eg. lack of electricity) could cause something like this. Fsck on chunkserver could have unfortunately deleted this chunk. It may be worthy to look into "lost+found" on disks connected on mfschunkservers. You can also issue "mfsfilerepair", but this will help only by creating zeros in the "damaged" place of the file. The system would not try to read it (to be exact system does not hang up, it makes lots of retries to read it - waits for the file to show up and after several minutes it gives up). If you need any further assistance please let us know. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: kuer ku [mailto:ku...@gm...] Sent: Saturday, June 05, 2010 12:05 PM To: moo...@li... Subject: [Moosefs-users] how to fix unavailabe chunk ?? Hi, all, I setup a moosefs storage with 1 metaserver + 4 chunkserver. today I found some error messages on http interface : there are some files lost. currently unavailable chunk 000000000000D710 (inode: 331 ; index: 0) * currently unavailable file 331: sink/fifodata/00126/20100604/00126_20100604164805 On box where mfsmount, when executing 'ls' command, it shows : -rw-rw-rw- 1 sea sea 2778996 6 4 17:32 00126_20100604164805 There is a system reboot occurs on 06/04 17:32; it is the last time when file was written. Now, at present, I can list it, but I cannot cat content of the files. Moreover, when you cat this file, the command would hang. I can find some error message in /var/log/messages : Jun 5 17:51:26 nbase07 mfsmount[6625]: file: 331, index: 0, chunk: 55056, version: 2 - there are no valid copies Jun 5 17:51:26 nbase07 mfsmount[6625]: file: 331, index: 0 - can't connect to proper chunkserver (try counter: 15) Jun 5 17:52:26 nbase07 mfsmount[6625]: file: 331, index: 0, chunk: 55056, version: 2 - there are no valid copies Jun 5 17:52:26 nbase07 mfsmount[6625]: file: 331, index: 0 - can't connect to proper chunkserver (try counter: 22) Jun 5 17:53:26 nbase07 mfsmount[6625]: file: 331, index: 0, chunk: 55056, version: 2 - there are no valid copies Jun 5 17:53:26 nbase07 mfsmount[6625]: file: 331, index: 0 - can't connect to proper chunkserver (try counter: 29) and, the goal of the file should be 3, because I set goal of its parent-directory is 3. What is the problem ? how to fix it ?? My environment : metaserver : moosefs 1.6.13 build on CentOS 5.3 x86_64 chunkserver : moosefs 1.6.13 build on CentOS 5.3 x86_64 mfsmount : MFS version 1.6.15 (FUSE library version: 2.7.4) on FreeBSD 6.2 thanks, - kuer |
From: Michał B. <mic...@ge...> - 2010-06-08 08:26:16
|
Code responsible for communication between chunkservers is in "mfschunkserver/csserv.c" file, main functions are: "csserv_fwd*", "csserv_forward", "csserv_write_*". If you need any further assistance please let us know. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: 洪志雄 [mailto:fis...@gm...] Sent: Monday, June 07, 2010 5:26 AM To: moo...@li... Subject: [Moosefs-users] Fwd: I want to know the detail of write process I was read source code of the MFS(ver1.6.15) these days. I have a question about write process. >From source code, I saw if in the client if wanting to write data, It would created a job in queue. all job would be done with in write_worker() (in mfsmount/writedata.c). >From the picture of MooseFS Write Process( in moosefs index page), The processes of 1, 2, 3, 4 were clearly in write_worker(), but process 5, 6 which synchronized the data between chunk server I couldn't find it in source code . Would please provide some clues about this problem. Thanks:) -- --------------------------------------------------------------- by 洪志雄 -- --------------------------------------------------------------- by 洪志雄 |
From: 洪志雄 <fis...@gm...> - 2010-06-07 03:25:54
|
I was read source code of the MFS(ver1.6.15) these days. I have a question about write process. >From source code, I saw if in the client if wanting to write data, It would created a job in queue. all job would be done with in write_worker() (in mfsmount/writedata.c). >From the picture of MooseFS Write Process( in moosefs index page), The processes of 1, 2, 3, 4 were clearly in write_worker(), but process 5, 6 which synchronized the data between chunk server I couldn't find it in source code . Would please provide some clues about this problem. Thanks:) -- --------------------------------------------------------------- by 洪志雄 -- --------------------------------------------------------------- by 洪志雄 |
From: <lw...@hy...> - 2010-06-05 20:10:03
|
Hi, Please find attached an updated spec file for rpm creation, based on the one Kirby Zhou provided. Tested on CentOS 5.5 x86_64 and i386. Arch built: i386,i486,i586,i686,athlon,pentium2/3/4,x86_64. Tested on Fedora 13 x86_64. Arch built: x86_64. Could you please add the file in the git repo ? After all, debian is there :) I can provide a repo (probably CentOS 5 only) if you want. Thanks, Laurent. PS: not even a warning during compilation, nice ! |
From: kuer ku <ku...@gm...> - 2010-06-05 10:05:31
|
Hi, all, I setup a moosefs storage with 1 metaserver + 4 chunkserver. today I found some error messages on http interface : there are some files lost. currently unavailable chunk 000000000000D710 (inode: 331 ; index: 0) * currently unavailable file 331: sink/fifodata/00126/20100604/00126_20100604164805 On box where mfsmount, when executing 'ls' command, it shows : -rw-rw-rw- 1 sea sea 2778996 6 4 17:32 00126_20100604164805 There is a system reboot occurs on 06/04 17:32; it is the last time when file was written. Now, at present, I can list it, but I cannot cat content of the files. Moreover, when you cat this file, the command would hang. I can find some error message in /var/log/messages : Jun 5 17:51:26 nbase07 mfsmount[6625]: file: 331, index: 0, chunk: 55056, version: 2 - there are no valid copies Jun 5 17:51:26 nbase07 mfsmount[6625]: file: 331, index: 0 - can't connect to proper chunkserver (try counter: 15) Jun 5 17:52:26 nbase07 mfsmount[6625]: file: 331, index: 0, chunk: 55056, version: 2 - there are no valid copies Jun 5 17:52:26 nbase07 mfsmount[6625]: file: 331, index: 0 - can't connect to proper chunkserver (try counter: 22) Jun 5 17:53:26 nbase07 mfsmount[6625]: file: 331, index: 0, chunk: 55056, version: 2 - there are no valid copies Jun 5 17:53:26 nbase07 mfsmount[6625]: file: 331, index: 0 - can't connect to proper chunkserver (try counter: 29) and, the goal of the file should be 3, because I set goal of its parent-directory is 3. What is the problem ? how to fix it ?? My environment : metaserver : moosefs 1.6.13 build on CentOS 5.3 x86_64 chunkserver : moosefs 1.6.13 build on CentOS 5.3 x86_64 mfsmount : MFS version 1.6.15 (FUSE library version: 2.7.4) on FreeBSD 6.2 thanks, - kuer |
From: Ricardo J. B. <ric...@da...> - 2010-06-03 15:38:57
|
El Jue 03 Junio 2010, Laurent Wandrebeck escribió: > On Thu, 3 Jun 2010 09:02:29 +0200 [ ... snip ... ] > - Do you know of any « big user » relying on mfs ? I've been able to find > several for glusterfs for example, nothing for moosefs. Such entries would > be nice on the website, and reassuring for potential users. Well, I was pretty sure I saw a "Who's using" section on the website but I can't find it. Indeed it would be nice to have one. > - How does moosefs compare to glusterfs? What are their respective pros and > cons ? I haven't been able to find a comprehensive list. I have tested both (and also Lustre) so here are my two cents: > moose is quite easy to deploy (easier than glusterfs, I think, but not yet > tested). Yes, I think Moose is the easiest of the three. > master failover is a bit tricky, which is really annoying for HA. That's probably a point for Gluster as it doesn't have a metadata server, but actually there is a master (sort of) which is the one the clients connect to. If it goes away, there's a delay till another node becomes master, at least in theory as I didn't test that part. > Goal is just beautiful. Yes, and IMHO this is a big advantage of Moose. Lustre doesn't even have replication and with Gluster the copies of a file are determined by how many storage nodes you configure as replicas. > Other than that, stability/performance wise, I have no idea. My tests showed Moose had the best performance of the three. My Moose cluster (1 master + 1 metalogger + 3 chunkservers = 5.3 TB, with 84 clients doing nightly backups) has been running for only 3 months, but without any problems so far. Never had any stability issues with Gluster or Lustre either, but I only did some tests, never put them in production. > - At last, just to be sure I understood correctly, files are automatically > striped through available chunkservers, so for all files with goal at 1, if > a single chunkserver falls, files are unavailable, unless they are smaller > than 64MB and not on the out of order chunkserver, correct ? I believe you're correct, and that's why you should always have at least a goal of 2. I mean, if you consider your data important ;) Best regards, -- Ricardo J. Barberis Senior SysAdmin - I+D Dattatec.com :: Soluciones de Web Hosting Su Hosting hecho Simple..! ------------------------------------------ Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo son confidenciales, de uso exclusivo para el destinatario del mismo. La divulgación y/o uso del mismo sin autorización por parte de Dattatec.com queda prohibida. Dattatec.com no se hace responsable del mensaje por la falsificación y/o alteración del mismo. De no ser Ud. el destinatario del mismo y lo ha recibido por error, por favor notifique al remitente y elimínelo de su sistema. Confidentiality Note: This message and any attachments (the message) are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be liable for the message if altered or falsified. If you are not the intended addressee of this message, please cancel it immediately and inform the sender. Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem conter dados confidenciais ou privilegiados. Se você os recebeu por engano ou não é um dos destinatários aos quais ela foi endereçada, por favor destrua-a e a todos os seus eventuais anexos ou copias realizadas, imediatamente. É proibida a retenção, distribuição, divulgação ou utilização de quaisquer informações aqui contidas. Por favor, informe-nos sobre o recebimento indevido desta mensagem, retornando-a para o autor. |
From: Laurent W. <lw...@hy...> - 2010-06-03 08:33:57
|
On Thu, 3 Jun 2010 09:02:29 +0200 Michał Borychowski <mic...@ge...> wrote: > It will be updated when we push the next release (probably next week). Nice. 4 months without a commit, and 4 new stable versions was a bit annoying. What about 1.7 ? Is there a not yet published dev branch ? > > If you need any further assistance please let us know. Thanks, now some technical points: I've been able to deploy for testing purpose 6 boxes: 1 master, 1 metalogger, 4 chunks, for a little less than 1TB. - Are there any known problems being mfs client and server ? Right now, every machine has its own storage, having a single volume is nice, but we can't afford to lose their processing power. I've done some quick tests, it seems to work fine. - I've read that you have something like half a PB. We're up to 70TB, going to 200 in the next months. Are there any known limits, bottlenecks, loads that push systems/network on their knees ? We are processing satellite images, so I/O is quite heavy, and I'm worrying a bit about the behaviour during real processing load. - Do you know of any « big user » relying on mfs ? I've been able to find several for glusterfs for example, nothing for moosefs. Such entries would be nice on the website, and reassuring for potential users. - How does moosefs compare to glusterfs ? What are their respective pros and cons ? I haven't been able to find a comprehensive list. moose is quite easy to deploy (easier than glusterfs, I think, but not yet tested). master failover is a bit tricky, which is really annoying for HA. Goal is just beautiful. Other than that, stability/performance wise, I have no idea. - At last, just to be sure I understood correctly, files are automatically striped through available chunkservers, so for all files with goal at 1, if a single chunkserver falls, files are unavailable, unless they are smaller than 64MB and not on the out of order chunkserver, correct ? Thanks, -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: Michał B. <mic...@ge...> - 2010-06-03 07:03:05
|
It will be updated when we push the next release (probably next week). If you need any further assistance please let us know. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 > -----Original Message----- > From: Laurent Wandrebeck [mailto:lw...@hy...] > Sent: Wednesday, June 02, 2010 5:36 PM > To: moo...@li... > Subject: [Moosefs-users] git outdated > > Hi, > > git repo is outdated. > where are up to date sources available ? > > thanks, > -- > Laurent Wandrebeck > HYGEOS, Earth Observation Department / Observation de la Terre > Euratechnologies > 165 Avenue de Bretagne > 59000 Lille, France > tel: +33 3 20 08 24 98 > http://www.hygeos.com > GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C > F64C |
From: Laurent W. <lw...@hy...> - 2010-06-02 16:02:47
|
Hi, git repo is outdated. where are up to date sources available ? thanks, -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: Guowen S. <guo...@gm...> - 2010-06-01 13:55:03
|
Hi, I was testing mfs with 16G RAM for master. After 16,000,000 files were written into, I “mfschunkserver stop” a chunkserver for some time. Troubles happened. The log of master server filled with plenty of messages as below: --------------------------------------------------------------------------------------------------------------------- * currently unavailable file 158921: photos/47/74/82/r_163042.jpg currently unavailable chunk 00000000004465CA (inode: 4353226 ; index: 0) --------------------------------------------------------------------------------------------------------------------- It stays in the loop of “fs_test_files()”, refusing to go out to continue serving for clients. The goal is 1, of cause. I attempted to reconnect the chunk server to master, but it failed. The master cannot provide the normal services for clients, and cannot accept the packets from chunk server, even it is impossible to get information from mfscgiserv. The mfs crashed. From the code, it decreases the value of “allvalidcopies”, then print log messages as above in loop after entering into “fs_test_files()”. Because the amount of files are so large, it lasts for a very long time (maybe the most time is spent on “printf”). I think it is quite common that the chunk server disconnected temporary, and connected again. so maybe the fs_test_file() could be work in a separate thread. How do you concern about this problem? -- Yours sincerely, Guowen Shen <guo...@gm...> |
From: Guowen S. <guo...@gm...> - 2010-06-01 12:05:34
|
Hi, I was testing mfs with 16G RAM for master. After 16,000,000 files were written into, I “mfschunkserver stop” a chunkserver for some time. Troubles happened. The log of master server filled with plenty of messages as below: --------------------------------------------------------------------------------------------------------------------- * currently unavailable file 158921: photos/47/74/82/r_163042.jpg currently unavailable chunk 00000000004465CA (inode: 4353226 ; index: 0) --------------------------------------------------------------------------------------------------------------------- It stays in the loop of “fs_test_files()”, refusing to go out to continue serving for clients. The goal is 1, of cause. I attempted to reconnect the chunk server to master, but it failed. The master cannot provide the normal services for clients, and cannot accept the packets from chunk server, even it is impossible to get information from mfscgiserv. The mfs crashed. From the code, it decreases the value of “allvalidcopies”, then print log messages as above in loop after entering into “fs_test_files()”. Because the amount of files are so large, it lasts for a very long time (maybe the most time is spent on “printf”). I think it is quite common that the chunk server disconnected temporary, and connected again. so maybe the fs_test_file() could be work in a separate thread. How do you concern about this problem? -- Yours sincerely, Guowen Shen <guo...@gm...> |
From: Michał B. <mic...@ge...> - 2010-06-01 07:37:15
|
Hi! Caching mechanism would also be helpful for you even with large files if you have enough RAM on your client machine (eg. 8 or 16 GB). We would shortly release a beta version with possibility to enable cache. Regards Michał > -----Original Message----- > From: Metin Akyalı [mailto:met...@gm...] > Sent: Thursday, May 27, 2010 8:32 PM > To: moo...@li... > Subject: Re: [Moosefs-users] Fwd: how to install a cloud storage network > :which is best smart automatic file replication solution for cloud storage > based systems. > > Hello everyone, > > Even you put a caching mechanism, i wont be able to cache files since > my average file size would be 500 MB :) But i am sure it will be very > useful for the other prople. > > I am realyl getting excited when i see that there are people working > on such projects. > > I am going to be a contributer to one of the file systems in next 2-3 > months and moose fs is in my top list :) > Btw, is there any other recommended commercial and open source > solutions out there ? > > 2010/5/27 Michał Borychowski <mic...@ge...>: > > Ricardo gave very accurate observations about MooseFS and what you would > like to achieve. > > > > There is a setting "goal" which tells in how many copies you want to store > the file. MooseFS won't automatically adjust this parameter for you, but you > can prepare a script which would examine the popularity of the file and will > change the goal to a higher number. If the file is not popular any longer, the > goal will be reverted to 2 (recommended) or to 1. > > > > But the most important thing is the performance and throughput of the > clients' machines. What should be interesting for you - we plan to introduce > in the near future a cache mechanism so that when a file had been downloaded > by a client machine from the chunk server it won't be downloaded again unless > the file had been modified. So this would eliminate a problem of network speed > between chunks and clients and it would not be necessary to store the file in > more than 2 copies (it won't make the system work more quickly). > > > > If you need any further assistance please let us know. > > > > > > Kind regards > > Michał Borychowski > > MooseFS Support Manager > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > Gemius S.A. > > ul. Wołoska 7, 02-672 Warszawa > > Budynek MARS, klatka D > > Tel.: +4822 874-41-00 > > Fax : +4822 874-41-01 > > > > > > > > > >> -----Original Message----- > >> From: Ricardo J. Barberis [mailto:ric...@da...] > >> Sent: Wednesday, May 26, 2010 5:05 PM > >> To: moo...@li... > >> Subject: Re: [Moosefs-users] Fwd: how to install a cloud storage network > >> :which is best smart automatic file replication solution for cloud storage > >> based systems. > >> > >> El Martes 25 May 2010, Metin Akyalı escribió: > >> > Hello, > >> > >> Hi! > >> > >> > I am looking for a solution for a project i am working on. > >> > > >> > We are developing a website where people can upload their files and > >> > where they can share those files and other people can download them. > >> > (similar to rapidshare.com model) > >> > >> [ massive snippage ] > >> > >> > So i need a cloud based system, which will push the files into > >> > replicated nodes automatically when demanded to those files are high, > >> > and when the demand is low, they will delete from other nodes and it > >> > will stay in only 1 node. > >> > > >> > I have looked to glusterfs and asked in their irc channel that > >> > problem, and got an answer from the guys that gluster cant do such a > >> > thing. It is only able to replicate all the files or none of the > >> > files. (i have to define which files to be replicated) But i need it > >> > the cluster software to do it automatically. > >> > >> OK, I don't think MooseFS (mfs for short) will help you either, at least > not > >> automagically: in mfs you can specify that some files have to have more > >> copies than others, but you have to do it by hand. > >> > >> However, I would really advise you to store at least 2 copies to avoid data > >> loss in case of one of the storage nodes going kaput. > >> > >> > >> The only distributed/replicated filesystem that I'm aware of capable of > >> automatic load balancing is Ceph (http://ceph.newdream.net/) but it's > >> currently in alpha status and not recommended for production sites. > >> > >> Nonetheless, maybe you can reach your goals with mfs, keep reading :) > >> > >> > I am sure afer some time, i will have some trouble using client > >> > server which i have to loadbalance them later, but that is the next > >> > step which i dont mind right now. > >> > >> Well, I'm going to actually suggest something along those lines. > >> > >> Consider this: no matter how many storage nodes you will have, the > bottleneck > >> will then be the frontend server (client to mfs) bandwidth, so you _will_ > >> have to do frontend load balancing, might as well do it from the stat and > >> avoid doing it while in production. > >> > >> With than in mind, and to avoid manually increasing files copies on the > >> backend (storage) nodes, you could cache the files on the frontends > (asuming > >> the files won't change often, once uploaded) to speed up serving them. > >> > >> In short: I think your best option is a combination of load balancing/proxy > >> cache and MooseFS with goal >= 2. > >> > >> > >> Best regards, > >> -- > >> Ricardo J. Barberis > >> Senior SysAdmin - I+D > >> Dattatec.com :: Soluciones de Web Hosting > >> Su Hosting hecho Simple..! > >> > >> ------------------------------------------ > >> > >> Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo > >> son confidenciales, de uso exclusivo para el destinatario del mismo. La > >> divulgación y/o uso del mismo sin autorización por parte de Dattatec.com > >> queda prohibida. Dattatec.com no se hace responsable del mensaje por la > >> falsificación y/o alteración del mismo. > >> De no ser Ud. el destinatario del mismo y lo ha recibido por error, por > >> favor notifique al remitente y elimínelo de su sistema. > >> > >> Confidentiality Note: This message and any attachments (the message) are > >> confidential and intended solely for the addressees. Any unauthorised use > >> or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be > >> liable for the message if altered or falsified. > >> If you are not the intended addressee of this message, please cancel it > >> immediately and inform the sender. > >> > >> Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem > >> conter dados confidenciais ou privilegiados. Se você os recebeu por engano > >> ou não é um dos destinatários aos quais ela foi endereçada, por favor > >> destrua-a e a todos os seus eventuais anexos ou copias realizadas, > >> imediatamente. > >> É proibida a retenção, distribuição, divulgação ou utilização de quaisquer > >> informações aqui contidas. Por favor, informe-nos sobre o recebimento > >> indevido desta mensagem, retornando-a para o autor. > >> > >> --------------------------------------------------------------------------- > --- > >> > >> _______________________________________________ > >> moosefs-users mailing list > >> moo...@li... > >> https://lists.sourceforge.net/lists/listinfo/moosefs-users > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Metin A. <met...@gm...> - 2010-05-27 18:31:53
|
Hello everyone, Even you put a caching mechanism, i wont be able to cache files since my average file size would be 500 MB :) But i am sure it will be very useful for the other prople. I am realyl getting excited when i see that there are people working on such projects. I am going to be a contributer to one of the file systems in next 2-3 months and moose fs is in my top list :) Btw, is there any other recommended commercial and open source solutions out there ? 2010/5/27 Michał Borychowski <mic...@ge...>: > Ricardo gave very accurate observations about MooseFS and what you would like to achieve. > > There is a setting "goal" which tells in how many copies you want to store the file. MooseFS won't automatically adjust this parameter for you, but you can prepare a script which would examine the popularity of the file and will change the goal to a higher number. If the file is not popular any longer, the goal will be reverted to 2 (recommended) or to 1. > > But the most important thing is the performance and throughput of the clients' machines. What should be interesting for you - we plan to introduce in the near future a cache mechanism so that when a file had been downloaded by a client machine from the chunk server it won't be downloaded again unless the file had been modified. So this would eliminate a problem of network speed between chunks and clients and it would not be necessary to store the file in more than 2 copies (it won't make the system work more quickly). > > If you need any further assistance please let us know. > > > Kind regards > Michał Borychowski > MooseFS Support Manager > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > Gemius S.A. > ul. Wołoska 7, 02-672 Warszawa > Budynek MARS, klatka D > Tel.: +4822 874-41-00 > Fax : +4822 874-41-01 > > > > >> -----Original Message----- >> From: Ricardo J. Barberis [mailto:ric...@da...] >> Sent: Wednesday, May 26, 2010 5:05 PM >> To: moo...@li... >> Subject: Re: [Moosefs-users] Fwd: how to install a cloud storage network >> :which is best smart automatic file replication solution for cloud storage >> based systems. >> >> El Martes 25 May 2010, Metin Akyalı escribió: >> > Hello, >> >> Hi! >> >> > I am looking for a solution for a project i am working on. >> > >> > We are developing a website where people can upload their files and >> > where they can share those files and other people can download them. >> > (similar to rapidshare.com model) >> >> [ massive snippage ] >> >> > So i need a cloud based system, which will push the files into >> > replicated nodes automatically when demanded to those files are high, >> > and when the demand is low, they will delete from other nodes and it >> > will stay in only 1 node. >> > >> > I have looked to glusterfs and asked in their irc channel that >> > problem, and got an answer from the guys that gluster cant do such a >> > thing. It is only able to replicate all the files or none of the >> > files. (i have to define which files to be replicated) But i need it >> > the cluster software to do it automatically. >> >> OK, I don't think MooseFS (mfs for short) will help you either, at least not >> automagically: in mfs you can specify that some files have to have more >> copies than others, but you have to do it by hand. >> >> However, I would really advise you to store at least 2 copies to avoid data >> loss in case of one of the storage nodes going kaput. >> >> >> The only distributed/replicated filesystem that I'm aware of capable of >> automatic load balancing is Ceph (http://ceph.newdream.net/) but it's >> currently in alpha status and not recommended for production sites. >> >> Nonetheless, maybe you can reach your goals with mfs, keep reading :) >> >> > I am sure afer some time, i will have some trouble using client >> > server which i have to loadbalance them later, but that is the next >> > step which i dont mind right now. >> >> Well, I'm going to actually suggest something along those lines. >> >> Consider this: no matter how many storage nodes you will have, the bottleneck >> will then be the frontend server (client to mfs) bandwidth, so you _will_ >> have to do frontend load balancing, might as well do it from the stat and >> avoid doing it while in production. >> >> With than in mind, and to avoid manually increasing files copies on the >> backend (storage) nodes, you could cache the files on the frontends (asuming >> the files won't change often, once uploaded) to speed up serving them. >> >> In short: I think your best option is a combination of load balancing/proxy >> cache and MooseFS with goal >= 2. >> >> >> Best regards, >> -- >> Ricardo J. Barberis >> Senior SysAdmin - I+D >> Dattatec.com :: Soluciones de Web Hosting >> Su Hosting hecho Simple..! >> >> ------------------------------------------ >> >> Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo >> son confidenciales, de uso exclusivo para el destinatario del mismo. La >> divulgación y/o uso del mismo sin autorización por parte de Dattatec.com >> queda prohibida. Dattatec.com no se hace responsable del mensaje por la >> falsificación y/o alteración del mismo. >> De no ser Ud. el destinatario del mismo y lo ha recibido por error, por >> favor notifique al remitente y elimínelo de su sistema. >> >> Confidentiality Note: This message and any attachments (the message) are >> confidential and intended solely for the addressees. Any unauthorised use >> or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be >> liable for the message if altered or falsified. >> If you are not the intended addressee of this message, please cancel it >> immediately and inform the sender. >> >> Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem >> conter dados confidenciais ou privilegiados. Se você os recebeu por engano >> ou não é um dos destinatários aos quais ela foi endereçada, por favor >> destrua-a e a todos os seus eventuais anexos ou copias realizadas, >> imediatamente. >> É proibida a retenção, distribuição, divulgação ou utilização de quaisquer >> informações aqui contidas. Por favor, informe-nos sobre o recebimento >> indevido desta mensagem, retornando-a para o autor. >> >> ------------------------------------------------------------------------------ >> >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users > > |
From: Michał B. <mic...@ge...> - 2010-05-27 09:24:07
|
Ricardo gave very accurate observations about MooseFS and what you would like to achieve. There is a setting "goal" which tells in how many copies you want to store the file. MooseFS won't automatically adjust this parameter for you, but you can prepare a script which would examine the popularity of the file and will change the goal to a higher number. If the file is not popular any longer, the goal will be reverted to 2 (recommended) or to 1. But the most important thing is the performance and throughput of the clients' machines. What should be interesting for you - we plan to introduce in the near future a cache mechanism so that when a file had been downloaded by a client machine from the chunk server it won't be downloaded again unless the file had been modified. So this would eliminate a problem of network speed between chunks and clients and it would not be necessary to store the file in more than 2 copies (it won't make the system work more quickly). If you need any further assistance please let us know. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 > -----Original Message----- > From: Ricardo J. Barberis [mailto:ric...@da...] > Sent: Wednesday, May 26, 2010 5:05 PM > To: moo...@li... > Subject: Re: [Moosefs-users] Fwd: how to install a cloud storage network > :which is best smart automatic file replication solution for cloud storage > based systems. > > El Martes 25 May 2010, Metin Akyalı escribió: > > Hello, > > Hi! > > > I am looking for a solution for a project i am working on. > > > > We are developing a website where people can upload their files and > > where they can share those files and other people can download them. > > (similar to rapidshare.com model) > > [ massive snippage ] > > > So i need a cloud based system, which will push the files into > > replicated nodes automatically when demanded to those files are high, > > and when the demand is low, they will delete from other nodes and it > > will stay in only 1 node. > > > > I have looked to glusterfs and asked in their irc channel that > > problem, and got an answer from the guys that gluster cant do such a > > thing. It is only able to replicate all the files or none of the > > files. (i have to define which files to be replicated) But i need it > > the cluster software to do it automatically. > > OK, I don't think MooseFS (mfs for short) will help you either, at least not > automagically: in mfs you can specify that some files have to have more > copies than others, but you have to do it by hand. > > However, I would really advise you to store at least 2 copies to avoid data > loss in case of one of the storage nodes going kaput. > > > The only distributed/replicated filesystem that I'm aware of capable of > automatic load balancing is Ceph (http://ceph.newdream.net/) but it's > currently in alpha status and not recommended for production sites. > > Nonetheless, maybe you can reach your goals with mfs, keep reading :) > > > I am sure afer some time, i will have some trouble using client > > server which i have to loadbalance them later, but that is the next > > step which i dont mind right now. > > Well, I'm going to actually suggest something along those lines. > > Consider this: no matter how many storage nodes you will have, the bottleneck > will then be the frontend server (client to mfs) bandwidth, so you _will_ > have to do frontend load balancing, might as well do it from the stat and > avoid doing it while in production. > > With than in mind, and to avoid manually increasing files copies on the > backend (storage) nodes, you could cache the files on the frontends (asuming > the files won't change often, once uploaded) to speed up serving them. > > In short: I think your best option is a combination of load balancing/proxy > cache and MooseFS with goal >= 2. > > > Best regards, > -- > Ricardo J. Barberis > Senior SysAdmin - I+D > Dattatec.com :: Soluciones de Web Hosting > Su Hosting hecho Simple..! > > ------------------------------------------ > > Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo > son confidenciales, de uso exclusivo para el destinatario del mismo. La > divulgación y/o uso del mismo sin autorización por parte de Dattatec.com > queda prohibida. Dattatec.com no se hace responsable del mensaje por la > falsificación y/o alteración del mismo. > De no ser Ud. el destinatario del mismo y lo ha recibido por error, por > favor notifique al remitente y elimínelo de su sistema. > > Confidentiality Note: This message and any attachments (the message) are > confidential and intended solely for the addressees. Any unauthorised use > or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be > liable for the message if altered or falsified. > If you are not the intended addressee of this message, please cancel it > immediately and inform the sender. > > Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem > conter dados confidenciais ou privilegiados. Se você os recebeu por engano > ou não é um dos destinatários aos quais ela foi endereçada, por favor > destrua-a e a todos os seus eventuais anexos ou copias realizadas, > imediatamente. > É proibida a retenção, distribuição, divulgação ou utilização de quaisquer > informações aqui contidas. Por favor, informe-nos sobre o recebimento > indevido desta mensagem, retornando-a para o autor. > > ------------------------------------------------------------------------------ > > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Ricardo J. B. <ric...@da...> - 2010-05-26 15:27:45
|
El Martes 25 May 2010, Metin Akyalı escribió: > Hello, Hi! > I am looking for a solution for a project i am working on. > > We are developing a website where people can upload their files and > where they can share those files and other people can download them. > (similar to rapidshare.com model) [ massive snippage ] > So i need a cloud based system, which will push the files into > replicated nodes automatically when demanded to those files are high, > and when the demand is low, they will delete from other nodes and it > will stay in only 1 node. > > I have looked to glusterfs and asked in their irc channel that > problem, and got an answer from the guys that gluster cant do such a > thing. It is only able to replicate all the files or none of the > files. (i have to define which files to be replicated) But i need it > the cluster software to do it automatically. OK, I don't think MooseFS (mfs for short) will help you either, at least not automagically: in mfs you can specify that some files have to have more copies than others, but you have to do it by hand. However, I would really advise you to store at least 2 copies to avoid data loss in case of one of the storage nodes going kaput. The only distributed/replicated filesystem that I'm aware of capable of automatic load balancing is Ceph (http://ceph.newdream.net/) but it's currently in alpha status and not recommended for production sites. Nonetheless, maybe you can reach your goals with mfs, keep reading :) > I am sure afer some time, i will have some trouble using client > server which i have to loadbalance them later, but that is the next > step which i dont mind right now. Well, I'm going to actually suggest something along those lines. Consider this: no matter how many storage nodes you will have, the bottleneck will then be the frontend server (client to mfs) bandwidth, so you _will_ have to do frontend load balancing, might as well do it from the stat and avoid doing it while in production. With than in mind, and to avoid manually increasing files copies on the backend (storage) nodes, you could cache the files on the frontends (asuming the files won't change often, once uploaded) to speed up serving them. In short: I think your best option is a combination of load balancing/proxy cache and MooseFS with goal >= 2. Best regards, -- Ricardo J. Barberis Senior SysAdmin - I+D Dattatec.com :: Soluciones de Web Hosting Su Hosting hecho Simple..! ------------------------------------------ Nota de confidencialidad: Este mensaje y los archivos adjuntos al mismo son confidenciales, de uso exclusivo para el destinatario del mismo. La divulgación y/o uso del mismo sin autorización por parte de Dattatec.com queda prohibida. Dattatec.com no se hace responsable del mensaje por la falsificación y/o alteración del mismo. De no ser Ud. el destinatario del mismo y lo ha recibido por error, por favor notifique al remitente y elimínelo de su sistema. Confidentiality Note: This message and any attachments (the message) are confidential and intended solely for the addressees. Any unauthorised use or dissemination is prohibited by Dattatec.com. Dattatec.com shall not be liable for the message if altered or falsified. If you are not the intended addressee of this message, please cancel it immediately and inform the sender. Nota de Confidencialidade: Esta mensagem e seus eventuais anexos podem conter dados confidenciais ou privilegiados. Se você os recebeu por engano ou não é um dos destinatários aos quais ela foi endereçada, por favor destrua-a e a todos os seus eventuais anexos ou copias realizadas, imediatamente. É proibida a retenção, distribuição, divulgação ou utilização de quaisquer informações aqui contidas. Por favor, informe-nos sobre o recebimento indevido desta mensagem, retornando-a para o autor. |
From: Metin A. <met...@gm...> - 2010-05-25 16:03:50
|
Hello, I am looking for a solution for a project i am working on. We are developing a website where people can upload their files and where they can share those files and other people can download them. (similar to rapidshare.com model) Problem is, some files can be demanded much more than other files. And we cant replicate every file on other nodes, the cost would increase exponentialy. The scenerio is like: Simon has uploaded his birthday video and shared it with all of his friends, He has uploaded it to project.com and it was stored in one of the server in the cluster which has 100mbit connection. Problem is, once all of Smion's friends want to download the file, they cant download it since the bottleneck here is 100mbit which is 12.5MB per second, but he got 1000 friends trying to download the video file and they can only download 12.5KB per second which is very very bad. I am not taking into account that the overhead in the hdd. Thus, i need to find a way to replicate only demanded files to scale the network and serve the files without problem. (at least 200KB/sec) My network infrastrucre is as follows: I will have client and storage nodes. For client nodes i will use 1GBIT bandwidth with enough amount of ram and cpu, and that server will be the client. And they would be connected to 4 Nodes of storage servers that each of has 100mbit connection. 1gbit server can handle the 1000 users traffic if one of storage node can stream more than 15MB per second to my 1gbit (client) server and visitor will stream directly from client server instead of storage nodes. I can do it by replicating the file into 2 nodes . But i dont want to replicate all files uploadded to my network to my nodes since it is costing much more. I think and i am sure that somebody has same error in past and they have developed a solution to this problem. So i need a cloud based system, which will push the files into replicated nodes automatically when demanded to those files are high, and when the demand is low, they will delete from other nodes and it will stay in only 1 node. I have looked to glusterfs and asked in their irc channel that problem, and got an answer from the guys that gluster cant do such a thing. It is only able to replicate all the files or none of the files. (i have to define which files to be replicated) But i need it the cluster software to do it automatically. I would use 1gbit client servers and 100mbit storage servers. All the servers will be in same DC. I will rent the servers and i dont own my own DC house. Reason i am choosing 1gbit server as the client is i wont have too much 1gbit server, but i will have many stogage nodes, and 1gbit server is very expensive but 100mbit is not so. I am sure afer some time, i will have some trouble using client server which i have to loadbalance them later, but that is the next step which i dont mind right now. I would be happy to use open source solutions like (which i searched) glusterfs, gfs, google file system, rdbd, parascale, cloudstore,but i really couldnt find which is the best way for me. I thought it is best way to listen other people's experiences'. If you might help i will be happy. (instead of recommending me using amazon s3 :) ) Thanks. |
From: Ryszard Ł. <ry...@gm...> - 2010-05-25 07:07:32
|
Hi. It is pretty easy to learn about MooseFS features, but I miss some document with MooseFS use cases. Are you able to provide some sort of such a list? I mean brief listing of success-stories, without much details, but proven in practice. TIA, R. -- "First they ignore you. Then they laugh at you. Then they fight you. Then you win." - Mohandas Gandhi. |
From: Michał B. <mic...@ge...> - 2010-05-21 11:18:04
|
Hi! On one of our deployments we have almost 30 million of files where the master server has 16GB RAM from which 10GB are occupied by mfsmaster. We do not observe any problems with metadata testing, although we could think of an option of reducing the speed of tests. Now the system makes the whole testing during one hour which by 10 million files means testing about 3,000 inodes per second. For every save operation there is a separate save process initialized which does the operation in the background, usually on different processor core so it should not slow down the master itself. Of course saving metada for 30 million files takes some time (about 50 seconds) but we do not encounter any problems during saving. We think 6GB RAM may be too little for 10+ million files. What do you exactly mean by 10+? Is it 10,5 million or maybe 15? If you could also give us more detailed information, maybe some screens from mfscgiserv? Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: Shen Guowen [mailto:sh...@ui...] Sent: Friday, May 21, 2010 10:31 AM To: moo...@li... Cc: 'jiliang'; 'Gavin Huang' Subject: [Moosefs-users] [MooseFS-user] Master server is overloaded for storing metadate periodically Hi moosefs developer, The master server is storing metadata periodically. However, after testing files increases up to 10+ million, this process is becoming so time-consuming, that the master cannot provides services normally. it is overloaded by metadate storing. And from the mfscgiserv, as the increasing files writes into, the spent time of metadata storing is becoming longer. Besides, during each fs_storeall(), much percentage of fsnodes and chunks are unchanged, for that most operations are read. Maybe lots of repeated saving items are not necessary, but I do not know how to reduce them. I think it may will be better to refresh the metadata in another sub-master server by combining the original metadata and changlogs, and remove the periodical storing from the master. Do you have any solutions or opinions about this issue? Thanks! OS: CentOS 5.3 Hardware: CPU xeon 5410 RAM: 6GB Version: mfs-1.6.15 -- Yours sincerely, Guowen Shen |
From: Shen G. <sh...@ui...> - 2010-05-21 08:46:32
|
Hi moosefs developer, The master server is storing metadata periodically. However, after testing files increases up to 10+ million, this process is becoming so time-consuming, that the master cannot provides services normally. it is overloaded by metadate storing. And from the mfscgiserv, as the increasing files writes into, the spent time of metadata storing is becoming longer. Besides, during each fs_storeall(), much percentage of fsnodes and chunks are unchanged, for that most operations are read. Maybe lots of repeated saving items are not necessary, but I do not know how to reduce them. I think it may will be better to refresh the metadata in another sub-master server by combining the original metadata and changlogs, and remove the periodical storing from the master. Do you have any solutions or opinions about this issue? Thanks! OS: CentOS 5.3 Hardware: CPU xeon 5410 RAM: 6GB Version: mfs-1.6.15 -- Yours sincerely, Guowen Shen |
From: Michał B. <mic...@ge...> - 2010-05-17 11:15:16
|
Hi! Unfortunately implementing block chunk size as a parameter would demand too many changes in the system but we have some other ideas on how to avoid wasting space by storing small files which we think to implement. Kind regards Michał Borychowski From: lwxian_aha [mailto:lwx...@16...] Sent: Monday, May 17, 2010 3:03 AM To: Michał_Borychowski Cc: moosefs-users Subject: Re: [Moosefs-users] chunkserver can't connect to masterserver thanks for you help! I think,Why not make the "block size of 64KiB" be a Parameter of the MFS system; there will result of less waste of space for " small file "; Many "hard-coded " values In MFS Can Changed to be a parameter of config file, So the system will be more flexible; Regards lwxian_aha 2010-05-14 _____ lwxian_aha _____ 发件人: Michał_Borychowski 发送时间: 2010-05-14 13:34:12 收件人: 'lwxian_aha' 抄送: 'moosefs-users' 主题: RE: [Moosefs-users] chunkserver can't connect to masterserver From: lwxian_aha [mailto:lwx...@16...] Sent: Thursday, May 13, 2010 8:22 AM To: Michał_Borychowski Cc: moosefs-users Subject: Re: [Moosefs-users] chunkserver can't connect to masterserver with you suggestions, I have fixes the problem.thanks!!! [MB] We are very happy that the patch has helped you! On my product system ,there is a lots of small files (less then 10K), so there are about 5,000,000 chunks with only 1.4T DATA; I make suggestions : 1, the first way :you can separate the "registration packet " for serveral times ,every time limit to 500,000; so we will have no limit with the amount of chunks every chunkserver; [MB] That’s what we thought of. Thank you for the suggestion. 2, the second way : maybe you can store several files in one chunks(every chunk limits to 64M)? so you can reduce the amount of chunks? [MB] It looks interesting but in case of a modification of a file in “the middle” of the chunk it would bring a big problem Here ,I have another question: on my chunkserver disk,I have see the smallest file of chunks is 70656, and my smallest source file is only about 4K; why the chunk file size is substantially exceeds the soursefile; [MB] The system was initially designed for keeping large amounts (like several thousands) of very big files (of tens of gigabytes) and has a hard-coded chunk size of 64MiB and block size of 64KiB. That’s why even a small file will occupy 64KiB plus additionally 4KiB of checksums and 1KiB for the header. The whole transfer which takes place in the system is done in blocks of 64KiB. However it doesn’t have any impact on the performance (in a normal filesystem it is also usual to read ahead some superfluous data). The issue regarding the occupied space is really more significant, but in our opinion it is still negligible. Let’s take 25 million files with a goal set to 2, so we have about 50 million of “last” chunks. If in every chunk we lose even 64KiB we will have an overall waste of 3.2TB which nowadays should not be a very big concern. In your case 10,000 files with goal=2 would produce maximally 1.2 GB of wasted space. It is still perfect to keep source files on the MooseFS system if they are going to be served somewhere or just stored and developed there. If you need any further assistance please let us know. Regards Michał Borychowski thanks !! 2010-05-13 _____ lwxian_aha |
From: l. <lwx...@16...> - 2010-05-17 01:04:14
|
thanks for you help! I think,Why not make the "block size of 64KiB" be a Parameter of the MFS system; there will result of less waste of space for " small file "; Many "hard-coded " values In MFS Can Changed to be a parameter of config file, So the system will be more flexible; Regards lwxian_aha 2010-05-14 lwxian_aha 发件人: Michał_Borychowski 发送时间: 2010-05-14 13:34:12 收件人: 'lwxian_aha' 抄送: 'moosefs-users' 主题: RE: [Moosefs-users] chunkserver can't connect to masterserver From: lwxian_aha [mailto:lwx...@16...] Sent: Thursday, May 13, 2010 8:22 AM To: Michał_Borychowski Cc: moosefs-users Subject: Re: [Moosefs-users] chunkserver can't connect to masterserver with you suggestions, I have fixes the problem.thanks!!! [MB] We are very happy that the patch has helped you! On my product system ,there is a lots of small files (less then 10K), so there are about 5,000,000 chunks with only 1.4T DATA; I make suggestions : 1, the first way :you can separate the "registration packet " for serveral times ,every time limit to 500,000; so we will have no limit with the amount of chunks every chunkserver; [MB] That’s what we thought of. Thank you for the suggestion. 2, the second way : maybe you can store several files in one chunks(every chunk limits to 64M)? so you can reduce the amount of chunks? [MB] It looks interesting but in case of a modification of a file in “the middle” of the chunk it would bring a big problem Here ,I have another question: on my chunkserver disk,I have see the smallest file of chunks is 70656, and my smallest source file is only about 4K; why the chunk file size is substantially exceeds the soursefile; [MB] The system was initially designed for keeping large amounts (like several thousands) of very big files (of tens of gigabytes) and has a hard-coded chunk size of 64MiB and block size of 64KiB. That’s why even a small file will occupy 64KiB plus additionally 4KiB of checksums and 1KiB for the header. The whole transfer which takes place in the system is done in blocks of 64KiB. However it doesn’t have any impact on the performance (in a normal filesystem it is also usual to read ahead some superfluous data). The issue regarding the occupied space is really more significant, but in our opinion it is still negligible. Let’s take 25 million files with a goal set to 2, so we have about 50 million of “last” chunks. If in every chunk we lose even 64KiB we will have an overall waste of 3.2TB which nowadays should not be a very big concern. In your case 10,000 files with goal=2 would produce maximally 1.2 GB of wasted space. It is still perfect to keep source files on the MooseFS system if they are going to be served somewhere or just stored and developed there. If you need any further assistance please let us know. Regards Michał Borychowski thanks !! 2010-05-13 lwxian_aha |
From: l. <lwx...@16...> - 2010-05-14 06:33:20
|
thanks for you help! I think,Why not make the "block size of 64KiB" be a Parameter of the MFS system; there will result of less waste of space for " small file "; Many "hard-coded " values In MFS Can Changed to be a parameter of config file, So the system will be more flexible; Regards lwxian_aha 2010-05-14 lwxian_aha 发件人: Michał_Borychowski 发送时间: 2010-05-14 13:34:12 收件人: 'lwxian_aha' 抄送: 'moosefs-users' 主题: RE: [Moosefs-users] chunkserver can't connect to masterserver From: lwxian_aha [mailto:lwx...@16...] Sent: Thursday, May 13, 2010 8:22 AM To: Michał_Borychowski Cc: moosefs-users Subject: Re: [Moosefs-users] chunkserver can't connect to masterserver with you suggestions, I have fixes the problem.thanks!!! [MB] We are very happy that the patch has helped you! On my product system ,there is a lots of small files (less then 10K), so there are about 5,000,000 chunks with only 1.4T DATA; I make suggestions : 1, the first way :you can separate the "registration packet " for serveral times ,every time limit to 500,000; so we will have no limit with the amount of chunks every chunkserver; [MB] That’s what we thought of. Thank you for the suggestion. 2, the second way : maybe you can store several files in one chunks(every chunk limits to 64M)? so you can reduce the amount of chunks? [MB] It looks interesting but in case of a modification of a file in “the middle” of the chunk it would bring a big problem Here ,I have another question: on my chunkserver disk,I have see the smallest file of chunks is 70656, and my smallest source file is only about 4K; why the chunk file size is substantially exceeds the soursefile; [MB] The system was initially designed for keeping large amounts (like several thousands) of very big files (of tens of gigabytes) and has a hard-coded chunk size of 64MiB and block size of 64KiB. That’s why even a small file will occupy 64KiB plus additionally 4KiB of checksums and 1KiB for the header. The whole transfer which takes place in the system is done in blocks of 64KiB. However it doesn’t have any impact on the performance (in a normal filesystem it is also usual to read ahead some superfluous data). The issue regarding the occupied space is really more significant, but in our opinion it is still negligible. Let’s take 25 million files with a goal set to 2, so we have about 50 million of “last” chunks. If in every chunk we lose even 64KiB we will have an overall waste of 3.2TB which nowadays should not be a very big concern. In your case 10,000 files with goal=2 would produce maximally 1.2 GB of wasted space. It is still perfect to keep source files on the MooseFS system if they are going to be served somewhere or just stored and developed there. If you need any further assistance please let us know. Regards Michał Borychowski thanks !! 2010-05-13 lwxian_aha 发件人: Michał_Borychowski 发送时间: 2010-05-13 01:41:50 收件人: 'lwxian_aha'; 'moosefs-users' 抄送: 主题: RE: [Moosefs-users] chunkserver can't connect to masterserver You have lots of chunks on your chunkservers. Probably the registration packet for the master is too big and the master server rejects it. You need to change in the "matocsserv.c" file in "mfsmaster" folder this line: #define MaxPacketSize 50000000 into this: #define MaxPacketSize 200000000 After changing source you need to recompile the master server and restart it. Generally on one chunkserver there are about 500,000 chunks. For this amount it is necessary to allocate 6 million bytes. Limit of 50 million seemed quite reasonable. But you have about 5,000,000 chunks on one chunkserver which demands about 60 million bytes for sending information about these chunks. And you exceed this limit. That's why we suggest to increase it to 200 million bytes. Let us know if it fixes your problem. Kind regards Micha?Borychowski From: lwxian_aha [mailto:lwx...@16...] Sent: Tuesday, May 11, 2010 5:34 AM To: moosefs-users Subject: [Moosefs-users] chunkserver can't connect to masterserver today I have new trouble with my new MFS system,chunkserver can't connect to masterserver; my MFS system consist of one masterserver,three chunkserver,every chunkserver with 7.2T diskspace; about 1.4T data and about 7 million files ,every file with 2 copies; MFS version is 1.6.11 OS version CENTOS 5.0 FS is ext3; Following is the error message: [root@localhost mfs]# tail /var/log/messages May 11 10:57:00 localhost mfsmaster[25802]: server 1 (ip: 192.168.10.23, port: 9422): usedspace: 33326616576 (31.04 GiB), totalspace: 140025790464 (130.41 GiB), usage: 23.80% May 11 10:57:00 localhost mfsmaster[25802]: total: usedspace: 33326616576 (31.04 GiB), totalspace: 140025790464 (130.41 GiB), usage: 23.80% May 11 10:57:03 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:03 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 11 10:57:08 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:08 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 11 10:57:13 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:13 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 11 10:57:18 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:18 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) what's happen ?need you help! thanks a lot's 2010-05-05 ________________________________________ lwxian_aha |
From: Michał B. <mic...@ge...> - 2010-05-14 05:34:19
|
From: lwxian_aha [mailto:lwx...@16...] Sent: Thursday, May 13, 2010 8:22 AM To: Michał_Borychowski Cc: moosefs-users Subject: Re: [Moosefs-users] chunkserver can't connect to masterserver with you suggestions, I have fixes the problem.thanks!!! [MB] We are very happy that the patch has helped you! On my product system ,there is a lots of small files (less then 10K), so there are about 5,000,000 chunks with only 1.4T DATA; I make suggestions : 1, the first way :you can separate the "registration packet " for serveral times ,every time limit to 500,000; so we will have no limit with the amount of chunks every chunkserver; [MB] That’s what we thought of. Thank you for the suggestion. 2, the second way : maybe you can store several files in one chunks(every chunk limits to 64M)? so you can reduce the amount of chunks? [MB] It looks interesting but in case of a modification of a file in “the middle” of the chunk it would bring a big problem Here ,I have another question: on my chunkserver disk,I have see the smallest file of chunks is 70656, and my smallest source file is only about 4K; why the chunk file size is substantially exceeds the soursefile; [MB] The system was initially designed for keeping large amounts (like several thousands) of very big files (of tens of gigabytes) and has a hard-coded chunk size of 64MiB and block size of 64KiB. That’s why even a small file will occupy 64KiB plus additionally 4KiB of checksums and 1KiB for the header. The whole transfer which takes place in the system is done in blocks of 64KiB. However it doesn’t have any impact on the performance (in a normal filesystem it is also usual to read ahead some superfluous data). The issue regarding the occupied space is really more significant, but in our opinion it is still negligible. Let’s take 25 million files with a goal set to 2, so we have about 50 million of “last” chunks. If in every chunk we lose even 64KiB we will have an overall waste of 3.2TB which nowadays should not be a very big concern. In your case 10,000 files with goal=2 would produce maximally 1.2 GB of wasted space. It is still perfect to keep source files on the MooseFS system if they are going to be served somewhere or just stored and developed there. If you need any further assistance please let us know. Regards Michał Borychowski thanks !! 2010-05-13 _____ lwxian_aha _____ 发件人: Michał_Borychowski 发送时间: 2010-05-13 01:41:50 收件人: 'lwxian_aha'; 'moosefs-users' 抄送: 主题: RE: [Moosefs-users] chunkserver can't connect to masterserver You have lots of chunks on your chunkservers. Probably the registration packet for the master is too big and the master server rejects it. You need to change in the "matocsserv.c" file in "mfsmaster" folder this line: #define MaxPacketSize 50000000 into this: #define MaxPacketSize 200000000 After changing source you need to recompile the master server and restart it. Generally on one chunkserver there are about 500,000 chunks. For this amount it is necessary to allocate 6 million bytes. Limit of 50 million seemed quite reasonable. But you have about 5,000,000 chunks on one chunkserver which demands about 60 million bytes for sending information about these chunks. And you exceed this limit. That's why we suggest to increase it to 200 million bytes. Let us know if it fixes your problem. Kind regards Micha?Borychowski From: lwxian_aha [mailto:lwx...@16...] Sent: Tuesday, May 11, 2010 5:34 AM To: moosefs-users Subject: [Moosefs-users] chunkserver can't connect to masterserver today I have new trouble with my new MFS system,chunkserver can't connect to masterserver; my MFS system consist of one masterserver,three chunkserver,every chunkserver with 7.2T diskspace; about 1.4T data and about 7 million files ,every file with 2 copies; MFS version is 1.6.11 OS version CENTOS 5.0 FS is ext3; Following is the error message: [root@localhost mfs]# tail /var/log/messages May 11 10:57:00 localhost mfsmaster[25802]: server 1 (ip: 192.168.10.23, port: 9422): usedspace: 33326616576 (31.04 GiB), totalspace: 140025790464 (130.41 GiB), usage: 23.80% May 11 10:57:00 localhost mfsmaster[25802]: total: usedspace: 33326616576 (31.04 GiB), totalspace: 140025790464 (130.41 GiB), usage: 23.80% May 11 10:57:03 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:03 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 11 10:57:08 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:08 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 11 10:57:13 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:13 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 11 10:57:18 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:18 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) what's happen ?need you help! thanks a lot's 2010-05-05 ________________________________________ lwxian_aha |
From: l. <lwx...@16...> - 2010-05-13 06:23:48
|
with you suggestions, I have fixes the problem.thanks!!! On my product system ,there is a lots of small files (less then 10K), so there are about 5,000,000 chunks with only 1.4T DATA; I make suggestions : 1, the first way :you can separate the "registration packet " for serveral times ,every time limit to 500,000; so we will have no limit with the amount of chunks every chunkserver; 2, the second way : maybe you can store several files in one chunks(every chunk limits to 64M)? so you can reduce the amount of chunks? Here ,I have another question: on my chunkserver disk,I have see the smallest file of chunks is 70656, and my smallest source file is only about 4K; why the chunk file size is substantially exceeds the soursefile; thanks !! 2010-05-13 lwxian_aha 发件人: Michał_Borychowski 发送时间: 2010-05-13 01:41:50 收件人: 'lwxian_aha'; 'moosefs-users' 抄送: 主题: RE: [Moosefs-users] chunkserver can't connect to masterserver You have lots of chunks on your chunkservers. Probably the registration packet for the master is too big and the master server rejects it. You need to change in the "matocsserv.c" file in "mfsmaster" folder this line: #define MaxPacketSize 50000000 into this: #define MaxPacketSize 200000000 After changing source you need to recompile the master server and restart it. Generally on one chunkserver there are about 500,000 chunks. For this amount it is necessary to allocate 6 million bytes. Limit of 50 million seemed quite reasonable. But you have about 5,000,000 chunks on one chunkserver which demands about 60 million bytes for sending information about these chunks. And you exceed this limit. That's why we suggest to increase it to 200 million bytes. Let us know if it fixes your problem. Kind regards Micha?Borychowski From: lwxian_aha [mailto:lwx...@16...] Sent: Tuesday, May 11, 2010 5:34 AM To: moosefs-users Subject: [Moosefs-users] chunkserver can't connect to masterserver today I have new trouble with my new MFS system,chunkserver can't connect to masterserver; my MFS system consist of one masterserver,three chunkserver,every chunkserver with 7.2T diskspace; about 1.4T data and about 7 million files ,every file with 2 copies; MFS version is 1.6.11 OS version CENTOS 5.0 FS is ext3; Following is the error message: [root@localhost mfs]# tail /var/log/messages May 11 10:57:00 localhost mfsmaster[25802]: server 1 (ip: 192.168.10.23, port: 9422): usedspace: 33326616576 (31.04 GiB), totalspace: 140025790464 (130.41 GiB), usage: 23.80% May 11 10:57:00 localhost mfsmaster[25802]: total: usedspace: 33326616576 (31.04 GiB), totalspace: 140025790464 (130.41 GiB), usage: 23.80% May 11 10:57:03 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:03 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 11 10:57:08 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:08 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 11 10:57:13 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:13 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) May 11 10:57:18 localhost mfsmaster[25802]: CS(192.168.10.21) packet too long (56130653/50000000) May 11 10:57:18 localhost mfsmaster[25802]: chunkserver disconnected - ip: 192.168.10.21, port: 0, usedspace: 0 (0.00 GiB), totalspace: 0 (0.00 GiB) what's happen ?need you help! thanks a lot's 2010-05-05 ________________________________________ lwxian_aha |