You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(20) |
Feb
(11) |
Mar
(11) |
Apr
(9) |
May
(22) |
Jun
(85) |
Jul
(94) |
Aug
(80) |
Sep
(72) |
Oct
(64) |
Nov
(69) |
Dec
(89) |
2011 |
Jan
(72) |
Feb
(109) |
Mar
(116) |
Apr
(117) |
May
(117) |
Jun
(102) |
Jul
(91) |
Aug
(72) |
Sep
(51) |
Oct
(41) |
Nov
(55) |
Dec
(74) |
2012 |
Jan
(45) |
Feb
(77) |
Mar
(99) |
Apr
(113) |
May
(132) |
Jun
(75) |
Jul
(70) |
Aug
(58) |
Sep
(58) |
Oct
(37) |
Nov
(51) |
Dec
(15) |
2013 |
Jan
(28) |
Feb
(16) |
Mar
(25) |
Apr
(38) |
May
(23) |
Jun
(39) |
Jul
(42) |
Aug
(19) |
Sep
(41) |
Oct
(31) |
Nov
(18) |
Dec
(18) |
2014 |
Jan
(17) |
Feb
(19) |
Mar
(39) |
Apr
(16) |
May
(10) |
Jun
(13) |
Jul
(17) |
Aug
(13) |
Sep
(8) |
Oct
(53) |
Nov
(23) |
Dec
(7) |
2015 |
Jan
(35) |
Feb
(13) |
Mar
(14) |
Apr
(56) |
May
(8) |
Jun
(18) |
Jul
(26) |
Aug
(33) |
Sep
(40) |
Oct
(37) |
Nov
(24) |
Dec
(20) |
2016 |
Jan
(38) |
Feb
(20) |
Mar
(25) |
Apr
(14) |
May
(6) |
Jun
(36) |
Jul
(27) |
Aug
(19) |
Sep
(36) |
Oct
(24) |
Nov
(15) |
Dec
(16) |
2017 |
Jan
(8) |
Feb
(13) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(10) |
Jul
(20) |
Aug
(3) |
Sep
(18) |
Oct
(8) |
Nov
|
Dec
(5) |
2018 |
Jan
(15) |
Feb
(9) |
Mar
(12) |
Apr
(7) |
May
(123) |
Jun
(41) |
Jul
|
Aug
(14) |
Sep
|
Oct
(15) |
Nov
|
Dec
(7) |
2019 |
Jan
(2) |
Feb
(9) |
Mar
(2) |
Apr
(9) |
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(6) |
Oct
(1) |
Nov
(12) |
Dec
(2) |
2020 |
Jan
(2) |
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
(4) |
Jul
(4) |
Aug
(1) |
Sep
(18) |
Oct
(2) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
(5) |
Oct
(5) |
Nov
(3) |
Dec
|
2022 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Thomas S H. <tha...@gm...> - 2010-12-03 17:04:57
|
This is the second time a chunkserver has issued this type of failure in out environment, after giving this log message the chunkserver does not crash, but all files on the chunk become unavailable and it shows %0 usage on the mfsmaster Dec 3 14:58:08 localhost mfschunkserver[6969]: testing chunk: /mnt/moose1/6C/chunk_000000000008F96C_00000001.mfs Dec 3 14:58:18 localhost mfschunkserver[6969]: testing chunk: /mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs Dec 3 14:58:18 localhost mfschunkserver[6969]: chunk_readcrc: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - wrong id/version in header (000000000001BB0D_00000000) Dec 3 14:58:18 localhost mfschunkserver[6969]: hdd_io_begin: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - read error: Unknown error Dec 3 14:58:28 localhost mfschunkserver[6969]: testing chunk: /mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs Dec 3 14:58:28 localhost mfschunkserver[6969]: chunk_readcrc: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - wrong id/version in header (000000000001BB0D_00000000) Dec 3 14:58:28 localhost mfschunkserver[6969]: hdd_io_begin: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - read error: Unknown error Dec 3 14:58:38 localhost mfschunkserver[6969]: testing chunk: /mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs Dec 3 14:58:38 localhost mfschunkserver[6969]: chunk_readcrc: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - wrong id/version in header (000000000001BB0D_00000000) Dec 3 14:58:38 localhost mfschunkserver[6969]: hdd_io_begin: file:/mnt/moose1/0D/chunk_000000000001BB0D_00000001.mfs - read error: Unknown error Dec 3 14:58:38 localhost mfschunkserver[6969]: 3 errors occurred in 60 seconds on folder: /mnt/moose1/ Dec 3 14:58:39 localhost mfschunkserver[6969]: replicator: hdd_create status: 21 I am running the prerelease of 1.6.18 on Ubuntu 10.04. After restarting the chunkserver everything comes back online without problems. Any ideas as to what could be causing this? -Tom Hatch |
From: Roast <zha...@gm...> - 2010-12-03 06:56:54
|
Hi,丁赞! Can you tell me more about your environment? 2010/12/1 丁赞 <di...@ba...> > Hi all > > > > FUSE is a accept+fock module, which means if N clients(500+ httpd > threads in our enviroment) read data from mfs through one mfs_mount > simultaneously, FUSE could create N threads to handle the requests. This > mechanism spent much more time on context siwtch,which made apache > response time much more longer in our enviroment。*SO anyone had tried to > promote the fuse or mfs_client performace? Using threads pool or user level > cache would work???* > > * * > > > > BTW:I am a new guy here:) > > > > Thanks all > > > > DingZan @baidu.com > > > ------------------------------------------------------------------------------ > Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! > Tap into the largest installed PC base & get more eyes on your game by > optimizing for Intel(R) Graphics Technology. Get started today with the > Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. > http://p.sf.net/sfu/intelisp-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > -- The time you enjoy wasting is not wasted time! |
From: Laurent W. <lw...@hy...> - 2010-12-02 16:07:38
|
On Thu, 2 Dec 2010 18:43:22 +0300 "Leonid Satanovsky" <leo...@ar...> wrote: > Greetings! > The question is: > can we use MooseFS as a storage for Cyrus IMAP server? > In its' docs it is said that it extensively uses file locking (through flock > and/or fcntl system calls). > As I understand, it is just a matter of MooseFS support for this locking > mechanisms. > Is it already available and if not, for what release is it planned? Hi Leonid, file locks are not supported yet. It's on the roadmap planned for 1.7, unknown release date. -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: Leonid S. <leo...@ar...> - 2010-12-02 16:02:55
|
Greetings! The question is: can we use MooseFS as a storage for Cyrus IMAP server? In its' docs it is said that it extensively uses file locking (through flock and/or fcntl system calls). As I understand, it is just a matter of MooseFS support for this locking mechanisms. Is it already available and if not, for what release is it planned? Best regards, Leonid. |
From: jose m. <let...@us...> - 2010-12-01 19:21:29
|
El mié, 01-12-2010 a las 08:59 +0100, Michał Borychowski escribió: > Hola Jose! > > Yes, that's right - changing the goal for files in trash resets their timer. We will look into it further and probably make a patch. > > > Saludos > Michal > * Thanks. |
From: Alexander A. <akh...@ri...> - 2010-12-01 12:10:55
|
Hi! Grate thank you for this post. I had some similar unexplained bugs with Jumbo-Frames. I will try to fix them thanks to your hint. wbr Alexander Akhobadze ====================================================== Hi, I just wanted to give a heads up if anyone gets into the same problems as we did here. Our new setup is like this: 1 x Master (Ubuntu 10.04) 1 x Metalogger (Ubuntu 10.04) 10 x Chunkservers (Ubuntu 10.04) And then we have for testing 8 CentOS 5.5 servers using the MFS. All Ubuntu servers have Broadcom BCM5708 network cards and the CentOS blades have BCM5709 and this is all connected through Cisco 3560 and 2960 switches. Everything was working fine after the initial install and then I set Jumbo-Frames on the switches and changed the MTU on all the servers to 9000, and by that point I could only get a listing of files but we were unable to read the contents of large files on MFS mounts from the CentOS machines, but if the filesystem was mounted from the Master it worked fine. Direct communication seemed to work fine, ping with different size packets, ssh and other services did not fail. After a couple of days looking at this I found that the operating systems have different driver settings for atleast Broadcom cards, while Ubuntu has a switch called "generic segmentation offload" set by default to on, CentOS has the switch set to off, by changing this setting on the CentOS machines it now runs fine just like before. To set it you use "ethtool -K gso on" /Oli Ólafur Osvaldsson |
From: Ólafur Ó. <osv...@ne...> - 2010-12-01 10:58:17
|
Hi, I just wanted to give a heads up if anyone gets into the same problems as we did here. Our new setup is like this: 1 x Master (Ubuntu 10.04) 1 x Metalogger (Ubuntu 10.04) 10 x Chunkservers (Ubuntu 10.04) And then we have for testing 8 CentOS 5.5 servers using the MFS. All Ubuntu servers have Broadcom BCM5708 network cards and the CentOS blades have BCM5709 and this is all connected through Cisco 3560 and 2960 switches. Everything was working fine after the initial install and then I set Jumbo-Frames on the switches and changed the MTU on all the servers to 9000, and by that point I could only get a listing of files but we were unable to read the contents of large files on MFS mounts from the CentOS machines, but if the filesystem was mounted from the Master it worked fine. Direct communication seemed to work fine, ping with different size packets, ssh and other services did not fail. After a couple of days looking at this I found that the operating systems have different driver settings for atleast Broadcom cards, while Ubuntu has a switch called "generic segmentation offload" set by default to on, CentOS has the switch set to off, by changing this setting on the CentOS machines it now runs fine just like before. To set it you use "ethtool -K gso on" /Oli -- Ólafur Osvaldsson System Administrator Nethonnun ehf. e-mail: osv...@ne... phone: +354 517 3418 |
From: Michał B. <mic...@ge...> - 2010-12-01 08:06:29
|
Hi! You have here some problem with starting the chunkserver. But for sure this has no connection with the TIMEMODE_RUNONCE constant (it says that after clock time change sth has to be run once and not 60 times). We also had some very rare cases that while starting master server, chunkserver got hung up. In 1.6.18 this problem should be eliminated. Kind regards Michal From: kuer ku [mailto:ku...@gm...] Sent: Monday, November 29, 2010 12:18 PM To: moo...@li... Subject: [Moosefs-users] how many times chunckserver will retry when disconnecting from metaserver ? Hi, all, I deloyed a mfs-1.6.15 in my environment, today I found a problem. The appearance is one of mfsmount (FUSE) complained that : Nov 29 18:26:11 storage04 mfsmount[32233]: file: 43, index: 0 - can't connect to proper chunkserver (try counter: 29) I donot know which chunkserver cause this. ??? On web interface, I found storage01, one of chunkservers, is not in the server list. and on storage01, there are some logs in /var/log/messages : Nov 29 14:43:27 storage01 mfsmount[13155]: master: connection lost (1) Nov 29 14:43:27 storage01 mfsmount[13155]: registered to master Nov 29 14:44:12 storage01 mfschunkserver[11730]: Master connection lost ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mfschunkserver found connection lost, but there no logs indicate that mfschunkserver try to reconnect with master Nov 29 15:07:44 storage01 smartd[4268]: System clock time adjusted to the past. Resetting next wakeup time. # the following log happened because I restart chunkserver forcely Nov 29 18:29:59 storage01 h*U¥2[11730]: closing *:19722 Nov 29 18:30:13 storage01 mfschunkserver[6764]: listen on *:19722 Nov 29 18:30:13 storage01 mfschunkserver[6764]: connecting ... Nov 29 18:30:13 storage01 mfschunkserver[6764]: open files limit: 10000 Nov 29 18:30:13 storage01 mfschunkserver[6764]: connected to Master and in chunkserver/masterconn.c , I found codes : 1311 main_eachloopregister(masterconn_check_hdd_reports); 1312 main_timeregister(TIMEMODE_RUNONCE,ReconnectionDelay,0,masterconn_reconnect); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ it will try to reconnect once ?????? 1313 main_destructregister(masterconn_term); 1314 main_pollregister(masterconn_desc,masterconn_serve); 1315 main_reloadregister(masterconn_reload); I think chunkserver should re-connect to master again and again, until it reachs master. but I does not find that in the code. P.S. I remember that I adjust storage01 's time by ntpdate dateserver. does this affect chunckserver so seriously ?? thanks -- kuer |
From: Michał B. <mic...@ge...> - 2010-12-01 07:59:56
|
Hola Jose! Yes, that's right - changing the goal for files in trash resets their timer. We will look into it further and probably make a patch. Saludos Michal -----Original Message----- From: jose maria [mailto:let...@us...] Sent: Saturday, November 27, 2010 4:23 PM To: moo...@li... Subject: Re: [Moosefs-users] reset time count in trash ? El vie, 26-11-2010 a las 15:57 +0100, jose maria escribió: > * I have applied for testing, to the files of the cluster a trashtime of > 12 hours, a script is executed every hour and reduces to 2 goals the > files of the trash, for one week the number of files has been increasing > in the trash and it continues without becoming stable, at present there > is 400.000.- > > * ¿Is it possible that on having applied setgoal 2, reset de time count? > > * The number of files in the reply of the secondary cluster with equal > configuration of trashtime 12 hours is trivial, without execution of the > script that applies mfssetgoal 2. > * Confirmed, I have disabled the cronjob that applies mfssetgoal to the files in trash and in 12 hours the number of files has reduced from 375.000 to 1.500. ¿any other idea to reduce goals of trashfiles? I need 1 week retained trashfiles ...... ------------------------------------------------------------------------------ Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Michał B. <mic...@ge...> - 2010-12-01 07:53:55
|
Hi Laurent! If master doesn't take full 100% of CPU it would work even better as one thread. Generally speaking multithreading is often overestimated. Please have a look at this article: http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf Regards Michał -----Original Message----- From: Laurent Wandrebeck [mailto:lw...@hy...] Sent: Friday, November 26, 2010 12:00 PM To: moo...@li... Subject: Re: [Moosefs-users] A problem of reading the same file at the same moment On Thu, 25 Nov 2010 10:55:03 +0100 Michał Borychowski <mic...@ge...> wrote: > Hi! > > As written on http://www.moosefs.org/moosefs-faq.html#goal increasing goal may only increase the reading speed under certain conditions. You can just try increasing the goal, wait for the replication and see if it helps. I'm wondering if such a behaviour could be due to mfsmaster being a monothread program. Thus, in high-load cases, the master being busy answering a request kind of queues the others, being a performance bottleneck by adding latency, preventing one than one request to be worked on at the same time. Comments ? -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: 丁赞 <di...@ba...> - 2010-12-01 04:05:28
|
Hi all FUSE is a accept+fock module, which means if N clients(500+ httpd threads in our enviroment) read data from mfs through one mfs_mount simultaneously, FUSE could create N threads to handle the requests. This mechanism spent much more time on context siwtch,which made apache response time much more longer in our enviroment。SO anyone had tried to promote the fuse or mfs_client performace? Using threads pool or user level cache would work??? BTW:I am a new guy here:) Thanks all DingZan @baidu.com |
From: Thomas S H. <tha...@gm...> - 2010-11-30 16:34:30
|
Ignore me :) We had a problem with our routes, our vms could not see all of our chunk servers. On Tue, Nov 30, 2010 at 9:00 AM, Thomas S Hatch <tha...@gm...> wrote: > I am experiencing problems with moosefs mounts inside of kvm virtual > machines. There are a number of files which I cannot read on my mfs mount > from inside a kvm virtual machine, (the read attempts hang and then issue an > IOError). But I am confident that the files are good because they can be > read without problem from an mfsmount on a bare metal system. I am running > moosefs 1.6.18 (a pre-release) on Ubuntu 10.04. > > Please let me know if there is any additional information I can send. > > -Thomas S Hatch > |
From: Thomas S H. <tha...@gm...> - 2010-11-30 16:00:34
|
I am experiencing problems with moosefs mounts inside of kvm virtual machines. There are a number of files which I cannot read on my mfs mount from inside a kvm virtual machine, (the read attempts hang and then issue an IOError). But I am confident that the files are good because they can be read without problem from an mfsmount on a bare metal system. I am running moosefs 1.6.18 (a pre-release) on Ubuntu 10.04. Please let me know if there is any additional information I can send. -Thomas S Hatch |
From: Michał B. <mic...@ge...> - 2010-11-30 10:59:41
|
We still carry out some internal tests. I believe it would be published this or next week. Regards Michał Borychowski -----Original Message----- From: Josef [mailto:pe...@p-...] Sent: Tuesday, November 30, 2010 11:20 AM To: moo...@li... Subject: [Moosefs-users] release 1.6.18 Hello, is 1.6.18 going to be released soon? I have reported a bug in 1.6.17 in cgiserver that should be repaired in 18, so I'm quite excited and would like to know an aproximate date, if I should repair it myself or wait.... Thank you, Josef ---------------------------------------------------------------------------- -- Increase Visibility of Your 3D Game App & Earn a Chance To Win $500! Tap into the largest installed PC base & get more eyes on your game by optimizing for Intel(R) Graphics Technology. Get started today with the Intel(R) Software Partner Program. Five $500 cash prizes are up for grabs. http://p.sf.net/sfu/intelisp-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Josef <pe...@p-...> - 2010-11-30 10:20:26
|
Hello, is 1.6.18 going to be released soon? I have reported a bug in 1.6.17 in cgiserver that should be repaired in 18, so I'm quite excited and would like to know an aproximate date, if I should repair it myself or wait.... Thank you, Josef |
From: Michał B. <mic...@ge...> - 2010-11-30 08:18:45
|
Hi! Kernel messages state that your disk “sdh” is broken. The other messages from MooseFS probably results from this broken disk. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: 王浩 [mailto:wan...@gm...] Sent: Monday, November 29, 2010 4:46 AM To: moo...@li... Subject: [Moosefs-users] mfschunkserver write error: Broken pipe hi, version of MooseFS:1.6.0 operating system:CentOS 5.4 files system:ext3 In /var/log/message,i found some errors like this: Nov 27 07:12:12 store-1 kernel: sd 0:0:7:0: SCSI error: return code = 0x08000002 Nov 27 07:12:12 store-1 kernel: sdh: Current: sense key: Medium Error Nov 27 07:12:12 store-1 kernel: Add. Sense: Unrecovered read error Nov 27 07:12:12 store-1 kernel: Nov 27 07:12:12 store-1 kernel: Info fld=0x58a78cb2 Nov 27 07:12:12 store-1 kernel: end_request: I/O error, dev sdh, sector 1487375538 Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Broken pipe Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Broken pipe Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Broken pipe Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer I want to know why it happened. please help me. thand you! wanghao 2010-11-29 |
From: Ricardo J. B. <ric...@da...> - 2010-11-29 16:14:07
|
El Lunes 29 Noviembre 2010, 王浩 escribió: > hi, > version of MooseFS:1.6.0 > operating system:CentOS 5.4 > files system:ext3 > > In /var/log/message,i found some errors like this: > Nov 27 07:12:12 store-1 kernel: sd 0:0:7:0: SCSI error: return code = > 0x08000002 > Nov 27 07:12:12 store-1 kernel: sdh: Current: sense key: Medium Error > Nov 27 07:12:12 store-1 kernel: Add. Sense: Unrecovered read error > Nov 27 07:12:12 store-1 kernel: > Nov 27 07:12:12 store-1 kernel: Info fld=0x58a78cb2 > Nov 27 07:12:12 store-1 kernel: end_request: I/O error, dev sdh, sector > 1487375538 It looks like your sdh is having errors, you should replace it. Check here: http://www.moosefs.org/moosefs-faq.html#add_remove http://www.moosefs.org/moosefs-faq.html#mark_for_removal If you have goal > 1 you shouldn't lose any files. > I want to know why it happened. > please help me. > thand you! Hope it helps, -- Ricardo J. Barberis Senior SysAdmin / ITI Dattatec.com :: Soluciones de Web Hosting Tu Hosting hecho Simple! |
From: kuer ku <ku...@gm...> - 2010-11-29 11:18:05
|
Hi, all, I deloyed a mfs-1.6.15 in my environment, today I found a problem. The appearance is one of mfsmount (FUSE) complained that : Nov 29 18:26:11 storage04 mfsmount[32233]: file: 43, index: 0 - can't connect to proper chunkserver (try counter: 29) I donot know which chunkserver cause this. ??? On web interface, I found storage01, one of chunkservers, is not in the server list. and on storage01, there are some logs in /var/log/messages : Nov 29 14:43:27 storage01 mfsmount[13155]: master: connection lost (1) Nov 29 14:43:27 storage01 mfsmount[13155]: registered to master Nov 29 14:44:12 storage01 mfschunkserver[11730]: Master connection lost ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mfschunkserver found connection lost, but there no logs indicate that mfschunkserver try to reconnect with master Nov 29 15:07:44 storage01 smartd[4268]: System clock time adjusted to the past. Resetting next wakeup time. # the following log happened because I restart chunkserver forcely Nov 29 18:29:59 storage01 h*U¥2[11730]: closing *:19722 Nov 29 18:30:13 storage01 mfschunkserver[6764]: listen on *:19722 Nov 29 18:30:13 storage01 mfschunkserver[6764]: connecting ... Nov 29 18:30:13 storage01 mfschunkserver[6764]: open files limit: 10000 Nov 29 18:30:13 storage01 mfschunkserver[6764]: connected to Master and in chunkserver/masterconn.c , I found codes : 1311 main_eachloopregister(masterconn_check_hdd_reports); 1312 main_timeregister(TIMEMODE_RUNONCE,ReconnectionDelay,0,masterconn_reconnect); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ it will try to reconnect once ?????? 1313 main_destructregister(masterconn_term); 1314 main_pollregister(masterconn_desc,masterconn_serve); 1315 main_reloadregister(masterconn_reload); I think chunkserver should re-connect to master again and again, until it reachs master. but I does not find that in the code. P.S. I remember that I adjust storage01 's time by ntpdate dateserver. does this affect chunckserver so seriously ?? thanks -- kuer |
From: Michał B. <mic...@ge...> - 2010-11-29 10:37:48
|
Hi Laurent! System doesn't rebalance disk load on a single chunk server. It is usually of little use. In your case new chunks would be created mainly on this 46% occupied disk and in the longer time everything will be balanced. Maybe in the future we will think about rebalancing disk loads on the level of chunk server. Regards Michal -----Original Message----- From: Laurent Wandrebeck [mailto:lw...@hy...] Sent: Thursday, November 25, 2010 10:23 AM To: moo...@li... Subject: [Moosefs-users] replicator connection lost Hi, I got a chunkserver tower box, with 12 sata disks plugged on a 3ware 9650 (jbod). A disk failed, so I halted the box, changed the disk, start mfschunkserver without the new one. formatted it, and add it back in mfshdd.cfg, then restarted mfschunkserver. replication/rebalance went on normally. At night, another disk failed on that box: Nov 25 01:43:24 msg kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive timeout detected:port=10. Nov 25 01:43:24 msg kernel: sd 0:0:9:0: WARNING: (0x06:0x002C): Command (0x35) timed out, resetting card. Nov 25 01:44:08 msg kernel: 3w-9xxx: scsi0: AEN: WARNING (0x04:0x0043): Backup DCB read error detected:port=10, error=0x204. Nov 25 01:44:09 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=0. Nov 25 01:44:09 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=4. Nov 25 01:44:09 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=8. testing chunks bla bla on several lines. Nov 25 01:50:33 msg kernel: sd 0:0:9:0: WARNING: (0x06:0x002C): Command (0x35) timed out, resetting card. Nov 25 01:51:03 msg kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x000A): Drive error detected:unit=9, port=10. Nov 25 01:51:03 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=0. Nov 25 01:51:03 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=1. Nov 25 01:51:03 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=2. Nov 25 01:51:03 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=3. Nov 25 01:51:03 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=4. Nov 25 01:51:03 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=6. Nov 25 01:51:03 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=7. Nov 25 01:51:04 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=8. Nov 25 01:51:04 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=9. Nov 25 01:51:04 msg kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x005E): Cache synchronization completed:unit=10. Nov 25 01:51:14 msg mfschunkserver[3386]: replicator: connection lost and replication/rebalance kind of stopped. the new disk is being filled at 46%, while the others on that box are around 90%. I restarted mfschunkserver, but replication/rebalance isn't starting over. Any clue before I change the second failing disk (which seems to behave normally again) ? Thanks, -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: 王浩 <wan...@gm...> - 2010-11-29 03:45:59
|
hi, version of MooseFS:1.6.0 operating system:CentOS 5.4 files system:ext3 In /var/log/message,i found some errors like this: Nov 27 07:12:12 store-1 kernel: sd 0:0:7:0: SCSI error: return code = 0x08000002 Nov 27 07:12:12 store-1 kernel: sdh: Current: sense key: Medium Error Nov 27 07:12:12 store-1 kernel: Add. Sense: Unrecovered read error Nov 27 07:12:12 store-1 kernel: Nov 27 07:12:12 store-1 kernel: Info fld=0x58a78cb2 Nov 27 07:12:12 store-1 kernel: end_request: I/O error, dev sdh, sector 1487375538 Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Broken pipe Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Broken pipe Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Broken pipe Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer Nov 27 07:12:12 store-1 mfschunkserver[2952]: (write) write error: Connection reset by peer I want to know why it happened. please help me. thand you! wanghao 2010-11-29 |
From: jose m. <let...@us...> - 2010-11-27 15:23:30
|
El vie, 26-11-2010 a las 15:57 +0100, jose maria escribió: > * I have applied for testing, to the files of the cluster a trashtime of > 12 hours, a script is executed every hour and reduces to 2 goals the > files of the trash, for one week the number of files has been increasing > in the trash and it continues without becoming stable, at present there > is 400.000.- > > * ¿Is it possible that on having applied setgoal 2, reset de time count? > > * The number of files in the reply of the secondary cluster with equal > configuration of trashtime 12 hours is trivial, without execution of the > script that applies mfssetgoal 2. > * Confirmed, I have disabled the cronjob that applies mfssetgoal to the files in trash and in 12 hours the number of files has reduced from 375.000 to 1.500. ¿any other idea to reduce goals of trashfiles? I need 1 week retained trashfiles ...... |
From: jose m. <let...@us...> - 2010-11-26 14:58:11
|
* I have applied for testing, to the files of the cluster a trashtime of 12 hours, a script is executed every hour and reduces to 2 goals the files of the trash, for one week the number of files has been increasing in the trash and it continues without becoming stable, at present there is 400.000.- * ¿Is it possible that on having applied setgoal 2, reset de time count? * The number of files in the reply of the secondary cluster with equal configuration of trashtime 12 hours is trivial, without execution of the script that applies mfssetgoal 2. * cheers. |
From: Laurent W. <lw...@hy...> - 2010-11-26 11:04:09
|
Hi, I've done a quick round of tests using /home on mfs. Firefox, pidgin, xchat, akregator, geany work fine. But Sylpheed (MUA) which chokes when I click on the button to send a mail (chokes after having typed my gpg passphrase if I asked for a signed mail). I guess It may be due to lack of support for locks. Anyway, that means mfs is unusable for home for us right now. Just so you know. Regards, -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: Laurent W. <lw...@hy...> - 2010-11-26 11:00:32
|
On Thu, 25 Nov 2010 10:55:03 +0100 Michał Borychowski <mic...@ge...> wrote: > Hi! > > As written on http://www.moosefs.org/moosefs-faq.html#goal increasing goal may only increase the reading speed under certain conditions. You can just try increasing the goal, wait for the replication and see if it helps. I'm wondering if such a behaviour could be due to mfsmaster being a monothread program. Thus, in high-load cases, the master being busy answering a request kind of queues the others, being a performance bottleneck by adding latency, preventing one than one request to be worked on at the same time. Comments ? -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: Michał B. <mic...@ge...> - 2010-11-25 09:55:29
|
Hi! As written on http://www.moosefs.org/moosefs-faq.html#goal increasing goal may only increase the reading speed under certain conditions. You can just try increasing the goal, wait for the replication and see if it helps. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: 陈轶博 [mailto:che...@dv...] Sent: Monday, November 15, 2010 8:37 AM To: moosefs-users Subject: [Moosefs-users] A problem of reading the same file at the same moment Hi I am jacky. I'm using MFS to build a movies center for myown video streaming server. And, when i work on stress testing, there is a reading problem. first,in my application, the requirment is: the case is occuring all the time that many process read a file at the same moment. So, my movies center can surpport process as more as possible to read a file at same moment. secend,please allow me to instruduce my test enviroment: hardware: master: IBM 3550 chunks and clients are the same server: cpu: inetel Xeon5520 * 2 2.6G (quad-core) mem: 16G RAID card: Adaptec 52445 disk: 450G*24, SAS nic: 3*PCI-E GE switch: Gigabit Switch H3 S9306 software: MFS version: 1.6.17 http://www.moosefs.org/download.html OS: CentOS 5.4 ,64bit, one disk FS: XFS for CentOS5.4 nic: bonding 4GE, mode=6 RAID: the other 23 disks for RAID6 mfs goal = 1 netork structure: third, the result of my testing is followed: Sequential read testing: #cat /dev/sda1 > /dev/null ..................189MB/S sda is the single disk for OS. #cat /dev/sdb1 > /dev/null....................383MB/S sdb1 is the RAID6 #dd if=/dev/sdb1 of=/dev/null bs=4M.........413MB/S Random read testing on one client: (carbon is my testing program with c, multi-thread, each thread for one random file, just read the file to a buffer, then drop the buffer) #./carbon fp=/mnt/fent fn=1000 tn=8 bs=8M------------------250MB/S #./carbon fp=/mnt/fent fn=1000 tn=16 bs=8M------------------260MB/S #./carbon fp=/mnt/fent fn=1000 tn=32 bs=8M------------------240MB/S #./carbon fp=/mnt/fent fn=1000 tn=64 bs=8M------------------260MB/S fp=path of file for reading fn=number of files tn=number of thread bs=blocksize(KB) third, the problem: there are 3 clients. when I runed {#./carbon fp=/mnt/fent fn=1000 tn=8 bs=8M} on each client, I fond that the third client(maybe anyone client), may always waiting for reading, when the 1,2 finished reading some files, the third begin to read. then, I confirmed this problem in another way: I rebuilt the envirmont with pc and the same other configration. on the each client: I run for (1 to 8 ) {#dd if=/mnt/fent?(1 to 8).ts of=/dev/null bs=4M}, and I found that: run on first client: the read speed is 70MB/S run on first and second clients: the read speed is 30~40MB/S run on 3 clients: the read speed is < 10MB/S, in my opinion, the result means, the more process (either on one client or different clients) read the same file at the same moment, the reading performance is worse. Also, I can set the goal to a bigger value to improve the performance, but, in my application, the size of each the movies file is about 3GB. Bigger goal means more storage. The biggest goal value I can suffer is 3, I'm afraid this can't sovle the reading problem for me. finally, Is there any thing I can do,except setting the goal value? 2010-11-15 _____ 陈轶博 |