You can subscribe to this list here.
2009 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
(4) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2010 |
Jan
(20) |
Feb
(11) |
Mar
(11) |
Apr
(9) |
May
(22) |
Jun
(85) |
Jul
(94) |
Aug
(80) |
Sep
(72) |
Oct
(64) |
Nov
(69) |
Dec
(89) |
2011 |
Jan
(72) |
Feb
(109) |
Mar
(116) |
Apr
(117) |
May
(117) |
Jun
(102) |
Jul
(91) |
Aug
(72) |
Sep
(51) |
Oct
(41) |
Nov
(55) |
Dec
(74) |
2012 |
Jan
(45) |
Feb
(77) |
Mar
(99) |
Apr
(113) |
May
(132) |
Jun
(75) |
Jul
(70) |
Aug
(58) |
Sep
(58) |
Oct
(37) |
Nov
(51) |
Dec
(15) |
2013 |
Jan
(28) |
Feb
(16) |
Mar
(25) |
Apr
(38) |
May
(23) |
Jun
(39) |
Jul
(42) |
Aug
(19) |
Sep
(41) |
Oct
(31) |
Nov
(18) |
Dec
(18) |
2014 |
Jan
(17) |
Feb
(19) |
Mar
(39) |
Apr
(16) |
May
(10) |
Jun
(13) |
Jul
(17) |
Aug
(13) |
Sep
(8) |
Oct
(53) |
Nov
(23) |
Dec
(7) |
2015 |
Jan
(35) |
Feb
(13) |
Mar
(14) |
Apr
(56) |
May
(8) |
Jun
(18) |
Jul
(26) |
Aug
(33) |
Sep
(40) |
Oct
(37) |
Nov
(24) |
Dec
(20) |
2016 |
Jan
(38) |
Feb
(20) |
Mar
(25) |
Apr
(14) |
May
(6) |
Jun
(36) |
Jul
(27) |
Aug
(19) |
Sep
(36) |
Oct
(24) |
Nov
(15) |
Dec
(16) |
2017 |
Jan
(8) |
Feb
(13) |
Mar
(17) |
Apr
(20) |
May
(28) |
Jun
(10) |
Jul
(20) |
Aug
(3) |
Sep
(18) |
Oct
(8) |
Nov
|
Dec
(5) |
2018 |
Jan
(15) |
Feb
(9) |
Mar
(12) |
Apr
(7) |
May
(123) |
Jun
(41) |
Jul
|
Aug
(14) |
Sep
|
Oct
(15) |
Nov
|
Dec
(7) |
2019 |
Jan
(2) |
Feb
(9) |
Mar
(2) |
Apr
(9) |
May
|
Jun
|
Jul
(2) |
Aug
|
Sep
(6) |
Oct
(1) |
Nov
(12) |
Dec
(2) |
2020 |
Jan
(2) |
Feb
|
Mar
|
Apr
(3) |
May
|
Jun
(4) |
Jul
(4) |
Aug
(1) |
Sep
(18) |
Oct
(2) |
Nov
|
Dec
|
2021 |
Jan
|
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(6) |
Aug
|
Sep
(5) |
Oct
(5) |
Nov
(3) |
Dec
|
2022 |
Jan
|
Feb
|
Mar
(3) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Davies L. <dav...@gm...> - 2011-11-04 05:51:17
|
2011/11/3 Michał Borychowski <mic...@ge...>: > Hi! > > Is this a repeatable problem? Do you know a scenario which causes this behaviour? No,I have not see it again. > > Kind regards > Michał Borychowski > MooseFS Support Manager > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > Gemius S.A. > ul. Wołoska 7, 02-672 Warszawa > Budynek MARS, klatka D > Tel.: +4822 874-41-00 > Fax : +4822 874-41-01 > > > -----Original Message----- > From: Davies Liu [mailto:dav...@gm...] > Sent: Thursday, November 03, 2011 2:47 AM > To: moo...@li... > Subject: [Moosefs-users] mfschunkserver eats 100% CPU > > Hi, > > Found one bug of mfschunkserver, it eats 100% CPU without any activities. > > strace shows: > > [pid 7754] gettimeofday({1320117856, 191716}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 192052}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 192404}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 192740}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 193063}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 193386}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 193710}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 194033}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 194893}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) > > It seems that fd 224 and 125 are ready , but it read fd 29 and 222, with -1, then fall into infinite loop. > > -- > - Davies > > ------------------------------------------------------------------------------ > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsa-sfdev2dev1 > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > -- - Davies |
From: Michał B. <mic...@ge...> - 2011-11-03 13:37:15
|
Hi! Is this a repeatable problem? Do you know a scenario which causes this behaviour? Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 -----Original Message----- From: Davies Liu [mailto:dav...@gm...] Sent: Thursday, November 03, 2011 2:47 AM To: moo...@li... Subject: [Moosefs-users] mfschunkserver eats 100% CPU Hi, Found one bug of mfschunkserver, it eats 100% CPU without any activities. strace shows: [pid 7754] gettimeofday({1320117856, 191716}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 192052}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 192404}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 192740}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 193063}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 193386}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 193710}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 194033}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 194893}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) It seems that fd 224 and 125 are ready , but it read fd 29 and 222, with -1, then fall into infinite loop. -- - Davies ------------------------------------------------------------------------------ RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Davies L. <dav...@gm...> - 2011-11-03 02:04:11
|
Hi, We found a bug in mfsmaster, in syslog: structure error - edge->parent/parent->edges (node: 2109740 ; edge: 1406108512,guessstat-2011-09-26_00000 -> 2516019) The parent 2516019 of node 2109740 not exists, It caused mfsmaster start failed and mfsmetarestore failed. It can been fixed by: e->parent = fsnodes_id_to_node(parent_id); + if (e->parent==NULL) { + // log + e->parent = fsnodes_id_to_node(1); // HACK + } if (e->parent==NULL) { When can not found the parent of a node, then put the node in / instead. After starting successfully, remove the file (unexpected node). The real reason should been found, maybe when deleting dirs, one node of its children was in unusually status, been ignored. -- - Davies |
From: Davies L. <dav...@gm...> - 2011-11-03 01:47:43
|
Hi, Found one bug of mfschunkserver, it eats 100% CPU without any activities. strace shows: [pid 7754] gettimeofday({1320117856, 191716}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 192052}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 192404}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 192740}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 193063}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 193386}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 193710}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 194033}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) [pid 7754] gettimeofday({1320117856, 194893}, NULL) = 0 [pid 7754] read(222, 0x7f397c978d08, 408) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] read(29, 0x7f397cad2268, 29368) = -1 EAGAIN (Resource temporarily unavailable) [pid 7754] poll([{fd=12, events=POLLIN}, {fd=11, events=POLLIN}, {fd=6, events=POLLIN}, {fd=9, events=POLLIN}, {fd=224, events=POLLIN|POLLOUT}, {fd=222, events=POLLIN}, {fd=125, events=POLLIN|POLLOUT}, {fd=29, events=POLLIN}], 8, 50) = 2 ([{fd=224, revents=POLLOUT}, {fd=125, revents=POLLOUT}]) It seems that fd 224 and 125 are ready , but it read fd 29 and 222, with -1, then fall into infinite loop. -- - Davies |
From: Laurent W. <lw...@hy...> - 2011-11-02 15:39:37
|
On Wed, 2 Nov 2011 20:54:56 +0800 温金超 <wen...@gm...> wrote: > I followed the instructions to install mfs master server, chunk server with > no error or warning. > > but when i try to start trunk server, it reported that: > hdd space manager: can't create lock file '/data/.lock': EACCES (Permission > denied) If you're using rpmforge package, you need to chown daemon:daemon /data (or whatever user mfschunkserver use to run). HTH, -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: 温金超 <wen...@gm...> - 2011-11-02 13:05:52
|
Soryy i forget install instructions: useradd mfs –s /sbin/nologin ./configure --prefix=/usr/local/mfs/ -with-default-user=mfs -with-default-group=mfs --enable-mfsmount make && make install 2011/11/2 温金超 <wen...@gm...> > I followed the instructions to install mfs master server, chunk server > with no error or warning. > > but when i try to start trunk server, it reported that: > hdd space manager: can't create lock file '/data/.lock': EACCES > (Permission denied) > > > Enviorment: > Linux 2.6.18-238.19.1.el5 x86_64 x86_64 x86_64 GNU/Linux > fuse2.6.5 > mfs1.6.20 > > Does anyone know how to fix it, or any hit? > > Thanks. > > > |
From: 温金超 <wen...@gm...> - 2011-11-02 12:55:24
|
I followed the instructions to install mfs master server, chunk server with no error or warning. but when i try to start trunk server, it reported that: hdd space manager: can't create lock file '/data/.lock': EACCES (Permission denied) Enviorment: Linux 2.6.18-238.19.1.el5 x86_64 x86_64 x86_64 GNU/Linux fuse2.6.5 mfs1.6.20 Does anyone know how to fix it, or any hit? Thanks. |
From: 张明富 <zha...@in...> - 2011-11-01 06:59:58
|
Hello friends: I find a problem when I read the source code of MooseFS . In the file mfsmaster/chunks.c there is a function called "chunk_do_jobs" and in the step 7c there is a judge as "if (vc+tdc>=scount && vc<c->goal && tdc>0 && vc+tdc>1)" . My question is : why use the condition "vc+tdc>=scount" and "vc<c->goal" here to restrict deleting the TDVALID chunks ? If there are 100 chunkservers in the pool and I want to remove a chunkserver called node_50 from the pool and there is a copy of a chunk with 2 goal just on node_50 . Now because of the condition "vc+tdc>=scount" the TDVALID chunks can't be deleted by the master forever ! Also the condition "vc<c->goal" is puzzled . Is it a bug ? Or it has some othe purposes ? Also the condition "tdc>0 && vc+tdc>1" is not strict . If I mark node_50 and node_51 for removal at the same time , and the copies of a chunk with 2 goal just on them . Now the number of VALID chunks is 0 and the number of TDVALID chunks is 2 . If use the condition "tdc>0 && vc+tdc>1" then the 2 TDVALID chunks will be deleted and there will be no valid copies for the chunk ! I look forward to your reply . Thanks very much . |
From: Robert S. <rsa...@ne...> - 2011-11-01 04:08:34
|
On Oct 31, 2011, at 11:24 PM, Mike <isp...@gm...> wrote: > On 11-10-24 09:25 AM, Robert Sandilands wrote: >> There are a few bottlenecks in MFS. >> >> In general MFS performs best with a high number of mfsmount's, a high >> number of mfschunkservers and a dedicated machine running mfsmaster. > Ok, let's suppose I have a collection of PC hardware with 4 effective > CPU cores per machine running as chunkservers. Will I get better > performance running a number of chunkserver processes, each handling a > subset of disks on the machine? > Yes, depending on the availability of RAM. > If so, will the best number of chunkservers be proportional to the > number of CPU cores, or the number of disk spindles? > Spindles. My guess is around 10 spindles per instance. > If I can come up with some spare hardware I may test this myself. > > |
From: Mike <isp...@gm...> - 2011-11-01 03:40:37
|
On 11-10-24 09:25 AM, Robert Sandilands wrote: > There are a few bottlenecks in MFS. > > Most of these are caused by too many connections to a single daemon. > mfsmount seems to peform badly if you make more than 5 - 10 simultaneous > reads or writes per mount. mfschunkserver also seems to perform badly if > you make more than around 10 simultaneous connections per instance. > mfsmaster seems to cause some problems if the load on the system becomes > too high. > > In general MFS performs best with a high number of mfsmount's, a high > number of mfschunkservers and a dedicated machine running mfsmaster. Ok, let's suppose I have a collection of PC hardware with 4 effective CPU cores per machine running as chunkservers. Will I get better performance running a number of chunkserver processes, each handling a subset of disks on the machine? If so, will the best number of chunkservers be proportional to the number of CPU cores, or the number of disk spindles? If I can come up with some spare hardware I may test this myself. |
From: Michał B. <mic...@ge...> - 2011-10-26 07:27:01
|
Hi! The value of 64MB is hardcoded and cannot be changed. We also do not plan to make it configurable. MooseFS was prepared with big files in mind. But it doesn't mean that every file occupies 64MB. Each chunks is divided into 1024 blocks of 64KB. Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: liangzhenfang [mailto:lia...@ba...] Sent: Wednesday, October 26, 2011 5:36 AM To: moo...@li... Cc: lia...@ba... Subject: [Moosefs-users] Is there any way to set the size of chunk? Dear, mfser Is there any way to set the size of chunk? 64M for every chunk seems too big to us . Thanks in advance Zhenfang, liang |
From: liangzhenfang <lia...@ba...> - 2011-10-26 03:49:50
|
Dear, mfser Is there any way to set the size of chunk? 64M for every chunk seems too big to us . Thanks in advance Zhenfang, liang |
From: Kristofer P. <kri...@cy...> - 2011-10-25 20:42:43
|
MooseFS was designed to match Google File System (v1). The GFS whitepaper even says its not ideal for many small files. It is most ideal for large files that are appended to. You are going to need to hope for a multi-master set up and a lot of work on the internals to MooseFS to get better performance with small files. On 10/24/2011 05:11 AM, leo...@ar... wrote: > Something is very wrong with MooseFS regarding writing 'lots of small > files'. > You can even see it 'visually': if you tar xzvf linux*.tgz the names of > the files appear much slooowly than this is in case of NFS or local > storage... > This is to say that we use quite a powerfull HW. And we have NO load on > MFS at the moment. The only thing done at the moment is the untaring the > linux kernel sources. > > I'd sincerelly appresiate any comments from the developers of the > MooseFS, the project I had big hopes for ) > Do they observe the same in their environment?.. > > > P.S. > sometimes it takes less than 100 seconds to untar, but still far beyond > NFS, where I get it in 4-7 seconds or so. > > > > > On Fri, 21 Oct 2011 00:00:06 +0800, Flow Jiang wrote: >> We are experiencing the same issue when dealing with a lot of small >> files. >> >> One observation is, when scp a large file through Gigabyte link onto >> our NFS server (Served by a VERY old Sun Raid5 Disk Array), there >> seems to have a large write cache to allow the speed reported at the >> first several seconds greater than the writing limit of hard drives. >> But when copying the same file onto MFS (Served by SAS 300G 10k disk >> on Dell PE 1950), the speed never reaches the hard drive limit. And >> our NFS server do perform better than MFS when copying large amount >> of >> source code files. >> >> I'm not sure if this issue is only related to our SW/HW >> configuration. But seems the write caches works better with NFS than >> MFS. Does any one know how does write cache of MFS work? >> >> Thanks >> Flow >> >> >> On 10/17/2011 10:46 PM, leo...@ar... wrote: >>> Greetings! >>> Resently I've started performance benchmarking for MooseFS, >>> _____________ __ _ _ >>> Chunkservers: >>> CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz >>> RAM: 12G >>> HDD: storage used for mfs chunks is raid0 of two WD 7200rpm caviar >>> black disks. >>> OS: OpenSuSE 11.3 >>> Chunk servers number: 5 >>> >>> ------------------------------- >>> Master: >>> >>> CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz >>> OS: OpenSuSE 11.3 >>> RAM: 12G >>> >>> -------------------------------- >>> Client: >>> One of chunkservers. >>> >>> --------------------------------------------------- >>> LAN: >>> 1Gbit/s LAN. >>> >>> ____________ _ _ __ >>> Problem; >>> I've experimented alot with bonnie, fio, iozone and others in >>> testing >>> other storages... We have people working with the source code here, >>> so >>> we need good random I/O for small files with moderate blocksize from >>> 8k >>> to 128k... Comparative testing of other storage solutions involving >>> ZFS, >>> different hardware raids etc showed that simple taring and untaring >>> Linux kernel sources shows how good can the storage be for that kind >>> of >>> work... and it always correlates with more advanced fio and iozone >>> tests. >>> >>> So, simple untaring Linux kernel sources takes about 7 seconds on >>> chunkservers storage bypassing moosefs... but when I untar it to >>> mounted >>> moosefs it takes more than 230 seconds. >>> Goal is set to 1. CPU load is OK on all the servers, RAM is >>> sufficient. >>> Network is not overloaded and I can untar this file in about7 secs >>> to >>> our nfs-mounted NAS.... >>> I even turned on files caching on the client. >>> And this is all is very strange... maybe fuse is the bottleneck?... >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> All the data continuously generated in your IT infrastructure >>> contains a >>> definitive record of customers, application performance, security >>> threats, fraudulent activity and more. Splunk takes this data and >>> makes >>> sense of it. Business sense. IT sense. Common sense. >>> http://p.sf.net/sfu/splunk-d2d-oct >>> _______________________________________________ >>> moosefs-users mailing list >>> moo...@li... >>> https://lists.sourceforge.net/lists/listinfo/moosefs-users > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: chen g. <cg...@gm...> - 2011-10-25 05:01:13
|
Hello I have tried according to your suggestion , using command strace o logfile vhd-tuil create n vhd.img s 2048, and execute the command to test whether the mooseFS support the VHD-format file or not?, I find some information as follows:(all command are excuted in client ) 1 in the mooseFs share directory,logfile records: execve("/usr/sbin/vhd-util", ["vhd-util", "create", "-n", "vhd.img", "-s", "2048"], [/* 22 vars */]) = 0 brk(0) = 0x602000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b321cf4a000 uname({sys="Linux", node="node34", ...}) = 0 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=73642, ...}) = 0 mmap(NULL, 73642, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b321cf4b000 close(3) = 0 open("/usr/lib64/libvhd.so.1.0", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340< \0030\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=376324, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b321cf5d000 mmap(0x3003200000, 2198912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3003200000 mprotect(0x3003218000, 2097152, PROT_NONE) = 0 mmap(0x3003418000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x3003418000 close(3) = 0 open("/lib64/libuuid.so.1", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\25\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=15360, ...}) = 0 mmap(NULL, 2110488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b321cf5e000 mprotect(0x2b321cf62000, 2093056, PROT_NONE) = 0 mmap(0x2b321d161000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x2b321d161000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332a\0010\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1718120, ...}) = 0 mmap(0x3001600000, 3498328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3001600000 mprotect(0x300174e000, 2093056, PROT_NONE) = 0 mmap(0x300194d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x300194d000 mmap(0x3001952000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3001952000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b321d162000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b321d163000 arch_prctl(ARCH_SET_FS, 0x2b321d162dc0) = 0 mprotect(0x300194d000, 16384, PROT_READ) = 0 mprotect(0x300141b000, 4096, PROT_READ) = 0 munmap(0x2b321cf4b000, 73642) = 0 brk(0) = 0x602000 brk(0x623000) = 0x623000 open("vhd.img", O_WRONLY|O_CREAT|O_TRUNC|O_DIRECT, 0644) = 3 stat("vhd.img", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 time(NULL) = 1319410058 open("/etc/localtime", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0644, st_size=405, ...}) = 0 fstat(4, {st_mode=S_IFREG|0644, st_size=405, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b321cf4b000 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\3\0\0\0\0"..., 4096) = 405 lseek(4, -240, SEEK_CUR) = 165 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\3\0\0\0\0"..., 4096) = 240 close(4) = 0 munmap(0x2b321cf4b000, 4096) = 0 gettimeofday({1319410058, 344663}, NULL) = 0 open("/dev/urandom", O_RDONLY) = 4 fcntl(4, F_GETFD) = 0 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 getpid() = getuid() = 0 getppid() = 31097 gettimeofday({1319410058, 346012}, NULL) = 0 gettimeofday({1319410058, 346216}, NULL) = 0 read(4, "\266#\216\21#\215\354\33\266\255\340\360D?\242\244", 16) = 16 gettid() = lseek(3, 0, SEEK_SET) = 0 write(3, "conectix\0\0\0\2\0\1\0\0\0\0\0\0\0\0\2\0\0267\306\212tap\0"..., 512) = -1 EINVAL (Invalid argument) close(3) = 0 unlink("vhd.img") = 0 exit_group(22) = ? 2 in non-shared directory: (everything is normal) execve("/usr/sbin/vhd-util", ["vhd-util", "create", "-n", "vhd.img", "-s", "2048"], [/* 22 vars */]) = 0 brk(0) = 0x602000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b766ea44000 uname({sys="Linux", node="node34", ...}) = 0 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=73642, ...}) = 0 mmap(NULL, 73642, PROT_READ, MAP_PRIVATE, 3, 0) = 0x2b766ea45000 close(3) = 0 open("/usr/lib64/libvhd.so.1.0", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340< \0030\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0644, st_size=376324, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b766ea57000 mmap(0x3003200000, 2198912, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3003200000 mprotect(0x3003218000, 2097152, PROT_NONE) = 0 mmap(0x3003418000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18000) = 0x3003418000 close(3) = 0 open("/lib64/libuuid.so.1", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\25\0\0\0\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=15360, ...}) = 0 mmap(NULL, 2110488, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x2b766ea58000 mprotect(0x2b766ea5c000, 2093056, PROT_NONE) = 0 mmap(0x2b766ec5b000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x2b766ec5b000 close(3) = 0 open("/lib64/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\332a\0010\0\0\0"..., 832) = 832 fstat(3, {st_mode=S_IFREG|0755, st_size=1718120, ...}) = 0 mmap(0x3001600000, 3498328, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3001600000 mprotect(0x300174e000, 2093056, PROT_NONE) = 0 mmap(0x300194d000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14d000) = 0x300194d000 mmap(0x3001952000, 16728, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3001952000 close(3) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b766ec5c000 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b766ec5d000 arch_prctl(ARCH_SET_FS, 0x2b766ec5cdc0) = 0 mprotect(0x300194d000, 16384, PROT_READ) = 0 mprotect(0x300141b000, 4096, PROT_READ) = 0 munmap(0x2b766ea45000, 73642) = 0 brk(0) = 0x602000 brk(0x623000) = 0x623000 open("vhd.img", O_WRONLY|O_CREAT|O_TRUNC|O_DIRECT, 0644) = 3 stat("vhd.img", {st_mode=S_IFREG|0644, st_size=0, ...}) = 0 time(NULL) = 1319411057 open("/etc/localtime", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0644, st_size=405, ...}) = 0 fstat(4, {st_mode=S_IFREG|0644, st_size=405, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2b766ea45000 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\3\0\0\0\0"..., 4096) = 405 lseek(4, -240, SEEK_CUR) = 165 read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\3\0\0\0\0"..., 4096) = 240 close(4) = 0 munmap(0x2b766ea45000, 4096) = 0 gettimeofday({1319411057, 55555}, NULL) = 0 open("/dev/urandom", O_RDONLY) = 4 fcntl(4, F_GETFD) = 0 fcntl(4, F_SETFD, FD_CLOEXEC) = 0 getpid() = getuid() = 0 getppid() = 31167 gettimeofday({1319411057, 56604}, NULL) = 0 gettimeofday({1319411057, 56766}, NULL) = 0 read(4, "\275A\20\351\201o\10}\230\345J\34\272\n}\r", 16) = 16 gettid() = lseek(3, 0, SEEK_SET) = 0 write(3, "conectix\0\0\0\2\0\1\0\0\0\0\0\0\0\0\2\0\0267\312qtap\0"..., 512) = 512 lseek(3, 512, SEEK_SET) = 512 write(3, "cxsparse\377\377\377\377\377\377\377\377\0\0\0\0\0\0\6\0\0\1\0\0\0\0\4\0".. ., 1024) = 1024 lseek(3, 6144, SEEK_SET) = 6144 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 512) = 512 lseek(3, 5632, SEEK_SET) = 5632 write(3, "tdbatmap\0\0\0\0\0\0\30\0\0\0\0\1\0\1\0\2\377\377\377\377\0\0\0\0"..., 512) = 512 lseek(3, 1536, SEEK_SET) = 1536 lseek(3, 1536, SEEK_SET) = 1536 write(3, "\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\377\37 7\377\377\377\377\377\377\377\377\377\377\377\377\377"..., 4096) = 4096 lseek(3, 512, SEEK_SET) = 512 write(3, "cxsparse\377\377\377\377\377\377\377\377\0\0\0\0\0\0\6\0\0\1\0\0\0\0\4\0".. ., 1024) = 1024 lseek(3, 0, SEEK_END) = 6656 lseek(3, 0, SEEK_CUR) = 6656 lseek(3, 6656, SEEK_SET) = 6656 write(3, "conectix\0\0\0\2\0\1\0\0\0\0\0\0\0\0\2\0\0267\312qtap\0"..., 512) = 512 close(3) = 0 exit_group(0) I don’t know the reason , I really hope you can help me, thanks so much. |
From: Robert S. <rsa...@ne...> - 2011-10-24 12:36:27
|
There are a few bottlenecks in MFS. Most of these are caused by too many connections to a single daemon. mfsmount seems to peform badly if you make more than 5 - 10 simultaneous reads or writes per mount. mfschunkserver also seems to perform badly if you make more than around 10 simultaneous connections per instance. mfsmaster seems to cause some problems if the load on the system becomes too high. In general MFS performs best with a high number of mfsmount's, a high number of mfschunkservers and a dedicated machine running mfsmaster. Fuse can be a bottleneck for asynchronous I/O but I doubt that is what Bonnie is using. Your setup is not quite how MFS was designed to be used and for a single machine file server you will find traditional file systems like ext[234], zfs and xfs will perform significantly better. There are not too many competitors for MFS in the distributed file system space, and most of then are significantly harder to get working and are even more unstable. I can not comment on the performance of the competitors. Robert On 10/23/11 6:36 PM, Marco B wrote: > Hi All, > we are doing some benchmarks with MooseFS using a single server > Linux CentOS 5.7, HP DL585G6 4x6core Opteron 2.8Ghz, 128GB Ram, 160 GB > ioDrive SLC > > Here is Bonnie result over ioDrive partition (unoptimized ext3 filesystem): > > Version 1.03e ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > mfs1.domain 1000M 72004 92 623230 99 1007946 99 87689 99 +++++ > +++ +++++ +++ > ------Sequential Create------ --------Random Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ > > > Here is Bonnie result on the same ioDrive mounted over MooseFS: > > Version 1.03e ------Sequential Output------ --Sequential Input- --Random- > -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- > Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP > mfs1.domain 1000M 46785 65 61631 8 85611 9 84549 98 2026423 > 99 4230 7 > ------Sequential Create------ --------Random Create-------- > -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- > files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP > 16 669 4 29247 20 5808 12 699 3 3405 5 1544 5 > > It seems there is a big bottleneck in mfs or maybe Fuse. > During tests we saw mfschunkserver and mfsmount processes eating 100% > of a single CPU each. > > Waiting for comments/suggestions > Regards > Marco > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: <leo...@ar...> - 2011-10-24 10:10:37
|
Something is very wrong with MooseFS regarding writing 'lots of small files'. You can even see it 'visually': if you tar xzvf linux*.tgz the names of the files appear much slooowly than this is in case of NFS or local storage... This is to say that we use quite a powerfull HW. And we have NO load on MFS at the moment. The only thing done at the moment is the untaring the linux kernel sources. I'd sincerelly appresiate any comments from the developers of the MooseFS, the project I had big hopes for ) Do they observe the same in their environment?.. P.S. sometimes it takes less than 100 seconds to untar, but still far beyond NFS, where I get it in 4-7 seconds or so. On Fri, 21 Oct 2011 00:00:06 +0800, Flow Jiang wrote: > We are experiencing the same issue when dealing with a lot of small > files. > > One observation is, when scp a large file through Gigabyte link onto > our NFS server (Served by a VERY old Sun Raid5 Disk Array), there > seems to have a large write cache to allow the speed reported at the > first several seconds greater than the writing limit of hard drives. > But when copying the same file onto MFS (Served by SAS 300G 10k disk > on Dell PE 1950), the speed never reaches the hard drive limit. And > our NFS server do perform better than MFS when copying large amount > of > source code files. > > I'm not sure if this issue is only related to our SW/HW > configuration. But seems the write caches works better with NFS than > MFS. Does any one know how does write cache of MFS work? > > Thanks > Flow > > > On 10/17/2011 10:46 PM, leo...@ar... wrote: >> Greetings! >> Resently I've started performance benchmarking for MooseFS, >> _____________ __ _ _ >> Chunkservers: >> CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz >> RAM: 12G >> HDD: storage used for mfs chunks is raid0 of two WD 7200rpm caviar >> black disks. >> OS: OpenSuSE 11.3 >> Chunk servers number: 5 >> >> ------------------------------- >> Master: >> >> CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz >> OS: OpenSuSE 11.3 >> RAM: 12G >> >> -------------------------------- >> Client: >> One of chunkservers. >> >> --------------------------------------------------- >> LAN: >> 1Gbit/s LAN. >> >> ____________ _ _ __ >> Problem; >> I've experimented alot with bonnie, fio, iozone and others in >> testing >> other storages... We have people working with the source code here, >> so >> we need good random I/O for small files with moderate blocksize from >> 8k >> to 128k... Comparative testing of other storage solutions involving >> ZFS, >> different hardware raids etc showed that simple taring and untaring >> Linux kernel sources shows how good can the storage be for that kind >> of >> work... and it always correlates with more advanced fio and iozone >> tests. >> >> So, simple untaring Linux kernel sources takes about 7 seconds on >> chunkservers storage bypassing moosefs... but when I untar it to >> mounted >> moosefs it takes more than 230 seconds. >> Goal is set to 1. CPU load is OK on all the servers, RAM is >> sufficient. >> Network is not overloaded and I can untar this file in about7 secs >> to >> our nfs-mounted NAS.... >> I even turned on files caching on the client. >> And this is all is very strange... maybe fuse is the bottleneck?... >> >> >> >> ------------------------------------------------------------------------------ >> All the data continuously generated in your IT infrastructure >> contains a >> definitive record of customers, application performance, security >> threats, fraudulent activity and more. Splunk takes this data and >> makes >> sense of it. Business sense. IT sense. Common sense. >> http://p.sf.net/sfu/splunk-d2d-oct >> _______________________________________________ >> moosefs-users mailing list >> moo...@li... >> https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Marco B <mar...@gm...> - 2011-10-23 22:36:14
|
Hi All, we are doing some benchmarks with MooseFS using a single server Linux CentOS 5.7, HP DL585G6 4x6core Opteron 2.8Ghz, 128GB Ram, 160 GB ioDrive SLC Here is Bonnie result over ioDrive partition (unoptimized ext3 filesystem): Version 1.03e ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP mfs1.domain 1000M 72004 92 623230 99 1007946 99 87689 99 +++++ +++ +++++ +++ ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ +++++ +++ Here is Bonnie result on the same ioDrive mounted over MooseFS: Version 1.03e ------Sequential Output------ --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP mfs1.domain 1000M 46785 65 61631 8 85611 9 84549 98 2026423 99 4230 7 ------Sequential Create------ --------Random Create-------- -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 669 4 29247 20 5808 12 699 3 3405 5 1544 5 It seems there is a big bottleneck in mfs or maybe Fuse. During tests we saw mfschunkserver and mfsmount processes eating 100% of a single CPU each. Waiting for comments/suggestions Regards Marco |
From: WK <wk...@bn...> - 2011-10-20 21:39:16
|
We stopped seeing the issue a few months ago. We assume recent kernel updates have mitigated the problem or it may simply be that we have been actively upgrading our cluster hardware (especially the masters) with more RAM to avoid the 'rewrite MetaData on the hour' stall We never saw it on our Cent5 machines and some of them are really quite busy (IMAP servers). Of course, now that I have declared that the situation is no longer there, I am sure it will happen in a few days just to spite me. -bill On 10/20/11 12:51 AM, Laurent Wandrebeck wrote: > Unburying this thread :) > Have you found any working solution about it ? > I've tried > echo never> /sys/kernel/mm/redhat_transparent_hugepage/defrag > echo no> /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag > but I still get (less though) task stuck etc etc running C6 x86_64. > Even « funnier », a user have been able to trigger it under C5 > x86_64, running a user-space program (data processing, data being > located on mfs volume ) ! Here's the C5 trace… > > INFO: task polymer-na-spg:17348 blocked for more than 120 seconds. > "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this > message. polymer-na-sp D ffff810001025e20 0 17348 > 17347 (NOTLB) ffff81018b9d5c08 0000000000000086 > ffff810210a141d0 ffffffff8863b219 ffff81081e89d000 0000000000000007 > ffff81037fb040c0 ffff81042e1f57e0 003ab01ace8dae8c 0000000000007bc3 > ffff81037fb042a8 000000048863ff35 Call Trace: > [<ffffffff8863b219>] :fuse:flush_bg_queue+0x2b/0x48 > [<ffffffff8006e1db>] do_gettimeofday+0x40/0x90 > [<ffffffff80028a85>] sync_page+0x0/0x43 > [<ffffffff800637ea>] io_schedule+0x3f/0x67 > [<ffffffff80028ac3>] sync_page+0x3e/0x43 > [<ffffffff8006392e>] __wait_on_bit_lock+0x36/0x66 > [<ffffffff8003fbc7>] __lock_page+0x5e/0x64 > [<ffffffff800a0a06>] wake_bit_function+0x0/0x23 > [<ffffffff8000c2e4>] do_generic_mapping_read+0x1df/0x359 > [<ffffffff8000d0fd>] file_read_actor+0x0/0x159 > [<ffffffff8000c5aa>] __generic_file_aio_read+0x14c/0x198 > [<ffffffff800c6774>] generic_file_read+0xac/0xc5 > [<ffffffff800a09d8>] autoremove_wake_function+0x0/0x2e > [<ffffffff8000e129>] do_mmap_pgoff+0x615/0x780 > [<ffffffff8012d629>] selinux_file_permission+0x9f/0xb6 > [<ffffffff8000b69a>] vfs_read+0xcb/0x171 > [<ffffffff80011bac>] sys_read+0x45/0x6e > [<ffffffff8005d28d>] tracesys+0xd5/0xe0 > > (and so on) > Thanks ! |
From: Flow J. <fl...@gm...> - 2011-10-20 16:00:24
|
We are experiencing the same issue when dealing with a lot of small files. One observation is, when scp a large file through Gigabyte link onto our NFS server (Served by a VERY old Sun Raid5 Disk Array), there seems to have a large write cache to allow the speed reported at the first several seconds greater than the writing limit of hard drives. But when copying the same file onto MFS (Served by SAS 300G 10k disk on Dell PE 1950), the speed never reaches the hard drive limit. And our NFS server do perform better than MFS when copying large amount of source code files. I'm not sure if this issue is only related to our SW/HW configuration. But seems the write caches works better with NFS than MFS. Does any one know how does write cache of MFS work? Thanks Flow On 10/17/2011 10:46 PM, leo...@ar... wrote: > Greetings! > Resently I've started performance benchmarking for MooseFS, > _____________ __ _ _ > Chunkservers: > CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz > RAM: 12G > HDD: storage used for mfs chunks is raid0 of two WD 7200rpm caviar > black disks. > OS: OpenSuSE 11.3 > Chunk servers number: 5 > > ------------------------------- > Master: > > CPU: Intel(R) Xeon(R) CPU X3430 @ 2.40GHz > OS: OpenSuSE 11.3 > RAM: 12G > > -------------------------------- > Client: > One of chunkservers. > > --------------------------------------------------- > LAN: > 1Gbit/s LAN. > > ____________ _ _ __ > Problem; > I've experimented alot with bonnie, fio, iozone and others in testing > other storages... We have people working with the source code here, so > we need good random I/O for small files with moderate blocksize from 8k > to 128k... Comparative testing of other storage solutions involving ZFS, > different hardware raids etc showed that simple taring and untaring > Linux kernel sources shows how good can the storage be for that kind of > work... and it always correlates with more advanced fio and iozone > tests. > > So, simple untaring Linux kernel sources takes about 7 seconds on > chunkservers storage bypassing moosefs... but when I untar it to mounted > moosefs it takes more than 230 seconds. > Goal is set to 1. CPU load is OK on all the servers, RAM is sufficient. > Network is not overloaded and I can untar this file in about7 secs to > our nfs-mounted NAS.... > I even turned on files caching on the client. > And this is all is very strange... maybe fuse is the bottleneck?... > > > ------------------------------------------------------------------------------ > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |
From: Laurent W. <lw...@hy...> - 2011-10-20 14:58:25
|
Unburying this thread :) Have you found any working solution about it ? I've tried echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag but I still get (less though) task stuck etc etc running C6 x86_64. Even « funnier », a user have been able to trigger it under C5 x86_64, running a user-space program (data processing, data being located on mfs volume ) ! Here's the C5 trace… INFO: task polymer-na-spg:17348 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. polymer-na-sp D ffff810001025e20 0 17348 17347 (NOTLB) ffff81018b9d5c08 0000000000000086 ffff810210a141d0 ffffffff8863b219 ffff81081e89d000 0000000000000007 ffff81037fb040c0 ffff81042e1f57e0 003ab01ace8dae8c 0000000000007bc3 ffff81037fb042a8 000000048863ff35 Call Trace: [<ffffffff8863b219>] :fuse:flush_bg_queue+0x2b/0x48 [<ffffffff8006e1db>] do_gettimeofday+0x40/0x90 [<ffffffff80028a85>] sync_page+0x0/0x43 [<ffffffff800637ea>] io_schedule+0x3f/0x67 [<ffffffff80028ac3>] sync_page+0x3e/0x43 [<ffffffff8006392e>] __wait_on_bit_lock+0x36/0x66 [<ffffffff8003fbc7>] __lock_page+0x5e/0x64 [<ffffffff800a0a06>] wake_bit_function+0x0/0x23 [<ffffffff8000c2e4>] do_generic_mapping_read+0x1df/0x359 [<ffffffff8000d0fd>] file_read_actor+0x0/0x159 [<ffffffff8000c5aa>] __generic_file_aio_read+0x14c/0x198 [<ffffffff800c6774>] generic_file_read+0xac/0xc5 [<ffffffff800a09d8>] autoremove_wake_function+0x0/0x2e [<ffffffff8000e129>] do_mmap_pgoff+0x615/0x780 [<ffffffff8012d629>] selinux_file_permission+0x9f/0xb6 [<ffffffff8000b69a>] vfs_read+0xcb/0x171 [<ffffffff80011bac>] sys_read+0x45/0x6e [<ffffffff8005d28d>] tracesys+0xd5/0xe0 (and so on) Thanks ! -- Laurent Wandrebeck HYGEOS, Earth Observation Department / Observation de la Terre Euratechnologies 165 Avenue de Bretagne 59000 Lille, France tel: +33 3 20 08 24 98 http://www.hygeos.com GPG fingerprint/Empreinte GPG: F5CA 37A4 6D03 A90C 7A1D 2A62 54E6 EF2C D17C F64C |
From: Michał B. <mic...@ge...> - 2011-10-20 12:33:18
|
Hi Robert! (My replies are marked with [MB2]) From: Robert Sandilands [mailto:rsa...@ne...] Sent: Wednesday, October 19, 2011 2:52 AM To: Michał Borychowski Cc: moo...@li... Subject: Re: [Moosefs-users] mfsmaster performance and hardware Hi Michal, [MB] Robert, why do you think there is just one socket between mfsmount and mfsmaster? >From looking at the code and from looking at the connections between the machines mfsmount and mfsmaster is running on. Also based off the very significant decrease in the time it takes to open() a file once the number of simultaneous file access per mfsmount was limited to a much lower number by increasing the number of times the volume was mounted on a machine. This also significantly increased the perceived speed and interactivity of the file system. On the machine running mfsmaster: # netstat -ap | grep mfsmaster | grep http-load-balance-1 | wc -l 9 The machine http-load-balance-1 is running 9 instances of mfsmount and therefore has 9 connections to mfsmaster. [MB2] Yes, that's right. But on the other hand having more sockets would probably slow down the master server and probalby it could get worse. [MB] Here, if you want to experiment, you can find this line: jpool = job_pool_new(10,BGJOBSCNT,&jobfd); in csserv.c and change 10 to eg. 20. We thought 10 is quite enough for the regular use of MooseFS. Or maybe we should put it to "mfschunkserver.cfg"? We'll wait for your feedback. I looked at this code a few weeks ago and modified it to read a value from the configuration file instead of the hard-coded value. There actually seems to be two calls to job_pool_new(). If I read the code correctly one pool seems to be used for connections from mfsmaster and one pool seems to be used for connections from mfsmount. Increasing the value did not seem to have a positive effect on the stability of mfschunkserver so I reverted the change without testing its effect thoroughly. In the short term it seems to be significantly more effective to run multiple instances of mfschunkserver per machine than to increase this constant. If the stability issues can be resolved then it may be useful to make this value user configurable. At this stage I would guess that it is most efficient to run one instance of mfschunkserver for approximately every 10 spindles. It may be useful to allow tuning the number of threads per job pool with a similar relationship. [MB2] And what do you mean by "stability of mfschunkserver"? It should not have any implications for stability of mfschunkserver. Could you give us more information whether you experienced some problems with stability? Did the chunkserver totally hung up? Worked slowly? Etc.? Kind regards Michał At this stage we have had enough down time because of mfs that I would rather live with the sub-optimal performance that we have now. Robert On 10/18/11 5:17 AM, Michał Borychowski wrote: From: Robert Sandilands [mailto:rsa...@ne...] Sent: Thursday, October 06, 2011 1:16 PM To: Michał Borychowski Cc: moo...@li... Subject: Re: [Moosefs-users] mfsmaster performance and hardware Hi Michal, I understand open is complex but as a later email I wrote shows the limit seems to be in the fact that mfsmount uses a single socket to mfsmaster and this socket is a significant bottleneck. [MB] Robert, why do you think there is just one socket between mfsmount and mfsmaster? Another bottleneck seems to be in mfschunkserver where more than around 10 simultaneous file accesses will cause significant deterioration of performance. [MB] Here, if you want to experiment, you can find this line: jpool = job_pool_new(10,BGJOBSCNT,&jobfd); in csserv.c and change 10 to eg. 20. We thought 10 is quite enough for the regular use of MooseFS. Or maybe we should put it to "mfschunkserver.cfg"? We'll wait for your feedback. Regards Michal Any advice on mitigating or fixing these two bottlenecks would be highly appreciated. Robert On 10/6/11 3:25 AM, Michał Borychowski wrote: Hi Robert! This is normal behaviour - "open" makes several things on the master, among others some lookups and the open itself. We tried to introduce here folder and attribute cache but it didn't work as expected. We could mark some files as "immutable" and keep the cache in the mfsmount, but these files had to be read only which is rather not acceptable. If you have any ideas how to speed things up, go ahead :) Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: Robert Sandilands [mailto:rsa...@ne...] Sent: Wednesday, August 31, 2011 2:54 AM To: moo...@li... Subject: Re: [Moosefs-users] mfsmaster performance and hardware Further on this subject. I wrote a dedicated http server to serve the files instead of using Apache. It allowed me to gain a few extra percent of performance and decreased the memory usage of the web servers. The web server also gave me some interesting timings: File open average 405.3732 ms File read average 238.7784 ms File close average 286.8376 ms File size average 0.0026 ms Net read average 2.536 ms Net write average 2.2148 ms Log to access log average 0.2526 ms Log to error log average 0.2234 ms Average time to process a file 936.2186 ms Total files processed 1,503,610 What I really find scary is that to open a file takes nearly half a second. To close a file a quarter of a second. The time to open() and close() is nearly 3 times more than the time to read the data. The server always reads in multiples of 64 kB except if there are less data available. Although it uses posix_fadvise() to try and do some read-ahead. This is the average over 5 machines running mfsmount and my custom web server running for about 18 hours. On a machine that only serves a low number of clients the times for open and close are negligible. open() and close() seems to scale very badly with an increase in clients using mfsmount. >From looking at the code for mfsmount it seems like all communication to the master happens over a single TCP socket with a global handle and mutex to protect it. This may be the bottle neck? If there are multiple open()'s at the same time they may end up waiting for the mutex to get an opportunity to communicate with the master? The same handle and mutex is also used to read replies and this may also not help the situation? What prevents multiple sockets to the master? It also seems to indicate that the only way to get the open() average down is to introduce more web servers and that a single web server can only serve a very low number of clients. Is that a correct assumption? Robert On 8/26/11 3:25 AM, Davies Liu wrote: Hi Robert, Another hint to make mfsmaster more responsive is to locate the metadata.mfs on a separated disk with change logs, such as SAS array, then you should modify the source code of mfsmaster to do this. PS: what is the average size of you files? MooseFS (like GFS) is designed for large file (100M+), it can not serve well for amount of small files. Haystack from Facebook may be the better choice. We (douban.com) use MooseFS to serve 200+T(1M files) offline data and beansdb [1] to serve 500 million online small files, it performs very well. [1]: http://code.google.com/p/beansdb/ Davies On Fri, Aug 26, 2011 at 9:08 AM, Robert Sandilands <rsa...@ne...> wrote: Hi Elliot, There is nothing in the code to change the priority. Taking virtually all other load from the chunk and master servers seems to have improved this significantly. I still see timeouts from mfsmount, but not enough to be problematic. To try and optimize the performance I am experimenting with accessing the data using different APIs and block sizes but this has been inconclusive. I have tried the effect of posix_fadvise(), sendfile() and different sized buffers for read(). I still want to try mmap(). Sendfile() did seem to be slightly slower than read(). Robert On 8/24/11 11:05 AM, Elliot Finley wrote: > On Tue, Aug 9, 2011 at 6:46 PM, Robert Sandilands<rsa...@ne...> wrote: >> Increasing the swap space fixed the fork() issue. It seems that you have to >> ensure that memory available is always double the memory needed by >> mfsmaster. None of the swap space was used over the last 24 hours. >> >> This did solve the extreme comb-like behavior of mfsmaster. It still does >> not resolve its sensitivity to load on the server. I am still seeing >> timeouts on the chunkservers and mounts on the hour due to the high CPU and >> I/O load when the meta data is dumped to disk. It did however decrease >> significantly. > Here is another thought on this... > > The process is niced to -19 (very high priority) so that it has good > performance. It forks once per hour to write out the metadata. I > haven't checked the code for this, but is the forked process lowering > it's priority so it doesn't compete with the original process? > > If it's not, it should be an easy code change to lower the priority in > the child process (metadata writer) so that it doesn't compete with > the original process at the same priority. > > If you check into this, I'm sure the list would appreciate an update. :) > > Elliot ---------------------------------------------------------------------------- -- EMC VNX: the world's simplest storage, starting under $10K The only unified storage solution that offers unified management Up to 160% more powerful than alternatives and 25% more efficient. Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users -- - Davies |
From: Robert S. <rsa...@ne...> - 2011-10-19 00:52:16
|
Hi Michal, */[MB] Robert, why do you think there is just one socket between mfsmount and mfsmaster? /*From looking at the code and from looking at the connections between the machines mfsmount and mfsmaster is running on. Also based off the very significant decrease in the time it takes to open() a file once the number of simultaneous file access per mfsmount was limited to a much lower number by increasing the number of times the volume was mounted on a machine. This also significantly increased the perceived speed and interactivity of the file system. */ /*On the machine running mfsmaster: # netstat -ap | grep mfsmaster | grep http-load-balance-1 | wc -l 9 The machine http-load-balance-1 is running 9 instances of mfsmount and therefore has 9 connections to mfsmaster. */ /* */[MB] Here, if you want to experiment, you can find this line:/* jpool = job_pool_new(10,BGJOBSCNT,&jobfd); *//* */in /*csserv.c*//* *//* */and change 10 to eg. 20. We thought 10 is quite enough for the regular use of MooseFS. Or maybe we should put it to "mfschunkserver.cfg"? We'll wait for your feedback./* */ /*I looked at this code a few weeks ago and modified it to read a value from the configuration file instead of the hard-coded value. There actually seems to be two calls to job_pool_new(). If I read the code correctly one pool seems to be used for connections from mfsmaster and one pool seems to be used for connections from mfsmount. Increasing the value did not seem to have a positive effect on the stability of mfschunkserver so I reverted the change without testing its effect thoroughly. In the short term it seems to be significantly more effective to run multiple instances of mfschunkserver per machine than to increase this constant. If the stability issues can be resolved then it may be useful to make this value user configurable. At this stage I would guess that it is most efficient to run one instance of mfschunkserver for approximately every 10 spindles. It may be useful to allow tuning the number of threads per job pool with a similar relationship. */ /* At this stage we have had enough down time because of mfs that I would rather live with the sub-optimal performance that we have now. Robert On 10/18/11 5:17 AM, Michał Borychowski wrote: > > *From:*Robert Sandilands [mailto:rsa...@ne...] > *Sent:* Thursday, October 06, 2011 1:16 PM > *To:* Michał Borychowski > *Cc:* moo...@li... > *Subject:* Re: [Moosefs-users] mfsmaster performance and hardware > > Hi Michal, > > I understand open is complex but as a later email I wrote shows the > limit seems to be in the fact that mfsmount uses a single socket to > mfsmaster and this socket is a significant bottleneck. > > */[MB] Robert, why do you think there is just one socket between > mfsmount and mfsmaster?/* > > > > Another bottleneck seems to be in mfschunkserver where more than > around 10 simultaneous file accesses will cause significant > deterioration of performance. > > */[MB] Here, if you want to experiment, you can find this line:/* > > jpool = job_pool_new(10,BGJOBSCNT,&jobfd); > > *//* > > */in /*csserv.c*//* > > *//* > > */and change 10 to eg. 20. We thought 10 is quite enough for the > regular use of MooseFS. Or maybe we should put it to > "mfschunkserver.cfg"? We'll wait for your feedback./* > > *//* > > *//* > > */Regards/* > > */Michal/* > > *//* > > > > Any advice on mitigating or fixing these two bottlenecks would be > highly appreciated. > > Robert > > On 10/6/11 3:25 AM, Michał Borychowski wrote: > > Hi Robert! > > This is normal behaviour - "open" makes several things on the master, > among others some lookups and the open itself. We tried to introduce > here folder and attribute cache but it didn't work as expected. We > could mark some files as "immutable" and keep the cache in the > mfsmount, but these files had to be read only which is rather not > acceptable. > > If you have any ideas how to speed things up, go ahead :) > > Kind regards > > Michał Borychowski > > MooseFS Support Manager > > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > > Gemius S.A. > > ul. Wołoska 7, 02-672 Warszawa > > Budynek MARS, klatka D > > Tel.: +4822 874-41-00 > > Fax : +4822 874-41-01 > > *From:* Robert Sandilands [mailto:rsa...@ne...] > *Sent:* Wednesday, August 31, 2011 2:54 AM > *To:* moo...@li... > <mailto:moo...@li...> > *Subject:* Re: [Moosefs-users] mfsmaster performance and hardware > > Further on this subject. > > I wrote a dedicated http server to serve the files instead of using > Apache. It allowed me to gain a few extra percent of performance and > decreased the memory usage of the web servers. The web server also > gave me some interesting timings: > > File open average 405.3732 ms > File read average 238.7784 ms > File close average 286.8376 ms > File size average 0.0026 ms > Net read average 2.536 ms > Net write average 2.2148 ms > Log to access log average 0.2526 ms > Log to error log average 0.2234 ms > > Average time to process a file 936.2186 ms > Total files processed 1,503,610 > > What I really find scary is that to open a file takes nearly half a > second. To close a file a quarter of a second. The time to open() and > close() is nearly 3 times more than the time to read the data. The > server always reads in multiples of 64 kB except if there are less > data available. Although it uses posix_fadvise() to try and do some > read-ahead. This is the average over 5 machines running mfsmount and > my custom web server running for about 18 hours. > > On a machine that only serves a low number of clients the times for > open and close are negligible. open() and close() seems to scale very > badly with an increase in clients using mfsmount. > > From looking at the code for mfsmount it seems like all communication > to the master happens over a single TCP socket with a global handle > and mutex to protect it. This may be the bottle neck? If there are > multiple open()'s at the same time they may end up waiting for the > mutex to get an opportunity to communicate with the master? The same > handle and mutex is also used to read replies and this may also not > help the situation? > > What prevents multiple sockets to the master? > > It also seems to indicate that the only way to get the open() average > down is to introduce more web servers and that a single web server can > only serve a very low number of clients. Is that a correct assumption? > > Robert > > On 8/26/11 3:25 AM, Davies Liu wrote: > > Hi Robert, > > Another hint to make mfsmaster more responsive is to locate the > metadata.mfs > > on a separated disk with change logs, such as SAS array, then you > should modify > > the source code of mfsmaster to do this. > > PS: what is the average size of you files? MooseFS (like GFS) is > designed for > > large file (100M+), it can not serve well for amount of small files. > Haystack from > > Facebook may be the better choice. We (douban.com <http://douban.com>) > use MooseFS to serve > > 200+T(1M files) offline data and beansdb [1] to serve 500 million > online small > > files, it performs very well. > > [1]: http://code.google.com/p/ <http://code.google.com/p/>*beansdb*/ > > Davies > > On Fri, Aug 26, 2011 at 9:08 AM, Robert Sandilands > <rsa...@ne... <mailto:rsa...@ne...>> wrote: > > Hi Elliot, > > There is nothing in the code to change the priority. > > Taking virtually all other load from the chunk and master servers seems > to have improved this significantly. I still see timeouts from mfsmount, > but not enough to be problematic. > > To try and optimize the performance I am experimenting with accessing > the data using different APIs and block sizes but this has been > inconclusive. I have tried the effect of posix_fadvise(), sendfile() and > different sized buffers for read(). I still want to try mmap(). > Sendfile() did seem to be slightly slower than read(). > > Robert > > > On 8/24/11 11:05 AM, Elliot Finley wrote: > > On Tue, Aug 9, 2011 at 6:46 PM, Robert > Sandilands<rsa...@ne... <mailto:rsa...@ne...>> wrote: > >> Increasing the swap space fixed the fork() issue. It seems that you > have to > >> ensure that memory available is always double the memory needed by > >> mfsmaster. None of the swap space was used over the last 24 hours. > >> > >> This did solve the extreme comb-like behavior of mfsmaster. It > still does > >> not resolve its sensitivity to load on the server. I am still seeing > >> timeouts on the chunkservers and mounts on the hour due to the high > CPU and > >> I/O load when the meta data is dumped to disk. It did however decrease > >> significantly. > > Here is another thought on this... > > > > The process is niced to -19 (very high priority) so that it has good > > performance. It forks once per hour to write out the metadata. I > > haven't checked the code for this, but is the forked process lowering > > it's priority so it doesn't compete with the original process? > > > > If it's not, it should be an easy code change to lower the priority in > > the child process (metadata writer) so that it doesn't compete with > > the original process at the same priority. > > > > If you check into this, I'm sure the list would appreciate an update. :) > > > > Elliot > > > ------------------------------------------------------------------------------ > EMC VNX: the world's simplest storage, starting under $10K > The only unified storage solution that offers unified management > Up to 160% more powerful than alternatives and 25% more efficient. > Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev > _______________________________________________ > moosefs-users mailing list > moo...@li... > <mailto:moo...@li...> > https://lists.sourceforge.net/lists/listinfo/moosefs-users > > -- > - Davies > |
From: Robert S. <rsa...@ne...> - 2011-10-18 12:18:16
|
14G metadata_ml.mfs.back Robert On 10/18/11 5:03 AM, Michał Borychowski wrote: > And how much space does the dumped metadata file occupy? > > > Regards > Michał > > > -----Original Message----- > From: Robert Sandilands [mailto:rsa...@ne...] > Sent: Monday, October 17, 2011 7:43 PM > To: Elliot Finley > Cc: moo...@li... > Subject: Re: [Moosefs-users] mfsmaster performance and hardware > > Around 42 GB. mfschunkserver also seems to use about 6 GB of RAM for 32 > million chunks. > > Robert > > On 10/17/11 10:19 AM, Elliot Finley wrote: >> 2011/10/17 Robert Sandilands<rsa...@ne...>: >>> The new master has 72 GB of RAM and it currently >>> has 125 million files. >> Just out of curiosity (and to plan my mfsmaster upgrade), how much RAM >> does the mfsmaster process use for 125 million files? >> >> Thanks, >> Elliot > > ---------------------------------------------------------------------------- > -- > All the data continuously generated in your IT infrastructure contains a > definitive record of customers, application performance, security > threats, fraudulent activity and more. Splunk takes this data and makes > sense of it. Business sense. IT sense. Common sense. > http://p.sf.net/sfu/splunk-d2d-oct > _______________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users > |
From: Michał B. <mic...@ge...> - 2011-10-18 09:19:07
|
From: Robert Sandilands [mailto:rsa...@ne...] Sent: Thursday, October 06, 2011 1:16 PM To: Michał Borychowski Cc: moo...@li... Subject: Re: [Moosefs-users] mfsmaster performance and hardware Hi Michal, I understand open is complex but as a later email I wrote shows the limit seems to be in the fact that mfsmount uses a single socket to mfsmaster and this socket is a significant bottleneck. [MB] Robert, why do you think there is just one socket between mfsmount and mfsmaster? Another bottleneck seems to be in mfschunkserver where more than around 10 simultaneous file accesses will cause significant deterioration of performance. [MB] Here, if you want to experiment, you can find this line: jpool = job_pool_new(10,BGJOBSCNT,&jobfd); in csserv.c and change 10 to eg. 20. We thought 10 is quite enough for the regular use of MooseFS. Or maybe we should put it to "mfschunkserver.cfg"? We'll wait for your feedback. Regards Michal Any advice on mitigating or fixing these two bottlenecks would be highly appreciated. Robert On 10/6/11 3:25 AM, Michał Borychowski wrote: Hi Robert! This is normal behaviour - "open" makes several things on the master, among others some lookups and the open itself. We tried to introduce here folder and attribute cache but it didn't work as expected. We could mark some files as "immutable" and keep the cache in the mfsmount, but these files had to be read only which is rather not acceptable. If you have any ideas how to speed things up, go ahead :) Kind regards Michał Borychowski MooseFS Support Manager _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ Gemius S.A. ul. Wołoska 7, 02-672 Warszawa Budynek MARS, klatka D Tel.: +4822 874-41-00 Fax : +4822 874-41-01 From: Robert Sandilands [mailto:rsa...@ne...] Sent: Wednesday, August 31, 2011 2:54 AM To: moo...@li... Subject: Re: [Moosefs-users] mfsmaster performance and hardware Further on this subject. I wrote a dedicated http server to serve the files instead of using Apache. It allowed me to gain a few extra percent of performance and decreased the memory usage of the web servers. The web server also gave me some interesting timings: File open average 405.3732 ms File read average 238.7784 ms File close average 286.8376 ms File size average 0.0026 ms Net read average 2.536 ms Net write average 2.2148 ms Log to access log average 0.2526 ms Log to error log average 0.2234 ms Average time to process a file 936.2186 ms Total files processed 1,503,610 What I really find scary is that to open a file takes nearly half a second. To close a file a quarter of a second. The time to open() and close() is nearly 3 times more than the time to read the data. The server always reads in multiples of 64 kB except if there are less data available. Although it uses posix_fadvise() to try and do some read-ahead. This is the average over 5 machines running mfsmount and my custom web server running for about 18 hours. On a machine that only serves a low number of clients the times for open and close are negligible. open() and close() seems to scale very badly with an increase in clients using mfsmount. >From looking at the code for mfsmount it seems like all communication to the master happens over a single TCP socket with a global handle and mutex to protect it. This may be the bottle neck? If there are multiple open()'s at the same time they may end up waiting for the mutex to get an opportunity to communicate with the master? The same handle and mutex is also used to read replies and this may also not help the situation? What prevents multiple sockets to the master? It also seems to indicate that the only way to get the open() average down is to introduce more web servers and that a single web server can only serve a very low number of clients. Is that a correct assumption? Robert On 8/26/11 3:25 AM, Davies Liu wrote: Hi Robert, Another hint to make mfsmaster more responsive is to locate the metadata.mfs on a separated disk with change logs, such as SAS array, then you should modify the source code of mfsmaster to do this. PS: what is the average size of you files? MooseFS (like GFS) is designed for large file (100M+), it can not serve well for amount of small files. Haystack from Facebook may be the better choice. We (douban.com) use MooseFS to serve 200+T(1M files) offline data and beansdb [1] to serve 500 million online small files, it performs very well. [1]: http://code.google.com/p/beansdb/ Davies On Fri, Aug 26, 2011 at 9:08 AM, Robert Sandilands <rsa...@ne...> wrote: Hi Elliot, There is nothing in the code to change the priority. Taking virtually all other load from the chunk and master servers seems to have improved this significantly. I still see timeouts from mfsmount, but not enough to be problematic. To try and optimize the performance I am experimenting with accessing the data using different APIs and block sizes but this has been inconclusive. I have tried the effect of posix_fadvise(), sendfile() and different sized buffers for read(). I still want to try mmap(). Sendfile() did seem to be slightly slower than read(). Robert On 8/24/11 11:05 AM, Elliot Finley wrote: > On Tue, Aug 9, 2011 at 6:46 PM, Robert Sandilands<rsa...@ne...> wrote: >> Increasing the swap space fixed the fork() issue. It seems that you have to >> ensure that memory available is always double the memory needed by >> mfsmaster. None of the swap space was used over the last 24 hours. >> >> This did solve the extreme comb-like behavior of mfsmaster. It still does >> not resolve its sensitivity to load on the server. I am still seeing >> timeouts on the chunkservers and mounts on the hour due to the high CPU and >> I/O load when the meta data is dumped to disk. It did however decrease >> significantly. > Here is another thought on this... > > The process is niced to -19 (very high priority) so that it has good > performance. It forks once per hour to write out the metadata. I > haven't checked the code for this, but is the forked process lowering > it's priority so it doesn't compete with the original process? > > If it's not, it should be an easy code change to lower the priority in > the child process (metadata writer) so that it doesn't compete with > the original process at the same priority. > > If you check into this, I'm sure the list would appreciate an update. :) > > Elliot ---------------------------------------------------------------------------- -- EMC VNX: the world's simplest storage, starting under $10K The only unified storage solution that offers unified management Up to 160% more powerful than alternatives and 25% more efficient. Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users -- - Davies |
From: Michał B. <mic...@ge...> - 2011-10-18 09:04:16
|
And how much space does the dumped metadata file occupy? Regards Michał -----Original Message----- From: Robert Sandilands [mailto:rsa...@ne...] Sent: Monday, October 17, 2011 7:43 PM To: Elliot Finley Cc: moo...@li... Subject: Re: [Moosefs-users] mfsmaster performance and hardware Around 42 GB. mfschunkserver also seems to use about 6 GB of RAM for 32 million chunks. Robert On 10/17/11 10:19 AM, Elliot Finley wrote: > 2011/10/17 Robert Sandilands<rsa...@ne...>: >> The new master has 72 GB of RAM and it currently >> has 125 million files. > Just out of curiosity (and to plan my mfsmaster upgrade), how much RAM > does the mfsmaster process use for 125 million files? > > Thanks, > Elliot ---------------------------------------------------------------------------- -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity and more. Splunk takes this data and makes sense of it. Business sense. IT sense. Common sense. http://p.sf.net/sfu/splunk-d2d-oct _______________________________________________ moosefs-users mailing list moo...@li... https://lists.sourceforge.net/lists/listinfo/moosefs-users |