Re: [Kosmosfs-users] Unable to retire node: <host> <port> status: -1
Status: Alpha
Brought to you by:
sriramsrao
From: Josh A. <jo...@gm...> - 2009-06-23 19:49:29
|
Yep, It's been this way for a few days even after having added new nodes to the cluster. The new nodes have been getting data but very slowly. It appears that there's no (or possibly very little) rebalancing going on and if there's a way for me to nudge kfs into getting rolling on that I'm all ears :-). Most nodes are running two chunkservers (one per physical disk) and that's been performing pretty well so far. [kfs@tsaa34 ~]$ /opt/philotic/kfs/0.3/bin/tools/kfsping -m -s tskfs1 -p 21000 Up servers: 45 s=10.0.1.40, p=31001, rack=1, used=558.398(GB), free=210.131(GB), util=72.658%, nblocks=41851, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.39, p=31000, rack=1, used=492.209(GB), free=43.7421(GB), util=91.8384%, nblocks=31373, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0, overloaded=1 s=10.0.1.76, p=31001, rack=1, used=553.128(GB), free=28.4725(GB), util=95.1045%, nblocks=37962, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0, overloaded=1 s=10.0.1.72, p=31001, rack=1, used=582.771(GB), free=267.229(GB), util=68.5613%, nblocks=42224, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.71, p=31000, rack=1, used=452.604(GB), free=111.324(GB), util=80.2592%, nblocks=33305, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.37, p=31001, rack=1, used=582.508(GB), free=190.621(GB), util=75.3442%, nblocks=42425, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.40, p=31000, rack=1, used=538.834(GB), free=70.6716(GB), util=88.4051%, nblocks=40817, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.38, p=31000, rack=1, used=567.559(GB), free=110.687(GB), util=83.6804%, nblocks=42057, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.71, p=31001, rack=1, used=579.41(GB), free=151.791(GB), util=79.2409%, nblocks=42017, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.76, p=31000, rack=1, used=450.049(GB), free=17.0633(GB), util=96.3471%, nblocks=19644, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0, overloaded=1 s=10.0.1.35, p=31000, rack=1, used=524.617(GB), free=172.351(GB), util=75.2713%, nblocks=40479, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.37, p=31000, rack=1, used=572.338(GB), free=170.664(GB), util=77.0305%, nblocks=42197, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.31, p=31001, rack=1, used=580.125(GB), free=188.277(GB), util=75.4976%, nblocks=41902, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.32, p=31001, rack=1, used=573.355(GB), free=195.56(GB), util=74.5668%, nblocks=42022, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.39, p=31001, rack=1, used=579.707(GB), free=193.47(GB), util=74.9773%, nblocks=42234, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.32, p=31000, rack=1, used=544.341(GB), free=59.9674(GB), util=90.0767%, nblocks=40708, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0, overloaded=1 s=10.0.1.38, p=31001, rack=1, used=574.91(GB), free=198.586(GB), util=74.3261%, nblocks=41754, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.35, p=31001, rack=1, used=577.502(GB), free=202.279(GB), util=74.0595%, nblocks=42108, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.31, p=31000, rack=1, used=548.517(GB), free=96.3173(GB), util=85.0632%, nblocks=41114, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.72, p=31000, rack=1, used=482.726(GB), free=97.9717(GB), util=83.1286%, nblocks=34170, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.54, p=31000, rack=1, used=285.032(GB), free=206.941(GB), util=57.9365%, nblocks=34951, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.34, p=31000, rack=1, used=563.095(GB), free=156.392(GB), util=78.2634%, nblocks=41691, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.34, p=31001, rack=1, used=506.618(GB), free=56.0474(GB), util=90.0389%, nblocks=40064, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0, overloaded=1 s=10.0.1.36, p=31000, rack=1, used=531.615(GB), free=59.0238(GB), util=90.0068%, nblocks=37389, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0, overloaded=1 s=10.0.1.36, p=31001, rack=1, used=557.213(GB), free=106.439(GB), util=83.9616%, nblocks=41289, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.33, p=31001, rack=1, used=295.819(GB), free=443.532(GB), util=40.0107%, nblocks=35512, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.33, p=31000, rack=1, used=188.675(GB), free=423.776(GB), util=30.8065%, nblocks=33086, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.79, p=31000, rack=1, used=11.8583(GB), free=431.129(GB), util=2.6769%, nblocks=3125, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.77, p=31000, rack=1, used=12.1811(GB), free=652.833(GB), util=1.83171%, nblocks=3003, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.74, p=31000, rack=1, used=11.5761(GB), free=483.46(GB), util=2.33843%, nblocks=3055, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.80, p=31000, rack=1, used=11.5342(GB), free=378.116(GB), util=2.96013%, nblocks=3057, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.73, p=31000, rack=1, used=11.3578(GB), free=737.449(GB), util=1.51679%, nblocks=3036, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.78, p=31000, rack=1, used=11.1976(GB), free=416.211(GB), util=2.61988%, nblocks=3066, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.74, p=31001, rack=1, used=11.5142(GB), free=767.223(GB), util=1.47857%, nblocks=3085, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.73, p=31001, rack=1, used=12.1274(GB), free=837.873(GB), util=1.42675%, nblocks=3115, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.77, p=31001, rack=1, used=11.7759(GB), free=592.708(GB), util=1.94809%, nblocks=3119, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.80, p=31001, rack=1, used=11.5353(GB), free=838.465(GB), util=1.35709%, nblocks=3082, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.79, p=31001, rack=1, used=11.8359(GB), free=838.164(GB), util=1.39246%, nblocks=3140, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.78, p=31001, rack=1, used=12.0909(GB), free=826.25(GB), util=1.44224%, nblocks=3166, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.75, p=31000, rack=1, used=11.0634(GB), free=173.986(GB), util=5.97864%, nblocks=2979, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.63, p=31000, rack=1, used=11.2592(GB), free=670.855(GB), util=1.65063%, nblocks=3060, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.56, p=31000, rack=1, used=10.7865(GB), free=813.933(GB), util=1.3079%, nblocks=2937, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.75, p=31001, rack=1, used=11.1986(GB), free=838.801(GB), util=1.31748%, nblocks=3029, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.63, p=31001, rack=1, used=11.077(GB), free=838.923(GB), util=1.30318%, nblocks=3022, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 s=10.0.1.56, p=31001, rack=1, used=11.2347(GB), free=838.765(GB), util=1.32173%, nblocks=3011, lastheard=3 (sec), ncorrupt=0, nchunksToMove=0 Down servers: 9 s=10.0.1.33, p=31000, down=Fri May 29 18:00:19 2009, reason=Unreachable s=10.0.1.33, p=31001, down=Fri May 29 18:04:19 2009, reason=Unreachable s=10.0.1.36, p=31000, down=Wed Jun 3 10:29:14 2009, reason=Unreachable s=10.0.1.36, p=31001, down=Wed Jun 3 10:29:14 2009, reason=Unreachable s=10.0.1.33, p=31001, down=Wed Jun 10 16:47:20 2009, reason=Unreachable s=10.0.1.33, p=31000, down=Wed Jun 10 16:51:21 2009, reason=Unreachable s=10.0.1.33, p=31000, down=Wed Jun 10 18:16:27 2009, reason=Unreachable s=10.0.1.33, p=31001, down=Wed Jun 10 18:16:27 2009, reason=Unreachable s=10.0.1.53, p=31000, down=Sun Jun 14 01:10:33 2009, reason=Unreachable Josh On Mon, Jun 22, 2009 at 11:57 PM, Sriram Rao<sri...@gm...> wrote: > That is odd...can you also provide me kfsping output? > > Sriram > > On Mon, Jun 22, 2009 at 4:08 PM, Josh Adams<jo...@gm...> wrote: >> Hey Sriram, thanks for the quick response. I ran a few permutations >> and got the same result for each: >> >> kfsretire -m tskfs1 -p 21000 -c tsaa76 -d 31000 >> kfsretire -m tskfs1 -p 21000 -c tsaa76 -d 31000 -v >> kfsretire -m tskfs1 -p 21000 -c tsaa76 -d 31000 -s 600 >> kfsretire -m tskfs1 -p 21000 -c tsaa76 -d 31000 -s 600 -v >> >> I verified that there are no typos with the ports and hostnames too :) >> >> Josh >> >> On Mon, Jun 22, 2009 at 4:04 PM, Sriram Rao<sri...@gm...> wrote: >>> What is the full command line that you ran? >>> >>> Sriram >>> >>> On Mon, Jun 22, 2009 at 3:57 PM, Josh Adams<jo...@gm...> wrote: >>>> Hey all, >>>> >>>> I'm getting this error when I try to run kfsretire on a node that I'd >>>> like to do some maintenance on. The corresponding chunkserver.log >>>> shows no relevant lines when I run the command and in metaserver.log I >>>> see a similar message to the output of kfsretire: >>>> >>>> Command Retiring server: <host> <port>, Status: -1 >>>> >>>> Thanks for your help! >>>> Josh >>>> >>>> ------------------------------------------------------------------------------ >>>> Are you an open source citizen? Join us for the Open Source Bridge conference! >>>> Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250. >>>> Need another reason to go? 24-hour hacker lounge. Register today! >>>> http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org >>>> _______________________________________________ >>>> Kosmosfs-users mailing list >>>> Kos...@li... >>>> https://lists.sourceforge.net/lists/listinfo/kosmosfs-users >>>> >>> >> > |