kuer - 2009-07-29

There are 3 times coredump of chunkserver in one day, with the same backtrace.

(gdb) bt
#0  0x00002b302affa672 in __gnu_cxx::__exchange_and_add () from /usr/lib64/libstdc++.so.6
#1  0x00002b302afdfa4b in std::string::assign () from /usr/lib64/libstdc++.so.6
#2  0x000000000043f404 in KFS::WriteIdAllocOp::Execute ()
#3  0x0000000000432702 in KFS::ClientSM::HandleClientCmd ()
#4  0x0000000000432c8e in KFS::ClientSM::HandleRequest ()
#5  0x000000000045f759 in KFS::NetConnection::HandleReadEvent ()
#6  0x000000000046874a in KFS::NetManager::MainLoop ()
#7  0x0000000000430ea5 in netWorker ()
#8  0x00000038ae406367 in start_thread () from /lib64/libpthread.so.0
#9  0x00000038ad8d2f7d in clone () from /lib64/libc.so.6

and the followings are logs of each coredump :
** the 1st coredump:
$ grep ' 00:43:' /home/s/logs/kfs.k1/kfs-k1-chunk.run | tail
2009-07-28 00:43:10.465 1127856448 [INFO] (KfsOps.cc:1527) Executing write sync: write-sync: seq = 30628 chunkId = 6 chunkversion = 3 write-id info: 221.194.134.173 39500 5910 221.194.134.174 39500 5450 221.194.134.176 39500 5540
2009-07-28 00:43:10.465 1127856448 [DEBUG] (KfsOps.cc:1639) Fwd'ing write-sync to peer: write-sync: seq = 6 chunkId = 6 chunkversion = 3 write-id info: 221.194.134.173 39500 5910 221.194.134.174 39500 5450 221.194.134.176 39500 5540
2009-07-28 00:43:10.465 1127856448 [DEBUG] (ClientSM.cc:87) Client 221.194.134.175, Command write-sync: seq = 30628 chunkId = 6 chunkversion = 3 write-id info: 221.194.134.173 39500 5910 221.194.134.174 39500 5450 221.194.134.176 39500 5540: Response status: 0
2009-07-28 00:43:10.465 1127856448 [INFO] (ClientSM.cc:95) Ack'ing write sync to 221.194.134.175: write-sync: seq = 30628 chunkId = 6 chunkversion = 3 write-id info: 221.194.134.173 39500 5910 221.194.134.174 39500 5450 221.194.134.176 39500 5540
2009-07-28 00:43:10.758 1127856448 [DEBUG] (ClientSM.cc:301) Got command: write-id-alloc: seq = 127 chunkId = 5 chunkversion = 1 servers = 221.194.134.173 39500 221.194.134.174 39500 221.194.134.175 39500
2009-07-28 00:43:10.758 1127856448 [INFO] (KfsOps.cc:1282) Fwd'ing write-id alloc to peer: write-id-alloc: seq = 4 chunkId = 5 chunkversion = 1 servers = 221.194.134.173 39500 221.194.134.174 39500 221.194.134.175 39500
2009-07-28 00:43:10.758 1127856448 [INFO] (RemoteSyncSM.cc:123) Lost the connection to peer 221.194.134.174 39500; failing ops
2009-07-28 00:43:10.758 1127856448 [INFO] (KfsOps.cc:1343) Sending write-id alloc back (status = -113): write-id-alloc: seq = 4 chunkId = 5 chunkversion = 1 servers = 221.194.134.173 39500 221.194.134.174 39500 221.194.134.175 39500
2009-07-28 00:43:10.758 1127856448 [INFO] (KfsOps.cc:1306) write-id alloc failed: write-id-alloc: seq = 127 chunkId = 5 chunkversion = 1 servers = 221.194.134.173 39500 221.194.134.174 39500 221.194.134.175 39500, code = -113
2009-07-28 00:43:10.758 1127856448 [DEBUG] (ClientSM.cc:87) Client 221.194.134.174, Command write-id-alloc: seq = 127 chunkId = 5 chunkversion = 1 servers = 221.194.134.173 39500 221.194.134.174 39500 221.194.134.175 39500: Response status: -113

** the 2nd coredump :
$ grep ' 09:00:' /home/s/logs/kfs.k1/kfs-k1-chunk.run | tail
2009-07-28 09:00:26.002 1125935424 [DEBUG] (ClientSM.cc:87) Client 221.194.134.174, Command write-sync: seq = 54 chunkId = 190 chunkversion = 1 write-id info: 221.194.134.175 39500 20882 221.194.134.174 39500 19786 221.194.134.173 39500 12082: Response status: 0
2009-07-28 09:00:26.002 1125935424 [INFO] (ClientSM.cc:95) Ack'ing write sync to 221.194.134.174: write-sync: seq = 54 chunkId = 190 chunkversion = 1 write-id info: 221.194.134.175 39500 20882 221.194.134.174 39500 19786 221.194.134.173 39500 12082
2009-07-28 09:00:26.002 1125935424 [DEBUG] (ClientSM.cc:87) Client 221.194.134.175, Command write-sync: seq = 1341 chunkId = 179 chunkversion = 1 write-id info: 221.194.134.174 39500 19787 221.194.134.175 39500 20883 221.194.134.173 39500 12083: Response status: 0
2009-07-28 09:00:26.003 1125935424 [INFO] (ClientSM.cc:95) Ack'ing write sync to 221.194.134.175: write-sync: seq = 1341 chunkId = 179 chunkversion = 1 write-id info: 221.194.134.174 39500 19787 221.194.134.175 39500 20883 221.194.134.173 39500 12083
2009-07-28 09:00:26.036 1125935424 [DEBUG] (ClientSM.cc:301) Got command: write-id-alloc: seq = 25460 chunkId = 5 chunkversion = 3 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
2009-07-28 09:00:26.036 1125935424 [INFO] (KfsOps.cc:1282) Fwd'ing write-id alloc to peer: write-id-alloc: seq = 4 chunkId = 5 chunkversion = 3 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
2009-07-28 09:00:26.036 1125935424 [INFO] (RemoteSyncSM.cc:123) Lost the connection to peer 221.194.134.176 39500; failing ops
2009-07-28 09:00:26.036 1125935424 [INFO] (KfsOps.cc:1343) Sending write-id alloc back (status = -113): write-id-alloc: seq = 4 chunkId = 5 chunkversion = 3 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
2009-07-28 09:00:26.036 1125935424 [INFO] (KfsOps.cc:1306) write-id alloc failed: write-id-alloc: seq = 25460 chunkId = 5 chunkversion = 3 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500, code = -113
2009-07-28 09:00:26.036 1125935424 [DEBUG] (ClientSM.cc:87) Client 221.194.134.174, Command write-id-alloc: seq = 25460 chunkId = 5 chunkversion = 3 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500: Response status: -113

** the 3rd coredump :
$ grep ' 17:53' /home/s/logs/kfs.k1/kfs-k1-chunk.run | tail
2009-07-28 17:53:12.693 1115851072 [DEBUG] (ClientSM.cc:87) Client 221.194.134.175, Command write-sync: seq = 166342 chunkId = 784 chunkversion = 1 write-id info: 221.194.134.173 39500 27921 221.194.134.175 39500 51878 221.194.134.174 39500 47862: Response status: 0
2009-07-28 17:53:12.693 1115851072 [INFO] (ClientSM.cc:95) Ack'ing write sync to 221.194.134.175: write-sync: seq = 166342 chunkId = 784 chunkversion = 1 write-id info: 221.194.134.173 39500 27921 221.194.134.175 39500 51878 221.194.134.174 39500 47862
2009-07-28 17:53:12.694 1115851072 [DEBUG] (NetConnection.cc:56) Read 0 bytes...connection dropped
2009-07-28 17:53:12.694 1115851072 [INFO] (ClientSM.cc:176) Closing connection from client 221.194.134.175
2009-07-28 17:53:12.722 1115851072 [DEBUG] (ClientSM.cc:301) Got command: write-id-alloc: seq = 166358 chunkId = 6 chunkversion = 11 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
2009-07-28 17:53:12.723 1115851072 [INFO] (KfsOps.cc:1282) Fwd'ing write-id alloc to peer: write-id-alloc: seq = 13 chunkId = 6 chunkversion = 11 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
2009-07-28 17:53:12.723 1115851072 [INFO] (RemoteSyncSM.cc:123) Lost the connection to peer 221.194.134.176 39500; failing ops
2009-07-28 17:53:12.723 1115851072 [INFO] (KfsOps.cc:1343) Sending write-id alloc back (status = -113): write-id-alloc: seq = 13 chunkId = 6 chunkversion = 11 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
2009-07-28 17:53:12.723 1115851072 [INFO] (KfsOps.cc:1306) write-id alloc failed: write-id-alloc: seq = 166358 chunkId = 6 chunkversion = 11 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500, code = -113
2009-07-28 17:53:12.723 1115851072 [DEBUG] (ClientSM.cc:87) Client 221.194.134.175, Command write-id-alloc: seq = 166358 chunkId = 6 chunkversion = 11 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500: Response status: -113

I want to know the reason to core dump, and I also want to know the line no of source code where core dump goes ??

Can someone kindly tell me ?

Thanks all

  -- kuer