Why did chunkserver coredump?

kuer
2009-07-29
2013-04-25
  • kuer
    kuer
    2009-07-29

    There are 3 times coredump of chunkserver in one day, with the same backtrace.

    (gdb) bt
    #0  0x00002b302affa672 in __gnu_cxx::__exchange_and_add () from /usr/lib64/libstdc++.so.6
    #1  0x00002b302afdfa4b in std::string::assign () from /usr/lib64/libstdc++.so.6
    #2  0x000000000043f404 in KFS::WriteIdAllocOp::Execute ()
    #3  0x0000000000432702 in KFS::ClientSM::HandleClientCmd ()
    #4  0x0000000000432c8e in KFS::ClientSM::HandleRequest ()
    #5  0x000000000045f759 in KFS::NetConnection::HandleReadEvent ()
    #6  0x000000000046874a in KFS::NetManager::MainLoop ()
    #7  0x0000000000430ea5 in netWorker ()
    #8  0x00000038ae406367 in start_thread () from /lib64/libpthread.so.0
    #9  0x00000038ad8d2f7d in clone () from /lib64/libc.so.6

    and the followings are logs of each coredump :
    ** the 1st coredump:
    $ grep ' 00:43:' /home/s/logs/kfs.k1/kfs-k1-chunk.run | tail
    2009-07-28 00:43:10.465 1127856448 [INFO] (KfsOps.cc:1527) Executing write sync: write-sync: seq = 30628 chunkId = 6 chunkversion = 3 write-id info: 221.194.134.173 39500 5910 221.194.134.174 39500 5450 221.194.134.176 39500 5540
    2009-07-28 00:43:10.465 1127856448 [DEBUG] (KfsOps.cc:1639) Fwd'ing write-sync to peer: write-sync: seq = 6 chunkId = 6 chunkversion = 3 write-id info: 221.194.134.173 39500 5910 221.194.134.174 39500 5450 221.194.134.176 39500 5540
    2009-07-28 00:43:10.465 1127856448 [DEBUG] (ClientSM.cc:87) Client 221.194.134.175, Command write-sync: seq = 30628 chunkId = 6 chunkversion = 3 write-id info: 221.194.134.173 39500 5910 221.194.134.174 39500 5450 221.194.134.176 39500 5540: Response status: 0
    2009-07-28 00:43:10.465 1127856448 [INFO] (ClientSM.cc:95) Ack'ing write sync to 221.194.134.175: write-sync: seq = 30628 chunkId = 6 chunkversion = 3 write-id info: 221.194.134.173 39500 5910 221.194.134.174 39500 5450 221.194.134.176 39500 5540
    2009-07-28 00:43:10.758 1127856448 [DEBUG] (ClientSM.cc:301) Got command: write-id-alloc: seq = 127 chunkId = 5 chunkversion = 1 servers = 221.194.134.173 39500 221.194.134.174 39500 221.194.134.175 39500
    2009-07-28 00:43:10.758 1127856448 [INFO] (KfsOps.cc:1282) Fwd'ing write-id alloc to peer: write-id-alloc: seq = 4 chunkId = 5 chunkversion = 1 servers = 221.194.134.173 39500 221.194.134.174 39500 221.194.134.175 39500
    2009-07-28 00:43:10.758 1127856448 [INFO] (RemoteSyncSM.cc:123) Lost the connection to peer 221.194.134.174 39500; failing ops
    2009-07-28 00:43:10.758 1127856448 [INFO] (KfsOps.cc:1343) Sending write-id alloc back (status = -113): write-id-alloc: seq = 4 chunkId = 5 chunkversion = 1 servers = 221.194.134.173 39500 221.194.134.174 39500 221.194.134.175 39500
    2009-07-28 00:43:10.758 1127856448 [INFO] (KfsOps.cc:1306) write-id alloc failed: write-id-alloc: seq = 127 chunkId = 5 chunkversion = 1 servers = 221.194.134.173 39500 221.194.134.174 39500 221.194.134.175 39500, code = -113
    2009-07-28 00:43:10.758 1127856448 [DEBUG] (ClientSM.cc:87) Client 221.194.134.174, Command write-id-alloc: seq = 127 chunkId = 5 chunkversion = 1 servers = 221.194.134.173 39500 221.194.134.174 39500 221.194.134.175 39500: Response status: -113

    ** the 2nd coredump :
    $ grep ' 09:00:' /home/s/logs/kfs.k1/kfs-k1-chunk.run | tail
    2009-07-28 09:00:26.002 1125935424 [DEBUG] (ClientSM.cc:87) Client 221.194.134.174, Command write-sync: seq = 54 chunkId = 190 chunkversion = 1 write-id info: 221.194.134.175 39500 20882 221.194.134.174 39500 19786 221.194.134.173 39500 12082: Response status: 0
    2009-07-28 09:00:26.002 1125935424 [INFO] (ClientSM.cc:95) Ack'ing write sync to 221.194.134.174: write-sync: seq = 54 chunkId = 190 chunkversion = 1 write-id info: 221.194.134.175 39500 20882 221.194.134.174 39500 19786 221.194.134.173 39500 12082
    2009-07-28 09:00:26.002 1125935424 [DEBUG] (ClientSM.cc:87) Client 221.194.134.175, Command write-sync: seq = 1341 chunkId = 179 chunkversion = 1 write-id info: 221.194.134.174 39500 19787 221.194.134.175 39500 20883 221.194.134.173 39500 12083: Response status: 0
    2009-07-28 09:00:26.003 1125935424 [INFO] (ClientSM.cc:95) Ack'ing write sync to 221.194.134.175: write-sync: seq = 1341 chunkId = 179 chunkversion = 1 write-id info: 221.194.134.174 39500 19787 221.194.134.175 39500 20883 221.194.134.173 39500 12083
    2009-07-28 09:00:26.036 1125935424 [DEBUG] (ClientSM.cc:301) Got command: write-id-alloc: seq = 25460 chunkId = 5 chunkversion = 3 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
    2009-07-28 09:00:26.036 1125935424 [INFO] (KfsOps.cc:1282) Fwd'ing write-id alloc to peer: write-id-alloc: seq = 4 chunkId = 5 chunkversion = 3 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
    2009-07-28 09:00:26.036 1125935424 [INFO] (RemoteSyncSM.cc:123) Lost the connection to peer 221.194.134.176 39500; failing ops
    2009-07-28 09:00:26.036 1125935424 [INFO] (KfsOps.cc:1343) Sending write-id alloc back (status = -113): write-id-alloc: seq = 4 chunkId = 5 chunkversion = 3 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
    2009-07-28 09:00:26.036 1125935424 [INFO] (KfsOps.cc:1306) write-id alloc failed: write-id-alloc: seq = 25460 chunkId = 5 chunkversion = 3 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500, code = -113
    2009-07-28 09:00:26.036 1125935424 [DEBUG] (ClientSM.cc:87) Client 221.194.134.174, Command write-id-alloc: seq = 25460 chunkId = 5 chunkversion = 3 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500: Response status: -113

    ** the 3rd coredump :
    $ grep ' 17:53' /home/s/logs/kfs.k1/kfs-k1-chunk.run | tail
    2009-07-28 17:53:12.693 1115851072 [DEBUG] (ClientSM.cc:87) Client 221.194.134.175, Command write-sync: seq = 166342 chunkId = 784 chunkversion = 1 write-id info: 221.194.134.173 39500 27921 221.194.134.175 39500 51878 221.194.134.174 39500 47862: Response status: 0
    2009-07-28 17:53:12.693 1115851072 [INFO] (ClientSM.cc:95) Ack'ing write sync to 221.194.134.175: write-sync: seq = 166342 chunkId = 784 chunkversion = 1 write-id info: 221.194.134.173 39500 27921 221.194.134.175 39500 51878 221.194.134.174 39500 47862
    2009-07-28 17:53:12.694 1115851072 [DEBUG] (NetConnection.cc:56) Read 0 bytes...connection dropped
    2009-07-28 17:53:12.694 1115851072 [INFO] (ClientSM.cc:176) Closing connection from client 221.194.134.175
    2009-07-28 17:53:12.722 1115851072 [DEBUG] (ClientSM.cc:301) Got command: write-id-alloc: seq = 166358 chunkId = 6 chunkversion = 11 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
    2009-07-28 17:53:12.723 1115851072 [INFO] (KfsOps.cc:1282) Fwd'ing write-id alloc to peer: write-id-alloc: seq = 13 chunkId = 6 chunkversion = 11 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
    2009-07-28 17:53:12.723 1115851072 [INFO] (RemoteSyncSM.cc:123) Lost the connection to peer 221.194.134.176 39500; failing ops
    2009-07-28 17:53:12.723 1115851072 [INFO] (KfsOps.cc:1343) Sending write-id alloc back (status = -113): write-id-alloc: seq = 13 chunkId = 6 chunkversion = 11 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500
    2009-07-28 17:53:12.723 1115851072 [INFO] (KfsOps.cc:1306) write-id alloc failed: write-id-alloc: seq = 166358 chunkId = 6 chunkversion = 11 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500, code = -113
    2009-07-28 17:53:12.723 1115851072 [DEBUG] (ClientSM.cc:87) Client 221.194.134.175, Command write-id-alloc: seq = 166358 chunkId = 6 chunkversion = 11 servers = 221.194.134.173 39500 221.194.134.176 39500 221.194.134.175 39500: Response status: -113

    I want to know the reason to core dump, and I also want to know the line no of source code where core dump goes ??

    Can someone kindly tell me ?

    Thanks all

      -- kuer