[Kosmosfs-users] metaserver crash
Status: Alpha
Brought to you by:
sriramsrao
|
From: Alexey T. <tim...@gm...> - 2008-02-04 09:17:41
|
Hello,
I discovered that sometimes metaserver crashes silently during I/O
operations (I used Fs2Kfs tool). Symptoms are: last message in log is
"Starting layout for req:xxx"; backtrace of core dump looks like
------
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000457740 in boost::detail::atomic_increment (pw=0x49)
at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:66
66 );
(gdb) bt
#0 0x0000000000457740 in boost::detail::atomic_increment (pw=0x49)
at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:66
#1 0x00000000004577b5 in boost::detail::sp_counted_base::add_ref_copy
(this=0x41)
at /usr/include/boost/detail/sp_counted_base_gcc_x86.hpp:133
#2 0x00000000004578b6 in shared_count (this=0x6e4938, r=@0x706da8) at
/usr/include/boost/detail/shared_count.hpp:170
#3 0x000000000045795b in shared_ptr (this=0x6e4930) at
/usr/include/boost/shared_ptr.hpp:106
#4 0x0000000000457c28 in
__gnu_cxx::new_allocator<boost::shared_ptr<KFS::ChunkServer> >::construct
(this=0x7094e0,
__p=0x6e4930, __val=@0x706da0)
at
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/ext/new_allocator.h:104
#5 0x000000000048f961 in std::vector<boost::shared_ptr<KFS::ChunkServer>,
std::allocator<boost::shared_ptr<KFS::ChunkServer> > >::push_back
(this=0x7094e0, __x=@0x706da0)
at
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../include/c++/4.1.2/bits/stl_vector.h:606
#6 0x0000000000485153 in KFS::LayoutManager::AllocateChunk (this=0x6da3e0,
r=0x709480)
at /root/kosmosfs-0.1.2/kfs/src/cc/meta/LayoutManager.cc:359
#7 0x000000000046034b in handle_allocate (r=0x709480) at /root/kosmosfs-
0.1.2/kfs/src/cc/meta/request.cc:413
#8 0x000000000045c035 in KFS::process_request () at /root/kosmosfs-0.1.2
/kfs/src/cc/meta/request.cc:693
#9 0x000000000046f8c7 in request_consumer (dummy=0x0) at /root/kosmosfs-
0.1.2/kfs/src/cc/meta/startup.cc:82
#10 0x0000003e936062f7 in start_thread () from /lib64/libpthread.so.0
#11 0x0000003e922d0fbd in clone () from /lib64/libc.so.6
--------
The reason is a typo in LayoutManager::AllocateChunk(MetaAllocate *r)
(src/cc/meta/LayoutManager.cc file). The following code
for (i = 0; r->servers.size() < (uint32_t) r->numReplicas &&
i < mChunkServers.size(); i++) {
must be replaced with
for (i = 0; r->servers.size() < (uint32_t) r->numReplicas &&
i < candidates.size(); i++) {
Sriram is aware of this issue and including the fix into next release.
--
Best regards,
Alexey Timanovsky.
|