From: Piotr R. K. <pio...@mo...> - 2017-04-07 12:41:48
|
Hey Zhongbo, We've followed you suggestion and added this change to MooseFS sources. Thanks! Please find an URL to appropriate commit below: https://github.com/moosefs/moosefs/commit/709eba4e72c997888f7e319740dcf45cbde5acbd <https://github.com/moosefs/moosefs/commit/709eba4e72c997888f7e319740dcf45cbde5acbd> Best regards, Peter -- Piotr Robert Konopelko MooseFS Technical Support Engineer | moosefs.com <https://moosefs.com/> > On 30 Mar 2017, at 5:02 AM, 田忠博(Zhongbo Tian) <win...@gm...> wrote: > > Hi Aleksander, > > Thanks for the quick reply. 3.0.90 DO fix the issue! Thank you. > > And on our production cluster, we found a lot of unconsumed messages in client's TCP receive queue. This led to periodically high load. After some investigation, we guess the client's `conncache` is too slow to digest KEEPALIVE messages. So we modified the source code to decrease the sleep time, and it seemed working for us. Here is our patch: > > """ > diff --git a/mfscommon/conncache.c b/mfscommon/conncache.c > index 4d33c19..b7a99bf 100644 > --- a/mfscommon/conncache.c > +++ b/mfscommon/conncache.c > @@ -161,7 +161,7 @@ void* conncache_keepalive_thread(void* arg) { > } > ka = keep_alive; > zassert(pthread_mutex_unlock(&glock)); > - portable_usleep(10000); > + portable_usleep(5000); > } > return arg; > } > """ > > Finally, I am curious on the progress of MooseFS 4.0. we are looking forward for the erase-coding implementation for a quite long time. And we also want to know how the MooseFS guys's option on Container Storage Interface (CSI), here you can find more details on it: https://github.com/docker/docker/issues/31923 <https://github.com/docker/docker/issues/31923> > > > And at the end, thank you for this excellent project. > > On Wed, Mar 29, 2017 at 6:04 PM Aleksander Wieliczko <ale...@mo... <mailto:ale...@mo...>> wrote: > Hi. > Did you tried the last stable MooseFS version 3.0.90? > > MooseFS 3.0.86 client has a few bugs, but they were fixed. > > Best regards > Aleksander Wieliczko > Technical Support Engineer > MooseFS.com <http://moosefs.com/> > On 29.03.2017 11:46, 田忠博(Zhongbo Tian) wrote: >> Hi all, >> >> We had encountered a weird issue after upgrading to moosefs 3.0.86. When we try to run ' TMPDIR=/some/moosefs/path python -c "import ctypes" ', we end up with a SIGBUS. >> After some investigations, we found it seems related with mmap, and we can reproduce this bug using following C code: >> """ >> >> #include <stdio.h> >> #include <fcntl.h> >> #include <unistd.h> >> #include <sys/mman.h> >> >> int main(int argc, char** argv) { >> int fd; >> char* filename; >> char *c2; >> if (argc != 2) { >> fprintf(stderr, "usage: %s <file>\n", argv[0]); >> return 1; >> } >> filename = argv[1]; >> unlink(filename); >> fd = open(filename, O_RDWR|O_CREAT, 0600); >> ftruncate(fd, 4096); >> c2 = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); >> *c2 = '\0'; // SIGBUS >> return 0; >> } >> >> """ >> Here is the strace for when we run this on a moosefs path: >> >> """ >> >> $ strace ./test /mfs/user/tianzhongbo/temp/test >> execve("./test", ["./test", "/mfs/user/tianzhongbo/temp/test"], [/* 52 vars */]) = 0 >> brk(0) = 0x949000 >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f825d85a000 >> access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) >> open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 >> fstat(3, {st_mode=S_IFREG|0644, st_size=114873, ...}) = 0 >> mmap(NULL, 114873, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f825d83d000 >> close(3) = 0 >> open("/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 >> read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0p\t\2\0\0\0\0\0"..., 832) = 832 >> fstat(3, {st_mode=S_IFREG|0755, st_size=1697568, ...}) = 0 >> mmap(NULL, 3804928, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f825d299000 >> mprotect(0x7f825d430000, 2097152, PROT_NONE) = 0 >> mmap(0x7f825d630000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x197000) = 0x7f825d630000 >> mmap(0x7f825d636000, 16128, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f825d636000 >> close(3) = 0 >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f825d83c000 >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f825d83b000 >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f825d83a000 >> arch_prctl(ARCH_SET_FS, 0x7f825d83b700) = 0 >> mprotect(0x7f825d630000, 16384, PROT_READ) = 0 >> mprotect(0x600000, 4096, PROT_READ) = 0 >> mprotect(0x7f825d85b000, 4096, PROT_READ) = 0 >> munmap(0x7f825d83d000, 114873) = 0 >> unlink("/mfs/user/tianzhongbo/temp/test") = 0 >> open("/mfs/user/tianzhongbo/temp/test", O_RDWR|O_CREAT, 0600) = 3 >> ftruncate(3, 4096) = 0 >> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, 3, 0) = 0x7f825d859000 >> --- SIGBUS {si_signo=SIGBUS, si_code=BUS_ADRERR, si_addr=0x7f825d859000} --- >> +++ killed by SIGBUS +++ >> Bus error >> >> """ >> >> Can anyone help to resolve this? >> >> > >> ------------------------------------------------------------------------------ >> Check out the vibrant tech community on one of the world's most >> engaging tech sites, Slashdot.org! http://sdm.link/slashdot <http://sdm.link/slashdot> >> >> _________________________________________ >> moosefs-users mailing list >> moo...@li... <mailto:moo...@li...> >> https://lists.sourceforge.net/lists/listinfo/moosefs-users <https://lists.sourceforge.net/lists/listinfo/moosefs-users> > > ------------------------------------------------------------------------------ > Check out the vibrant tech community on one of the world's most > engaging tech sites, Slashdot.org! http://sdm.link/slashdot_________________________________________ > moosefs-users mailing list > moo...@li... > https://lists.sourceforge.net/lists/listinfo/moosefs-users |