From: Gustaf N. <ne...@wu...> - 2020-12-15 16:10:05
|
On 15.12.20 13:18, David Osborne wrote: > So I have *removed* DSYSTEM_MALLOC from the default build flags and > created a new build... so far I've not been able to get that to crash > in test (where-as it was crashing every 3-4 requests when built with > DSYSTEM_MALLOC) ah, when it was crashing reliably, then it is easy to debug. The reference [3] below is pointing exactly to a fix for a case, where a mixup of memory allocators could lead to a crash. i would recommend to try the head version, this should not crash at all, with and without SYSTEM_MALLOC set. > I don't fully understand the implications of this - Is it a suitable > solution which we could use in production? We are using in our production environment always a configuration, where all mallocs are based on the system-malloc, and use as system malloc TCmalloc. See [4] for a comparison of malloc implementations with naviserver + Tcl. These are the memory and performance implications... which are irrelevant for small sites, but make a difference on large and busy sites. From the deployment side, when using Tcl with SYSTEM_MALLOC, you can't use the stock (debian) version of Tcl. We compile and install Tcl with --prefix=/usr/local/ns/ such that the Tcl-verson is in the /usr/local/ns tree. When producing new binaries of NaviServer, we produce as well new binaries of Tcl. Everything clear? -g [4] https://next-scripting.org/2.3.0/doc/misc/thread-mallocs/index1 > On Mon, 14 Dec 2020 at 20:36, Gustaf Neumann <ne...@wu... > <mailto:ne...@wu...>> wrote: > > Dear David, > > the crash looks like a problem in the OpenSSL memory management. > > In general, i would believe that this is a problem in the > NaviServer code, but of the interplay of the various memory > management options of OpenSSL, NaviServer and Tcl. We use these > functions under heavy load on many servers, but we are careful to > use everywhere the same malloc implementation (actually Google's > TCmalloc). > > OpenSSL: > ====== > > In general, OpenSSL supports configuration of management routines. > However, the memory management interface of OpenSSL changed with > the release of OpenSSL 1.1.0. As a consequence, when compiling > NaviServer with newer versions, of OpenSSL, the native OpenSSL > memory routines are used. The commit [1] says: "Registering our > own functions does not seem necessary". So, if one compiles a > version of NaviServer between 4.99.15 and 4.99.20 with newer > versions of OpenSSL, there might a problem arise, when the native > OpenSSL malloc implementation is not full thread-safe, or when a > mix between different malloc implementation happens. > > NaviServer: > ======= > > When NaviServer is compiled with -DSYSTEM_MALLOC, ns_malloc() uses > malloc() etc., otherwise it uses Tcl's ckalloc() and friends. > > Tcl: > === > There exists as well a patch [2] for using internally in Tcl as > well system malloc instead of Tcl's own mt-threaded version. > > In Oct there was as well a small patch for NaviServer for cases, > were Tcl and NaviServer are compiled with different memory > allocators [3]. > > My first attempt would be to compile NaviServer with SYSTEM_MALLOC > and check, whether you still experience a problem. The next > recommendation would be to check, what malloc versions are used by > which subsystems and align these if necessary. > > i will look into reviving the configuration of OpenSSL to allow to > configure its malloc implementation as it was possible before > OpenSSL 1.1.0. > > -gn > > [1] > https://bitbucket.org/naviserver/naviserver/commits/896a4e3765f91b048ccbf570e5afe21b1bb1a41f > <https://bitbucket.org/naviserver/naviserver/commits/896a4e3765f91b048ccbf570e5afe21b1bb1a41f> > [2] https://github.com/gustafn/install-ns > <https://github.com/gustafn/install-ns> > [3] > https://bitbucket.org/naviserver/naviserver/commits/caab40365f0429a44740db1927e9f459d733db3f > <https://bitbucket.org/naviserver/naviserver/commits/caab40365f0429a44740db1927e9f459d733db3f> > > On 14.12.20 18:07, David Osborne wrote: >> Hi, >> >> We're building some Naviserver instances (4.99.19) on Debian >> Buster (v10.7). >> One of the instances is a revproxy instance which uses connchans >> to speak to a back end. >> >> We're seeing very frequent signal 11 crashes of NaviServer with >> this combination. >> (We also see this infrequently with 4.99.18 running on Debian >> Stretch (v9)) >> >> Because of the increased frequency I've managed to take a core >> dump and the issue appears to be when calling SSL_CTX_new >> after Ns_TLS_CtxClientCreate. >> >> I realise I don't have gdb properly configured, but wondering if >> the backtrace as it is could shed any light on what's going on or >> is it still too opaque? >> >> Using host libthread_db library >> "/lib/x86_64-linux-gnu/libthread_db.so.1". >> Core was generated by `/usr/lib/naviserver/bin/nsd -u nsd -g nsd >> -b 0.0.0.0:80 <http://0.0.0.0:80>,0.0.0.0:443 >> <http://0.0.0.0:443> -i -t /etc/'. >> Program terminated with signal SIGABRT, Aborted. >> #0 __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:50 >> 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >> [Current thread is 1 (Thread 0x7f4405ddf700 (LWP 13613))] >> (gdb) bt >> #0 __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:50 >> #1 0x00007f4407936535 in __GI_abort () at abort.c:79 >> #2 0x00007f440847cfe6 in Panic (fmt=<optimized out>) at log.c:928 >> #3 0x00007f44080fbc4a in Tcl_PanicVA () from >> /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> >> #4 0x00007f44080fbdb9 in Tcl_Panic () from >> /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> >> #5 0x00007f44084bbc74 in Abort (signal=<optimized out>) at >> unix.c:1115 >> #6 <signal handler called> >> #7 malloc_consolidate (av=av@entry=0x7f43bc000020) at malloc.c:4486 >> #8 0x00007f4407996a58 in _int_malloc >> (av=av@entry=0x7f43bc000020, bytes=bytes@entry=1024) at malloc.c:3695 >> #9 0x00007f440799856a in __GI___libc_malloc (bytes=1024) at >> malloc.c:3057 >> #10 0x00007f4407c63559 in CRYPTO_zalloc () from >> /lib/x86_64-linux-gnu/libcrypto.so.1.1 >> #11 0x00007f4407df7699 in SSL_CTX_new () from >> /lib/x86_64-linux-gnu/libssl.so.1.1 >> #12 0x00007f44084b4d85 in Ns_TLS_CtxClientCreate >> (interp=interp@entry=0x7f43bc009ee0, cert=cert@entry=0x0, >> caFile=caFile@entry=0x0, caPath=caPath@entry=0x0, >> verify=verify@entry=false, >> ctxPtr=ctxPtr@entry=0x7f4405dde7c0) at tls.c:116 >> #13 0x00007f44084687a4 in ConnChanOpenObjCmd >> (clientData=<optimized out>, interp=0x7f43bc009ee0, >> objc=<optimized out>, objv=<optimized out>) >> at connchan.c:1010 >> #14 0x00007f44084a7eb8 in Ns_SubcmdObjv >> (subcmdSpec=subcmdSpec@entry=0x7f4405dde990, >> clientData=0x7f43bc047870, interp=0x7f43bc009ee0, objc=13, >> objv=0x7f43bc017ff8) at tclobjv.c:1849 >> #15 0x00007f4408469d45 in NsTclConnChanObjCmd >> (clientData=<optimized out>, interp=<optimized out>, >> objc=<optimized out>, objv=<optimized out>) >> at connchan.c:1761 >> #16 0x00007f440802ffb7 in TclNRRunCallbacks () from >> /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> >> #17 0x00007f44080313af in ?? () from >> /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> >> #18 0x00007f4408030d13 in Tcl_EvalEx () from >> /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> >> #19 0x00007f44084a9164 in NsTclFilterProc (arg=0x55af6a3e9880, >> conn=0x55af6a502480, why=NS_FILTER_PRE_AUTH) at tclrequest.c:535 >> #20 0x00007f4408478370 in NsRunFilters >> (conn=conn@entry=0x55af6a502480, >> why=why@entry=NS_FILTER_PRE_AUTH) at filter.c:160 >> #21 0x00007f440848654d in ConnRun >> (connPtr=connPtr@entry=0x55af6a502480) at queue.c:2450 >> #22 0x00007f4408485b33 in NsConnThread (arg=0x55af6a4a0090) at >> queue.c:2157 >> #23 0x00007f44081b2bb1 in NsThreadMain (arg=0x55af6a354f50) at >> thread.c:230 >> #24 0x00007f44081b3af9 in ThreadMain (arg=<optimized out>) at >> pthread.c:836 >> #25 0x00007f44078f5fa3 in start_thread (arg=<optimized out>) at >> pthread_create.c:486 >> #26 0x00007f4407a0d4cf in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 >> >> -- >> Regards, >> David > _______________________________________________ > naviserver-devel mailing list > nav...@li... > <mailto:nav...@li...> > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > <https://lists.sourceforge.net/lists/listinfo/naviserver-devel> > > > > -- > > *David Osborne | Software Engineer* > Qcode Software, Castle House, Fairways Business Park, Inverness, IV2 6AA > *Email:* da...@qc... <mailto:da...@qc...> | *Phone:* 01463 > 896 484 > www.qcode.co.uk <https://www.qcode.co.uk/> > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel |