From: David O. <da...@qc...> - 2020-12-14 17:37:24
|
Hi, We're building some Naviserver instances (4.99.19) on Debian Buster (v10.7). One of the instances is a revproxy instance which uses connchans to speak to a back end. We're seeing very frequent signal 11 crashes of NaviServer with this combination. (We also see this infrequently with 4.99.18 running on Debian Stretch (v9)) Because of the increased frequency I've managed to take a core dump and the issue appears to be when calling SSL_CTX_new after Ns_TLS_CtxClientCreate. I realise I don't have gdb properly configured, but wondering if the backtrace as it is could shed any light on what's going on or is it still too opaque? Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/lib/naviserver/bin/nsd -u nsd -g nsd -b 0.0.0.0:80,0.0.0.0:443 -i -t /etc/'. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. [Current thread is 1 (Thread 0x7f4405ddf700 (LWP 13613))] (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f4407936535 in __GI_abort () at abort.c:79 #2 0x00007f440847cfe6 in Panic (fmt=<optimized out>) at log.c:928 #3 0x00007f44080fbc4a in Tcl_PanicVA () from /lib/x86_64-linux-gnu/ libtcl8.6.so #4 0x00007f44080fbdb9 in Tcl_Panic () from /lib/x86_64-linux-gnu/ libtcl8.6.so #5 0x00007f44084bbc74 in Abort (signal=<optimized out>) at unix.c:1115 #6 <signal handler called> #7 malloc_consolidate (av=av@entry=0x7f43bc000020) at malloc.c:4486 #8 0x00007f4407996a58 in _int_malloc (av=av@entry=0x7f43bc000020, bytes=bytes@entry=1024) at malloc.c:3695 #9 0x00007f440799856a in __GI___libc_malloc (bytes=1024) at malloc.c:3057 #10 0x00007f4407c63559 in CRYPTO_zalloc () from /lib/x86_64-linux-gnu/libcrypto.so.1.1 #11 0x00007f4407df7699 in SSL_CTX_new () from /lib/x86_64-linux-gnu/libssl.so.1.1 #12 0x00007f44084b4d85 in Ns_TLS_CtxClientCreate (interp=interp@entry=0x7f43bc009ee0, cert=cert@entry=0x0, caFile=caFile@entry=0x0, caPath=caPath@entry=0x0, verify=verify@entry=false, ctxPtr=ctxPtr@entry=0x7f4405dde7c0) at tls.c:116 #13 0x00007f44084687a4 in ConnChanOpenObjCmd (clientData=<optimized out>, interp=0x7f43bc009ee0, objc=<optimized out>, objv=<optimized out>) at connchan.c:1010 #14 0x00007f44084a7eb8 in Ns_SubcmdObjv (subcmdSpec=subcmdSpec@entry=0x7f4405dde990, clientData=0x7f43bc047870, interp=0x7f43bc009ee0, objc=13, objv=0x7f43bc017ff8) at tclobjv.c:1849 #15 0x00007f4408469d45 in NsTclConnChanObjCmd (clientData=<optimized out>, interp=<optimized out>, objc=<optimized out>, objv=<optimized out>) at connchan.c:1761 #16 0x00007f440802ffb7 in TclNRRunCallbacks () from /lib/x86_64-linux-gnu/ libtcl8.6.so #17 0x00007f44080313af in ?? () from /lib/x86_64-linux-gnu/libtcl8.6.so #18 0x00007f4408030d13 in Tcl_EvalEx () from /lib/x86_64-linux-gnu/ libtcl8.6.so #19 0x00007f44084a9164 in NsTclFilterProc (arg=0x55af6a3e9880, conn=0x55af6a502480, why=NS_FILTER_PRE_AUTH) at tclrequest.c:535 #20 0x00007f4408478370 in NsRunFilters (conn=conn@entry=0x55af6a502480, why=why@entry=NS_FILTER_PRE_AUTH) at filter.c:160 #21 0x00007f440848654d in ConnRun (connPtr=connPtr@entry=0x55af6a502480) at queue.c:2450 #22 0x00007f4408485b33 in NsConnThread (arg=0x55af6a4a0090) at queue.c:2157 #23 0x00007f44081b2bb1 in NsThreadMain (arg=0x55af6a354f50) at thread.c:230 #24 0x00007f44081b3af9 in ThreadMain (arg=<optimized out>) at pthread.c:836 #25 0x00007f44078f5fa3 in start_thread (arg=<optimized out>) at pthread_create.c:486 #26 0x00007f4407a0d4cf in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 -- Regards, David |
From: Gustaf N. <ne...@wu...> - 2020-12-14 20:35:36
|
Dear David, the crash looks like a problem in the OpenSSL memory management. In general, i would believe that this is a problem in the NaviServer code, but of the interplay of the various memory management options of OpenSSL, NaviServer and Tcl. We use these functions under heavy load on many servers, but we are careful to use everywhere the same malloc implementation (actually Google's TCmalloc). OpenSSL: ====== In general, OpenSSL supports configuration of management routines. However, the memory management interface of OpenSSL changed with the release of OpenSSL 1.1.0. As a consequence, when compiling NaviServer with newer versions, of OpenSSL, the native OpenSSL memory routines are used. The commit [1] says: "Registering our own functions does not seem necessary". So, if one compiles a version of NaviServer between 4.99.15 and 4.99.20 with newer versions of OpenSSL, there might a problem arise, when the native OpenSSL malloc implementation is not full thread-safe, or when a mix between different malloc implementation happens. NaviServer: ======= When NaviServer is compiled with -DSYSTEM_MALLOC, ns_malloc() uses malloc() etc., otherwise it uses Tcl's ckalloc() and friends. Tcl: === There exists as well a patch [2] for using internally in Tcl as well system malloc instead of Tcl's own mt-threaded version. In Oct there was as well a small patch for NaviServer for cases, were Tcl and NaviServer are compiled with different memory allocators [3]. My first attempt would be to compile NaviServer with SYSTEM_MALLOC and check, whether you still experience a problem. The next recommendation would be to check, what malloc versions are used by which subsystems and align these if necessary. i will look into reviving the configuration of OpenSSL to allow to configure its malloc implementation as it was possible before OpenSSL 1.1.0. -gn [1] https://bitbucket.org/naviserver/naviserver/commits/896a4e3765f91b048ccbf570e5afe21b1bb1a41f [2] https://github.com/gustafn/install-ns [3] https://bitbucket.org/naviserver/naviserver/commits/caab40365f0429a44740db1927e9f459d733db3f On 14.12.20 18:07, David Osborne wrote: > Hi, > > We're building some Naviserver instances (4.99.19) on Debian Buster > (v10.7). > One of the instances is a revproxy instance which uses connchans to > speak to a back end. > > We're seeing very frequent signal 11 crashes of NaviServer with this > combination. > (We also see this infrequently with 4.99.18 running on Debian Stretch > (v9)) > > Because of the increased frequency I've managed to take a core dump > and the issue appears to be when calling SSL_CTX_new > after Ns_TLS_CtxClientCreate. > > I realise I don't have gdb properly configured, but wondering if the > backtrace as it is could shed any light on what's going on or is it > still too opaque? > > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Core was generated by `/usr/lib/naviserver/bin/nsd -u nsd -g nsd -b > 0.0.0.0:80 <http://0.0.0.0:80>,0.0.0.0:443 <http://0.0.0.0:443> -i -t > /etc/'. > Program terminated with signal SIGABRT, Aborted. > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > [Current thread is 1 (Thread 0x7f4405ddf700 (LWP 13613))] > (gdb) bt > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007f4407936535 in __GI_abort () at abort.c:79 > #2 0x00007f440847cfe6 in Panic (fmt=<optimized out>) at log.c:928 > #3 0x00007f44080fbc4a in Tcl_PanicVA () from > /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> > #4 0x00007f44080fbdb9 in Tcl_Panic () from > /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> > #5 0x00007f44084bbc74 in Abort (signal=<optimized out>) at unix.c:1115 > #6 <signal handler called> > #7 malloc_consolidate (av=av@entry=0x7f43bc000020) at malloc.c:4486 > #8 0x00007f4407996a58 in _int_malloc (av=av@entry=0x7f43bc000020, > bytes=bytes@entry=1024) at malloc.c:3695 > #9 0x00007f440799856a in __GI___libc_malloc (bytes=1024) at malloc.c:3057 > #10 0x00007f4407c63559 in CRYPTO_zalloc () from > /lib/x86_64-linux-gnu/libcrypto.so.1.1 > #11 0x00007f4407df7699 in SSL_CTX_new () from > /lib/x86_64-linux-gnu/libssl.so.1.1 > #12 0x00007f44084b4d85 in Ns_TLS_CtxClientCreate > (interp=interp@entry=0x7f43bc009ee0, cert=cert@entry=0x0, > caFile=caFile@entry=0x0, caPath=caPath@entry=0x0, > verify=verify@entry=false, ctxPtr=ctxPtr@entry=0x7f4405dde7c0) at > tls.c:116 > #13 0x00007f44084687a4 in ConnChanOpenObjCmd (clientData=<optimized > out>, interp=0x7f43bc009ee0, objc=<optimized out>, objv=<optimized out>) > at connchan.c:1010 > #14 0x00007f44084a7eb8 in Ns_SubcmdObjv > (subcmdSpec=subcmdSpec@entry=0x7f4405dde990, > clientData=0x7f43bc047870, interp=0x7f43bc009ee0, objc=13, > objv=0x7f43bc017ff8) at tclobjv.c:1849 > #15 0x00007f4408469d45 in NsTclConnChanObjCmd (clientData=<optimized > out>, interp=<optimized out>, objc=<optimized out>, objv=<optimized out>) > at connchan.c:1761 > #16 0x00007f440802ffb7 in TclNRRunCallbacks () from > /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> > #17 0x00007f44080313af in ?? () from > /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> > #18 0x00007f4408030d13 in Tcl_EvalEx () from > /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> > #19 0x00007f44084a9164 in NsTclFilterProc (arg=0x55af6a3e9880, > conn=0x55af6a502480, why=NS_FILTER_PRE_AUTH) at tclrequest.c:535 > #20 0x00007f4408478370 in NsRunFilters > (conn=conn@entry=0x55af6a502480, why=why@entry=NS_FILTER_PRE_AUTH) at > filter.c:160 > #21 0x00007f440848654d in ConnRun > (connPtr=connPtr@entry=0x55af6a502480) at queue.c:2450 > #22 0x00007f4408485b33 in NsConnThread (arg=0x55af6a4a0090) at > queue.c:2157 > #23 0x00007f44081b2bb1 in NsThreadMain (arg=0x55af6a354f50) at > thread.c:230 > #24 0x00007f44081b3af9 in ThreadMain (arg=<optimized out>) at > pthread.c:836 > #25 0x00007f44078f5fa3 in start_thread (arg=<optimized out>) at > pthread_create.c:486 > #26 0x00007f4407a0d4cf in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > -- > Regards, > David |
From: David O. <da...@qc...> - 2020-12-15 12:19:26
|
Thanks very much Gustaf, Looking in the build output I see the "-DSYSTEM_MALLOC" was already in place during the build of the binary which is now crashing. So I have *removed* DSYSTEM_MALLOC from the default build flags and created a new build... so far I've not been able to get that to crash in test (where-as it was crashing every 3-4 requests when built with DSYSTEM_MALLOC) I don't fully understand the implications of this - Is it a suitable solution which we could use in production? On Mon, 14 Dec 2020 at 20:36, Gustaf Neumann <ne...@wu...> wrote: > Dear David, > > the crash looks like a problem in the OpenSSL memory management. > > In general, i would believe that this is a problem in the NaviServer code, > but of the interplay of the various memory management options of OpenSSL, > NaviServer and Tcl. We use these functions under heavy load on many > servers, but we are careful to use everywhere the same malloc > implementation (actually Google's TCmalloc). > > OpenSSL: > ====== > > In general, OpenSSL supports configuration of management routines. > However, the memory management interface of OpenSSL changed with the > release of OpenSSL 1.1.0. As a consequence, when compiling NaviServer with > newer versions, of OpenSSL, the native OpenSSL memory routines are used. > The commit [1] says: "Registering our own functions does not seem > necessary". So, if one compiles a version of NaviServer between 4.99.15 and > 4.99.20 with newer versions of OpenSSL, there might a problem arise, when > the native OpenSSL malloc implementation is not full thread-safe, or when a > mix between different malloc implementation happens. > > NaviServer: > ======= > > When NaviServer is compiled with -DSYSTEM_MALLOC, ns_malloc() uses > malloc() etc., otherwise it uses Tcl's ckalloc() and friends. > > Tcl: > === > There exists as well a patch [2] for using internally in Tcl as well > system malloc instead of Tcl's own mt-threaded version. > > In Oct there was as well a small patch for NaviServer for cases, were Tcl > and NaviServer are compiled with different memory allocators [3]. > > My first attempt would be to compile NaviServer with SYSTEM_MALLOC and > check, whether you still experience a problem. The next recommendation > would be to check, what malloc versions are used by which subsystems and > align these if necessary. > > i will look into reviving the configuration of OpenSSL to allow to > configure its malloc implementation as it was possible before OpenSSL 1.1.0. > > -gn > > [1] > https://bitbucket.org/naviserver/naviserver/commits/896a4e3765f91b048ccbf570e5afe21b1bb1a41f > [2] https://github.com/gustafn/install-ns > [3] > https://bitbucket.org/naviserver/naviserver/commits/caab40365f0429a44740db1927e9f459d733db3f > On 14.12.20 18:07, David Osborne wrote: > > Hi, > > We're building some Naviserver instances (4.99.19) on Debian Buster > (v10.7). > One of the instances is a revproxy instance which uses connchans to speak > to a back end. > > We're seeing very frequent signal 11 crashes of NaviServer with this > combination. > (We also see this infrequently with 4.99.18 running on Debian Stretch (v9)) > > Because of the increased frequency I've managed to take a core dump and > the issue appears to be when calling SSL_CTX_new > after Ns_TLS_CtxClientCreate. > > I realise I don't have gdb properly configured, but wondering if the > backtrace as it is could shed any light on what's going on or is it still > too opaque? > > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Core was generated by `/usr/lib/naviserver/bin/nsd -u nsd -g nsd -b > 0.0.0.0:80,0.0.0.0:443 -i -t /etc/'. > Program terminated with signal SIGABRT, Aborted. > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > [Current thread is 1 (Thread 0x7f4405ddf700 (LWP 13613))] > (gdb) bt > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007f4407936535 in __GI_abort () at abort.c:79 > #2 0x00007f440847cfe6 in Panic (fmt=<optimized out>) at log.c:928 > #3 0x00007f44080fbc4a in Tcl_PanicVA () from /lib/x86_64-linux-gnu/ > libtcl8.6.so > #4 0x00007f44080fbdb9 in Tcl_Panic () from /lib/x86_64-linux-gnu/ > libtcl8.6.so > #5 0x00007f44084bbc74 in Abort (signal=<optimized out>) at unix.c:1115 > #6 <signal handler called> > #7 malloc_consolidate (av=av@entry=0x7f43bc000020) at malloc.c:4486 > #8 0x00007f4407996a58 in _int_malloc (av=av@entry=0x7f43bc000020, > bytes=bytes@entry=1024) at malloc.c:3695 > #9 0x00007f440799856a in __GI___libc_malloc (bytes=1024) at malloc.c:3057 > #10 0x00007f4407c63559 in CRYPTO_zalloc () from > /lib/x86_64-linux-gnu/libcrypto.so.1.1 > #11 0x00007f4407df7699 in SSL_CTX_new () from > /lib/x86_64-linux-gnu/libssl.so.1.1 > #12 0x00007f44084b4d85 in Ns_TLS_CtxClientCreate (interp=interp@entry=0x7f43bc009ee0, > cert=cert@entry=0x0, caFile=caFile@entry=0x0, caPath=caPath@entry=0x0, > verify=verify@entry=false, ctxPtr=ctxPtr@entry=0x7f4405dde7c0) at > tls.c:116 > #13 0x00007f44084687a4 in ConnChanOpenObjCmd (clientData=<optimized out>, > interp=0x7f43bc009ee0, objc=<optimized out>, objv=<optimized out>) > at connchan.c:1010 > #14 0x00007f44084a7eb8 in Ns_SubcmdObjv (subcmdSpec=subcmdSpec@entry=0x7f4405dde990, > clientData=0x7f43bc047870, interp=0x7f43bc009ee0, objc=13, > objv=0x7f43bc017ff8) at tclobjv.c:1849 > #15 0x00007f4408469d45 in NsTclConnChanObjCmd (clientData=<optimized out>, > interp=<optimized out>, objc=<optimized out>, objv=<optimized out>) > at connchan.c:1761 > #16 0x00007f440802ffb7 in TclNRRunCallbacks () from /lib/x86_64-linux-gnu/ > libtcl8.6.so > #17 0x00007f44080313af in ?? () from /lib/x86_64-linux-gnu/libtcl8.6.so > #18 0x00007f4408030d13 in Tcl_EvalEx () from /lib/x86_64-linux-gnu/ > libtcl8.6.so > #19 0x00007f44084a9164 in NsTclFilterProc (arg=0x55af6a3e9880, > conn=0x55af6a502480, why=NS_FILTER_PRE_AUTH) at tclrequest.c:535 > #20 0x00007f4408478370 in NsRunFilters (conn=conn@entry=0x55af6a502480, > why=why@entry=NS_FILTER_PRE_AUTH) at filter.c:160 > #21 0x00007f440848654d in ConnRun (connPtr=connPtr@entry=0x55af6a502480) > at queue.c:2450 > #22 0x00007f4408485b33 in NsConnThread (arg=0x55af6a4a0090) at queue.c:2157 > #23 0x00007f44081b2bb1 in NsThreadMain (arg=0x55af6a354f50) at thread.c:230 > #24 0x00007f44081b3af9 in ThreadMain (arg=<optimized out>) at pthread.c:836 > #25 0x00007f44078f5fa3 in start_thread (arg=<optimized out>) at > pthread_create.c:486 > #26 0x00007f4407a0d4cf in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > -- > Regards, > David > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > -- *David Osborne | Software Engineer* Qcode Software, Castle House, Fairways Business Park, Inverness, IV2 6AA *Email:* da...@qc... | *Phone:* 01463 896 484 www.qcode.co.uk |
From: Zoran V. <zv...@ar...> - 2020-12-15 13:13:42
|
On Tue, 15 Dec 2020 12:18:57 +0000 David Osborne <da...@qc...> wrote: > So I have *removed* DSYSTEM_MALLOC from the default build flags and created > a new build... so far I've not been able to get that to crash in test > (where-as it was crashing every 3-4 requests when built with > DSYSTEM_MALLOC) If it does not crash with system malloc, the problem is just "masked-away" and may (or may not) re-appear at some other place/time. The only right way is to locate the culprit. But this can be a daunting task... If you want to debug it, the best way is to use the regular system malloc, turn-on any/all possible debug features of it (or of the compiler) and have a debugger attached to the process all the time, until it breaks. At that point, a stack trace may point you in the (potentially) right direction (or may not, depending on luck or lack-of thereof)... Alternatively, if running on Linux, a valgrind session may also be helpful. If the problem is reproducible (which most of the such problems are not) then it is easier... |
From: Gustaf N. <ne...@wu...> - 2020-12-15 16:10:05
|
On 15.12.20 13:18, David Osborne wrote: > So I have *removed* DSYSTEM_MALLOC from the default build flags and > created a new build... so far I've not been able to get that to crash > in test (where-as it was crashing every 3-4 requests when built with > DSYSTEM_MALLOC) ah, when it was crashing reliably, then it is easy to debug. The reference [3] below is pointing exactly to a fix for a case, where a mixup of memory allocators could lead to a crash. i would recommend to try the head version, this should not crash at all, with and without SYSTEM_MALLOC set. > I don't fully understand the implications of this - Is it a suitable > solution which we could use in production? We are using in our production environment always a configuration, where all mallocs are based on the system-malloc, and use as system malloc TCmalloc. See [4] for a comparison of malloc implementations with naviserver + Tcl. These are the memory and performance implications... which are irrelevant for small sites, but make a difference on large and busy sites. From the deployment side, when using Tcl with SYSTEM_MALLOC, you can't use the stock (debian) version of Tcl. We compile and install Tcl with --prefix=/usr/local/ns/ such that the Tcl-verson is in the /usr/local/ns tree. When producing new binaries of NaviServer, we produce as well new binaries of Tcl. Everything clear? -g [4] https://next-scripting.org/2.3.0/doc/misc/thread-mallocs/index1 > On Mon, 14 Dec 2020 at 20:36, Gustaf Neumann <ne...@wu... > <mailto:ne...@wu...>> wrote: > > Dear David, > > the crash looks like a problem in the OpenSSL memory management. > > In general, i would believe that this is a problem in the > NaviServer code, but of the interplay of the various memory > management options of OpenSSL, NaviServer and Tcl. We use these > functions under heavy load on many servers, but we are careful to > use everywhere the same malloc implementation (actually Google's > TCmalloc). > > OpenSSL: > ====== > > In general, OpenSSL supports configuration of management routines. > However, the memory management interface of OpenSSL changed with > the release of OpenSSL 1.1.0. As a consequence, when compiling > NaviServer with newer versions, of OpenSSL, the native OpenSSL > memory routines are used. The commit [1] says: "Registering our > own functions does not seem necessary". So, if one compiles a > version of NaviServer between 4.99.15 and 4.99.20 with newer > versions of OpenSSL, there might a problem arise, when the native > OpenSSL malloc implementation is not full thread-safe, or when a > mix between different malloc implementation happens. > > NaviServer: > ======= > > When NaviServer is compiled with -DSYSTEM_MALLOC, ns_malloc() uses > malloc() etc., otherwise it uses Tcl's ckalloc() and friends. > > Tcl: > === > There exists as well a patch [2] for using internally in Tcl as > well system malloc instead of Tcl's own mt-threaded version. > > In Oct there was as well a small patch for NaviServer for cases, > were Tcl and NaviServer are compiled with different memory > allocators [3]. > > My first attempt would be to compile NaviServer with SYSTEM_MALLOC > and check, whether you still experience a problem. The next > recommendation would be to check, what malloc versions are used by > which subsystems and align these if necessary. > > i will look into reviving the configuration of OpenSSL to allow to > configure its malloc implementation as it was possible before > OpenSSL 1.1.0. > > -gn > > [1] > https://bitbucket.org/naviserver/naviserver/commits/896a4e3765f91b048ccbf570e5afe21b1bb1a41f > <https://bitbucket.org/naviserver/naviserver/commits/896a4e3765f91b048ccbf570e5afe21b1bb1a41f> > [2] https://github.com/gustafn/install-ns > <https://github.com/gustafn/install-ns> > [3] > https://bitbucket.org/naviserver/naviserver/commits/caab40365f0429a44740db1927e9f459d733db3f > <https://bitbucket.org/naviserver/naviserver/commits/caab40365f0429a44740db1927e9f459d733db3f> > > On 14.12.20 18:07, David Osborne wrote: >> Hi, >> >> We're building some Naviserver instances (4.99.19) on Debian >> Buster (v10.7). >> One of the instances is a revproxy instance which uses connchans >> to speak to a back end. >> >> We're seeing very frequent signal 11 crashes of NaviServer with >> this combination. >> (We also see this infrequently with 4.99.18 running on Debian >> Stretch (v9)) >> >> Because of the increased frequency I've managed to take a core >> dump and the issue appears to be when calling SSL_CTX_new >> after Ns_TLS_CtxClientCreate. >> >> I realise I don't have gdb properly configured, but wondering if >> the backtrace as it is could shed any light on what's going on or >> is it still too opaque? >> >> Using host libthread_db library >> "/lib/x86_64-linux-gnu/libthread_db.so.1". >> Core was generated by `/usr/lib/naviserver/bin/nsd -u nsd -g nsd >> -b 0.0.0.0:80 <http://0.0.0.0:80>,0.0.0.0:443 >> <http://0.0.0.0:443> -i -t /etc/'. >> Program terminated with signal SIGABRT, Aborted. >> #0 __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:50 >> 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. >> [Current thread is 1 (Thread 0x7f4405ddf700 (LWP 13613))] >> (gdb) bt >> #0 __GI_raise (sig=sig@entry=6) at >> ../sysdeps/unix/sysv/linux/raise.c:50 >> #1 0x00007f4407936535 in __GI_abort () at abort.c:79 >> #2 0x00007f440847cfe6 in Panic (fmt=<optimized out>) at log.c:928 >> #3 0x00007f44080fbc4a in Tcl_PanicVA () from >> /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> >> #4 0x00007f44080fbdb9 in Tcl_Panic () from >> /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> >> #5 0x00007f44084bbc74 in Abort (signal=<optimized out>) at >> unix.c:1115 >> #6 <signal handler called> >> #7 malloc_consolidate (av=av@entry=0x7f43bc000020) at malloc.c:4486 >> #8 0x00007f4407996a58 in _int_malloc >> (av=av@entry=0x7f43bc000020, bytes=bytes@entry=1024) at malloc.c:3695 >> #9 0x00007f440799856a in __GI___libc_malloc (bytes=1024) at >> malloc.c:3057 >> #10 0x00007f4407c63559 in CRYPTO_zalloc () from >> /lib/x86_64-linux-gnu/libcrypto.so.1.1 >> #11 0x00007f4407df7699 in SSL_CTX_new () from >> /lib/x86_64-linux-gnu/libssl.so.1.1 >> #12 0x00007f44084b4d85 in Ns_TLS_CtxClientCreate >> (interp=interp@entry=0x7f43bc009ee0, cert=cert@entry=0x0, >> caFile=caFile@entry=0x0, caPath=caPath@entry=0x0, >> verify=verify@entry=false, >> ctxPtr=ctxPtr@entry=0x7f4405dde7c0) at tls.c:116 >> #13 0x00007f44084687a4 in ConnChanOpenObjCmd >> (clientData=<optimized out>, interp=0x7f43bc009ee0, >> objc=<optimized out>, objv=<optimized out>) >> at connchan.c:1010 >> #14 0x00007f44084a7eb8 in Ns_SubcmdObjv >> (subcmdSpec=subcmdSpec@entry=0x7f4405dde990, >> clientData=0x7f43bc047870, interp=0x7f43bc009ee0, objc=13, >> objv=0x7f43bc017ff8) at tclobjv.c:1849 >> #15 0x00007f4408469d45 in NsTclConnChanObjCmd >> (clientData=<optimized out>, interp=<optimized out>, >> objc=<optimized out>, objv=<optimized out>) >> at connchan.c:1761 >> #16 0x00007f440802ffb7 in TclNRRunCallbacks () from >> /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> >> #17 0x00007f44080313af in ?? () from >> /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> >> #18 0x00007f4408030d13 in Tcl_EvalEx () from >> /lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so> >> #19 0x00007f44084a9164 in NsTclFilterProc (arg=0x55af6a3e9880, >> conn=0x55af6a502480, why=NS_FILTER_PRE_AUTH) at tclrequest.c:535 >> #20 0x00007f4408478370 in NsRunFilters >> (conn=conn@entry=0x55af6a502480, >> why=why@entry=NS_FILTER_PRE_AUTH) at filter.c:160 >> #21 0x00007f440848654d in ConnRun >> (connPtr=connPtr@entry=0x55af6a502480) at queue.c:2450 >> #22 0x00007f4408485b33 in NsConnThread (arg=0x55af6a4a0090) at >> queue.c:2157 >> #23 0x00007f44081b2bb1 in NsThreadMain (arg=0x55af6a354f50) at >> thread.c:230 >> #24 0x00007f44081b3af9 in ThreadMain (arg=<optimized out>) at >> pthread.c:836 >> #25 0x00007f44078f5fa3 in start_thread (arg=<optimized out>) at >> pthread_create.c:486 >> #26 0x00007f4407a0d4cf in clone () at >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 >> >> -- >> Regards, >> David > _______________________________________________ > naviserver-devel mailing list > nav...@li... > <mailto:nav...@li...> > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > <https://lists.sourceforge.net/lists/listinfo/naviserver-devel> > > > > -- > > *David Osborne | Software Engineer* > Qcode Software, Castle House, Fairways Business Park, Inverness, IV2 6AA > *Email:* da...@qc... <mailto:da...@qc...> | *Phone:* 01463 > 896 484 > www.qcode.co.uk <https://www.qcode.co.uk/> > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel |