You can subscribe to this list here.
2005 |
Jan
|
Feb
(53) |
Mar
(62) |
Apr
(88) |
May
(55) |
Jun
(204) |
Jul
(52) |
Aug
|
Sep
(1) |
Oct
(94) |
Nov
(15) |
Dec
(68) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2006 |
Jan
(130) |
Feb
(105) |
Mar
(34) |
Apr
(61) |
May
(41) |
Jun
(92) |
Jul
(176) |
Aug
(102) |
Sep
(247) |
Oct
(69) |
Nov
(32) |
Dec
(140) |
2007 |
Jan
(58) |
Feb
(51) |
Mar
(11) |
Apr
(20) |
May
(34) |
Jun
(37) |
Jul
(18) |
Aug
(60) |
Sep
(41) |
Oct
(105) |
Nov
(19) |
Dec
(14) |
2008 |
Jan
(3) |
Feb
|
Mar
(7) |
Apr
(5) |
May
(123) |
Jun
(5) |
Jul
(1) |
Aug
(29) |
Sep
(15) |
Oct
(21) |
Nov
(51) |
Dec
(3) |
2009 |
Jan
|
Feb
(36) |
Mar
(29) |
Apr
|
May
|
Jun
(7) |
Jul
(4) |
Aug
|
Sep
(4) |
Oct
|
Nov
(13) |
Dec
|
2010 |
Jan
|
Feb
|
Mar
(9) |
Apr
(11) |
May
(16) |
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
|
Nov
|
Dec
|
2011 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2012 |
Jan
(7) |
Feb
(3) |
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
|
Aug
|
Sep
|
Oct
(92) |
Nov
(28) |
Dec
(16) |
2013 |
Jan
(9) |
Feb
(2) |
Mar
|
Apr
(4) |
May
(4) |
Jun
(6) |
Jul
(14) |
Aug
(12) |
Sep
(4) |
Oct
(13) |
Nov
(1) |
Dec
(6) |
2014 |
Jan
(23) |
Feb
(19) |
Mar
(10) |
Apr
(14) |
May
(11) |
Jun
(6) |
Jul
(11) |
Aug
(15) |
Sep
(41) |
Oct
(95) |
Nov
(23) |
Dec
(11) |
2015 |
Jan
(3) |
Feb
(9) |
Mar
(19) |
Apr
(3) |
May
(1) |
Jun
(3) |
Jul
(11) |
Aug
(1) |
Sep
(15) |
Oct
(5) |
Nov
(2) |
Dec
|
2016 |
Jan
(7) |
Feb
(11) |
Mar
(8) |
Apr
(1) |
May
(3) |
Jun
(17) |
Jul
(12) |
Aug
(3) |
Sep
(5) |
Oct
(19) |
Nov
(12) |
Dec
(6) |
2017 |
Jan
(30) |
Feb
(23) |
Mar
(12) |
Apr
(32) |
May
(27) |
Jun
(7) |
Jul
(13) |
Aug
(16) |
Sep
(6) |
Oct
(11) |
Nov
|
Dec
(12) |
2018 |
Jan
(1) |
Feb
(5) |
Mar
(6) |
Apr
(7) |
May
(23) |
Jun
(3) |
Jul
(2) |
Aug
(1) |
Sep
(6) |
Oct
(6) |
Nov
(10) |
Dec
(3) |
2019 |
Jan
(26) |
Feb
(15) |
Mar
(9) |
Apr
|
May
(8) |
Jun
(14) |
Jul
(10) |
Aug
(10) |
Sep
(4) |
Oct
(2) |
Nov
(20) |
Dec
(10) |
2020 |
Jan
(10) |
Feb
(14) |
Mar
(29) |
Apr
(11) |
May
(25) |
Jun
(21) |
Jul
(23) |
Aug
(12) |
Sep
(19) |
Oct
(6) |
Nov
(8) |
Dec
(12) |
2021 |
Jan
(29) |
Feb
(9) |
Mar
(8) |
Apr
(8) |
May
(2) |
Jun
(2) |
Jul
(9) |
Aug
(9) |
Sep
(3) |
Oct
(4) |
Nov
(12) |
Dec
(13) |
2022 |
Jan
(4) |
Feb
|
Mar
(4) |
Apr
(12) |
May
(15) |
Jun
(7) |
Jul
(10) |
Aug
(2) |
Sep
|
Oct
(1) |
Nov
(8) |
Dec
|
2023 |
Jan
(15) |
Feb
|
Mar
(23) |
Apr
(1) |
May
(2) |
Jun
(10) |
Jul
|
Aug
(22) |
Sep
(19) |
Oct
(2) |
Nov
(20) |
Dec
|
2024 |
Jan
(1) |
Feb
|
Mar
(16) |
Apr
(15) |
May
(6) |
Jun
(4) |
Jul
(1) |
Aug
(1) |
Sep
|
Oct
(13) |
Nov
(18) |
Dec
(6) |
2025 |
Jan
(12) |
Feb
|
Mar
(2) |
Apr
(1) |
May
(11) |
Jun
(5) |
Jul
(4) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Gustaf N. <ne...@wu...> - 2023-09-05 11:29:07
|
On 05.09.23 11:39, Gustaf Neumann wrote: > On 05.09.23 09:17, John at Decent wrote: >> >> However, won’t that cause the external process to get reloaded every >> time I call proxy::exec ? > see > > https://github.com/openacs/openacs-core/blob/oacs-5-10/packages/acs-tcl/tcl/proxy-procs.tcl#L74-L83 > > in the link i have sent in my last mail > Maybe there is some confusion with the term "reloaded" Check also the "-maxuns" parameter... Or do you want to have - technically speaking - one fork per request? .... -g |
From: Gustaf N. <ne...@wu...> - 2023-09-05 09:40:20
|
On 05.09.23 09:17, John at Decent wrote: > Thanks Gustaf. > > I knew about ns_proxy, but as it is documented as being a proxy to > external Tcl processes, I didn’t think to use it as a general process > pool. > > The part I was missing was > > proxy::exec > > However, won’t that cause the external process to get reloaded every > time I call proxy::exec ? see https://github.com/openacs/openacs-core/blob/oacs-5-10/packages/acs-tcl/tcl/proxy-procs.tcl#L74-L83 in the link i have sent in my last mail -g |
From: John at D. <jo...@de...> - 2023-09-05 09:33:14
|
I'm currently looking into bringing FastCGI support to Naviserver. In principle, FastCGI is specified to support a pool of backends. For your case, one would have to implement a FastCGI server, maybe directly in undroidwish, which does the marshaling to the respective Tcl scripts. Would this suit your requirements? Yes, FastCGI is very similar to what I’m describing, and should work. However, I’m thinking a bit more about ns_proxy and wondering if: - a given Tcl interpreter launches a new process via exec, connects to its STDIN/STDOUT - feeds the input to the process, returns its output - but leaves the process running - so that next time a request comes in, the proxy proc first checks to see if it has a running “exec” external process, and if so, just re-uses it. If that works, then it’s very similar to FastCGI, and likely easier to implement, as it’s all Tcl, and not much of it. The code from https://github.com/openacs/openacs-core/blob/oacs-5-10/packages/acs-tcl/tcl/proxy-procs.tcl would just need to be modified a bit. |
From: Georg L. <jor...@ma...> - 2023-09-05 09:07:27
|
Hello, I'm currently looking into bringing FastCGI support to Naviserver. In principle, FastCGI is specified to support a pool of backends. For your case, one would have to implement a FastCGI server, maybe directly in undroidwish, which does the marshaling to the respective Tcl scripts. Would this suit your requirements? Best Regards, Georg On 9/5/23 08:10, John at Decent wrote: > I was talking to Christian Werner about how to have undroidwish work > well with Naviserver, allowing Tcl scripts to run in undroidwish, and > return their results to a calling Naviserver page. Why do this? > Perhaps undroidwish loads libraries that are not (easily) available > under naviserver? > > I think the best way is to have a naviserver module that implements a > “worker pool of processes”. The module would launch X numbers of this > external process (say, undroidwish), controlled via STDOUT/STDIN, and > dispatch requests free members of the pool, returning the result to > the caller. This would be an efficient way to have slow-loading Unix > processes brought into Naviserver efficiently. > > For some reason, I imagined that this feature existed already, years > ago, but I can’t find a mention of it. Certainly, there is a lot in > Naviserver that resembles this idea already. > > -john > > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- MagmaSoft e.U. (FN 601643w) Inhaber: Ing. Georg Lehner Kolpingstraße 15, A-4020 Linz https://magma-soft.at/ mailto:jo...@ma... |
From: John at D. <jo...@de...> - 2023-09-05 07:17:29
|
Thanks Gustaf. I knew about ns_proxy, but as it is documented as being a proxy to external Tcl processes, I didn’t think to use it as a general process pool. The part I was missing was > proxy::exec However, won’t that cause the external process to get reloaded every time I call proxy::exec ? If I’m running a fairly heavy Unix process, this doesn’t seem efficient. Each run of the external unix process would cause a new run of exec, and thus a new process loading: > set return_string [ns_proxy eval $handle [list ::exec {*}$exec_flags {*}$call]] |
From: Gustaf N. <ne...@wu...> - 2023-09-05 06:34:36
|
On 05.09.23 08:10, John at Decent wrote: > I think the best way is to have a naviserver module that implements a > “worker pool of processes”. The module would launch X numbers of this > external process (say, undroidwish), controlled via STDOUT/STDIN, and > dispatch requests free members of the pool, returning the result to > the caller. This would be an efficient way to have slow-loading Unix > processes brought into Naviserver efficiently. The NaviServer module nsproxy does exactly this, i.e. running processes in the background talking to NaviServer via pipes, providing queues, cancellation, ..... The processes, which are executed via the nsproxy module, are the "nsproxy-helpers", which are executing the incoming Tcl commands and return the results back via pipe, such that for a caller, it looks like a local "exec". The connection to the workers is established via handles. See [1] for an example, how nsproxy is used for implemented "exec" in NaviServer, see [2] for the nsproxy API and configuration options. -g [1] https://github.com/openacs/openacs-core/blob/oacs-5-10/packages/acs-tcl/tcl/proxy-procs.tcl [2] https://naviserver.sourceforge.io/n/nsproxy/files/ns_proxy.html |
From: John at D. <jo...@de...> - 2023-09-05 06:10:55
|
I was talking to Christian Werner about how to have undroidwish work well with Naviserver, allowing Tcl scripts to run in undroidwish, and return their results to a calling Naviserver page. Why do this? Perhaps undroidwish loads libraries that are not (easily) available under naviserver? I think the best way is to have a naviserver module that implements a “worker pool of processes”. The module would launch X numbers of this external process (say, undroidwish), controlled via STDOUT/STDIN, and dispatch requests free members of the pool, returning the result to the caller. This would be an efficient way to have slow-loading Unix processes brought into Naviserver efficiently. For some reason, I imagined that this feature existed already, years ago, but I can’t find a mention of it. Certainly, there is a lot in Naviserver that resembles this idea already. -john |
From: Brian F. <bri...@ai...> - 2023-08-21 10:36:11
|
Hi Gustaf thanks for the explanation of the cause of the issue. I have forked the docker image from oupfiz5, and changed it to run 4.99.27, which was how I discovered the issue. That Dockerfile relies on the existence of the official naviserver-"${NS_VERSION}".tar.gz and naviserver-"${NS_MODULES_VERSION}"-modules.tar.gz files, so I can't just easily patch in this fix. We have no deadline for a new release. For now, we can continue to use 4.99.24 in our testing, but obviously we would like to explain to clients why we're not running the latest version. If we can tell them e.g. there will soon be a new official release which includes the ns_set bugfix, that will keep them happy. cheers Brian ________________________________ From: Gustaf Neumann <ne...@wu...> Sent: Friday 18 August 2023 7:51 pm To: nav...@li... <nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu > FYI, also the "ns_set put" crashes the server e.g. if I run this in /ds/shell: it is clear. The problem were not the "ns_set update" or the "ns_set put" commands itself, but the fact, that the key "content-type" was in lower-case and the normalization of the output headers was causing the crash. Some ages ago, it was possible to set two different header values with upper and lower case keys, leading to some unspecified behaviors in browsers. On 18.08.23 15:55, Brian Fenton wrote: Hi again Gustaf Great news, this seems to have worked according to all my tests. I am assuming that I built this correctly with the "version_ns=GIT" parameter: version_ns=GIT ns_modules="nsconf nsstats" with_postgres=0 with_postgres_driver=0 bash install-ns.sh build by your parameter list, you will get the newest version from git.... which is the unreleased version of NaviServer 5. If one specifies the branch as well, different versions can be installed from git. E.g., ... version_ns=GIT git_branch_ns=release/4.99 .... installs the newest version from the release/4.99 branch, what you probably want. But one can use as well arbitrary tags. For example ... version_ns=GIT git_branch_ns=naviserver-4.99.16 .... will checkout the released version of NaviServer 4.99.16 via git. Do you have an idea when this will be included in an official Naviserver release? Our Docker build relies on the official releases, so it would be good to have a timeframe on that. When you use the docker image from oupfiz5, these are still based on 4.99.24 (released on 2022-06-14, a year ago). Anyhow, when do you need a release? all the best -g thanks again for the amazing support, Brian ________________________________ From: Gustaf Neumann <ne...@wu...><mailto:ne...@wu...> Sent: Friday 18 August 2023 12:30 pm To: nav...@li...<mailto:nav...@li...> <nav...@li...><mailto:nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu Hi Brian, With your input, I could locate the source of the problem and fix this in the repository. It was an ns_set but, that could only happen with the output headers, when the code normalized the capitalization of the header fields. There is now a test for this in the regression test suite of NaviServer. Many thanks for your patience and input! -g https://bitbucket.org/naviserver/naviserver/commits/4f8cf8a548bc60f88756a30f33f6a5b589fc6997 _______________________________________________ naviserver-devel mailing list nav...@li...<mailto:nav...@li...> https://lists.sourceforge.net/lists/listinfo/naviserver-devel _______________________________________________ naviserver-devel mailing list nav...@li...<mailto:nav...@li...> https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: Gustaf N. <ne...@wu...> - 2023-08-21 09:06:25
|
Dear all, i've added some improvements to the nsstats module on Bitbucket, such as - Basic ability to style appearance: * switched from hard-coded styling to a page template based on ADP * added optional ADP template based on the CSS classes from https://www.w3schools.com/w3css/ * When the optional ADP template is not available, fall back to the old styling. - Structured main menu * use a hierarchy of commands instead of the flat alphabetical list * hierarchy is used in the menu-bar of the new template - New page for registered request handlers. This should improve transparency, how requests are resolved. Below are a few screenshots of a minimal installation (plain NaviServer) All the best -gustaf a) start page of nsstats with the new template - showing menubar (makes old index page mostly obsolete) - shows "process" page per default - additional line per server for Request handlers (here 15 handlers are registered) b) request handlers page (with the new template) - provide a filter for HTTP methods - provide button for unregistering the handler - showing registered ADP pages, Tcl pages, fastpath pages and CGI pages. c) start page without the new template |
From: Gustaf N. <ne...@wu...> - 2023-08-18 18:51:38
|
> FYI, also the "ns_set put" crashes the server e.g. if I run this in /ds/shell: it is clear. The problem were not the "ns_set update" or the "ns_set put" commands itself, but the fact, that the key "content-type" was in lower-case and the normalization of the output headers was causing the crash. Some ages ago, it was possible to set two different header values with upper and lower case keys, leading to some unspecified behaviors in browsers. On 18.08.23 15:55, Brian Fenton wrote: > Hi again Gustaf > > Great news, this seems to have worked according to all my tests. I am > assuming that I built this correctly with the "version_ns=GIT" parameter: > version_ns=GIT ns_modules="nsconf nsstats" with_postgres=0 > with_postgres_driver=0 bash install-ns.sh build by your parameter list, you will get the newest version from git.... which is the unreleased version of NaviServer 5. If one specifies the branch as well, different versions can be installed from git. E.g., ... version_ns=GIT git_branch_ns=release/4.99 .... installs the newest version from the release/4.99 branch, what you probably want. But one can use as well arbitrary tags. For example ... version_ns=GIT git_branch_ns=naviserver-4.99.16 .... will checkout the released version of NaviServer 4.99.16 via git. > Do you have an idea when this will be included in an official > Naviserver release? Our Docker build relies on the official releases, > so it would be good to have a timeframe on that. When you use the docker image from oupfiz5, these are still based on 4.99.24 (released on 2022-06-14, a year ago). Anyhow, when do you need a release? all the best -g > > thanks again for the amazing support, > Brian > > ------------------------------------------------------------------------ > *From:* Gustaf Neumann <ne...@wu...> > *Sent:* Friday 18 August 2023 12:30 pm > *To:* nav...@li... > <nav...@li...> > *Subject:* Re: [naviserver-devel] Crashing on all versions >4.99.24 on > Ubuntu > Hi Brian, > > With your input, I could locate the source of the problem and fix this > in the repository. It was an ns_set but, that could only happen with the > output headers, when the code normalized the capitalization of the > header fields. There is now a test for this in the regression test suite > of NaviServer. > > Many thanks for your patience and input! > > -g > > https://bitbucket.org/naviserver/naviserver/commits/4f8cf8a548bc60f88756a30f33f6a5b589fc6997 > <https://bitbucket.org/naviserver/naviserver/commits/4f8cf8a548bc60f88756a30f33f6a5b589fc6997> > > > > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > <https://lists.sourceforge.net/lists/listinfo/naviserver-devel> > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: Brian F. <bri...@ai...> - 2023-08-18 14:11:12
|
Hi again Gustaf Great news, this seems to have worked according to all my tests. I am assuming that I built this correctly with the "version_ns=GIT" parameter: version_ns=GIT ns_modules="nsconf nsstats" with_postgres=0 with_postgres_driver=0 bash install-ns.sh build Do you have an idea when this will be included in an official Naviserver release? Our Docker build relies on the official releases, so it would be good to have a timeframe on that. thanks again for the amazing support, Brian ________________________________ From: Gustaf Neumann <ne...@wu...> Sent: Friday 18 August 2023 12:30 pm To: nav...@li... <nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu Hi Brian, With your input, I could locate the source of the problem and fix this in the repository. It was an ns_set but, that could only happen with the output headers, when the code normalized the capitalization of the header fields. There is now a test for this in the regression test suite of NaviServer. Many thanks for your patience and input! -g https://bitbucket.org/naviserver/naviserver/commits/4f8cf8a548bc60f88756a30f33f6a5b589fc6997 _______________________________________________ naviserver-devel mailing list nav...@li... https://lists.sourceforge.net/lists/listinfo/naviserver-devel |
From: Brian F. <bri...@ai...> - 2023-08-18 13:00:52
|
Wow, that's a quick turnaround, thank you so much Gustaf! I will test out your fix. FYI, also the "ns_set put" crashes the server e.g. if I run this in /ds/shell: ns_set put [ad_conn outputheaders] content-type "text/html" many thanks Brian ________________________________ From: Gustaf Neumann <ne...@wu...> Sent: Friday 18 August 2023 12:30 pm To: nav...@li... <nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu Hi Brian, With your input, I could locate the source of the problem and fix this in the repository. It was an ns_set but, that could only happen with the output headers, when the code normalized the capitalization of the header fields. There is now a test for this in the regression test suite of NaviServer. Many thanks for your patience and input! -g https://bitbucket.org/naviserver/naviserver/commits/4f8cf8a548bc60f88756a30f33f6a5b589fc6997 _______________________________________________ naviserver-devel mailing list nav...@li... https://lists.sourceforge.net/lists/listinfo/naviserver-devel |
From: Gustaf N. <ne...@wu...> - 2023-08-18 11:30:53
|
Hi Brian, With your input, I could locate the source of the problem and fix this in the repository. It was an ns_set but, that could only happen with the output headers, when the code normalized the capitalization of the header fields. There is now a test for this in the regression test suite of NaviServer. Many thanks for your patience and input! -g https://bitbucket.org/naviserver/naviserver/commits/4f8cf8a548bc60f88756a30f33f6a5b589fc6997 |
From: Brian F. <bri...@ai...> - 2023-08-17 16:16:43
|
Hi Gustaf thanks to your help, I was able to track down the source of the problem: it was a TCL/ADP pair that generates some Javascript. This gets called in the blank master and so was breaking on every page request. Once I removed it from the blank master I was able to access the site. After looking into further, I discovered a few more places where the same problem occurs: a few more Javascript pages generated from TCL/ADP pairs and also a file download TCL page. What all of these have in common are calls like this - removing these lines prevents the crash in each case: ns_set update [ns_conn outputheaders] content-type "application/javascript" ns_set update [ns_conn outputheaders] content-type "$mime_type" I created a simple test case which reproduces the problem on our OpenACS system i.e browsing to this page immediately crashes Naviserver. However the same test case runs fine without crashing on the simple nsd-config.tcl system. It also runs fine on a fresh OpenACS Oracle installation - this again suggests an issue specific to our application. I'm not sure how much deeper I can go here - would be grateful for your thoughts. This is my test case - if I create a file called test.adp and browse to it, Naviserver crashes. <html> <body> Welcome to NaviServer <%=[ns_info patchlevel]%> under <%=[set . "$::tcl_platform(os) $::tcl_platform(osVersion)"]%> <%=[ns_set update [ns_conn outputheaders] content-type "text/html"]%> </body> </html> regards Brian ________________________________ From: Gustaf Neumann <ne...@wu...> Sent: Wednesday 16 August 2023 7:31 pm To: Navidevel <nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu On 16.08.23 15:37, Brian Fenton wrote: We don't issue any "ns_set cleanup" ourselves, but when I added your recommendation of tracing "ns_set", I saw plenty in the logs. The docs say that "This command is autoamtically executed by ns_cleanup, which runs after every request, freeing all sets created via ns_set", so I presume that is where the calls come from. Hi Brian, Yes, it is clear from the log that the crash happens in the automatic cleanup, but I just wanted to make sure, that the application has not tried the same earlier - this would be a good place to start debugging. Is there something in particular I should be looking for in the trace output? There is an enormous amount of information the logs. see below I ran another backtrace and this time the error was a little bit different as i said before, the messages one gets from the operating system are not always helpful to find the source of the problem. ("error: Cannot access memory at address 0x2" - see below). This is actually not a message of the operating system, but a message from gdb about an unused C-structure, which is here not relevant. ... but we are getting closer. The crash happens on an dynamic ns_set (in yesterday's mail in "d6"). #11 0x00007ffff7f3e119 in NsTclSetObjCmd (clientData=0x7fffdc035570, interp=0x7fffdc005250, objc=2, objv=0x7fffdc245650) at tclset.c:330 key = 0x7fffdc3aa070 "d6" In this case, one has to watch out for operations on "d6". You might ask "What is a dynamic ns_set?". Background: ns_sets might be - C-only structures, or - Tcl-exposed structures, which might be * dynamic (i.e., freed after every request, the Tcl-name of these starts with a "d"), or * static (the structures will be reused). The Tcl-exposed ns_sets are "entered" (NaviServer terminology) into Tcl interpreters. It might be the case, that you do not find anything interesting in the log concerning the crashing ns_set (here "d6") in the Tcl trace output, ... since the ns_sets are as well manipulated from C. In case, you find nothing revealing on "d6", I've added more debugging output for C-level ns_set operations. Get the newest version from the branch "release/4.99", activate debug output on Debug(nsset) ns_logctl severity Debug(nsset) on and then you will see some more output, like e.g. the following: [16/Aug/2023:19:53:13.016627][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [29] key 'item_id' value '1066' size -1 [16/Aug/2023:19:53:13.016630][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [30] key 'revision_id' value '163776' size -1 [16/Aug/2023:19:53:13.016633][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [31] key 'publish_date' value '2021-11-13 13:35:27.423904+01' size -1 [16/Aug/2023:19:53:13.016637][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [32] key 'modifying_user' value '' size -1 [16/Aug/2023:19:53:13.016640][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [33] key 'last_modified' value '2017-08-08 13:26:49.138414+02' size -1 [16/Aug/2023:19:53:13.016644][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [34] key 'modifying_ip' value '::1' size -1 [16/Aug/2023:19:53:13.016647][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [35] key 'modifying_user' value '704' size -1 [16/Aug/2023:19:53:13.016651][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [36] key 'parent_id' value '1064' size -1 [16/Aug/2023:19:53:13.016655][-conn:oacs-head:default:0:11-] Debug(nsset): EnterSet 0x600002845600 with name 'd15' [16/Aug/2023:19:53:13.016678][-conn:oacs-head:default:0:11-] Notice: trace: ns_set array d15 ... [16/Aug/2023:20:04:31.283342][-conn:oacs-head:default:0:11-] Debug(nsset): ns_set cleanup key <d15> dynamic 1 [16/Aug/2023:20:04:31.283344][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetFree 0x600002845600 'db': elements 37 [16/Aug/2023:20:04:31.283346][-conn:oacs-head:default:0:11-] Debug(nsset): ... 0: key <object_type> value <::xowiki::Form> [16/Aug/2023:20:04:31.283349][-conn:oacs-head:default:0:11-] Debug(nsset): ... 1: key <package_id> value <1060> [16/Aug/2023:20:04:31.283351][-conn:oacs-head:default:0:11-] Debug(nsset): ... 2: key <creation_user> value <704> ... In this example output, there is a bunch of Ns_SetPut() operations on a set "0x600002845600" which is entered then to Tcl as set "d15".... somewhere later it is freed via "ns_set cleanup". Let me know if you need some help on getting/compiling a branch of NaviServer from Bitbucket. all the best -g |
From: Gustaf N. <ne...@wu...> - 2023-08-16 18:32:23
|
On 16.08.23 15:37, Brian Fenton wrote: > We don't issue any "ns_set cleanup" ourselves, but when I added your > recommendation of tracing "ns_set", I saw plenty in the logs. The docs > say that "This command is autoamtically executed by ns_cleanup, which > runs after every request, freeing all sets created via *ns_set*", so I > presume that is where the calls come from. Hi Brian, Yes, it is clear from the log that the crash happens in the automatic cleanup, but I just wanted to make sure, that the application has not tried the same earlier - this would be a good place to start debugging. > Is there something in particular I should be looking for in the trace > output? There is an enormous amount of information the logs. see below > I ran another backtrace and this time the error was a little bit > different as i said before, the messages one gets from the operating system are not always helpful to find the source of the problem. > ("error: Cannot access memory at address 0x2" - see below). This is actually not a message of the operating system, but a message from gdb about an unused C-structure, which is here not relevant. ... but we are getting closer. The crash happens on an dynamic ns_set (in yesterday's mail in "d6"). #11 0x00007ffff7f3e119 in NsTclSetObjCmd (clientData=0x7fffdc035570, interp=0x7fffdc005250, objc=2, objv=0x7fffdc245650) at tclset.c:330 key = 0x7fffdc3aa070 "d6" In this case, one has to watch out for operations on "d6". You might ask "What is a dynamic ns_set?". Background: ns_sets might be - C-only structures, or - Tcl-exposed structures, which might be * dynamic (i.e., freed after every request, the Tcl-name of these starts with a "d"), or * static (the structures will be reused). The Tcl-exposed ns_sets are "entered" (NaviServer terminology) into Tcl interpreters. It might be the case, that you do not find anything interesting in the log concerning the crashing ns_set (here "d6") in the Tcl trace output, ... since the ns_sets are as well manipulated from C. In case, you find nothing revealing on "d6", I've added more debugging output for C-level ns_set operations. Get the newest version from the branch "release/4.99", activate debug output on Debug(nsset) ns_logctl severity Debug(nsset) on and then you will see some more output, like e.g. the following: [16/Aug/2023:19:53:13.016627][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [29] key 'item_id' value '1066' size -1 [16/Aug/2023:19:53:13.016630][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [30] key 'revision_id' value '163776' size -1 [16/Aug/2023:19:53:13.016633][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [31] key 'publish_date' value '2021-11-13 13:35:27.423904+01' size -1 [16/Aug/2023:19:53:13.016637][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [32] key 'modifying_user' value '' size -1 [16/Aug/2023:19:53:13.016640][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [33] key 'last_modified' value '2017-08-08 13:26:49.138414+02' size -1 [16/Aug/2023:19:53:13.016644][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [34] key 'modifying_ip' value '::1' size -1 [16/Aug/2023:19:53:13.016647][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [35] key 'modifying_user' value '704' size -1 [16/Aug/2023:19:53:13.016651][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetPut 0x600002845600 [36] key 'parent_id' value '1064' size -1 [16/Aug/2023:19:53:13.016655][-conn:oacs-head:default:0:11-] Debug(nsset): EnterSet 0x600002845600 with name 'd15' [16/Aug/2023:19:53:13.016678][-conn:oacs-head:default:0:11-] Notice: trace: ns_set array d15 ... [16/Aug/2023:20:04:31.283342][-conn:oacs-head:default:0:11-] Debug(nsset): ns_set cleanup key <d15> dynamic 1 [16/Aug/2023:20:04:31.283344][-conn:oacs-head:default:0:11-] Debug(nsset): Ns_SetFree 0x600002845600 'db': elements 37 [16/Aug/2023:20:04:31.283346][-conn:oacs-head:default:0:11-] Debug(nsset): ... 0: key <object_type> value <::xowiki::Form> [16/Aug/2023:20:04:31.283349][-conn:oacs-head:default:0:11-] Debug(nsset): ... 1: key <package_id> value <1060> [16/Aug/2023:20:04:31.283351][-conn:oacs-head:default:0:11-] Debug(nsset): ... 2: key <creation_user> value <704> ... In this example output, there is a bunch of Ns_SetPut() operations on a set "0x600002845600" which is entered then to Tcl as set "d15".... somewhere later it is freed via "ns_set cleanup". Let me know if you need some help on getting/compiling a branch of NaviServer from Bitbucket. all the best -g |
From: Brian F. <bri...@ai...> - 2023-08-16 14:11:08
|
Hi Gustaf many thanks for that. Our version of OpenACS didn't have the tracing feature, but I was able to add it. We don't issue any "ns_set cleanup" ourselves, but when I added your recommendation of tracing "ns_set", I saw plenty in the logs. The docs say that "This command is autoamtically executed by ns_cleanup, which runs after every request, freeing all sets created via ns_set", so I presume that is where the calls come from. Is there something in particular I should be looking for in the trace output? There is an enormous amount of information the logs. I ran another backtrace and this time the error was a little bit different ("error: Cannot access memory at address 0x2" - see below). I will continue to try and find out which ns_set is the source of the issue (but there may be multiple). Do you have any theory what change in 4.99.25 caused this to stop working? thanks Brian [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set iget t7 content-type [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set iget t7 content-type [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set ifind t7 cache-control [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set ifind t7 expires [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set size t7 [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set key t7 0 [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set size t7 [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set update t7 Expires Wed, 16 Aug 2023 13:05:13 GMT [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set put t7 Pragma no-cache [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set put t7 Cache-Control no-cache [16/Aug/2023:14:05:13][13608.7ffff4b6f640][-conn:gustaf:default:0:1-] Notice: trace: ns_set cleanup free(): invalid next size (fast) Thread 4 "nsd" received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff4b6f640 (LWP 13801)] __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737299019328) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) backtrace #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737299019328) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140737299019328) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140737299019328, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007ffff7c7d476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007ffff7c637f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007ffff7cc46f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e16b8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155 #6 0x00007ffff7cdbd7c in malloc_printerr (str=str@entry=0x7ffff7e19740 "free(): invalid next size (fast)") at ./malloc/malloc.c:5664 #7 0x00007ffff7cddb1d in _int_free (av=0x7fffdc000030, p=0x7fffdd865320, have_lock=0) at ./malloc/malloc.c:4522 #8 0x00007ffff7ce04d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391 #9 0x00007ffff7bdb1e5 in ns_free (ptr=0x7fffdd865330) at memory.c:94 #10 0x00007ffff7f09b64 in Ns_SetFree (set=0x7fffdde381a0) at set.c:397 #11 0x00007ffff7f3e119 in NsTclSetObjCmd (clientData=0x7fffdc035570, interp=0x7fffdc005250, objc=2, objv=0x7fffdc245650) at tclset.c:330 #12 0x00007ffff79cb18e in Dispatch (data=0x7fffdc34afc8, interp=0x7fffdc005250, result=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4467 #13 0x00007ffff79cb21f in TclNRRunCallbacks (interp=0x7fffdc005250, result=0, rootPtr=0x0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4503 #14 0x00007ffff79ca949 in Tcl_EvalObjv (interp=0x7fffdc005250, objc=1, objv=0x7fffdc2453f0, flags=2097168) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4226 #15 0x00007ffff79cd384 in TclEvalEx (interp=0x7fffdc005250, script=0x7ffff4b6e880 "ns_cleanup", numBytes=10, flags=0, line=1, clNextOuter=0x0, outerScript=0x7ffff4b6e880 "ns_cleanup") at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5372 #16 0x00007ffff79cc5d9 in Tcl_EvalEx (interp=0x7fffdc005250, script=0x7ffff4b6e880 "ns_cleanup", numBytes=10, flags=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5037 #17 0x00007ffff7f18c02 in Ns_TclEvalCallback (interp=0x7fffdc005250, cbPtr=0x5555556a49d0, resultDString=0x0) at tclcallbacks.c:186 #18 0x00007ffff7f29764 in NsTclTraceProc (interp=0x7fffdc005250, arg=0x5555556a49d0) at tclinit.c:1913 #19 0x00007ffff7f2a158 in RunTraces (itPtr=0x7fffdc035570, why=NS_TCL_TRACE_DEALLOCATE) at tclinit.c:2375 #20 0x00007ffff7f29976 in PushInterp (itPtr=0x7fffdc035570) at tclinit.c:2026 #21 0x00007ffff7f29717 in NsFreeConnInterp (connPtr=0x555555630950) at tclinit.c:1885 #22 0x00007ffff7efdf11 in ConnRun (connPtr=0x555555630950) at queue.c:2648 #23 0x00007ffff7efd0de in NsConnThread (arg=0x55555564bea0) at queue.c:2211 #24 0x00007ffff7bdd734 in NsThreadMain (arg=0x555558577960) at thread.c:232 #25 0x00007ffff7bdf6f5 in ThreadMain (arg=0x555558577960) at pthread.c:870 #26 0x00007ffff7ccfb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 #27 0x00007ffff7d61a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 (gdb) bt full #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737299019328) at ./nptl/pthread_kill.c:44 tid = <optimized out> ret = 0 pd = 0x7ffff4b6f640 old_mask = {__val = {68118949824, 0, 140737349500896, 140737349117410, 140737299013472, 140736886756960, 140736887828416, 15, 140737299013552, 140737348535152, 140736886756976, 140736884396624, 140736886755600, 140736884396624, 140736884381136, 140736886756960}} ret = <optimized out> pd = <optimized out> old_mask = <optimized out> ret = <optimized out> tid = <optimized out> ret = <optimized out> resultvar = <optimized out> resultvar = <optimized out> __arg3 = <optimized out> __arg2 = <optimized out> __arg1 = <optimized out> _a3 = <optimized out> _a2 = <optimized out> _a1 = <optimized out> __futex = <optimized out> resultvar = <optimized out> __arg3 = <optimized out> __arg2 = <optimized out> __arg1 = <optimized out> _a3 = <optimized out> _a2 = <optimized out> _a1 = <optimized out> __futex = <optimized out> __private = <optimized out> __oldval = <optimized out> result = <optimized out> #1 __pthread_kill_internal (signo=6, threadid=140737299019328) at ./nptl/pthread_kill.c:78 No locals. #2 __GI___pthread_kill (threadid=140737299019328, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 No locals. #3 0x00007ffff7c7d476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 ret = <optimized out> #4 0x00007ffff7c637f3 in __GI_abort () at ./stdlib/abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x7fffdc245a30, sa_sigaction = 0x7fffdc245a30}, sa_mask = {__val = {140736886757216, 140737299014196, 140736884396624, 140736886756976, 140736886757568, 140736886757664, 140736886757840, 140736886757936, 140736886756704, 2203128881648, 140737299013943, 18446744069414584324, 140737349538724, 140733193388042, 140737299014196, 18446744069414584328}}, sa_flags = -600526912, sa_restorer = 0x7ffff4b6e239} --Type <RET> for more, q to quit, c to continue without paging--c sigs = {__val = {32, 93825009984944, 0, 0, 3, 140733193388035, 85899345920, 12884901888, 0, 0, 8589934591, 140736886757664, 140736886757664, 140737299014201, 140736886757840, 140736886757936}} #5 0x00007ffff7cc46f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e16b8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155 ap = {{gp_offset = 24, fp_offset = 3, overflow_arg_area = 0x7ffff4b6e240, reg_save_area = 0x7ffff4b6e1d0}} fd = <optimized out> list = <optimized out> nlist = <optimized out> cp = <optimized out> #6 0x00007ffff7cdbd7c in malloc_printerr (str=str@entry=0x7ffff7e19740 "free(): invalid next size (fast)") at ./malloc/malloc.c:5664 No locals. #7 0x00007ffff7cddb1d in _int_free (av=0x7fffdc000030, p=0x7fffdd865320, have_lock=0) at ./malloc/malloc.c:4522 fail = <optimized out> idx = <optimized out> old = <optimized out> old2 = <optimized out> size = 32 fb = <optimized out> nextchunk = <optimized out> nextsize = <optimized out> nextinuse = <optimized out> prevsize = <optimized out> bck = <optimized out> fwd = <optimized out> __PRETTY_FUNCTION__ = "_int_free" #8 0x00007ffff7ce04d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391 ar_ptr = <optimized out> p = <optimized out> err = 25 #9 0x00007ffff7bdb1e5 in ns_free (ptr=0x7fffdd865330) at memory.c:94 No locals. #10 0x00007ffff7f09b64 in Ns_SetFree (set=0x7fffdde381a0) at set.c:397 i = 8 __PRETTY_FUNCTION__ = "Ns_SetFree" #11 0x00007ffff7f3e119 in NsTclSetObjCmd (clientData=0x7fffdc035570, interp=0x7fffdc005250, objc=2, objv=0x7fffdc245650) at tclset.c:330 key = 0x7fffdc3aa070 "d6" itPtr = 0x7fffdc035570 set = 0x7fffdde381a0 ds = {string = 0x2 <error: Cannot access memory at address 0x2>, length = -601598384, spaceAvl = 32767, staticSpace = "\300\344\266\364\377\177\000\000PV$\334\377\177\000\000\v\350b\336\002\000\000\000\200}\021\334\377\177\000\000H\345\266\364\377\177\000\000PR\000\334\377\177\000\000\000\345\266\364\377\177\000\000\025", '\000' <repeats 15 times>, "PR\000\334\377\177\000\000\220n\004\334\377\177\000\000\340\266\062\334\377\177\000\000@\207\236ZUU\000\000\000%9\210\226wv\035\300\345\266\364\377\177\000\000\230\256\234\367\377\177\000\000`\345\266\364\377\177\000\000p\203\252\367\000\000\000\000PR\000\334\377\177\000\000\021\334\377\177\000\000\000\017\000\334\377\177\000\000\000\000\000\000\020\000 \000\002\000\000\000\000\000\000\000\220n\004\334\377\177\000\000\000\000\000\000\000\000\000"} tablePtr = 0x7fffdc035740 hPtr = 0x7fffdc3aa050 search = {tablePtr = 0x7fffdc035740, nextIndex = 3, nextEntryPtr = 0x7fffda7c9790} opt = 1 result = 0 opts = {0x7ffff7f89745 "array", 0x7ffff7f8974b "cleanup", 0x7ffff7f89753 "copy", 0x7ffff7f89758 "cput", 0x7ffff7f8975d "create", 0x7ffff7f89764 "delete", 0x7ffff7f8976b "delkey", 0x7ffff7f89772 "find", 0x7ffff7f89777 "free", 0x7ffff7f8977c "get", 0x7ffff7f89780 "icput", 0x7ffff7f89786 "idelkey", 0x7ffff7f8978e "ifind", 0x7ffff7f89794 "iget", 0x7ffff7f89799 "imerge", 0x7ffff7f897a0 "isnull", 0x7ffff7f897a7 "iunique", 0x7ffff7f897af "iupdate", 0x7ffff7f897b7 "key", 0x7ffff7f897bb "keys", 0x7ffff7f897c0 "list", 0x7ffff7f897c5 "merge", 0x7ffff7f897cb "move", 0x7ffff7f897d0 "name", 0x7ffff7f897d5 "new", 0x7ffff7f897d9 "print", 0x7ffff7f897df "put", 0x7ffff7f897e3 "size", 0x7ffff7f897e8 "split", 0x7ffff7f897ee "truncate", 0x7ffff7f897f7 "unique", 0x7ffff7f897fe "update", 0x7ffff7f89805 "value", 0x7ffff7f8980b "values", 0x0} SArrayIdx = SArrayIdx SCleanupIdx = SCleanupIdx SCopyIdx = SCopyIdx SCPutIdx = SCPutIdx SCreateidx = SCreateidx SDeleteIdx = SDeleteIdx SDelkeyIdx = SDelkeyIdx SFindIdx = SFindIdx SFreeIdx = SFreeIdx SGetIdx = SGetIdx SICPutIdx = SICPutIdx SIDelkeyIdx = SIDelkeyIdx SIFindIdx = SIFindIdx SIGetIdx = SIGetIdx SIMergeIdx = SIMergeIdx SIsNullIdx = SIsNullIdx SIUniqueIdx = SIUniqueIdx SIUpdateIdx = SIUpdateIdx SKeyIdx = SKeyIdx SKeysIdx = SKeysIdx SListIdx = SListIdx SMergeIdx = SMergeIdx SMoveIdx = SMoveIdx sINameIdx = sINameIdx SNewIdx = SNewIdx SPrintIdx = SPrintIdx SPutIdx = SPutIdx SSizeIdx = SSizeIdx SSplitIdx = SSplitIdx STruncateIdx = STruncateIdx SUniqueIdx = SUniqueIdx SUpdateIdx = SUpdateIdx SValueIdx = SValueIdx SValuesIdx = SValuesIdx __PRETTY_FUNCTION__ = "NsTclSetObjCmd" #12 0x00007ffff79cb18e in Dispatch (data=0x7fffdc34afc8, interp=0x7fffdc005250, result=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4467 objProc = 0x7ffff7f3df2d <NsTclSetObjCmd> clientData = 0x7fffdc035570 objc = 2 objv = 0x7fffdc245650 iPtr = 0x7fffdc005250 #13 0x00007ffff79cb21f in TclNRRunCallbacks (interp=0x7fffdc005250, result=0, rootPtr=0x0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4503 callbackPtr = 0x7fffdc34afc0 procPtr = 0x7ffff79cb10e <Dispatch> iPtr = 0x7fffdc005250 #14 0x00007ffff79ca949 in Tcl_EvalObjv (interp=0x7fffdc005250, objc=1, objv=0x7fffdc2453f0, flags=2097168) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4226 result = 0 rootPtr = 0x0 #15 0x00007ffff79cd384 in TclEvalEx (interp=0x7fffdc005250, script=0x7ffff4b6e880 "ns_cleanup", numBytes=10, flags=0, line=1, clNextOuter=0x0, outerScript=0x7ffff4b6e880 "ns_cleanup") at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5372 wordLine = 1 wordCLNext = 0x0 objectsNeeded = 1 wordStart = 0x7ffff4b6e880 "ns_cleanup" numWords = 1 iPtr = 0x7fffdc005250 p = 0x7ffff4b6e880 "ns_cleanup" next = 0x1f4b6e820 <error: Cannot access memory at address 0x1f4b6e820> minObjs = 20 objv = 0x7fffdc2453f0 objvSpace = 0x7fffdc2453f0 expand = 0x7fffdc2454a0 lines = 0x7fffdc245500 lineSpace = 0x7fffdc245500 tokenPtr = 0x7fffdc2451d0 commandLength = 32767 bytesLeft = 10 expandRequested = 0 code = 0 savedVarFramePtr = 0x7fffdc001550 allowExceptions = 0 gotParse = 1 i = 4105627376 objectsUsed = 1 parsePtr = 0x7fffdc245140 eeFramePtr = 0x7fffdc245390 stackObjArray = 0x7fffdc2453f0 expandStack = 0x7fffdc2454a0 linesStack = 0x7fffdc245500 clNext = 0x0 #16 0x00007ffff79cc5d9 in Tcl_EvalEx (interp=0x7fffdc005250, script=0x7ffff4b6e880 "ns_cleanup", numBytes=10, flags=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5037 No locals. #17 0x00007ffff7f18c02 in Ns_TclEvalCallback (interp=0x7fffdc005250, cbPtr=0x5555556a49d0, resultDString=0x0) at tclcallbacks.c:186 arg = 0x0 ii = 0 ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7ffff4b6ea10, reg_save_area = 0x7ffff4b6e950}} ds = {string = 0x7ffff4b6e880 "ns_cleanup", length = 10, spaceAvl = 200, staticSpace = "ns_cleanup\000\367\377\177\000\000\300\350\266\364\377\177\000\000P\351\266\364\377\177\000\000\210\060jUUU\000\000@\351\266\364\377\177\000\000\000\351\266\364\377\177\000\000\340\tcU\001\001\001\000\340\350\266\364\377\177\000\000\360\350\266\364\377\177\000\000\020\351\266\364\377\177\000\000\270\060jUUU\000\000\020\351\266\364\377\177\000\000\332\356\275\367\377\177\000\000\211\311\334d\000\000\000\000 1jUUU\000\000\000\000\000\000\000\000\000\000\270\060jU\005\000\000\000\220\351\266\364\377\177\000\000\023\275\275\367\377\177\000\000\060\352\266\364\377\177\000\000¢\362\367\377\177\000\000\211\311\334d\000\000\000\000p\fJ\334\b\000\000\000\060JjUUU\000"} deallocInterp = false status = 1 __PRETTY_FUNCTION__ = "Ns_TclEvalCallback" #18 0x00007ffff7f29764 in NsTclTraceProc (interp=0x7fffdc005250, arg=0x5555556a49d0) at tclinit.c:1913 cbPtr = 0x5555556a49d0 result = 0 #19 0x00007ffff7f2a158 in RunTraces (itPtr=0x7fffdc035570, why=NS_TCL_TRACE_DEALLOCATE) at tclinit.c:2375 tracePtr = 0x5555556a4a30 servPtr = 0x55555562b470 __PRETTY_FUNCTION__ = "RunTraces" #20 0x00007ffff7f29976 in PushInterp (itPtr=0x7fffdc035570) at tclinit.c:2026 interp = 0x7fffdc005250 ok = true __PRETTY_FUNCTION__ = "PushInterp" #21 0x00007ffff7f29717 in NsFreeConnInterp (connPtr=0x555555630950) at tclinit.c:1885 itPtr = 0x7fffdc035570 #22 0x00007ffff7efdf11 in ConnRun (connPtr=0x555555630950) at queue.c:2648 sockPtr = 0x7fffd8001d80 conn = 0x555555630950 servPtr = 0x55555562b470 status = NS_OK auth = 0x0 __PRETTY_FUNCTION__ = "ConnRun" #23 0x00007ffff7efd0de in NsConnThread (arg=0x55555564bea0) at queue.c:2211 argPtr = 0x55555564bea0 poolPtr = 0x555555630660 servPtr = 0x55555562b470 connPtr = 0x555555630950 wait = {sec = 1692191231, usec = 653861} timePtr = 0x7ffff4b6ec20 threadId = 0 duringShutdown = 220 fromQueue = false cpt = 1000 ncons = 999 current = 2 status = NS_OK timeout = {sec = 120, usec = 0} exitMsg = 0x7fffdc000b70 "" joinThread = 0x7ffff4b6f640 threadsLockPtr = 0x5555556306d0 tqueueLockPtr = 0x555555630718 wqueueLockPtr = 0x5555556306a8 __PRETTY_FUNCTION__ = "NsConnThread" #24 0x00007ffff7bdd734 in NsThreadMain (arg=0x555558577960) at thread.c:232 thrPtr = 0x555558577960 #25 0x00007ffff7bdf6f5 in ThreadMain (arg=0x555558577960) at pthread.c:870 No locals. #26 0x00007ffff7ccfb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 ret = <optimized out> pd = <optimized out> out = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488346768, 7820541724095505063, 140737299019328, 0, 140737350793296, 140737488347120, -7820557727598026073, -7820559187479928153}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #27 0x00007ffff7d61a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 No locals. ________________________________ From: Gustaf Neumann <ne...@wu...> Sent: Tuesday 15 August 2023 6:16 pm To: nav...@li... <nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu Brian, many thanks, the backtrace gives some insights: The problem happens in a Ns_SetFree operation if set "d8" triggered by an "ns_set cleanup" during the cleanup of the request. Something is broken with this nsset. Can it be that your application package issues "ns_set cleanup" as well? Do you have a recent version of OpenACS? If so, you can turn on tracing of the "ns_set" command by adding it to the "traced_cmds". -g ================================================================================= --- acs-tcl/tcl/tcltrace-init.tcl 27 Nov 2020 09:52:06 -0000 1.4.2.4 +++ acs-tcl/tcl/tcltrace-init.tcl 15 Aug 2023 17:13:51 -0000 @@ -32,6 +32,7 @@ #set traced_cmds {::ns_setcookie ::ns_getcookie ::ns_deletecookie} #set traced_cmds {::ns_return ::ns_returnnotfound ::ns_returnfile ::ns_returnmoved} #set traced_cmds [lsort [info commands ::ns_return*]] +set traced_cmds {::ns_set} foreach cmd $traced_cmds { append trace "\ntrace add execution $cmd enter {::tcltrace::before}" } ================================================================================= On 15.08.23 15:51, Brian Fenton wrote: Hi I reproduced the problem using the install-ns.sh script running under gdb. Here's the output of backtrace and bt full. I'm new to using gdb so please let me know if you'd like to see some other info. |
From: Gustaf N. <ne...@wu...> - 2023-08-15 17:16:35
|
Brian, many thanks, the backtrace gives some insights: The problem happens in a Ns_SetFree operation if set "d8" triggered by an "ns_set cleanup" during the cleanup of the request. Something is broken with this nsset. Can it be that your application package issues "ns_set cleanup" as well? Do you have a recent version of OpenACS? If so, you can turn on tracing of the "ns_set" command by adding it to the "traced_cmds". -g ================================================================================= --- acs-tcl/tcl/tcltrace-init.tcl 27 Nov 2020 09:52:06 -0000 1.4.2.4 +++ acs-tcl/tcl/tcltrace-init.tcl 15 Aug 2023 17:13:51 -0000 @@ -32,6 +32,7 @@ #set traced_cmds {::ns_setcookie ::ns_getcookie ::ns_deletecookie} #set traced_cmds {::ns_return ::ns_returnnotfound ::ns_returnfile ::ns_returnmoved} #set traced_cmds [lsort [info commands ::ns_return*]] +set traced_cmds {::ns_set} foreach cmd $traced_cmds { append trace "\ntrace add execution $cmd enter {::tcltrace::before}" } ================================================================================= On 15.08.23 15:51, Brian Fenton wrote: > Hi > > I reproduced the problem using the install-ns.sh script running under > gdb. Here's the output of backtrace and bt full. I'm new to using gdb > so please let me know if you'd like to see some other info. |
From: Brian F. <bri...@ai...> - 2023-08-15 15:24:14
|
Hi I reproduced the problem using the install-ns.sh script running under gdb. Here's the output of backtrace and bt full. I'm new to using gdb so please let me know if you'd like to see some other info. [15/Aug/2023:13:56:52][13147.7fffe35fe640][-driver:nsssl:0-] Notice: ... sockAccept accepted 2 connections free(): invalid next size (fast) Thread 4 "nsd" received signal SIGABRT, Aborted. [Switching to Thread 0x7ffff4ad8640 (LWP 13651)] __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737298400832) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) backtrace #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737016493632) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140737016493632) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140737016493632, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007ffff7c7d476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007ffff7c637f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x00007ffff7cc46f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e16b8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155 #6 0x00007ffff7cdbd7c in malloc_printerr (str=str@entry=0x7ffff7e19230 "munmap_chunk(): invalid pointer") at ./malloc/malloc.c:5664 #7 0x00007ffff7cdc05c in munmap_chunk (p=<optimized out>) at ./malloc/malloc.c:3060 #8 0x00007ffff7ce051a in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3381 #9 0x00007ffff7bdb1e5 in ns_free (ptr=0x7fffd4de0ba0) at memory.c:94 #10 0x00007ffff7f09b64 in Ns_SetFree (set=0x7fffd5886210) at set.c:397 #11 0x00007ffff7f3e119 in NsTclSetObjCmd (clientData=0x7fffd403d590, interp=0x7fffd4005250, objc=2, objv=0x7fffd453a510) at tclset.c:330 #12 0x00007ffff79cb18e in Dispatch (data=0x7fffd410e3b8, interp=0x7fffd4005250, result=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4467 #13 0x00007ffff79cb21f in TclNRRunCallbacks (interp=0x7fffd4005250, result=0, rootPtr=0x0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4503 #14 0x00007ffff79ca949 in Tcl_EvalObjv (interp=0x7fffd4005250, objc=1, objv=0x7fffd453a2b0, flags=2097168) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4226 #15 0x00007ffff79cd384 in TclEvalEx (interp=0x7fffd4005250, script=0x7fffe3dfe880 "ns_cleanup", numBytes=10, flags=0, line=1, clNextOuter=0x0, outerScript=0x7fffe3dfe880 "ns_cleanup") at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5372 #16 0x00007ffff79cc5d9 in Tcl_EvalEx (interp=0x7fffd4005250, script=0x7fffe3dfe880 "ns_cleanup", numBytes=10, flags=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5037 #17 0x00007ffff7f18c02 in Ns_TclEvalCallback (interp=0x7fffd4005250, cbPtr=0x5555556a1b30, resultDString=0x0) at tclcallbacks.c:186 #18 0x00007ffff7f29764 in NsTclTraceProc (interp=0x7fffd4005250, arg=0x5555556a1b30) at tclinit.c:1913 #19 0x00007ffff7f2a158 in RunTraces (itPtr=0x7fffd403d590, why=NS_TCL_TRACE_DEALLOCATE) at tclinit.c:2375 #20 0x00007ffff7f29976 in PushInterp (itPtr=0x7fffd403d590) at tclinit.c:2026 #21 0x00007ffff7f29717 in NsFreeConnInterp (connPtr=0x55555562ebd0) at tclinit.c:1885 #22 0x00007ffff7efdf11 in ConnRun (connPtr=0x55555562ebd0) at queue.c:2648 #23 0x00007ffff7efd0de in NsConnThread (arg=0x555555649030) at queue.c:2211 #24 0x00007ffff7bdd734 in NsThreadMain (arg=0x55555855cdc0) at thread.c:232 #25 0x00007ffff7bdf6f5 in ThreadMain (arg=0x55555855cdc0) at pthread.c:870 #26 0x00007ffff7ccfb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 #27 0x00007ffff7d61a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 gdb) bt full #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737016493632) at ./nptl/pthread_kill.c:44 tid = <optimized out> ret = 0 pd = 0x7fffe3dff640 old_mask = {__val = {140737016487840, 140736755639472, 3823099840, 512, 140737016487920, 140737348535152, 140736755639488, 140736750178896, 140736755638224, 140733193388032, 140736750163408, 140736755639472, 140736755639432, 140736752725904, 93825010219632, 93825010219632}} ret = <optimized out> pd = <optimized out> old_mask = <optimized out> ret = <optimized out> tid = <optimized out> ret = <optimized out> resultvar = <optimized out> resultvar = <optimized out> __arg3 = <optimized out> __arg2 = <optimized out> __arg1 = <optimized out> _a3 = <optimized out> _a2 = <optimized out> _a1 = <optimized out> __futex = <optimized out> resultvar = <optimized out> __arg3 = <optimized out> __arg2 = <optimized out> __arg1 = <optimized out> _a3 = <optimized out> _a2 = <optimized out> _a1 = <optimized out> __futex = <optimized out> __private = <optimized out> __oldval = <optimized out> result = <optimized out> #1 __pthread_kill_internal (signo=6, threadid=140737016493632) at ./nptl/pthread_kill.c:78 No locals. #2 __GI___pthread_kill (threadid=140737016493632, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 No locals. #3 0x00007ffff7c7d476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 ret = <optimized out> #4 0x00007ffff7c637f3 in __GI_abort () at ./stdlib/abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x600000004, sa_sigaction = 0x600000004}, sa_mask = {__val = {140736789161264, 140733193388042, 140737347688968, 140737016488736, 279037356156, 18446744073709551615, 140736755314240, 140737016488272, 140733193388033, 140736792190256, 140736790551568, 0, 140736755639936, 93825035611088, 140736753589312, 140736756049120}}, sa_flags = 1487610384, sa_restorer = 0x1} --Type <RET> for more, q to quit, c to continue without paging-- sigs = {__val = {32, 140737350793296, 140737488347040, 140737350862035, 93824993127520, 140736755639576, 8589934656, 93825010219632, 25769803776, 193273528320, 140737016488160, 140737349119905, 3823100240, 4294967296, 2202846355952, 3556773632}} #5 0x00007ffff7cc46f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7e16b8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155 ap = {{gp_offset = 24, fp_offset = 0, overflow_arg_area = 0x7fffe3dfe2a0, reg_save_area = 0x7fffe3dfe230}} fd = <optimized out> list = <optimized out> nlist = <optimized out> cp = <optimized out> #6 0x00007ffff7cdbd7c in malloc_printerr (str=str@entry=0x7ffff7e19230 "munmap_chunk(): invalid pointer") at ./malloc/malloc.c:5664 No locals. #7 0x00007ffff7cdc05c in munmap_chunk (p=<optimized out>) at ./malloc/malloc.c:3060 pagesize = <optimized out> size = <optimized out> __PRETTY_FUNCTION__ = "munmap_chunk" mem = <optimized out> block = <optimized out> total_size = <optimized out> #8 0x00007ffff7ce051a in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3381 ar_ptr = <optimized out> p = <optimized out> err = 25 #9 0x00007ffff7bdb1e5 in ns_free (ptr=0x7fffd4de0ba0) at memory.c:94 No locals. #10 0x00007ffff7f09b64 in Ns_SetFree (set=0x7fffd5886210) at set.c:397 i = 10 __PRETTY_FUNCTION__ = "Ns_SetFree" #11 0x00007ffff7f3e119 in NsTclSetObjCmd (clientData=0x7fffd403d590, interp=0x7fffd4005250, objc=2, objv=0x7fffd453a510) at tclset.c:330 key = 0x7fffd464eb50 "d8" itPtr = 0x7fffd403d590 set = 0x7fffd5886210 ds = {string = 0x7fffd6650c80 "%", length = -738176432, spaceAvl = 32767, staticSpace = "\320\344\337\343\377\177\000\000\240\236t\336\377\177\000\000\260\356\004\324\377\177\000\000PZ\000\324\377\177\000\000\000\345\337\343\377\177\000\000\312Ĝ\367\377\177\000\000\360\357Y\324\377\177\000\000\000\000\000\000\000\000\000\000\200\fe\326\377\177\000\000PR\000\324\377\177\000\000\000\000\000\000\001\000\000\000PR\000\324\377\177\000\000PZ\000\324\377\177\000\000\260\356\004\324\377\177\000\000\300\345\337\343\377\177\000\000Э\234\367\377\177\000\000`\345\337\343\377\177\000\000p\203\252\367\000\000\000\000PR\000\324\377\177\000\000H\003Z\324\377\177\000\000\000\017\000\324\377\177\000\000\000\000\000\000\020\000 \000\002\000\000\000\377\177\000\000\260\356\004\324\377\177\000\000\000\000\000\000\000\000\000"} tablePtr = 0x7fffd403d760 hPtr = 0x7fffd464eb30 search = {tablePtr = 0x7fffd403d760, nextIndex = 13, nextEntryPtr = 0x0} opt = 1 result = 0 opts = {0x7ffff7f89745 "array", 0x7ffff7f8974b "cleanup", 0x7ffff7f89753 "copy", 0x7ffff7f89758 "cput", 0x7ffff7f8975d "create", 0x7ffff7f89764 "delete", 0x7ffff7f8976b "delkey", 0x7ffff7f89772 "find", 0x7ffff7f89777 "free", 0x7ffff7f8977c "get", 0x7ffff7f89780 "icput", 0x7ffff7f89786 "idelkey", 0x7ffff7f8978e "ifind", 0x7ffff7f89794 "iget", 0x7ffff7f89799 "imerge", 0x7ffff7f897a0 "isnull", 0x7ffff7f897a7 "iunique", 0x7ffff7f897af "iupdate", 0x7ffff7f897b7 "key", 0x7ffff7f897bb "keys", 0x7ffff7f897c0 "list", 0x7ffff7f897c5 "merge", 0x7ffff7f897cb "move", 0x7ffff7f897d0 "name", 0x7ffff7f897d5 "new", 0x7ffff7f897d9 "print", 0x7ffff7f897df "put", 0x7ffff7f897e3 "size", 0x7ffff7f897e8 "split", 0x7ffff7f897ee "truncate", 0x7ffff7f897f7 "unique", 0x7ffff7f897fe "update", 0x7ffff7f89805 "value", 0x7ffff7f8980b "values", 0x0} SArrayIdx = SArrayIdx SCleanupIdx = SCleanupIdx SCopyIdx = SCopyIdx SCPutIdx = SCPutIdx SCreateidx = SCreateidx SDeleteIdx = SDeleteIdx SDelkeyIdx = SDelkeyIdx SFindIdx = SFindIdx SFreeIdx = SFreeIdx SGetIdx = SGetIdx SICPutIdx = SICPutIdx SIDelkeyIdx = SIDelkeyIdx SIFindIdx = SIFindIdx SIGetIdx = SIGetIdx SIMergeIdx = SIMergeIdx SIsNullIdx = SIsNullIdx SIUniqueIdx = SIUniqueIdx SIUpdateIdx = SIUpdateIdx SKeyIdx = SKeyIdx SKeysIdx = SKeysIdx SListIdx = SListIdx SMergeIdx = SMergeIdx SMoveIdx = SMoveIdx sINameIdx = sINameIdx SNewIdx = SNewIdx SPrintIdx = SPrintIdx SPutIdx = SPutIdx SSizeIdx = SSizeIdx SSplitIdx = SSplitIdx STruncateIdx = STruncateIdx SUniqueIdx = SUniqueIdx SUpdateIdx = SUpdateIdx SValueIdx = SValueIdx SValuesIdx = SValuesIdx __PRETTY_FUNCTION__ = "NsTclSetObjCmd" #12 0x00007ffff79cb18e in Dispatch (data=0x7fffd410e3b8, interp=0x7fffd4005250, result=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4467 objProc = 0x7ffff7f3df2d <NsTclSetObjCmd> clientData = 0x7fffd403d590 objc = 2 objv = 0x7fffd453a510 iPtr = 0x7fffd4005250 #13 0x00007ffff79cb21f in TclNRRunCallbacks (interp=0x7fffd4005250, result=0, rootPtr=0x0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4503 callbackPtr = 0x7fffd410e3b0 procPtr = 0x7ffff79cb10e <Dispatch> iPtr = 0x7fffd4005250 #14 0x00007ffff79ca949 in Tcl_EvalObjv (interp=0x7fffd4005250, objc=1, objv=0x7fffd453a2b0, flags=2097168) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:4226 result = 0 rootPtr = 0x0 #15 0x00007ffff79cd384 in TclEvalEx (interp=0x7fffd4005250, script=0x7fffe3dfe880 "ns_cleanup", numBytes=10, flags=0, line=1, clNextOuter=0x0, outerScript=0x7fffe3dfe880 "ns_cleanup") at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5372 wordLine = 1 wordCLNext = 0x0 objectsNeeded = 1 wordStart = 0x7fffe3dfe880 "ns_cleanup" numWords = 1 iPtr = 0x7fffd4005250 p = 0x7fffe3dfe880 "ns_cleanup" next = 0x1e3dfe820 <error: Cannot access memory at address 0x1e3dfe820> minObjs = 20 objv = 0x7fffd453a2b0 objvSpace = 0x7fffd453a2b0 expand = 0x7fffd453a360 lines = 0x7fffd453a3c0 lineSpace = 0x7fffd453a3c0 tokenPtr = 0x7fffd453a090 commandLength = 32767 bytesLeft = 10 expandRequested = 0 code = 0 savedVarFramePtr = 0x7fffd4001550 allowExceptions = 0 gotParse = 1 i = 3823101680 objectsUsed = 1 parsePtr = 0x7fffd453a000 eeFramePtr = 0x7fffd453a250 stackObjArray = 0x7fffd453a2b0 expandStack = 0x7fffd453a360 linesStack = 0x7fffd453a3c0 clNext = 0x0 #16 0x00007ffff79cc5d9 in Tcl_EvalEx (interp=0x7fffd4005250, script=0x7fffe3dfe880 "ns_cleanup", numBytes=10, flags=0) at /usr/local/src/tcl8.6.13/generic/tclBasic.c:5037 No locals. #17 0x00007ffff7f18c02 in Ns_TclEvalCallback (interp=0x7fffd4005250, cbPtr=0x5555556a1b30, resultDString=0x0) at tclcallbacks.c:186 arg = 0x0 ii = 0 ap = {{gp_offset = 32, fp_offset = 48, overflow_arg_area = 0x7fffe3dfea10, reg_save_area = 0x7fffe3dfe950}} ds = {string = 0x7fffe3dfe880 "ns_cleanup", length = 10, spaceAvl = 200, staticSpace = "ns_cleanup\000\367\377\177\000\000\300\350\337\343\377\177\000\000P\351\337\343\377\177\000\000\210\277jUUU\000\000@\351\337\343\377\177\000\000\000\351\337\343\377\177\000\000`\354bU\001\001\001\000\340\350\337\343\377\177\000\000\360\350\337\343\377\177\000\000\020\351\337\343\377\177\000\000\270\277jUUU\000\000\020\351\337\343\377\177\000\000\332\356\275\367\377\177\000\000\223z\333d\000\000\000\000 \300jUUU\000\000\000\000\000\000\000\000\000\000\270\277jU\005\000\000\000\220\351\337\343\377\177\000\000\023\275\275\367\377\177\000\000\060\352\337\343\377\177\000\000¢\362\367\377\177\000\000\223z\333d\000\000\000\000P\340\025\324\b\000\000\000\220\033jUUU\000"} deallocInterp = false status = 1 __PRETTY_FUNCTION__ = "Ns_TclEvalCallback" #18 0x00007ffff7f29764 in NsTclTraceProc (interp=0x7fffd4005250, arg=0x5555556a1b30) at tclinit.c:1913 cbPtr = 0x5555556a1b30 result = 0 #19 0x00007ffff7f2a158 in RunTraces (itPtr=0x7fffd403d590, why=NS_TCL_TRACE_DEALLOCATE) at tclinit.c:2375 tracePtr = 0x5555556a1b90 servPtr = 0x555555628560 __PRETTY_FUNCTION__ = "RunTraces" #20 0x00007ffff7f29976 in PushInterp (itPtr=0x7fffd403d590) at tclinit.c:2026 interp = 0x7fffd4005250 ok = true __PRETTY_FUNCTION__ = "PushInterp" #21 0x00007ffff7f29717 in NsFreeConnInterp (connPtr=0x55555562ebd0) at tclinit.c:1885 itPtr = 0x7fffd403d590 #22 0x00007ffff7efdf11 in ConnRun (connPtr=0x55555562ebd0) at queue.c:2648 sockPtr = 0x7fffd98f68a0 conn = 0x55555562ebd0 servPtr = 0x555555628560 status = NS_OK auth = 0x0 __PRETTY_FUNCTION__ = "ConnRun" #23 0x00007ffff7efd0de in NsConnThread (arg=0x555555649030) at queue.c:2211 argPtr = 0x555555649030 poolPtr = 0x55555562d7c0 servPtr = 0x555555628560 connPtr = 0x55555562ebd0 wait = {sec = 1692105481, usec = 312006} timePtr = 0x7fffe3dfec20 threadId = 1 duringShutdown = 219 fromQueue = true cpt = 1000 ncons = 996 current = 2 status = NS_OK timeout = {sec = 120, usec = 0} exitMsg = 0x7fffd4000b70 "" joinThread = 0x7fffe3dff640 threadsLockPtr = 0x55555562d830 tqueueLockPtr = 0x55555562d878 wqueueLockPtr = 0x55555562d808 __PRETTY_FUNCTION__ = "NsConnThread" #24 0x00007ffff7bdd734 in NsThreadMain (arg=0x55555855cdc0) at thread.c:232 thrPtr = 0x55555855cdc0 #25 0x00007ffff7bdf6f5 in ThreadMain (arg=0x55555855cdc0) at pthread.c:870 No locals. #26 0x00007ffff7ccfb43 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442 ret = <optimized out> pd = <optimized out> out = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140737488346688, -3886469656811452993, 140737016493632, 0, 140737350793296, 140737488347040, 3886531503754790335, 3886487635365545407}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = <optimized out> #27 0x00007ffff7d61a00 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 No locals. thanks Brian ________________________________ From: Brian Fenton <bri...@ai...> Sent: Monday 14 August 2023 5:40 pm To: nav...@li... <nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu Hi Gustaf thanks again for the advice. Today I made some more progress on this. There does appear to be some differences between your script and the Oupfiz5 installer e.g. his ns-build.sh script https://github.com/oupfiz5/tcl-build/blob/master/src/builds/ns-build.sh I have reached the conclusion that I will be wasting your time if I can't reproduce this problem using your scripts, so my next task will be to run your script and try to reproduce. I am now seeing the downsides to using a non-official Docker approach! Today I took the approach of installing (through the APM) our OpenACS packages one by one. For example, we use packages such as Categories, General Comments etc as well as many of our own custom packages. After each package I bounced Naviserver and tested the site. The system worked perfectly until after I installed the last package, which is our main core of our product, very large and old with a lot of features. This makes me very confident that Oracle and nsoracle are working fine. The problem could be some API call in our custom package that maybe changed in 4.99.25. To answer some of your questions: * did you run at this state any Oracle queries? Yes, I did. I'm 95% confident that Oracle and nsoracle are working fine. * did you recompile in the "clean install" also the oracle driver? Yes, I'm building nsoracle from scratch (I am also running the same version of nsoracle in the 4.99.24 build that is working without issue) * you mean the crash happens in the plain openacs-config.tcl, with no additional drivers etc, no oracle involved? No, this does use Oracle, sorry for not being clear. We have our own heavily modified config file, so I wanted to rule that out by using the openacs-config.tcl that you provide. I just changed the database to Oracle and left everything else as is. The fact that it crashed too means that I can eliminate some strange configuration setting in our custom config file as a possible cause. * My request in the last mail was to try to reproduce the problem with nsd-config.tcl (i.e. no OpenACS involved). Yes, I replied previously that it runs fine. And also a simple OpenACS install on Oracle runs fine. The problems only start with our custom OpenACS package. * To be on the safe side, all /usr/local/ns/bin/*.so files should be newly compiled. Yes, these all appear to be freshly compiled. # ls -l /usr/local/ns/bin/*.so -rwxr-xr-x 1 nsadmin nsadmin 32560 Aug 10 15:31 /usr/local/ns/bin/nscgi.so -rwxr-xr-x 1 nsadmin nsadmin 27360 Aug 10 15:31 /usr/local/ns/bin/nscp.so -rwxr-xr-x 1 nsadmin nsadmin 15808 Aug 10 15:31 /usr/local/ns/bin/nsdb.so -rwxr-xr-x 1 nsadmin nsadmin 50808 Aug 10 15:31 /usr/local/ns/bin/nsdbpg.so -rwxr-xr-x 1 nsadmin nsadmin 16176 Aug 10 15:31 /usr/local/ns/bin/nsdbtest.so -rwxr-xr-x 1 nsadmin nsadmin 32640 Aug 10 15:31 /usr/local/ns/bin/nslog.so -rwxr-xr-x 1 nsadmin nsadmin 90688 Aug 10 15:42 /usr/local/ns/bin/nsoracle.so -rwxr-xr-x 1 nsadmin nsadmin 90848 Aug 10 15:42 /usr/local/ns/bin/nsoraclecass.so -rwxr-xr-x 1 nsadmin nsadmin 31712 Aug 10 15:31 /usr/local/ns/bin/nsperm.so -rwxr-xr-x 1 nsadmin nsadmin 15888 Aug 10 15:31 /usr/local/ns/bin/nsproxy.so -rwxr-xr-x 1 nsadmin nsadmin 16536 Aug 10 15:31 /usr/local/ns/bin/nssock.so -rwxr-xr-x 1 nsadmin nsadmin 26624 Aug 10 15:31 /usr/local/ns/bin/nsssl.so So my next steps are to try to reproduce the problem using your install-ns.sh script. Then I can compile with debugging and have some fun with gdb. thanks Brian ________________________________ From: Gustaf Neumann <ne...@wu...> Sent: Saturday 12 August 2023 11:55 am To: nav...@li... <nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu On 11.08.23 20:15, Brian Fenton wrote: Hi Gustaf thanks for the response. I've been looking at this in more detail this afternoon and it does appear to be caused by something in the interaction of our OpenACS application with 4.99.27. As I previously mentioned, it has been running fine on 4.99.24 on the same Ubuntu version. I realise that I may not have been clear on this point on my previous email: this is Naviserver running on Ubuntu in a Docker container. The version of Naviserver is based on this Docker build https://github.com/oupfiz5/naviserver-s6 which I have forked and updated to 4.99.27 (I may well have missed something in updating NS version - maybe I should have waited until oupfiz updates his build). * I can confirm that nsd-config.tcl runs fine with 4.99.27 * Some good news: I am able to do an OpenACS clean install on Oracle with 4.99.27. I then successfully installed our application using the APM. did you run at this state any Oracle queries? did you recompile in the "clean install" also the oracle driver? * However, once I restart Naviserver the problems start. * I tried using the openacs-config.tcl that ships with 4.99.27 and the problems are happening with that too. you mean the crash happens in the plain openacs-config.tcl, with no additional drivers etc, no oracle involved? this can get us closer to something i might be able to reproduce. My request in the last mail was to try to reproduce the problem with nsd-config.tcl (i.e. no OpenACS involved). If you can reproduce the crash, you should compile with debugging turned on and run nsd under gdb or lldb. First one should get he most simple case causing the crash. What is odd is that it seems to be able to handle one request before crashing. Eg. I type in the URL, it shows the /register page but then crashes. After restarting, I enter my login details on the register page, press return. It then crashes. After restarting, it successfully logs me, then crashes again. the memory errors or normally hinting on some buffer overflow, or a mixture between 32bit and 64bit compilation, etc. There is no clear pattern in the logs. I thought it might be related to OCSP and disabled that, but the problems continued to occur. if you suspect nsssl, then one potential problem might be a mixture during of different OpenSSL versions during compilation (when using install_ns.sh, this will not happen). Turning on debug hasn't helped - but maybe there is so much information in the log that I have missed something important. What drivers are you referring to in your question? actually all naviserver modules you are using, including the db drivers (since you mentioned nsoracle, which is not part of the regular regression tests). To be on the safe side, all /usr/local/ns/bin/*.so files should be newly compiled. all the best -gn thanks Brian ________________________________ From: Gustaf Neumann <ne...@wu...><mailto:ne...@wu...> Sent: Thursday 10 August 2023 7:27 pm To: nav...@li...<mailto:nav...@li...> <nav...@li...><mailto:nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu Hi Brian, The new NaviServer versions are running fine on Ubuntu 22.04. Have you recompiled the drivers you are using with the updated version? A good test for the NaviServer binary is to test it with one of the packaged configuration files, e.g. nsd-config.tcl. all the best -gn On 10.08.23 18:23, Brian Fenton wrote: Hello we have been testing out our OpenACS application on Ubuntu 22.04.2 LTS (previously we only ran on Windows). It was working great with Naviserver 4.99.24 but I have been getting constant crashes on more recent versions. I get this error on 4.99.25, 4.99.26 and today I also got it on 4.99.27. The server runs fine until I click on a page, then it immediately crashes. The log has only the following error: free(): invalid size and today I got this one: [10/Aug/2023:15:02:23][303.7fa3a64ee640][-conn:openacs:default:1:119-] Fatal: received fatal signal 11 We have an Oracle application and are using the latest nsoracle driver, which might be a factor here. We have been running it with a pretty old OpenACS config file, so I am currently looking to merge in all the latest changes to ensure that is not an issue. Also note that I am running Naviserver on Docker on Windows, but as mentioned it was running great on 4.99.24. thanks for any help Brian _______________________________________________ naviserver-devel mailing list nav...@li...<mailto:nav...@li...> https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: Brian F. <bri...@ai...> - 2023-08-14 17:14:21
|
Hi Gustaf thanks again for the advice. Today I made some more progress on this. There does appear to be some differences between your script and the Oupfiz5 installer e.g. his ns-build.sh script https://github.com/oupfiz5/tcl-build/blob/master/src/builds/ns-build.sh I have reached the conclusion that I will be wasting your time if I can't reproduce this problem using your scripts, so my next task will be to run your script and try to reproduce. I am now seeing the downsides to using a non-official Docker approach! Today I took the approach of installing (through the APM) our OpenACS packages one by one. For example, we use packages such as Categories, General Comments etc as well as many of our own custom packages. After each package I bounced Naviserver and tested the site. The system worked perfectly until after I installed the last package, which is our main core of our product, very large and old with a lot of features. This makes me very confident that Oracle and nsoracle are working fine. The problem could be some API call in our custom package that maybe changed in 4.99.25. To answer some of your questions: * did you run at this state any Oracle queries? Yes, I did. I'm 95% confident that Oracle and nsoracle are working fine. * did you recompile in the "clean install" also the oracle driver? Yes, I'm building nsoracle from scratch (I am also running the same version of nsoracle in the 4.99.24 build that is working without issue) * you mean the crash happens in the plain openacs-config.tcl, with no additional drivers etc, no oracle involved? No, this does use Oracle, sorry for not being clear. We have our own heavily modified config file, so I wanted to rule that out by using the openacs-config.tcl that you provide. I just changed the database to Oracle and left everything else as is. The fact that it crashed too means that I can eliminate some strange configuration setting in our custom config file as a possible cause. * My request in the last mail was to try to reproduce the problem with nsd-config.tcl (i.e. no OpenACS involved). Yes, I replied previously that it runs fine. And also a simple OpenACS install on Oracle runs fine. The problems only start with our custom OpenACS package. * To be on the safe side, all /usr/local/ns/bin/*.so files should be newly compiled. Yes, these all appear to be freshly compiled. # ls -l /usr/local/ns/bin/*.so -rwxr-xr-x 1 nsadmin nsadmin 32560 Aug 10 15:31 /usr/local/ns/bin/nscgi.so -rwxr-xr-x 1 nsadmin nsadmin 27360 Aug 10 15:31 /usr/local/ns/bin/nscp.so -rwxr-xr-x 1 nsadmin nsadmin 15808 Aug 10 15:31 /usr/local/ns/bin/nsdb.so -rwxr-xr-x 1 nsadmin nsadmin 50808 Aug 10 15:31 /usr/local/ns/bin/nsdbpg.so -rwxr-xr-x 1 nsadmin nsadmin 16176 Aug 10 15:31 /usr/local/ns/bin/nsdbtest.so -rwxr-xr-x 1 nsadmin nsadmin 32640 Aug 10 15:31 /usr/local/ns/bin/nslog.so -rwxr-xr-x 1 nsadmin nsadmin 90688 Aug 10 15:42 /usr/local/ns/bin/nsoracle.so -rwxr-xr-x 1 nsadmin nsadmin 90848 Aug 10 15:42 /usr/local/ns/bin/nsoraclecass.so -rwxr-xr-x 1 nsadmin nsadmin 31712 Aug 10 15:31 /usr/local/ns/bin/nsperm.so -rwxr-xr-x 1 nsadmin nsadmin 15888 Aug 10 15:31 /usr/local/ns/bin/nsproxy.so -rwxr-xr-x 1 nsadmin nsadmin 16536 Aug 10 15:31 /usr/local/ns/bin/nssock.so -rwxr-xr-x 1 nsadmin nsadmin 26624 Aug 10 15:31 /usr/local/ns/bin/nsssl.so So my next steps are to try to reproduce the problem using your install-ns.sh script. Then I can compile with debugging and have some fun with gdb. thanks Brian ________________________________ From: Gustaf Neumann <ne...@wu...> Sent: Saturday 12 August 2023 11:55 am To: nav...@li... <nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu On 11.08.23 20:15, Brian Fenton wrote: Hi Gustaf thanks for the response. I've been looking at this in more detail this afternoon and it does appear to be caused by something in the interaction of our OpenACS application with 4.99.27. As I previously mentioned, it has been running fine on 4.99.24 on the same Ubuntu version. I realise that I may not have been clear on this point on my previous email: this is Naviserver running on Ubuntu in a Docker container. The version of Naviserver is based on this Docker build https://github.com/oupfiz5/naviserver-s6 which I have forked and updated to 4.99.27 (I may well have missed something in updating NS version - maybe I should have waited until oupfiz updates his build). * I can confirm that nsd-config.tcl runs fine with 4.99.27 * Some good news: I am able to do an OpenACS clean install on Oracle with 4.99.27. I then successfully installed our application using the APM. did you run at this state any Oracle queries? did you recompile in the "clean install" also the oracle driver? * However, once I restart Naviserver the problems start. * I tried using the openacs-config.tcl that ships with 4.99.27 and the problems are happening with that too. you mean the crash happens in the plain openacs-config.tcl, with no additional drivers etc, no oracle involved? this can get us closer to something i might be able to reproduce. My request in the last mail was to try to reproduce the problem with nsd-config.tcl (i.e. no OpenACS involved). If you can reproduce the crash, you should compile with debugging turned on and run nsd under gdb or lldb. First one should get he most simple case causing the crash. What is odd is that it seems to be able to handle one request before crashing. Eg. I type in the URL, it shows the /register page but then crashes. After restarting, I enter my login details on the register page, press return. It then crashes. After restarting, it successfully logs me, then crashes again. the memory errors or normally hinting on some buffer overflow, or a mixture between 32bit and 64bit compilation, etc. There is no clear pattern in the logs. I thought it might be related to OCSP and disabled that, but the problems continued to occur. if you suspect nsssl, then one potential problem might be a mixture during of different OpenSSL versions during compilation (when using install_ns.sh, this will not happen). Turning on debug hasn't helped - but maybe there is so much information in the log that I have missed something important. What drivers are you referring to in your question? actually all naviserver modules you are using, including the db drivers (since you mentioned nsoracle, which is not part of the regular regression tests). To be on the safe side, all /usr/local/ns/bin/*.so files should be newly compiled. all the best -gn thanks Brian ________________________________ From: Gustaf Neumann <ne...@wu...><mailto:ne...@wu...> Sent: Thursday 10 August 2023 7:27 pm To: nav...@li...<mailto:nav...@li...> <nav...@li...><mailto:nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu Hi Brian, The new NaviServer versions are running fine on Ubuntu 22.04. Have you recompiled the drivers you are using with the updated version? A good test for the NaviServer binary is to test it with one of the packaged configuration files, e.g. nsd-config.tcl. all the best -gn On 10.08.23 18:23, Brian Fenton wrote: Hello we have been testing out our OpenACS application on Ubuntu 22.04.2 LTS (previously we only ran on Windows). It was working great with Naviserver 4.99.24 but I have been getting constant crashes on more recent versions. I get this error on 4.99.25, 4.99.26 and today I also got it on 4.99.27. The server runs fine until I click on a page, then it immediately crashes. The log has only the following error: free(): invalid size and today I got this one: [10/Aug/2023:15:02:23][303.7fa3a64ee640][-conn:openacs:default:1:119-] Fatal: received fatal signal 11 We have an Oracle application and are using the latest nsoracle driver, which might be a factor here. We have been running it with a pretty old OpenACS config file, so I am currently looking to merge in all the latest changes to ensure that is not an issue. Also note that I am running Naviserver on Docker on Windows, but as mentioned it was running great on 4.99.24. thanks for any help Brian _______________________________________________ naviserver-devel mailing list nav...@li...<mailto:nav...@li...> https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: Gustaf N. <ne...@wu...> - 2023-08-12 10:55:25
|
On 11.08.23 20:15, Brian Fenton wrote: > Hi Gustaf > > thanks for the response. I've been looking at this in more detail this > afternoon and it does appear to be caused by something in the > interaction of our OpenACS application with 4.99.27. As I previously > mentioned, it has been running fine on 4.99.24 on the same Ubuntu > version. I realise that I may not have been clear on this point on my > previous email: this is Naviserver running on Ubuntu in a Docker > container. The version of Naviserver is based on this Docker build > https://github.com/oupfiz5/naviserver-s6 > <https://github.com/oupfiz5/naviserver-s6> which I have forked and > updated to 4.99.27 (I may well have missed something in updating NS > version - maybe I should have waited until oupfiz updates his build). > > * I can confirm that nsd-config.tcl runs fine with 4.99.27 > * Some good news: I am able to do an OpenACS clean install on Oracle > with 4.99.27. I then successfully installed our application using > the APM. > did you run at this state any Oracle queries? did you recompile in the "clean install" also the oracle driver? > > * However, once I restart Naviserver the problems start. > * I tried using the openacs-config.tcl that ships with 4.99.27 and > the problems are happening with that too. > you mean the crash happens in the plain openacs-config.tcl, with no additional drivers etc, no oracle involved? this can get us closer to something i might be able to reproduce. My request in the last mail was to try to reproduce the problem with nsd-config.tcl (i.e. no OpenACS involved). If you can reproduce the crash, you should compile with debugging turned on and run nsd under gdb or lldb. First one should get he most simple case causing the crash. > What is odd is that it seems to be able to handle one request before > crashing. Eg. I type in the URL, it shows the /register page but then > crashes. After restarting, I enter my login details on the register > page, press return. It then crashes. After restarting, it successfully > logs me, then crashes again. the memory errors or normally hinting on some buffer overflow, or a mixture between 32bit and 64bit compilation, etc. > > There is no clear pattern in the logs. I thought it might be related > to OCSP and disabled that, but the problems continued to occur. if you suspect nsssl, then one potential problem might be a mixture during of different OpenSSL versions during compilation (when using install_ns.sh, this will not happen). > Turning on debug hasn't helped - but maybe there is so much > information in the log that I have missed something important. > > What drivers are you referring to in your question? actually all naviserver modules you are using, including the db drivers (since you mentioned nsoracle, which is not part of the regular regression tests). To be on the safe side, all /usr/local/ns/bin/*.so files should be newly compiled. all the best -gn > > thanks > Brian > > ------------------------------------------------------------------------ > *From:* Gustaf Neumann <ne...@wu...> > *Sent:* Thursday 10 August 2023 7:27 pm > *To:* nav...@li... > <nav...@li...> > *Subject:* Re: [naviserver-devel] Crashing on all versions >4.99.24 on > Ubuntu > > Hi Brian, > > > The new NaviServer versions are running fine on Ubuntu 22.04. Have you > recompiled the drivers you are using with the updated version? > > > A good test for the NaviServer binary is to test it with one of the > packaged configuration files, e.g. nsd-config.tcl. > > > all the best > > -gn > > > On 10.08.23 18:23, Brian Fenton wrote: >> Hello >> >> we have been testing out our OpenACS application on Ubuntu 22.04.2 >> LTS (previously we only ran on Windows). It was working great with >> Naviserver 4.99.24 but I have been getting constant crashes on more >> recent versions. >> >> I get this error on 4.99.25, 4.99.26 and today I also got it on >> 4.99.27. The server runs fine until I click on a page, then it >> immediately crashes. >> The log has only the following error: >> free(): invalid size >> >> and today I got this one: >> [10/Aug/2023:15:02:23][303.7fa3a64ee640][-conn:openacs:default:1:119-] >> Fatal: received fatal signal 11 >> >> We have an Oracle application and are using the latest nsoracle >> driver, which might be a factor here. >> We have been running it with a pretty old OpenACS config file, so I >> am currently looking to merge in all the latest changes to ensure >> that is not an issue. >> Also note that I am running Naviserver on Docker on Windows, but as >> mentioned it was running great on 4.99.24. >> >> thanks for any help >> Brian >> > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: Brian F. <bri...@ai...> - 2023-08-11 19:29:30
|
Hi Gustaf thanks for the response. I've been looking at this in more detail this afternoon and it does appear to be caused by something in the interaction of our OpenACS application with 4.99.27. As I previously mentioned, it has been running fine on 4.99.24 on the same Ubuntu version. I realise that I may not have been clear on this point on my previous email: this is Naviserver running on Ubuntu in a Docker container. The version of Naviserver is based on this Docker build https://github.com/oupfiz5/naviserver-s6 which I have forked and updated to 4.99.27 (I may well have missed something in updating NS version - maybe I should have waited until oupfiz updates his build). * I can confirm that nsd-config.tcl runs fine with 4.99.27 * Some good news: I am able to do an OpenACS clean install on Oracle with 4.99.27. I then successfully installed our application using the APM. However, once I restart Naviserver the problems start. * I tried using the openacs-config.tcl that ships with 4.99.27 and the problems are happening with that too. Here is a selection of the kinds of errors I'm seeing: * munmap_chunk(): invalid pointer * free(): invalid next size (fast) * corrupted size vs. prev_size * malloc(): unaligned tcache chunk detected What is odd is that it seems to be able to handle one request before crashing. Eg. I type in the URL, it shows the /register page but then crashes. After restarting, I enter my login details on the register page, press return. It then crashes. After restarting, it successfully logs me, then crashes again. There is no clear pattern in the logs. I thought it might be related to OCSP and disabled that, but the problems continued to occur. Turning on debug hasn't helped - but maybe there is so much information in the log that I have missed something important. What drivers are you referring to in your question? thanks Brian ________________________________ From: Gustaf Neumann <ne...@wu...> Sent: Thursday 10 August 2023 7:27 pm To: nav...@li... <nav...@li...> Subject: Re: [naviserver-devel] Crashing on all versions >4.99.24 on Ubuntu Hi Brian, The new NaviServer versions are running fine on Ubuntu 22.04. Have you recompiled the drivers you are using with the updated version? A good test for the NaviServer binary is to test it with one of the packaged configuration files, e.g. nsd-config.tcl. all the best -gn On 10.08.23 18:23, Brian Fenton wrote: Hello we have been testing out our OpenACS application on Ubuntu 22.04.2 LTS (previously we only ran on Windows). It was working great with Naviserver 4.99.24 but I have been getting constant crashes on more recent versions. I get this error on 4.99.25, 4.99.26 and today I also got it on 4.99.27. The server runs fine until I click on a page, then it immediately crashes. The log has only the following error: free(): invalid size and today I got this one: [10/Aug/2023:15:02:23][303.7fa3a64ee640][-conn:openacs:default:1:119-] Fatal: received fatal signal 11 We have an Oracle application and are using the latest nsoracle driver, which might be a factor here. We have been running it with a pretty old OpenACS config file, so I am currently looking to merge in all the latest changes to ensure that is not an issue. Also note that I am running Naviserver on Docker on Windows, but as mentioned it was running great on 4.99.24. thanks for any help Brian |
From: Gustaf N. <ne...@wu...> - 2023-08-10 18:27:37
|
Hi Brian, The new NaviServer versions are running fine on Ubuntu 22.04. Have you recompiled the drivers you are using with the updated version? A good test for the NaviServer binary is to test it with one of the packaged configuration files, e.g. nsd-config.tcl. all the best -gn On 10.08.23 18:23, Brian Fenton wrote: > Hello > > we have been testing out our OpenACS application on Ubuntu 22.04.2 LTS > (previously we only ran on Windows). It was working great with > Naviserver 4.99.24 but I have been getting constant crashes on more > recent versions. > > I get this error on 4.99.25, 4.99.26 and today I also got it on > 4.99.27. The server runs fine until I click on a page, then it > immediately crashes. > The log has only the following error: > free(): invalid size > > and today I got this one: > [10/Aug/2023:15:02:23][303.7fa3a64ee640][-conn:openacs:default:1:119-] > Fatal: received fatal signal 11 > > We have an Oracle application and are using the latest nsoracle > driver, which might be a factor here. > We have been running it with a pretty old OpenACS config file, so I am > currently looking to merge in all the latest changes to ensure that is > not an issue. > Also note that I am running Naviserver on Docker on Windows, but as > mentioned it was running great on 4.99.24. > > thanks for any help > Brian > |
From: Brian F. <bri...@ai...> - 2023-08-10 16:56:31
|
Hello we have been testing out our OpenACS application on Ubuntu 22.04.2 LTS (previously we only ran on Windows). It was working great with Naviserver 4.99.24 but I have been getting constant crashes on more recent versions. I get this error on 4.99.25, 4.99.26 and today I also got it on 4.99.27. The server runs fine until I click on a page, then it immediately crashes. The log has only the following error: free(): invalid size and today I got this one: [10/Aug/2023:15:02:23][303.7fa3a64ee640][-conn:openacs:default:1:119-] Fatal: received fatal signal 11 We have an Oracle application and are using the latest nsoracle driver, which might be a factor here. We have been running it with a pretty old OpenACS config file, so I am currently looking to merge in all the latest changes to ensure that is not an issue. Also note that I am running Naviserver on Docker on Windows, but as mentioned it was running great on 4.99.24. thanks for any help Brian |
From: Gustaf N. <ne...@wu...> - 2023-08-09 10:47:42
|
Many thanks, David, for figuring this out! Many thanks, David, for figuring this out! The change is incorporated in the nswebpush module on Bitbucket. Against my own rules, I've updated the just released tar file for the modules naviserver-4.99.27-modules.tar.gz to include this change. all the best! -g On 09.08.23 12:19, David Osborne wrote: > Thanks Gustaf - replies inline... > > On Wed, 9 Aug 2023 at 10:38, Gustaf Neumann <ne...@wu...> wrote: > > Hi David, > > We do not have nswebpush somewhere in production. Can you tell > more precisely, what "suddenly" means? > > About lunchtime on 2nd Aug! > > Does this mean, that you have not changed anything in your > environment, but google started to refuse it? > > Yes exactly... > > We've worked out what was angering Google - it was a version of this > code in our case: > https://bitbucket.org/naviserver/nswebpush/src/1e412c76626b29a4573b595a069a8ea10feece8a/webpush-procs.tcl#lines-607 > > Construction of the json from the claim dict was treating "exp" as a > string rather than numeric. > Just as an illustration, this quick hack makes the "make test" run > cleanly in the nswebpush codebase: > > proc dictToJson {dict} { > # > # Serializes a Tcl dict to compact JSON. No testing for > # nested dicts or arrays, these will be simply added as a > # string the JSON is in compact form, meaning no whitespaces > # and newlines between keys/values. > > set pairs {} > dict for {key value} $dict { > regsub -all \" $key "\\\"" key > regsub -all \" $value "\\\"" value > if { $key eq "exp"} { > lappend pairs [subst {"$key":$value}] > } else { > lappend pairs [subst {"$key":"$value"}] > } > } > return "{[join $pairs ,]}" > } > > > > > _______________________________________________ > naviserver-devel mailing list > nav...@li... > https://lists.sourceforge.net/lists/listinfo/naviserver-devel -- Univ.Prof. Dr. Gustaf Neumann Head of the Institute of Information Systems and New Media of Vienna University of Economics and Business Program Director of MSc "Information Systems" |
From: David O. <da...@qc...> - 2023-08-09 10:27:53
|
Thanks Gustaf - replies inline... On Wed, 9 Aug 2023 at 10:38, Gustaf Neumann <ne...@wu...> wrote: > Hi David, > > We do not have nswebpush somewhere in production. Can you tell more > precisely, what "suddenly" means? > About lunchtime on 2nd Aug! > Does this mean, that you have not changed anything in your environment, > but google started to refuse it? > Yes exactly... We've worked out what was angering Google - it was a version of this code in our case: https://bitbucket.org/naviserver/nswebpush/src/1e412c76626b29a4573b595a069a8ea10feece8a/webpush-procs.tcl#lines-607 Construction of the json from the claim dict was treating "exp" as a string rather than numeric. Just as an illustration, this quick hack makes the "make test" run cleanly in the nswebpush codebase: proc dictToJson {dict} { # # Serializes a Tcl dict to compact JSON. No testing for # nested dicts or arrays, these will be simply added as a # string the JSON is in compact form, meaning no whitespaces # and newlines between keys/values. set pairs {} dict for {key value} $dict { regsub -all \" $key "\\\"" key regsub -all \" $value "\\\"" value if { $key eq "exp"} { lappend pairs [subst {"$key":$value}] } else { lappend pairs [subst {"$key":"$value"}] } } return "{[join $pairs ,]}" } > > |