Sorry to seem incommunicado. I'm not ignoring the request for information;
I was just waiting for the problem to recur so that I could try a few
things. True to form, it hasn't recurred, probably because I'm watching it.
I think that the only C extension we're using is DCOracle2 from zope.org.
I'll try to implement the ideas below, and also there have been some others
(how to use the debugger to attach to a "hung" thread and get a stack trace,
recreating the problem using Webware 0.8, Webware latest-from-CVS, and
installing Python 2.3a2 and applying the WW patch to deal with concurrent
imports... That last one may be problematic in our environment, but if it
comes to that, I guess we'll have to.
If anybody's had more insights about this problem, please reply.
David Hancock | dhancock@... | 410-266-4384
From: Geoffrey Talvola [mailto:gtalvola@...]
Sent: Monday, March 10, 2003 3:07 PM
To: 'Hancock, David (DHANCOCK)'; webware-discuss@...
Subject: RE: Wedged threads (was RE: [Webware-discuss] help evaluating We
Which extension modules are you using (for database access or other
purposes)? Being pure Python code, WebKit is extremely unlikely to ever
segfault on its own, but extension modules coded in C certainly will if they
are buggy. Webware is only as reliable as the extension modules you use.
Try upgrading to the latest stable versions of all extension modules.
Webware 0.8 _does_ fix a subtle problem in 0.7 that would eventually cause a
particular servlet to stop responding (and other important bugfixes). You
are probably experiencing a different problem, but you ought to upgrade
anyhow since this was a known problem with 0.7.
I'll let you know if I make any progress on automatic thread wedge
debugging, but I haven't started it yet. There is some stuff you can do
- Change AppServer.config to use the same number for StartServerThreads,
MinServerThreads, and MaxServerThreads to make analyzing the logs easier.
(As a side note, I personally do this always because I think having the
thread pool dynamically adjust is pointless. I argued this position when
the dynamically adjusting thread pool code was being put into WebKit, but
other people disagreed with me so it went in.)
- Add code to your servlets to print out threading.currentThread().getName()
which will be a unique name per thread.
- Let this run for a while, then review your logs to see if certain threads
stop responding after a while. You should see the requests rotate through
the number of threads you configured in your AppServer.config. If the
number of threads being used drops over time, you've got a thread wedge.
This type of monitoring is basically what I was planning on automating with
a DebugThreads flag.
> -----Original Message-----
> From: Hancock, David (DHANCOCK) [mailto:DHANCOCK@...]
> Sent: Monday, March 10, 2003 2:37 PM
> To: Geoffrey Talvola; webware-discuss@...
> Subject: Wedged threads (was RE: [Webware-discuss] help evaluating
> WebWar e)
> We're now encountering a "wedge" every couple of days or so
> on our test
> servers (RedHat Linux 6.2, Webware 0.7, Apache 1.3.x with
> mod_webkit) and
> it's starting to worry us. So if there's progress on the
> front, we'd love to hear about it. And if we can help, let me know.
> Caveat: We're developing WITH Webware, but don't feel
> competent yet to make
> substantive changes to the core code.
> The symptoms are that NO webkit threads seem to respond to
> browser requests.
> Sometimes there's a core dump, which gets labeled with the
> pid, I'm assuming
> (such as core.34172, where 34171, 34170, etc. are Webkit
> processes. Loading
> the core in the debugger tells me that ThreadedAppServer encountered a
> segmentation fault (signal 11). There's nothing labeled in
> the stack trace,
> just memory addresses, so I can't tell what function was
> executing. netstat
> reports that there's a server listening (we're using 8086 for
> WebKit), and
> in fact, we can do 'telnet localhost 8086' and get connected.
> (Of course,
> we can't do more at that point because it's looking for request data
> marshalled into some structure we don't know how to recreate from the
> command line!) But at least there's a listening socket.
> When this occurs, stopping and starting the ThreadedAppServer
> clears the
> problem, but the processes need to be 'kill -9'ed before they'll die.
> I'd be grateful for any ideas on how to figure out where this
> problem is
> coming from, how to fix it, etc.
> I wish I had better information from the core dump (which
> doesn't always
> happen), so if we need to instrument Webware or even Python
> to get better
> stack traces, we could try that. My past experiences with turning on
> voluminous debug information, though, is that it slows things
> down enough
> that timing-related bugs don't show up anymore.
> We MAY be able to install 0.8 on the test system, but my
> handlers would like
> some statement from me BESIDES "I hope the upgrade will fix
> this problem."
> They keep quoting the title of a recent business book: _Hope Is Not a
> David Hancock | dhancock@... | 410-266-4384
> -----Original Message-----
> From: Geoffrey Talvola [mailto:gtalvola@...]
> Sent: Thursday, March 06, 2003 9:41 AM
> To: 'Hancock, David (DHANCOCK)'; webware-discuss@...
> Subject: RE: [Webware-discuss] help evaluating WebWare
> Hancock, David (DHANCOCK) [mailto:DHANCOCK@...] wrote:
> > We have had some occurrences of threads seeming to wedge
> > (mentioned just
> > recently on the list), and we're trying to figure out why.
> I'd like to improve WebKit so that you can enable a
> "DebugThreads" flag. It
> would then detect when a thread is wedged (i.e. hasn't finished its
> transaction within some configurable timeout) and send a
> warning email.
> Perhaps it would also add an additional thread to the thread
> pool to make up
> for the wedged thread. Finally, it would be nice if the wedged thread
> didn't prevent the appserver from shutting down cleanly.
> Any other ideas? I'll probably try to implement this within
> the next couple
> - Geoff