Webware for Python / Patches / #82 Transaction Monitoring and Control

#82 Transaction Monitoring and Control

Status: closed-accepted

Owner: Christoph Zwerschke

Labels: WebKit (45)

Priority: 5

Updated: 2007-11-25

Created: 2007-10-03

Creator: Steve Schwarz

Private: No

Hi,
I've been running a local patch for a couple weeks now with following features:

1 - Added two methods to ThreadedAppServer: startRequest() and endRequest() methods. These methods can be used to implement page statistics and/or click tracking by session. Default implementation supports feature 2 below.

2 - Allow real time monitoring of the requests being processed by each thread. A new ThreadControl page has been added to the Admin context. Setting 'TrackThreads' to True in AppServer.config enables this feature. If the supplied thread2.py module is loaded requests can be cancelled via this page.

3 - Support automatically cancelling requests taking more than a configurable amount of time. Setting 'CancelLongRequests' to the number of seconds after which a request should be cancelled in the AppServer.config file enables this feature. This feature also requires 'TrackThreads' to be enabled.

The patch is relatively small, if there is interest in looking at these changes and moving them into the mainline I'd be happy to forward them to anyone interested. Of course, comments and suggestions are very welcome.

The performance penalty for feature 1 is the cost of some accessors into the current transaction, a couple time() calls and two function calls. I would guess these changes should have a negligible impact on performance.

Features 2 and 3 use a new class which is basically a locked dictionary. So the cost is two additional mutex lock/unlocks for each transaction. If WebKit didn't support adding/removing threads dynamically that mutex could be removed and the additional overhead is some dictionary accesses.

I'd be happy to get feedback on these enhancements.

Thanks,
Steve

Discussion

Steve Schwarz - 2007-10-03

WebKit Thread Mods 2007-10-02

WebKit_threadmods_20071002.tgz

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christoph Zwerschke - 2007-10-28

assigned_to: nobody --> cito
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christoph Zwerschke - 2007-11-23

status: open --> closed-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christoph Zwerschke - 2007-11-23

Logged In: YES
user_id=193957
Originator: NO

Thanks a lot, Steve. This has now been implemented in r7097 and will go into Webware 0.9.5. I only changed some implementation details. For instance, I am not using any locks. Since we have only one writer, and since setattr operations are atomic in Python, you can actually do without them. I have also added an improved version of your ThreadControl servlet. Let me know if you think this implementation is ok or if you find any bugs or problems.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christoph Zwerschke - 2007-11-23

Logged In: YES
user_id=193957
Originator: NO

I just noticed that I have only implemented features 1 and 2, and completely forgot about your CancelLongRequests option for canceling long-runners automatically. Reopened this ticket as a reminder to implement that, too.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christoph Zwerschke - 2007-11-23

status: closed-accepted --> open-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christoph Zwerschke - 2007-11-25

Logged In: YES
user_id=193957
Originator: NO

In r7107 has now been implemented in r7107. Again, let me know if there is still something missing or not working for you.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christoph Zwerschke - 2007-11-25

status: open-accepted --> closed-accepted
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steve Schwarz - 2007-11-25

Logged In: YES
user_id=1111832
Originator: YES

Thanks for promoting this code. I'll take a look at it this week.

My concern with removing the mutexes is the situation where the list of active requests is being created for a request (i.e. for the admin screen) at the same time the number of threads is being reduced. I believe I had the map iterator raise an exception when the map changed underneath it in that situation.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christoph Zwerschke - 2007-11-25

Logged In: YES
user_id=193957
Originator: NO

I don't think that the changing map iteration is a problem since I'm using the items() or values() of the map only.

However, now that you say it, I see a different problem. There is a minuscule chance that while the list of active threads is iterated, one of these threads finishes by itself and processes another request. In this case, the new request will be canceled instead of the one we actually wanted to cancel. I think both of our implementations have this problem. To solve this, we may need a different kind of lock used inside the threadloop(). I'll examine this end of this week. Maybe you can make some suggestions until then already.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Steve Schwarz - 2007-11-27

Logged In: YES
user_id=1111832
Originator: YES

Christoph: "However, now that you say it, I see a different problem. There is a
minuscule chance that while the list of active threads is iterated, one of
these threads finishes by itself and processes another request. In this
case, the new request will be canceled instead of the one we actually
wanted to cancel. I think both of our implementations have this problem. To
solve this, we may need a different kind of lock used inside the
threadloop()."

I only just started looking at the code. One thing that comes to mind is to have the exception that is passed into ThreadWorker.abort() contain the requestId being cancelled. Then that code could compare request ids just before issuing the async thread cancel.

Another idea I played with was having long running tasks and/or cancelled tasks give a traceback when they are cancelled. That way you can see what the page was working on when it was killed - the client doesn't have to get the traceback though. In my original version it did that, the version I supplied in this ticket catches that specific exception and just logged a message. WDYT?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christoph Zwerschke - 2007-11-29

Logged In: YES
user_id=193957
Originator: NO

Concerning the first problem, checking the thread again just before it's canceled would decrease the (already small) possibility of a wrong cancelation, but the chance would still exist. I've now checked in an air-tight solution in r7112 which should not cost performance and makes sure that always the right request is canceled.

Concerning your second idea: Currently requests are aborted with an ConnectionAbortedError. Actually this exception is used in the case where the client aborted the connection and the response could not be sent to the client any more.

I think you're right and a different exception should be used for aborting requests, e.g. a new class RequestAbortedError. This should be handled just like any Exception, i.e. a traceback is logged and the client gets either the traceback or an error message (depending on the ShowDebugInfoOnErrors setting) if the servlet does not catch this. If you agree, I will implement it that way and we can close this issue.

Also, what do you think about the default setting of MaxRequestTime = 300. Is this ok? it corresponds with the Apache default timeout value. Or should we set it to None (disable the feature) by default?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Christoph Zwerschke - 2007-12-08

Logged In: YES
user_id=193957
Originator: NO

Concerning your second idea, in r7118 different (http service unavailable) exceptions are now used to abort requests.

Please let me know if there are still issues with the current solution.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.