The Anatomy of a Webware Transaction

This documents the process that occurs when an HTTP request is made to Webware. It is current as of version 0.5 of Webware, as of my understanding on 25 March 2000. -- Ian Bicking

The First Connection

The process begins when a browser requests the web server to for a page. The browser tells the server what page it wants to receive, and passes any cookies that are marked for the server, as well as any form variables (GET or POST).

Because different web servers are supported, and under each server there several ways to interact with Webware, there are a variety of adapters that will handle the request at this point. With the exception of OneShot, they will generally package up the request and send it over a socket to the AppServer. The AppServer has been started ahead of time, and is waiting to respond.

(@@ give examples/summaries of various adapters and how they work, perhaps describe how request is packaged)

AppServer

The AppServer is generally the subclass AsyncThreadedAppServer, however I will refer to it simply as AppServer.

The AppServer listens for requests from an Adapter*. When it receives a request, it puts it in a queue and the next available thread will handle the request.

The thread (which is an instance of the RequestHandler class) will wait until it has read all the data (RequestHandler.handle_read), placing the data in RequestHandler.reqdata. Then the RequestHandler.handleRequest method is called. The request that was passed over the socket is then unmarshals*.

RequestHandler handles STATUS and QUIT methods directly. (@@ how would these requests be made?) All other requests are handled by Application. AppServer keeps an instance of Application, and Application.dispatchRawRequest is called with the unmarshalled request*.

OneShot

When you connect through OneShot.cgi you go through largely the same process, except that there is no persistence. The OneShot adapter starts a new AppServer (OneShotAppServer) for each request. This is inefficient but at times convenient.

Application

Creating a Request

Application.dispatchRawRequest takes the dictionary that was passed over the socket, and creates an HTTPRequest. HTTPRequest.__init__ parses the dictionary. It parses fields, cookies, and some internal values that are used by Application.

Creating a Transaction

After the HTTPRequest is created, it is passed to Application.dispatchRequest. This creates a Transaction with Application.createTransactionForRequest, which in turn simply creates a Transaction, passing it the application (self) and the request. Transaction simply acts a container for these various pieces of a transaction (request, response, session, servlet, and application), and passes messages to them (through methods).

Creating a Response

A response is created with Application.createResponseInTransaction. An HTTPResponse is created, and again Transaction acts as a container.

Finding the Servlet file

The Application asks for HTTPRequest.serverSidePath. This in turns goes right back to Application.serverSidePathForRequest. This then tries to find the Servlet that corresponds to the URI asked for.

Consider an example URI:
    http://www.server.com/cgi-bin/OneShot.cgi/Welcome

With the filename of the servlet, Application continues.

Dispatching on the Result of serverSidePath

When Application.dispatchRequest gets the resultant serverSidePath, it calls one of a couple methods:

Creating a Servlet

Application.handleGoodURL calls Application.createServletInTransaction. Like the path lookup, this method first looks for a cached Servlet. If it's found, it checks the timestamp on the cache and the source file, invalidating the cache if necessary.

If a cached Servlet wasn't found, or the cache was invalidated, it creates a new cache entry for the Servlet.*

Application.getServlet actually creates the Servlet. The cache actually keeps a queue of available instances of the Servlet, which are reused when possible. (@@ what's up with the factories here? Oy, I need to figure this part out some more)

Waking the Transaction

Once the Transaction has a Servlet to work with, it calls Transaction.awake, Transaction.respond, and finally Transaction.sleep. Transaction in turns calls these methods on both the Session and the Servlet.

HTTPServlet.awake doesn't do anything, unless you override it in a subclass (@@ why would you override it? What does it mean?)

Session.awake sets its list access time and number of accesses when awake is called.

Responding

HTTPServlet.respond is called with the transaction as its only argument. It calls a method based on the request type: 'GET', 'POST', 'PUT', 'DELETE', 'OPTIONS', 'TRACE'. 'GET' calls HTTPServlet.respondToGet, 'POST' respondToPost, etc., all with the transaction as an argument. The actual Servlet must override these methods to give the desired behavior.

Session.respond does nothing.

Page

Page (a subclass of HTTPServlet) has more interesting behavior. It is particularly directed towards generating HTML (HTTPServlet is entirely neutral), and consolidates a number of things.

Page.awake initializes a number of variables that will not change for the entire transaction (but may change if the Page is reused for other transactions).

Page.respondToGet and Page.respondToPost both call Page._respond, which looks for a field named '_action_' and dispatches based on that. '_action_' is translated by Page.methodNameForAction, and the result must be among the list returned by Page.actions (cached by Page._actionSet). If no '_action_' field is given, Page.writeHTML is run.

Page can also generate the HEAD, TITLE, and other elements of the HTML page. You can override Page.writeBody to generate content, and methods like Page.title() to generate other content. It's easiest just to look at Page.py to see these.

Writing a page

Page.write calls HTTPResponse.write with its arguments. HTTPResponse.write holds these strings in a list.

The Return Path

Application

After having set up the request, we need to back out all the way to the browser.

After Application.dispatchRequest has called Application.handleGoodURL (which calls awake/respond/sleep), it will call HTTPResponse.deliver. (@@ what exactly is this supposed to do?) (@@ then Application.returnInstance is called... I don't understand it either) It then returns the transaction to RequestHandler (in AsyncThreadedAppServer.py).

Response

RequestHandler.handleRequest calls HTTPResponse.rawResponse, which returns a dictionary containing the keys 'headers' and 'contents'. Headers is a list of header/value pairs. For example:

[('Content-Type', 'text/html'),
 ('Set-Cookie', 'foo=bar')
]

RequestHandler.handleRequest then turns this into a normal CGI-style response, with header: value at the top, a blank line, and then 'contents'. It then deletes the transaction.

Adapter

Having waited patiently, RequestHandler will finally send the string contructed from the Response to the Adapter over the socket. The adapter will deal with it as appropriate. E.g., the CGI adapter prints the result to stdout.

Finished

The user sees the page, and it is good.


* The AppServer writes the hostname and port to a file address.text. The Adapter reads this file to determine where it can connect to the AppServer.

* Marshalling takes simple Python values -- strings, lists, numbers, etc., and puts them into a string representation.

* The request is a dictionary with the keys 'format', 'time', 'input', and 'environ':

'format'
The only current allowed value for 'format' is 'CGI'.
'time'
A timestamp (seconds from the Unix Epoch).
'environ'
A dictionary that looks like what os.environ would look like were this actually a CGI call -- that is, with keys like REQUEST_METHOD, QUERY_STRING, etc.
'input'
The request that the browser made. This would be something like GET /Examples/View?filename=Welcome.py (@@ POST example too?)

* The cache for the Servlet is used both for the file path lookup, and for the Servlet cache (i.e., two caches keyed by URL/PATH_INFO and by serverSidePath, but pointing to the same cached data). (@@ maybe some information on how the cache is stored)