From: Matt F. <ma...@pa...> - 2002-07-29 21:30:56
|
A codebase management problem has developed for us where we have created a servlet codebase that we want to be shared as gracefully as possible by multiple deployments. Right now, it's very easy for WebKit applications to share non-servlet modules and packages, through the standard pythonic ways (MiddleKit, cStringIO, PIL, etc.). It's NOT correspondingly easy to have WebKit applications share servlets. Consequently, it's quite difficult to manage shared complex servlet codebases. Imagine a WebKit app that provides an "on-line storefront". We want the same code to be used by one company that sells fastening hardware and another that sells plush toys. Each of these companies has their own context (or even their own appserver). If we get a third customer (one that sells tea, for instance), we want to be able to reuse the same codebase a third time. And we want to be able to continue to make improvements to that codebase that are enjoyed by all three customers. Given the requirement of separate app servers (so, no, we don't want to use mod_rewrite) stock WebKit provides only two real options for making this happen: 1) Physically replicate all the servlet files for each deployment: each deployment has a complete set of code and will work fine, however updating the codebase becomes unpleasant because the changes have to be manually implemented for each and every deployment. The more deployments, the more pain. And, any customizations unique to a deployment make it hurt even more, because you have to manually edit around the customizations. 2) Create abstract "master" servlets somewhere, and then import and subclass them in a published context. This works, but now you have a plethora of servlet subclasses littering your context that, for the most part, don't do anything useful themselves. Most are just there for AppServer's convenience, because it expects to find a file for each requested URI. Furthermore, some of these servlets may implement customizations (overwrite methods or whatever) and there's no way of telling which ones do or do not without opening them up (which means, there's no way to know if a file does something important or not). Option one is what we have been doing. It sucks, empirically. Option two is what we're facing, and while it buys us that easily-managed central codebase in the form of the master tree, all those subclassing servlets in the respective deployments continue to cause trouble. What we WANT to do is banish all those subclassing servlets that contain no custom code. This is the third option we are looking for. The solution to our problem might be in the way WebKit maps URIs to objects (servlets). Right now, it's a very strict mapping: URI --> <file>.py --> class instance of same name as <file> Where <file>.py is a python module, perhaps in a "package" that contains a class definition called <file>. The App Server then makes an instance of this class, which is the running servlet. If a URI request maps to a file that isn't there, WebKit currently returns a 404. If, however, we could provide one (or more) ALTERNATIVE paths (packages) for the app server to find files (modules) in, we would probably be set. If the URI mapped to index.py, and there was no index.py file in the context where it was expected, the app server could look for the module in the corresponding relative location in an alternate package tree (the master, as it happens) and instantiate a class from there. context "foo" alternate path ------------ ---------------- [missing index.py] ---> index.py The critical point is that the alternate path is simply an alternate PHYSICAL LOCATION for a module; it has absolutely no namespace implications. Going back to our "online storefront" example, we could hypothetically have a completely EMPTY directory tree for each of these different deployments, save for one servlet subclass in each that provides the right database connection for that customer. And if the plush toys store needed a deviation from the master on one of its servlets, we could subclass that servlet for that deployment. Otherwise, all the code would be loaded from the shared pool. No files or code would exist in the deployment directories except what was truly unique from the master. From a cursory inspection of the servlet factory code in WebKit, a patch to search additional physical paths doesn't look too terribly complicated. The main question is how the list of paths would get in there in the first place. The most elegant approach we could think of to-date is to modify the parsing of application.config so that if a context is mapped to a string, it works exactly as now; if the context is mapped to a list of strings, it will iterate through that list of strings as necessary to find a mapping that works. In theory with a list, this could be (n) levels deep, but in our case, we just need two. Would this implementation work? How can we ensure that the module/namespaces stay "correct" (in the local namespace, not the master)? What do we need to take into account to patch the code? Can we do the same thing for non-servlet URI's like .jpg images and .css files? Are we crazy? We know that some of you are working on other mappings, but they're not in Webware yet (and the Wiki is down again). |