I'm in the process of moving from mod_webkit to using the built-in
HTTP server (because my hosting co wants to run my webapp behind
My service supports uploads of some very large files. This worked
fine under mod_webkit, which is what we had been using. However,
things are blowing up with the built-in HTTP server. I did some
investigating, and discovered that, for *any* request with a separate
body (a POST, a multipart/form, etc), WebKit.HTTPServer.HTTPHandler
is reading the entire body into memory, then wrapping it in a
StringIO instance before handing if off to the Application class.
This is a disaster for large files (> 50 M, say), and just seems like
an odd design choice all around.
After looking carefully at HTTPHandler, I think I've found a way to
avoid holding that file in memory. I wanted to ask the list if there
was a reason for the current design that I'm missing. If not, I'd
propose the patch below as an improvement to the built-in HTTP
server. Here's what I'm doing:
In WebKit/HTTPServer.py, l. 58, it does the read into memory from
rfile (which is a file-like wrapper around the connection from the
client, poised to read at the start of the body):
input = self.headers.has_key('Content-Length') \
Which is then wrapped in StringIO and passed on to the app on l. 140:
I changed that to (again, l.58):
input = self.rfile
env['CONTENT_LENGTH'] = self.headers['Content-Length']
input = StringIO('')
And then just passed that input in directly on l. 140:
I had to set the CONTENT_LENGTH in the environment dict so that
cgi.py could accurately parse it. Without that, it just locked up.
With it, it seems to be working fine.
Basically, I'm just removing an extra level of reading into memory
and then wrapping in StringIO, which, as far as I can tell, is
serving no purpose, and, for large request bodies, is a crippling
Anyone know what it's set up the way it is now?