Good, good... These are important tests. Let's see if we can explain
these two, and, of course, fix things if necessary. It would be nice to
see the actual scripts, but here's my guess based on what you have
Let's deal with the mod_python case first. In this case, the Spyce
engine is all loaded up. The Spyce script is compiled and ready to go.
Therefore, the slowness is coming as the script is running. There is
some operation that you are performing repeatedly, and this operation is
slower in Spyce than in PHP. Based on your description, I would say that
this is 'print'. Why is 'print' slow? Or, first, how does 'print' work?
When you call print, what really happens is that it makes calls to
sys.stdout.write(), flush() and also sets the softspace variable (to
determine when to create that all-important additional space). Now,
that's the source of problem. To be specific, the problem is that you
don't want to reload the sys module on every request. You don't want to
reload anything on every request, which leaves us with a question of how
to share the one and only sys.stdout across many requests, and the
requests might occur concurrently. We certainly don't want the output of
one response to go to another!
So, here's how we currently solve the problem. We replace sys.stdout
with either an instance of spyceUtil.NonThreadedWriter or
spyceUtil.ThreadedWriter depending on which concurrency mode you choose.
The NonThreaderWriter simply takes all write, flush, close requests and
forwards them on to the underlying stream. There's actually not even a
function call, because we replace the write() function of ThreadWriter
with the write() function of the underlying output object. You can call
setObject() on either of the writers to set this output object, and
that's exactly what the response module does before and after each
request. This is setup is pretty efficient, but apparently not efficient
enough for you. ;) (see below for why and for improvements)
However, the real inefficiency (which is unavoidable) comes in if you
use the ThreadedWriter. This implementation performs a dictionary lookup
using the current thread as a key to find the appropriate underlying
object to use. The hope is that you make up for this inefficiency by
actually using the hardware parallelism available. If you don't have
hardware parallelism to exploit, then just don't use the threaded
engine. There's no alternative to the dictionary lookup (within pure
Python, currently, at least), because 'print' always goes for that
common sys.stdout hook, and that's shared across the threads. We need
this lookup to get a thread-local output.
Now, before I go on to discuss performance improvements within the Spyce
engine, there are two things that you can do improve the performance of
your script without any changes to Spyce. The first is not to call print
so many times. It's not characteristic of real-world operation. You're
stressing the I/O path far too much in this benchmark. The I/O path not
only has to go through the Spyce magic, but it also needs to go through
mod_python to get back to Apache. Clearly, PHP has an advantage here,
whether or not it is compiling the scripts a priori. So, try to write()
larger strings and the performance gap should shrink. The second way to
improve performance without any change to Spyce is to use
response.write('x '), instead of print 'x',. The former goes directly
for the response object, rather than via the NonThreadedWriter sitting
at sys.stdout to the response object. That's not going to cost you much
in the single-threaded case. More importantly, you skip all the
'softspace' logic that 'print' performs. 'print' is not only making two
calls to the underlying output stream for every print (one for the
string and one for the space), but also doing a get and a set on the
softspace field in the underlying output object. If you have a look at
NonThreadedWriter, you'll see that I go great pains to make this as
efficient as possible. But, still there's an overhead. And, that's where
your additional time is being spent... on a string comparison.
I spent some time looking at this problem, and realised that we can do
away with the NonThreadedWriter entirely in the single threaded case.
(In the threaded case, nothing changes.) We can simply *directly*
replace the sys.stdout with the internal response object by making the
Spyce server a tad smarter. (Before, the core Spyce server code treated
the threaded and non-threaded cases the same way and polymorphism took
care of the rest.) I have coded up this change, and tested it on three
small scripts, which I think are similar to yours in their effect on the
Spyce I/O runtime.
The first test is the standard hello.spy. (for comparison)
The second test is:
for i in range(1000):
The third test is:
for i in range(1000):
Note that, in test3.spy, you can replace response with sys.stdout, if
you are more comfortable with that, and the results do not change.
Here are the performance numbers against SpyceWWW using Spyce v1.3.12
and the new implementation. I measured using 'ab' with 500 requests on
localhost. hello.spy is provided only so that you can normalize and
compare against results on your machine. It is clearly doing far less
work than the other two tests.
req/sec t/req(s) req/sec t/req(s)
hello.spy 204.98 4.878 206.86 4.834 (basically equal)
test2.spy 7.51 133.185 22.84 43.784 (3x faster)
test3.spy 61.55 16.247 68.49 14.601 (basically equal)
Ok, so your test looks most like test2.spy. And, test2.spy is where we
are seeing all the performance gain, because we eliminate the string
comparisons for all the 'softspace' operations. In other words, you
should see a three-fold performance improvement for your test (if I
understood it correctly from your description). However, note how much
faster test3.spy is, b/c it basically performs half the number of I/O
operations, does not do any 'softspace' operations, and calls the
response object directly. That does not mean you should not use 'print'.
Just don't exercise the output path one character at a time. That's a
bad idea in C. It's an even worse idea in a scripting language. It kills
your throughput, which is what you are measuring.
Now, onto your second test. The 2-seconds of "think time" is simple to
explain. Part of it comes from loading up the Spyce engine into the
Python runtime. You also need to initialize the Spyce engine, which may
include reading a configuration file. You have to *look* for the config
file, at the very least. There are other, similar one-time operations.
Finally, you need to compile the script. The script compilation is not
terribly efficient, and it is proportional to the size of the script.
You have a 20KB script, so that will take a bit of time. As a
comparison, the docs/doc-single.spy file is 178KB and takes around 4.3
second to compile and process on my machine. (It's actually processed
twice, to generate the table of contents.) Also, it is considerably
faster than JSP compilers that I have seen! And, it only occurs once, if
you've set up Spyce correctly. Finally, you run the script, and that
takes some time. Under mod_python you *only run* the script. The
compiled script is cached in memory and does not get recompiled on each
hit. If you can't run under mod_python, or some other long-lived server
mode, then you can use a disk-based cache for your compiled scripts.
Refer to the configuration section in the manual on how to do this.
I hope that helps. The changes to improve single-threaded 'print'
performance are in the CVS. Please have a look at them and let me know
what you think. I will release once these changes have been tested under
multiple configurations. Thanks for the feedback.
All the best,
On Sun, 11 Jan 2004, Klaus Reimer wrote:
>The statement is "Spyce is fast enough" but I could not resist to do
>some benchmarking myself and I have the odd feeling, that I'm doing
>something wrong because I can't believe that Spyce is THAT slow. I'll
>explain, what I've done in hope you can tell me what is wrong:
>1. I'm using a standard Debian Woody installation (apache 1.3.26,
>mod_python 2.7.8, python 2.1.3) and installed the latest Spyce on it.
>I've added Space to the httpd.conf in that way:
>AddHandler python-program .spy
>Requesting a .spy file results in the header line "X-Spyce:
>Spyce/modpy_1.3.12 Python/2.1" so I think it's correctly installed with
>mod_python which should give me maximum performance.
>My server is a Pentium Celeron 700 MHz. I've used "ab" for benchmarking
>and fired 1000 Requests.
>My first benchmark was the same test-program which was used for
>benchmarking on spyce.sourceforge.net. PHP (4.1.2) gives me 140
>Requests/sec, Spyce gives my 60 Requests/s. This is a little bit slower
>than the benchmark on the spyce website, but it's ok.
>Then I started a second benchmark to prove that spyce will be faster in
>comparison to PHP if the scripts are larger and there are some include
>files (Because PHP does no compilation and no caching). So I created
>six scripts. Each script ist 20 KB large, contains 50% comments and the
>other 50% are print/echo-Commands which are just outputing one
>character per line (This is not a real application, I know). The main
>script includes the five other scripts. PHP responds with 35 Requests
>per second. Spyce repsonds with just 5 Requests. But a lot more
>disturbing is the fact, that the CPU usage is at 100% during the
>Spyce-benchmark. The CPU usage during the PHP benchmark is at normal
>level (ca. 2%). What the hell is going on here?
>Up to now I'm using python on the command line and never experienced
>bad performance so I tried another thing: Comparing Spyce with Python
>on the command line. I changed the 5 include files to normal python
>modules and created a similiar main python file which does the same
>thing as the main spy-file. Then I compared these two commands:
> # python index.py
> # /usr/share/spyce/run_spyceCmd.py index.spy
>The first command (python) is responding immediately. "time" says:
>But the second command is just doing nothing for 2 seconds before the output
>comes. "time" says:
>Any ideas what I'm doing wrong? What is Spyce doing in these 2 seconds
>eating up my CPU power? Loading all the spyce libraries (Which are
>already compiled in .pyo files)? Is this also done during each
>I know my python version is pretty old, I also played a little bit with
>apache2, the latest mod_python and python 2.2, but the performance
>"feels" the same (not benchmarked it).