On Dec 15, 2004, at 5:53 AM, Francesc Altet wrote:
> Hi Andrew,
> A Dimecres 15 Desembre 2004 03:49, Andrew Straw va escriure:
>> I enclose a small patch (against CVS HEAD) which demonstrates how to
>> release the GIL during potentially long operations.
> The patched version is already in CVS HEAD.
>> For example purposes, I've only tracked down this single operation.
>> you'd like any more explanation/justification/etc., please let me know
>> and I'll do my best.
> I think it would be interesting if you can provide some (small)
> example on
> how you can speed-up your I/O by using threading. I can manage to put
> example (or similar) in the chapter of "Optimization Tips" in PyTables
> manual. By the way, do you think that doing multithreading on a
> machine would made some processes more efficient? I mean, if you have a
> thread for reading and other for processing the read data, perhaps you
> get some speed-up (while I doubt it, that would be just great).
I will endeavor to write a demo WhenIHaveFreeTime (DubiousWiki link ;).
It may be more clear to say this patch improves "interactivity" rather
than "performance". (Even on single CPU systems.) To explain, imagine
a case where the program must A) acquire data from an instrument and
provide a low latency data stream to something else, (e.g. via
ethernet) and B) save all this data to a file. (One could further
elaborate by including a discussion of a GUI, but that's extraneous to
the fundamental point.)
Even if task A is performed in a thread separate from the thread for
task B, the disk writing will block A's thread (unless the GIL is
released), resulting in possible dropped data and unnecessary latency
increases. Because writing to disk takes a potentially long time to
complete but doesn't require the Python interpreter, it's better to
avoid this situation and let A continue while B is ongoing by releasing
the GIL in B's thread. All of this holds true for a single CPU system
because often the blocking involved with saving to disk is not related
to CPU usage but things like waiting for the drive (disk access is
usually I/O bound, not CPU bound). In fact, Python's GIL is a major
obstacle to writing multi-threaded programs that make use of
multi-CPUs, so I envision this patch having perhaps more significant
impact on single CPU machines.
Note that pytable's realtime compression does make "disk access" much
more CPU bound. Thus, releasing the GIL may not result in speed
increases on single CPUs with compression turned on. I haven't dug
into the pytables internals very much, but if the (de)compression can
be done after releasing the GIL, I predict this would improve
performance on a multi-CPU system. In that case, one CPU can busy
itself with compressing data, while another CPU can perform other
So, I hope this example is relatively clear. As I said, I will endeavor
to write a simple (as possible) example.
Thanks again for PyTables!