Re: [Pytables-users] threading/GIL

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Dec 15, 2004, at 5:53 AM, Francesc Altet wrote:

> Hi Andrew,
>
> A Dimecres 15 Desembre 2004 03:49, Andrew Straw va escriure:
>> I enclose a small patch (against CVS HEAD) which demonstrates how to
>> release the GIL during potentially long operations.

> The patched version is already in CVS HEAD.

Super!

>> For example purposes, I've only tracked down this single operation.  
>> If
>> you'd like any more explanation/justification/etc., please let me know
>> and I'll do my best.
>
> I think it would be interesting if you can provide some (small) 
> example on
> how you can speed-up your I/O by using threading. I can manage to put 
> that
> example (or similar) in the chapter of "Optimization Tips" in PyTables
> manual. By the way, do you think that doing multithreading on a 
> single-CPU
> machine would made some processes more efficient? I mean, if you have a
> thread for reading and other for processing the read data, perhaps you 
> can
> get some speed-up (while I doubt it, that would be just great).

I will endeavor to write a demo WhenIHaveFreeTime (DubiousWiki link ;).

It may be more clear to say this patch improves "interactivity" rather 
than "performance".  (Even on single CPU systems.) To explain, imagine 
a case where the program must A) acquire data from an instrument and 
provide a low latency data stream to something else, (e.g. via 
ethernet) and B) save all this data to a file.  (One could further 
elaborate by including a discussion of a GUI, but that's extraneous to 
the fundamental point.)

Even if task A is performed in a thread separate from the thread for 
task B, the disk writing will block A's thread (unless the GIL is 
released), resulting in possible dropped data and unnecessary latency 
increases.  Because writing to disk takes a potentially long time to 
complete but doesn't require the Python interpreter, it's better to 
avoid this situation and let A continue while B is ongoing by releasing 
the GIL in B's thread.  All of this holds true for a single CPU system 
because often the blocking involved with saving to disk is not related 
to CPU usage but things like waiting for the drive (disk access is 
usually I/O bound, not CPU bound).  In fact, Python's GIL is a major 
obstacle to writing multi-threaded programs that make use of 
multi-CPUs, so I envision this patch having perhaps more significant 
impact on single CPU machines.

Note that pytable's realtime compression does make "disk access" much 
more CPU bound.  Thus, releasing the GIL may not result in speed 
increases on single CPUs with compression turned on.  I haven't dug 
into the pytables internals very much, but if the (de)compression can 
be done after releasing the GIL, I predict this would improve 
performance on a multi-CPU system.  In that case, one CPU can busy 
itself with compressing data, while another CPU can perform other 
tasks.

So, I hope this example is relatively clear. As I said, I will endeavor 
to write a simple (as possible) example.

Thanks again for PyTables!

Cheers!
Andrew