From: Andrew S. <str...@as...> - 2004-12-15 17:31:19
|
On Dec 15, 2004, at 5:53 AM, Francesc Altet wrote: > Hi Andrew, > > A Dimecres 15 Desembre 2004 03:49, Andrew Straw va escriure: >> I enclose a small patch (against CVS HEAD) which demonstrates how to >> release the GIL during potentially long operations. > The patched version is already in CVS HEAD. Super! >> For example purposes, I've only tracked down this single operation. >> If >> you'd like any more explanation/justification/etc., please let me know >> and I'll do my best. > > I think it would be interesting if you can provide some (small) > example on > how you can speed-up your I/O by using threading. I can manage to put > that > example (or similar) in the chapter of "Optimization Tips" in PyTables > manual. By the way, do you think that doing multithreading on a > single-CPU > machine would made some processes more efficient? I mean, if you have a > thread for reading and other for processing the read data, perhaps you > can > get some speed-up (while I doubt it, that would be just great). I will endeavor to write a demo WhenIHaveFreeTime (DubiousWiki link ;). It may be more clear to say this patch improves "interactivity" rather than "performance". (Even on single CPU systems.) To explain, imagine a case where the program must A) acquire data from an instrument and provide a low latency data stream to something else, (e.g. via ethernet) and B) save all this data to a file. (One could further elaborate by including a discussion of a GUI, but that's extraneous to the fundamental point.) Even if task A is performed in a thread separate from the thread for task B, the disk writing will block A's thread (unless the GIL is released), resulting in possible dropped data and unnecessary latency increases. Because writing to disk takes a potentially long time to complete but doesn't require the Python interpreter, it's better to avoid this situation and let A continue while B is ongoing by releasing the GIL in B's thread. All of this holds true for a single CPU system because often the blocking involved with saving to disk is not related to CPU usage but things like waiting for the drive (disk access is usually I/O bound, not CPU bound). In fact, Python's GIL is a major obstacle to writing multi-threaded programs that make use of multi-CPUs, so I envision this patch having perhaps more significant impact on single CPU machines. Note that pytable's realtime compression does make "disk access" much more CPU bound. Thus, releasing the GIL may not result in speed increases on single CPUs with compression turned on. I haven't dug into the pytables internals very much, but if the (de)compression can be done after releasing the GIL, I predict this would improve performance on a multi-CPU system. In that case, one CPU can busy itself with compressing data, while another CPU can perform other tasks. So, I hope this example is relatively clear. As I said, I will endeavor to write a simple (as possible) example. Thanks again for PyTables! Cheers! Andrew |