From: Andrew S. <str...@as...> - 2004-12-15 02:49:15
|
Dear Francesc (and others), I enclose a small patch (against CVS HEAD) which demonstrates how to release the GIL during potentially long operations. I think all potentially long operations (basically most of the C H5* functions) should be bracketed by these statements. For example purposes, I've only tracked down this single operation. If you'd like any more explanation/justification/etc., please let me know and I'll do my best. It's worth checking the Pyrex-outputted C file to ensure that there are no calls to the C Python API between the BEGIN_ALLOW_THREADS and the END_ALLOW_THREADS. I did this for this example. Other cases may require that the C return value (e.g. an integer) is stored in an intermediate variable (as in this case) so that the GIL can be acquired again before raising a Python exception upon error return from C. Hoping this (and similar for other potentially long operations) will make it into PyTables, Andrew RCS file: /cvsroot/pytables/pytables/src/hdf5Extension.pyx,v retrieving revision 1.150 diff -c -r1.150 hdf5Extension.pyx *** src/hdf5Extension.pyx 9 Dec 2004 13:01:58 -0000 1.150 --- src/hdf5Extension.pyx 15 Dec 2004 02:35:44 -0000 *************** *** 116,121 **** --- 116,125 ---- char *PyString_AsString(object string) object PyString_FromString(char *) + # To release global interpreter lock (GIL) for threading + void Py_BEGIN_ALLOW_THREADS() + void Py_END_ALLOW_THREADS() + # To access to str and tuple structures. This does not work with Pyrex 0.8 # This is not necessary, though # ctypedef class __builtin__.str [object PyStringObject]: *************** *** 1658,1666 **** --- 1662,1677 ---- if not self._open: self._open_append(recarr) + # release GIL (allow other threads to use the Python interpreter) + Py_BEGIN_ALLOW_THREADS + # Append the records: ret = H5TBOappend_records(&self.dataset_id, &self.mem_type_id, nrecords, self.totalrecords, self.rbuf) + + # acquire GIL (disallow other threads from using the Python interpreter) + Py_END_ALLOW_THREADS + if ret < 0: raise RuntimeError("Problems appending the records.") |
From: Francesc A. <fa...@ca...> - 2004-12-15 13:53:44
|
Hi Andrew, A Dimecres 15 Desembre 2004 03:49, Andrew Straw va escriure: > I enclose a small patch (against CVS HEAD) which demonstrates how to=20 > release the GIL during potentially long operations. I think all=20 > potentially long operations (basically most of the C H5* functions)=20 > should be bracketed by these statements. Done. I've applied your patch, and besides, looked at the places where potentially long operations would happen and backeted them by BEGIN_ALLOW_THREADS/END_ALLOW_THREADS. After some work (some variables passed to the C functions were Python objects), all the tests units pass fine. The patched version is already in CVS HEAD. > For example purposes, I've only tracked down this single operation. If=20 > you'd like any more explanation/justification/etc., please let me know=20 > and I'll do my best. I think it would be interesting if you can provide some (small) example on how you can speed-up your I/O by using threading. I can manage to put that example (or similar) in the chapter of "Optimization Tips" in PyTables manual. By the way, do you think that doing multithreading on a single-CPU machine would made some processes more efficient? I mean, if you have a thread for reading and other for processing the read data, perhaps you can get some speed-up (while I doubt it, that would be just great). Thanks for your contribution! =2D-=20 =46rancesc Altet Who's your data daddy? =A0PyTables |
From: Andrew S. <str...@as...> - 2004-12-15 17:31:19
|
On Dec 15, 2004, at 5:53 AM, Francesc Altet wrote: > Hi Andrew, > > A Dimecres 15 Desembre 2004 03:49, Andrew Straw va escriure: >> I enclose a small patch (against CVS HEAD) which demonstrates how to >> release the GIL during potentially long operations. > The patched version is already in CVS HEAD. Super! >> For example purposes, I've only tracked down this single operation. >> If >> you'd like any more explanation/justification/etc., please let me know >> and I'll do my best. > > I think it would be interesting if you can provide some (small) > example on > how you can speed-up your I/O by using threading. I can manage to put > that > example (or similar) in the chapter of "Optimization Tips" in PyTables > manual. By the way, do you think that doing multithreading on a > single-CPU > machine would made some processes more efficient? I mean, if you have a > thread for reading and other for processing the read data, perhaps you > can > get some speed-up (while I doubt it, that would be just great). I will endeavor to write a demo WhenIHaveFreeTime (DubiousWiki link ;). It may be more clear to say this patch improves "interactivity" rather than "performance". (Even on single CPU systems.) To explain, imagine a case where the program must A) acquire data from an instrument and provide a low latency data stream to something else, (e.g. via ethernet) and B) save all this data to a file. (One could further elaborate by including a discussion of a GUI, but that's extraneous to the fundamental point.) Even if task A is performed in a thread separate from the thread for task B, the disk writing will block A's thread (unless the GIL is released), resulting in possible dropped data and unnecessary latency increases. Because writing to disk takes a potentially long time to complete but doesn't require the Python interpreter, it's better to avoid this situation and let A continue while B is ongoing by releasing the GIL in B's thread. All of this holds true for a single CPU system because often the blocking involved with saving to disk is not related to CPU usage but things like waiting for the drive (disk access is usually I/O bound, not CPU bound). In fact, Python's GIL is a major obstacle to writing multi-threaded programs that make use of multi-CPUs, so I envision this patch having perhaps more significant impact on single CPU machines. Note that pytable's realtime compression does make "disk access" much more CPU bound. Thus, releasing the GIL may not result in speed increases on single CPUs with compression turned on. I haven't dug into the pytables internals very much, but if the (de)compression can be done after releasing the GIL, I predict this would improve performance on a multi-CPU system. In that case, one CPU can busy itself with compressing data, while another CPU can perform other tasks. So, I hope this example is relatively clear. As I said, I will endeavor to write a simple (as possible) example. Thanks again for PyTables! Cheers! Andrew |
From: Francesc A. <fa...@ca...> - 2004-12-15 18:43:51
|
A Dimecres 15 Desembre 2004 18:31, Andrew Straw va escriure: > It may be more clear to say this patch improves "interactivity" rather=20 > than "performance". (Even on single CPU systems.) To explain, imagine=20 [...] > usually I/O bound, not CPU bound). In fact, Python's GIL is a major=20 > obstacle to writing multi-threaded programs that make use of=20 > multi-CPUs, so I envision this patch having perhaps more significant=20 > impact on single CPU machines. Aha!, very good explanation. That's been very interesting to know about. > Note that pytable's realtime compression does make "disk access" much=20 > more CPU bound. Thus, releasing the GIL may not result in speed=20 > increases on single CPUs with compression turned on. I haven't dug=20 > into the pytables internals very much, but if the (de)compression can=20 > be done after releasing the GIL, I predict this would improve=20 > performance on a multi-CPU system. In that case, one CPU can busy=20 > itself with compressing data, while another CPU can perform other=20 > tasks. I'm afraid that compression/uncompression is driven by HDF5 itself, so, with the actual patch it will happen in the same thread as the one doing the real I/O. However, as you already said, pytables online compression is more "disk access" bound than CPU bound, specially when fast compressors (i.e. LZO) or fast CPU's are used. With the advent of newer CPUs the bottleneck will be more and more on "disk access", so it may not be worth the effort trying to make pure I/O and compression happen in different threads. > So, I hope this example is relatively clear. As I said, I will endeavor=20 > to write a simple (as possible) example. That would be great. Thanks! =2D-=20 =46rancesc Altet Who's your data daddy? =A0PyTables |
From: Andrew S. <str...@as...> - 2004-12-15 19:08:40
|
>I'm afraid that compression/uncompression is driven by HDF5 itself, so, with >the actual patch it will happen in the same thread as the one doing the real >I/O. > This is actually good news from a multi-CPU system (or perhaps even one with HyperThreading) perspective. It means that substantial CPU processing can simultaneously occur in two threads because one of them (the thread doing compression) does not need the Python GIL, thus allowing the other thread to continue along in Python. |