From: Andrew S. <str...@as...> - 2004-06-22 00:04:25
|
Francesc Alted wrote: >A Dissabte 19 Juny 2004 02:49, Andrew Straw va escriure: > > >>I am trying to save realtime video (640x480x100 fps uint8 grayscale >>data) using PyTables for scientific purposes (lossy compression is >>bad). So, first off, is using PyTables for this task a reasonable >>idea? Although I'm no expert, it seems the compression algorithms that >>PyTables offers may be ideal. It also may be nice to use HDF5 to >>incorporate some data. >> >> > >I've never thought in such an application for PyTables, but I think that for >your case (provided that you can't afford lossing information) maybe just >fine. > > > >>Using this code, I get approximately 4 MB/sec with no compression, and >>MB/sec with complevel=1 UCL. This is with an XFS filesystem on linux >> >> > >Mmm... How much using UCL?. Anyway, you may want to try LZO and ZLIB (with >different compression levels) as well in order to see if this improve the >speed. > > > Sorry, the process never completed while writing the email. Playing around with hdparm, I can now get ~6.5 MB/sec. With no compression, and UCL, LZO, and zlib all reduce that rate. >>So, are there any suggestions for getting this to run faster? >> >> > >A couple: > >1.- Ensure that your bottleneck is really the call to .append() method by >commenting it out and doing timings again. > > Actually, I'm timing purely the call to .append(), which often takes seconds. >2.- EArray.append() method do many checks so as to ensure that you pass an >object compatible with the EArray being saved. If you are going to pass a >*NumArray* object that you are sure it's compliant with the underlying >EArray shape, you can save quite time by calling to the >._append(numarrayObject) instead of .append(numarrayObject). > >If suggestion 2 is not enough (although I'd doubt it), things can be further >speeded-up by optimizing the number of calls to the underlying HDF5 library. >However, this must be regarded as a commercial service only (but you can >always do it by yourself, of course!). > > That does help a little... Anyhow, I think using PyTables/HDF5 is too slow for this task -- I can easily save at ~50 MB/sec using .write on simple File objects. So I'll use that for now. Finally, as a suggestion, you may want to incorporate the following code into the C source for PyTables, which will allow other Python threads to continue running when performing long-running HDF5 tasks. See http://docs.python.org/api/threads.html for more information. PyThreadState *_save; _save = PyEval_SaveThread(); /* Do work accessing HDF5 library which does not touch the Python API */ PyEval_RestoreThread(_save); (The file_write function in Objects/fileobject.c in the Python sourcecode uses the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros to acheive to do the same thing, but do to the error handling of PyTables, you'll probably need 2 copies of "PyEval_RestoreThread(_save)": one in the normal return path and one in the error handling path.) |