From: Andrew S. <str...@as...> - 2004-06-19 00:49:41
|
I am trying to save realtime video (640x480x100 fps uint8 grayscale data) using PyTables for scientific purposes (lossy compression is bad). So, first off, is using PyTables for this task a reasonable idea? Although I'm no expert, it seems the compression algorithms that PyTables offers may be ideal. It also may be nice to use HDF5 to incorporate some data. At the moment, however, I'm stymied by slow write speeds, and I seek suggestions on how to speed this up. I need to get this working approximately 10x faster to be viable. At first pass, I've started with code like this: self.fileh = tables.openFile(filename, mode="w", title="Raw camera stream") root = self.fileh.root a = tables.UInt8Atom((self.cam_height,self.cam_width,0)) filter_args = dict( complevel = 0, complib = 'ucl', ) self.hdfarray = self.fileh.createEArray(root, 'images', a, "Unsigned byte array", tables.Filters(**filter_args)) while 1: # other stuff that fills self.grabbed_frames n_frames = len(self.grabbed_frames) # set to rank 3 def add_dim(f): f.shape = (self.cam_height,self.cam_width,1) map( add_dim, self.grabbed_frames) frames = na.concatenate( self.grabbed_frames, axis=2) print frames.shape self.hdfarray.append(frames) Using this code, I get approximately 4 MB/sec with no compression, and MB/sec with complevel=1 UCL. This is with an XFS filesystem on linux kernel 2.6.6 using a SerialATA drive which benchmarks writing at 50 MB/sec using iozone. So, are there any suggestions for getting this to run faster? Cheers! Andrew |
From: Francesc A. <fa...@py...> - 2004-06-21 08:19:13
|
A Dissabte 19 Juny 2004 02:49, Andrew Straw va escriure: > I am trying to save realtime video (640x480x100 fps uint8 grayscale > data) using PyTables for scientific purposes (lossy compression is > bad). So, first off, is using PyTables for this task a reasonable > idea? Although I'm no expert, it seems the compression algorithms that > PyTables offers may be ideal. It also may be nice to use HDF5 to > incorporate some data. I've never thought in such an application for PyTables, but I think that for your case (provided that you can't afford lossing information) maybe just fine. > Using this code, I get approximately 4 MB/sec with no compression, and > MB/sec with complevel=1 UCL. This is with an XFS filesystem on linux Mmm... How much using UCL?. Anyway, you may want to try LZO and ZLIB (with different compression levels) as well in order to see if this improve the speed. > So, are there any suggestions for getting this to run faster? A couple: 1.- Ensure that your bottleneck is really the call to .append() method by commenting it out and doing timings again. 2.- EArray.append() method do many checks so as to ensure that you pass an object compatible with the EArray being saved. If you are going to pass a *NumArray* object that you are sure it's compliant with the underlying EArray shape, you can save quite time by calling to the ._append(numarrayObject) instead of .append(numarrayObject). If suggestion 2 is not enough (although I'd doubt it), things can be further speeded-up by optimizing the number of calls to the underlying HDF5 library. However, this must be regarded as a commercial service only (but you can always do it by yourself, of course!). Cheers, -- Francesc Alted |
From: Andrew S. <str...@as...> - 2004-06-22 00:04:25
|
Francesc Alted wrote: >A Dissabte 19 Juny 2004 02:49, Andrew Straw va escriure: > > >>I am trying to save realtime video (640x480x100 fps uint8 grayscale >>data) using PyTables for scientific purposes (lossy compression is >>bad). So, first off, is using PyTables for this task a reasonable >>idea? Although I'm no expert, it seems the compression algorithms that >>PyTables offers may be ideal. It also may be nice to use HDF5 to >>incorporate some data. >> >> > >I've never thought in such an application for PyTables, but I think that for >your case (provided that you can't afford lossing information) maybe just >fine. > > > >>Using this code, I get approximately 4 MB/sec with no compression, and >>MB/sec with complevel=1 UCL. This is with an XFS filesystem on linux >> >> > >Mmm... How much using UCL?. Anyway, you may want to try LZO and ZLIB (with >different compression levels) as well in order to see if this improve the >speed. > > > Sorry, the process never completed while writing the email. Playing around with hdparm, I can now get ~6.5 MB/sec. With no compression, and UCL, LZO, and zlib all reduce that rate. >>So, are there any suggestions for getting this to run faster? >> >> > >A couple: > >1.- Ensure that your bottleneck is really the call to .append() method by >commenting it out and doing timings again. > > Actually, I'm timing purely the call to .append(), which often takes seconds. >2.- EArray.append() method do many checks so as to ensure that you pass an >object compatible with the EArray being saved. If you are going to pass a >*NumArray* object that you are sure it's compliant with the underlying >EArray shape, you can save quite time by calling to the >._append(numarrayObject) instead of .append(numarrayObject). > >If suggestion 2 is not enough (although I'd doubt it), things can be further >speeded-up by optimizing the number of calls to the underlying HDF5 library. >However, this must be regarded as a commercial service only (but you can >always do it by yourself, of course!). > > That does help a little... Anyhow, I think using PyTables/HDF5 is too slow for this task -- I can easily save at ~50 MB/sec using .write on simple File objects. So I'll use that for now. Finally, as a suggestion, you may want to incorporate the following code into the C source for PyTables, which will allow other Python threads to continue running when performing long-running HDF5 tasks. See http://docs.python.org/api/threads.html for more information. PyThreadState *_save; _save = PyEval_SaveThread(); /* Do work accessing HDF5 library which does not touch the Python API */ PyEval_RestoreThread(_save); (The file_write function in Objects/fileobject.c in the Python sourcecode uses the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS macros to acheive to do the same thing, but do to the error handling of PyTables, you'll probably need 2 copies of "PyEval_RestoreThread(_save)": one in the normal return path and one in the error handling path.) |
From: Francesc A. <fa...@py...> - 2004-06-22 10:47:20
Attachments:
bench-echunk.py
|
A Dimarts 22 Juny 2004 02:04, Andrew Straw va escriure: > Sorry, the process never completed while writing the email. Playing > around with hdparm, I can now get ~6.5 MB/sec. With no compression, and > UCL, LZO, and zlib all reduce that rate. Mmm, just out of curiosity, I run some benchmarks simulating your scenario (see attachment). After some runs I can reproduce your numbers (I can get up to 5.5 MB/s on my laptop, a P4 Mobile @2 GHz, with a hard disk spinning at 4200 RPM, with a maximum throughput of 8 MB/s during writings). However, using ._append() instead of .append() *do* helped a lot in my case, improving the output from 2.8 MB/s to 5.5 MB/s (maybe this is because you have a faster CPU). These figures has been collected without compression. As I'm doing tests with a very slow hard disk, I used very repetitive data (all zeros) to bypass the bottleneck, but the results are barely the same. So, the bottleneck seems to be in the I/O calls, indeed. In order to determine if the problem was PyTables or the HDF5 layer, I used a small C program that opens the EArray only once, write all the data, and then close the array (PyTables, on its hand, always open and close on every append() operation). With that, I was able to achieve 7.7 MB/s, so very close of the writing limits of my disk. When using compression (zlib, complevel=1) and shuffling, however, I was able to achieve 22 MB/s. So, perhaps it would be feasible to reach 30 MB/s or more without using compression by using this kind of optimized writing on a system that supports faster writing speeds, like yours. So, most probably HDF5 would be able to achieve the speed that you need. PyTables is quite slower because of the way that it does I/O (i.e. opening and closing the EArray object on every append). Of course, as I said you in my firts message, that would be speeded-up by writing a specialized method that would open first the object, write frame objects and close at the end. > >If suggestion 2 is not enough (although I'd doubt it), things can be further ^^^^^^^^^^^^^^^^^^^^^ Ooops, I have a mouth too big ;) > Finally, as a suggestion, you may want to incorporate the following code > into the C source for PyTables, which will allow other Python threads to > continue running when performing long-running HDF5 tasks. See > http://docs.python.org/api/threads.html for more information. > > PyThreadState *_save; > _save = PyEval_SaveThread(); > /* Do work accessing HDF5 library which does not touch the Python API */ > PyEval_RestoreThread(_save); > > (The file_write function in Objects/fileobject.c in the Python > sourcecode uses the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS > macros to acheive to do the same thing, but do to the error handling of > PyTables, you'll probably need 2 copies of > "PyEval_RestoreThread(_save)": one in the normal return path and one in > the error handling path.) Ok. Thanks for the suggestion. This is very interesting indeed :) Cheers, -- Francesc Alted |
From: Francesc A. <fa...@py...> - 2004-06-22 11:28:07
|
A Dimarts 22 Juny 2004 12:47, Francesc Alted va escriure: > As I'm doing tests with a very slow hard disk, I used very repetitive data > (all zeros) to bypass the bottleneck, but the results are barely the same. > So, the bottleneck seems to be in the I/O calls, indeed. Ops, I forgot to say that this is using compression. > In order to determine if the problem was PyTables or the HDF5 layer, I used > a small C program that opens the EArray only once, write all the data, and > then close the array (PyTables, on its hand, always open and close on every > append() operation). With that, I was able to achieve 7.7 MB/s, so very > close of the writing limits of my disk. When using compression (zlib, > complevel=1) and shuffling, however, I was able to achieve 22 MB/s. So, > perhaps it would be feasible to reach 30 MB/s or more without using > compression by using this kind of optimized writing on a system that > supports faster writing speeds, like yours. An small update: I re-made this C benchmark, but using only the zlib compressor (i.e. whitout shuffling) and setting all data to zeros, and I've obtained 33 MB/s. Without compression, that figure may perfectly grow up to 40 MB/s (whenever the hard disk would support this throughput, of course). -- Francesc Alted |
From: Francesc A. <fa...@py...> - 2004-06-23 12:28:03
|
A Dimarts 22 Juny 2004 13:27, Francesc Alted va escriure: > An small update: I re-made this C benchmark, but using only the zlib > compressor (i.e. whitout shuffling) and setting all data to zeros, and I've > obtained 33 MB/s. Without compression, that figure may perfectly grow up to > 40 MB/s (whenever the hard disk would support this throughput, of course). More updates ;). This morning I remembered that Table objects has a much more efficient interface for writing than EArrays (just because I've spend more time on optimizing Tables than anything else), and besides, I've recently reworked the algorithm to compute buffer sizes for Tables, in order to make them still faster. And the good news is that all of this largely address to this problem :) So, if you use the lastest CVS and use a Table instead of an EArray, this small script would be far more efficient than the equivalent using EArrays: ***************************************************** import tables import numarray as na class Test(tables.IsDescription): var1 = tables.UInt8Col(shape=(640,480)) nframes = 200 filename = "data.nobackup/test2.h5" fileh = tables.openFile(filename, mode="w", title="Raw camera stream") root = fileh.root filter_args = dict( complevel = 1, complib = 'lzo', shuffle=0) hdftable = fileh.createTable(root,'images', Test, "Unsigned byte table", tables.Filters(**filter_args), expectedrows=nframes) frame = na.zeros(type="UInt8", shape=(640,480)) for i in range(nframes): hdftable.row["var1"] = frame hdftable.row.append() fileh.close() ***************************************************** With it, I was able to save frames at a speed of 46.2 MB/s without compression (this script generates a 60 MB file, so that it fits well on my laptop cache). Using ZLIB, I've got 36.4 MB/s, with LZO compression 54.5 MB/s and with UCL it drops down to 8.0 MB/s. I was curious about how much memory would take a long run, and I made a test with 20000 frames for a total dataset size of 6 GB. I've used LZO compressor in order to keep the file size small. With that, the run took 1m23s, for a total throughput of more than 70 MB/s. And the process took 16 MB of memory during the run, which is quite reasonable. However, there seems to be a "small" memory leak that develops at a rate of 3 KB/frame. Whether this is acceptable or not is up to you. Cheers, -- Francesc Alted |