Thread: [Pytables-users] using pytables for saving realtime video?

Brought to you by: a_valentino, falted, ivilata, joshmoore

pytables-users

[Pytables-users] using pytables for saving realtime video?

From: Andrew S. <str...@as...> - 2004-06-19 00:49:41

I am trying to save realtime video (640x480x100 fps uint8 grayscale 
data) using PyTables for scientific purposes (lossy compression is 
bad).  So, first off, is using PyTables for this task a reasonable 
idea?  Although I'm no expert, it seems the compression algorithms that 
PyTables offers may be ideal.  It also may be nice to use HDF5 to 
incorporate some data.

At the moment, however, I'm stymied by slow write speeds, and I seek 
suggestions on how to speed this up.  I need to get this working 
approximately 10x faster to be viable.

At first pass, I've started with code like this:

        self.fileh = tables.openFile(filename,
                                     mode="w",
                                     title="Raw camera stream")
        root = self.fileh.root
        a = tables.UInt8Atom((self.cam_height,self.cam_width,0))
        filter_args = dict( complevel = 0,
                            complib = 'ucl',
                            )
        self.hdfarray = self.fileh.createEArray(root,
                                                'images', a,
                                                "Unsigned byte array",
                                                
tables.Filters(**filter_args))

while 1:
                    # other stuff that fills self.grabbed_frames

                    n_frames = len(self.grabbed_frames)
                    # set to rank 3
                    def add_dim(f):
                        f.shape = (self.cam_height,self.cam_width,1)
                    map( add_dim, self.grabbed_frames)
                    frames = na.concatenate( self.grabbed_frames, axis=2)
                    print frames.shape
                    self.hdfarray.append(frames)

Using this code, I get approximately 4 MB/sec with no compression, and 
MB/sec with complevel=1 UCL.  This is with an XFS filesystem on linux 
kernel 2.6.6 using a SerialATA drive which benchmarks writing at 50 
MB/sec using iozone.

So, are there any suggestions for getting this to run faster?

Cheers!
Andrew

Re: [Pytables-users] using pytables for saving realtime video?

From: Francesc A. <fa...@py...> - 2004-06-21 08:19:13

A Dissabte 19 Juny 2004 02:49, Andrew Straw va escriure:
> I am trying to save realtime video (640x480x100 fps uint8 grayscale 
> data) using PyTables for scientific purposes (lossy compression is 
> bad).  So, first off, is using PyTables for this task a reasonable 
> idea?  Although I'm no expert, it seems the compression algorithms that 
> PyTables offers may be ideal.  It also may be nice to use HDF5 to 
> incorporate some data.

I've never thought in such an application for PyTables, but I think that for
your case (provided that you can't afford lossing information) maybe just
fine.

> Using this code, I get approximately 4 MB/sec with no compression, and 
> MB/sec with complevel=1 UCL.  This is with an XFS filesystem on linux 

Mmm... How much using UCL?. Anyway, you may want to try LZO and ZLIB (with
different compression levels) as well in order to see if this improve the
speed.

> So, are there any suggestions for getting this to run faster?

A couple:

1.- Ensure that your bottleneck is really the call to .append() method by
commenting it out and doing timings again.

2.- EArray.append() method do many checks so as to ensure that you pass an
object compatible with the EArray being saved. If you are going to pass a
*NumArray* object that you are sure it's compliant with the underlying
EArray shape, you can save quite time by calling to the
._append(numarrayObject) instead of .append(numarrayObject).

If suggestion 2 is not enough (although I'd doubt it), things can be further
speeded-up by optimizing the number of calls to the underlying HDF5 library.
However, this must be regarded as a commercial service only (but you can
always do it by yourself, of course!).

Cheers,

-- 
Francesc Alted

Re: [Pytables-users] using pytables for saving realtime video?

From: Andrew S. <str...@as...> - 2004-06-22 00:04:25

Francesc Alted wrote:

>A Dissabte 19 Juny 2004 02:49, Andrew Straw va escriure:
>  
>
>>I am trying to save realtime video (640x480x100 fps uint8 grayscale 
>>data) using PyTables for scientific purposes (lossy compression is 
>>bad).  So, first off, is using PyTables for this task a reasonable 
>>idea?  Although I'm no expert, it seems the compression algorithms that 
>>PyTables offers may be ideal.  It also may be nice to use HDF5 to 
>>incorporate some data.
>>    
>>
>
>I've never thought in such an application for PyTables, but I think that for
>your case (provided that you can't afford lossing information) maybe just
>fine.
>
>  
>
>>Using this code, I get approximately 4 MB/sec with no compression, and 
>>MB/sec with complevel=1 UCL.  This is with an XFS filesystem on linux 
>>    
>>
>
>Mmm... How much using UCL?. Anyway, you may want to try LZO and ZLIB (with
>different compression levels) as well in order to see if this improve the
>speed.
>
>  
>
Sorry, the process never completed while writing the email.   Playing 
around with hdparm, I can now get ~6.5 MB/sec. With no compression, and 
UCL, LZO, and zlib all reduce that rate.

>>So, are there any suggestions for getting this to run faster?
>>    
>>
>
>A couple:
>
>1.- Ensure that your bottleneck is really the call to .append() method by
>commenting it out and doing timings again.
>  
>
Actually, I'm timing purely the call to .append(), which often takes 
seconds.

>2.- EArray.append() method do many checks so as to ensure that you pass an
>object compatible with the EArray being saved. If you are going to pass a
>*NumArray* object that you are sure it's compliant with the underlying
>EArray shape, you can save quite time by calling to the
>._append(numarrayObject) instead of .append(numarrayObject).
>
>If suggestion 2 is not enough (although I'd doubt it), things can be further
>speeded-up by optimizing the number of calls to the underlying HDF5 library.
>However, this must be regarded as a commercial service only (but you can
>always do it by yourself, of course!).
>  
>
That does help a little...

Anyhow, I think using PyTables/HDF5 is too slow for this task -- I can 
easily save at ~50 MB/sec using .write on simple File objects.  So I'll 
use that for now.

Finally, as a suggestion, you may want to incorporate the following code 
into the C source for PyTables, which will allow other Python threads to 
continue running when performing long-running HDF5 tasks.  See 
http://docs.python.org/api/threads.html for more information.

 PyThreadState *_save;
 _save = PyEval_SaveThread();
/* Do work accessing HDF5 library which does not touch the Python API */
PyEval_RestoreThread(_save);

(The file_write function in Objects/fileobject.c in the Python 
sourcecode uses the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS 
macros to acheive to do the same thing, but do to the error handling of 
PyTables, you'll probably need 2 copies of 
"PyEval_RestoreThread(_save)": one in the normal return path and one in 
the error handling path.)

Re: [Pytables-users] using pytables for saving realtime video?

From: Francesc A. <fa...@py...> - 2004-06-22 10:47:20

Attachments: bench-echunk.py

A Dimarts 22 Juny 2004 02:04, Andrew Straw va escriure:
> Sorry, the process never completed while writing the email.   Playing 
> around with hdparm, I can now get ~6.5 MB/sec. With no compression, and 
> UCL, LZO, and zlib all reduce that rate.

Mmm, just out of curiosity, I run some benchmarks simulating your scenario
(see attachment). After some runs I can reproduce your numbers (I can get up
to 5.5 MB/s on my laptop, a P4 Mobile @2 GHz, with a hard disk spinning at
4200 RPM, with a maximum throughput of 8 MB/s during writings). However,
using ._append() instead of .append() *do* helped a lot in my case,
improving the output from 2.8 MB/s to 5.5 MB/s (maybe this is because you
have a faster CPU). These figures has been collected without compression.

As I'm doing tests with a very slow hard disk, I used very repetitive data
(all zeros) to bypass the bottleneck, but the results are barely the same.
So, the bottleneck seems to be in the I/O calls, indeed.

In order to determine if the problem was PyTables or the HDF5 layer, I used
a small C program that opens the EArray only once, write all the data, and
then close the array (PyTables, on its hand, always open and close on every
append() operation). With that, I was able to achieve 7.7 MB/s, so very
close of the writing limits of my disk. When using compression (zlib,
complevel=1) and shuffling, however, I was able to achieve 22 MB/s. So,
perhaps it would be feasible to reach 30 MB/s or more without using
compression by using this kind of optimized writing on a system that
supports faster writing speeds, like yours.

So, most probably HDF5 would be able to achieve the speed that you need.
PyTables is quite slower because of the way that it does I/O (i.e. opening
and closing the EArray object on every append). Of course, as I said you in
my firts message, that would be speeded-up by writing a specialized method
that would open first the object, write frame objects and close at the end.

> >If suggestion 2 is not enough (although I'd doubt it), things can be further
                                  ^^^^^^^^^^^^^^^^^^^^^
Ooops, I have a mouth too big ;)

> Finally, as a suggestion, you may want to incorporate the following code 
> into the C source for PyTables, which will allow other Python threads to 
> continue running when performing long-running HDF5 tasks.  See 
> http://docs.python.org/api/threads.html for more information.
> 
>  PyThreadState *_save;
>  _save = PyEval_SaveThread();
> /* Do work accessing HDF5 library which does not touch the Python API */
> PyEval_RestoreThread(_save);
> 
> (The file_write function in Objects/fileobject.c in the Python 
> sourcecode uses the Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS 
> macros to acheive to do the same thing, but do to the error handling of 
> PyTables, you'll probably need 2 copies of 
> "PyEval_RestoreThread(_save)": one in the normal return path and one in 
> the error handling path.)

Ok. Thanks for the suggestion. This is very interesting indeed :)

Cheers,

-- 
Francesc Alted

Re: [Pytables-users] using pytables for saving realtime video?

From: Francesc A. <fa...@py...> - 2004-06-22 11:28:07

A Dimarts 22 Juny 2004 12:47, Francesc Alted va escriure:
> As I'm doing tests with a very slow hard disk, I used very repetitive data
> (all zeros) to bypass the bottleneck, but the results are barely the same.
> So, the bottleneck seems to be in the I/O calls, indeed.

Ops, I forgot to say that this is using compression.

> In order to determine if the problem was PyTables or the HDF5 layer, I used
> a small C program that opens the EArray only once, write all the data, and
> then close the array (PyTables, on its hand, always open and close on every
> append() operation). With that, I was able to achieve 7.7 MB/s, so very
> close of the writing limits of my disk. When using compression (zlib,
> complevel=1) and shuffling, however, I was able to achieve 22 MB/s. So,
> perhaps it would be feasible to reach 30 MB/s or more without using
> compression by using this kind of optimized writing on a system that
> supports faster writing speeds, like yours.

An small update: I re-made this C benchmark, but using only the zlib
compressor (i.e. whitout shuffling) and setting all data to zeros, and I've
obtained 33 MB/s. Without compression, that figure may perfectly grow up to
40 MB/s (whenever the hard disk would support this throughput, of course).

-- 
Francesc Alted

Re: [Pytables-users] using pytables for saving realtime video?

From: Francesc A. <fa...@py...> - 2004-06-23 12:28:03

A Dimarts 22 Juny 2004 13:27, Francesc Alted va escriure:
> An small update: I re-made this C benchmark, but using only the zlib
> compressor (i.e. whitout shuffling) and setting all data to zeros, and I've
> obtained 33 MB/s. Without compression, that figure may perfectly grow up to
> 40 MB/s (whenever the hard disk would support this throughput, of course).

More updates ;). This morning I remembered that Table objects has a much
more efficient interface for writing than EArrays (just because I've spend
more time on optimizing Tables than anything else), and besides, I've
recently reworked the algorithm to compute buffer sizes for Tables, in order
to make them still faster. And the good news is that all of this largely
address to this problem :)

So, if you use the lastest CVS and use a Table instead of an EArray, this
small script would be far more efficient than the equivalent using EArrays:

*****************************************************
import tables
import numarray as na

class Test(tables.IsDescription):
    var1 = tables.UInt8Col(shape=(640,480))

nframes = 200
filename = "data.nobackup/test2.h5"
fileh = tables.openFile(filename, mode="w", title="Raw camera stream")
root = fileh.root
filter_args = dict( complevel = 1, complib = 'lzo', shuffle=0)
hdftable = fileh.createTable(root,'images', Test, "Unsigned byte table",
                             tables.Filters(**filter_args),
                             expectedrows=nframes)

frame = na.zeros(type="UInt8", shape=(640,480))
for i in range(nframes):
    hdftable.row["var1"] = frame
    hdftable.row.append()
fileh.close()
*****************************************************

With it, I was able to save frames at a speed of 46.2 MB/s without
compression (this script generates a 60 MB file, so that it fits well on my
laptop cache). Using ZLIB, I've got 36.4 MB/s, with LZO compression 54.5
MB/s and with UCL it drops down to 8.0 MB/s.

I was curious about how much memory would take a long run, and I made a test
with 20000 frames for a total dataset size of 6 GB. I've used LZO compressor
in order to keep the file size small. With that, the run took 1m23s, for a
total throughput of more than 70 MB/s. And the process took 16 MB of memory
during the run, which is quite reasonable. However, there seems to be a
"small" memory leak that develops at a rate of 3 KB/frame. Whether this is
acceptable or not is up to you.

Cheers,

-- 
Francesc Alted