Re: [Pytables-users] Data Format Suggestions

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

A Saturday 03 May 2008, Glenn escrigué:
> Francesc Alted <falted <at> pytables.org> writes:
> > A Friday 02 May 2008, Glenn escrigué:
> > > Hello,
> > > I would like to use pytables to store the output from a
> > > spectrometer. The spectra come in at a rapid rate. I am having
> > > trouble understanding how to set up a data structure for the
> > > data. The two options that seem reasonable are an EArray and a
> > > Table. The example shown for an EArray leaves me wondering how to
> > > make an array of numpy 1D array rows that I can dynamically add
> > > to.
> >
> > If all the data you want to save is homogeneous, using an EArray is
> > ok. See below an example of use:
> >
> > N = 10  # your 1D array length
> > f = tables.openFile("test.h5", "w")
> > e = f.createEArray(f.root, 'earray', tables.FloatAtom(), (0,N),
> > 'test') for i in xrange(10):
> >     e.append([numpy.random.rand(N)])
> > f.close()
> >
> > > With a Table, I
> > > tried setting up an IsDescription subclass but could not figure
> > > out how to add a member to again represent a 1D array.
> >
> > Generally speaking, a Table is best for saving heterogeneous
> > datasets. In addition, the I/O is buffered in PyTables space (and
> > not only in HDF5) and it is generally faster than using an EArray,
> > so it may be more adequate in your case.
> >
> > Representing a 1D column is as easy as passing a 'shape=(N,)' 
> > argument to your 1D columns.  Look at this example:
> >
> > N = 10  # your 1D array length
> > class TTable(tables.IsDescription):
> >     col1 = tables.Int32Col(pos=0)
> >     col2 = tables.Float64Col(shape=(N,), pos=1)  # you 1D column
> > f = tables.openFile("test.h5", "w")
> > t = f.createTable(f.root, 'table', TTable, 'table test')
> > for i in xrange(10):
> >     t.append([[i, numpy.random.rand(N)]])
> > t.flush()
> > f.close()
> >
> > Hope that helps,
>
> Thank you for the help, I got it working with a Table now.
> I have a couple of new questions:
> My table has a column with a 1000 element 1d numpy array. I would
> like to do the following types of operations where I treat this
> column as a N x 1000 2d array, call it X:
> mean(X,axis=0)
>
> std(X[k].reshape((k, N/k)))
>
> In the mean case, I could imagine doing something like:
> m = zeros((1,1000))
> for row in X:
>   m = m + x
> m/N
> But it seems like this will be slow. I tried just numpy.mean(X) out
> of curiosity, but it took forever and finally ran out of memory. I
> assume it was forming a copy of the array in memory.

Can you be a more explicit on how you are building X?  An autocontained 
code example, with timings, is always nice to have.

Cheers,

-- 
Francesc Alted