From: Francesc A. <fa...@py...> - 2008-05-05 09:54:20
|
A Saturday 03 May 2008, Glenn escrigué: > Francesc Alted <falted <at> pytables.org> writes: > > A Friday 02 May 2008, Glenn escrigué: > > > Hello, > > > I would like to use pytables to store the output from a > > > spectrometer. The spectra come in at a rapid rate. I am having > > > trouble understanding how to set up a data structure for the > > > data. The two options that seem reasonable are an EArray and a > > > Table. The example shown for an EArray leaves me wondering how to > > > make an array of numpy 1D array rows that I can dynamically add > > > to. > > > > If all the data you want to save is homogeneous, using an EArray is > > ok. See below an example of use: > > > > N = 10 # your 1D array length > > f = tables.openFile("test.h5", "w") > > e = f.createEArray(f.root, 'earray', tables.FloatAtom(), (0,N), > > 'test') for i in xrange(10): > > e.append([numpy.random.rand(N)]) > > f.close() > > > > > With a Table, I > > > tried setting up an IsDescription subclass but could not figure > > > out how to add a member to again represent a 1D array. > > > > Generally speaking, a Table is best for saving heterogeneous > > datasets. In addition, the I/O is buffered in PyTables space (and > > not only in HDF5) and it is generally faster than using an EArray, > > so it may be more adequate in your case. > > > > Representing a 1D column is as easy as passing a 'shape=(N,)' > > argument to your 1D columns. Look at this example: > > > > N = 10 # your 1D array length > > class TTable(tables.IsDescription): > > col1 = tables.Int32Col(pos=0) > > col2 = tables.Float64Col(shape=(N,), pos=1) # you 1D column > > f = tables.openFile("test.h5", "w") > > t = f.createTable(f.root, 'table', TTable, 'table test') > > for i in xrange(10): > > t.append([[i, numpy.random.rand(N)]]) > > t.flush() > > f.close() > > > > Hope that helps, > > Thank you for the help, I got it working with a Table now. > I have a couple of new questions: > My table has a column with a 1000 element 1d numpy array. I would > like to do the following types of operations where I treat this > column as a N x 1000 2d array, call it X: > mean(X,axis=0) > > std(X[k].reshape((k, N/k))) > > In the mean case, I could imagine doing something like: > m = zeros((1,1000)) > for row in X: > m = m + x > m/N > But it seems like this will be slow. I tried just numpy.mean(X) out > of curiosity, but it took forever and finally ran out of memory. I > assume it was forming a copy of the array in memory. Can you be a more explicit on how you are building X? An autocontained code example, with timings, is always nice to have. Cheers, -- Francesc Alted |