[Pytables-users] openFile strategy question

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi All,

Just a strategy question.
I have many hdf5 files containing data for different measurements of the same quantities.

My directory tree looks like

top description [ group ]
  sub description [ group ]
    avg [ group ]
      re [ numpy array shape = (96,1,2) ]
      im [ numpy array shape = (96,1,2) ] - only exists for know subset of data files

I have ~400 of these files.  What I want to do is create a single file, which collects all of these files with exactly the same directory structure, except at the very bottom

      re [ numpy array shape = (400,96,1,2) ]

The simplest thing I came up with to do this is loop over the two levels of descriptive group structures, and build the numpy array for the final set this way.

basic loop structure:

final_file = tables.openFile('all_data.h5','a')

for d1 in top_description:
    final_file.createGroup(final_file.root,d1)
    for d2 in sub_description:
        final_file.createGroup(final_file.root+'/'+d1,d2)
        data_re = np.zeros([400,96,1,2])
        for i,file in enumerate(hdf5_files):
            tmp = tables.openFile(file)
            data_re[i] = np.array(tmp.getNode('/d1/d2/avg/re')
            tmp.close()
        final_file.createArray(final_file.root+'/'+d1+'/'+d2,'re',data_re)

But this involves opening and closing the individual 400 hdf5 files many times.
There must be a smarter algorithmic way to do this - or perhaps built in pytables tools.

Any advice is appreciated.

Andre