From: Andre' Walker-L. <wal...@gm...> - 2012-08-16 00:25:21
|
Hi Anthony, > I am a little confused. Let me verify. You have 400 hdf5 file (re and im) buried in an a unix directory tree. You want to make a single file which concatenates this data. Is this right? Sorry for my description - that is not quite right. The "unix directory tree" is the group tree I have made in each individual hdf5 file. So I have 400 hdf5 files, each with the given directory tree. And I basically want to copy that directory tree, but "merge" all of them together. However, there are bits in each of the small files that I do not want to merge - I only want to grab the average data sets, while the little files contains many different samples (which I have already averaged into the "avg" group. Is this clear? Thanks, Andre > > Be Well > Anthony > > On Wed, Aug 15, 2012 at 6:52 PM, Andre' Walker-Loud <wal...@gm...> wrote: > Hi All, > > Just a strategy question. > I have many hdf5 files containing data for different measurements of the same quantities. > > My directory tree looks like > > top description [ group ] > sub description [ group ] > avg [ group ] > re [ numpy array shape = (96,1,2) ] > im [ numpy array shape = (96,1,2) ] - only exists for know subset of data files > > I have ~400 of these files. What I want to do is create a single file, which collects all of these files with exactly the same directory structure, except at the very bottom > > re [ numpy array shape = (400,96,1,2) ] > > > The simplest thing I came up with to do this is loop over the two levels of descriptive group structures, and build the numpy array for the final set this way. > > basic loop structure: > > final_file = tables.openFile('all_data.h5','a') > > for d1 in top_description: > final_file.createGroup(final_file.root,d1) > for d2 in sub_description: > final_file.createGroup(final_file.root+'/'+d1,d2) > data_re = np.zeros([400,96,1,2]) > for i,file in enumerate(hdf5_files): > tmp = tables.openFile(file) > data_re[i] = np.array(tmp.getNode('/d1/d2/avg/re') > tmp.close() > final_file.createArray(final_file.root+'/'+d1+'/'+d2,'re',data_re) > > > But this involves opening and closing the individual 400 hdf5 files many times. > There must be a smarter algorithmic way to do this - or perhaps built in pytables tools. > > Any advice is appreciated. > > > Andre > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/_______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |