Re: [Pytables-users] openFile strategy question

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Anthony,

> I am a little confused.  Let me verify.  You have 400 hdf5 file (re and im) buried in an a unix directory tree.  You want to make a single file which concatenates this data.  Is this right?

Sorry for my description - that is not quite right.
The "unix directory tree" is the group tree I have made in each individual hdf5 file.  So I have 400 hdf5 files, each with the given directory tree.  And I basically want to copy that directory tree, but "merge" all of them together.
However, there are bits in each of the small files that I do not want to merge - I only want to grab the average data sets, while the little files contains many different samples (which I have already averaged into the "avg" group.

Is this clear?

Thanks,

Andre

> 
> Be Well
> Anthony
> 
> On Wed, Aug 15, 2012 at 6:52 PM, Andre' Walker-Loud <wal...@gm...> wrote:
> Hi All,
> 
> Just a strategy question.
> I have many hdf5 files containing data for different measurements of the same quantities.
> 
> My directory tree looks like
> 
> top description [ group ]
>   sub description [ group ]
>     avg [ group ]
>       re [ numpy array shape = (96,1,2) ]
>       im [ numpy array shape = (96,1,2) ] - only exists for know subset of data files
> 
> I have ~400 of these files.  What I want to do is create a single file, which collects all of these files with exactly the same directory structure, except at the very bottom
> 
>       re [ numpy array shape = (400,96,1,2) ]
> 
> 
> The simplest thing I came up with to do this is loop over the two levels of descriptive group structures, and build the numpy array for the final set this way.
> 
> basic loop structure:
> 
> final_file = tables.openFile('all_data.h5','a')
> 
> for d1 in top_description:
>     final_file.createGroup(final_file.root,d1)
>     for d2 in sub_description:
>         final_file.createGroup(final_file.root+'/'+d1,d2)
>         data_re = np.zeros([400,96,1,2])
>         for i,file in enumerate(hdf5_files):
>             tmp = tables.openFile(file)
>             data_re[i] = np.array(tmp.getNode('/d1/d2/avg/re')
>             tmp.close()
>         final_file.createArray(final_file.root+'/'+d1+'/'+d2,'re',data_re)
> 
> 
> But this involves opening and closing the individual 400 hdf5 files many times.
> There must be a smarter algorithmic way to do this - or perhaps built in pytables tools.
> 
> Any advice is appreciated.
> 
> 
> Andre
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/_______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users