From: Anthony S. <sc...@gm...> - 2013-08-05 14:50:48
|
On Mon, Aug 5, 2013 at 4:11 AM, Nyirő Gergő <ger...@gm...> wrote: > Hello, > > > We develop a measurement evaluation tool, and we'd like to use > pytables/hdf5 as a middle layer for signal accessing. > > We have to deal with the silly structure of the recorder device > measurement format. > > > > The signals can be accessed via two identifiers: > > * device name: <source of the signal>-<channel of the > message>-<another tag>-<yet another tag> > > * signal name > > > > The first identifier says the source information of the signal, which > can be quite long. > > Therefore I grouped the device name into two layers: > > /<source of the signal> > > /<channel of the message>... > > /<signal name> > > > > So if you have the same message from two channels, than you will get > /foo-device-name > > /channel-1 > > /bar > > /baz > > /channel-2 > > /bar > > /baz > > > > Besides signal loading, we have to search for signal name as fast as > possible, and return with the shortest unique device name part and the > signal name. > > Using the structure above, iterating over the group names is quite > slow. So I build up a table from device and signal name. > > As far as I know, the pytables query does not support string searching > (e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us > to a pure python loop which is slow again. > > Therefore I build up a python dictionary from the table, which provide > fast iteration against the table, but the init time increased from 100 > ms to 3-4 sec (we have more than 40 000 signals). > > > > Do you have any advice how to search for group names in hdf5 with > pytables in an efficient way? > Hi grego, Searching through group names, like accessing all HDF5 metadata, is slow. For group names this is because rather than searching through a list you are traversing a B-tree, IIRC. So you have to use the couple of tricks that you used: 1) have another Table / Array of all table names, 2) read this in once to a native Python data structure (dict here). However, 4 sec to read in this table seems excessive for data of this size. You are probably not reading this in properly. You should be using: raw_grps = f.root.grp_names[:] or similar. Maybe other people have some other ideas. Be Well Anthony > > ps: I would be most happy with a glob interface. > > > > thanks for your advices in advance, > > gergo > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out. > http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |