Re: [Pytables-users] searching for group names

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On Mon, Aug 5, 2013 at 4:11 AM, Nyirő Gergő <ger...@gm...> wrote:

> Hello,
>
>
> We develop a measurement evaluation tool, and we'd like to use
> pytables/hdf5 as a middle layer for signal accessing.
>
> We have to deal with the silly structure of the recorder device
> measurement format.
>
>
>
> The signals can be accessed via two identifiers:
>
> * device name: <source of the signal>-<channel of the
> message>-<another tag>-<yet another tag>
>
> * signal name
>
>
>
> The first identifier says the source information of the signal, which
> can be quite long.
>
> Therefore I grouped the device name into two layers:
>
> /<source of the signal>
>
>                 /<channel of the message>...
>
>                                 /<signal name>
>
>
>
> So if you have the same message from two channels, than you will get
> /foo-device-name
>
>                 /channel-1
>
>                                 /bar
>
>                                 /baz
>
>                 /channel-2
>
>                                 /bar
>
>                                 /baz
>
>
>
> Besides signal loading, we have to search for signal name as fast as
> possible, and return with the shortest unique device name part and the
> signal name.
>
> Using the structure above, iterating over the group names is quite
> slow. So I build up a table from device and signal name.
>
> As far as I know, the pytables query does not support string searching
> (e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us
> to a pure python loop which is slow again.
>
> Therefore I build up a python dictionary from the table, which provide
> fast iteration against the table, but the init time increased from 100
> ms to 3-4 sec (we have more than 40 000 signals).
>
>
>
> Do you have any advice how to search for group names in hdf5 with
> pytables in an efficient way?
>

Hi grego,

Searching through group names, like accessing all HDF5 metadata, is slow.
 For group names this is because rather than searching through a list you
are traversing a B-tree, IIRC.  So you have to use the couple of tricks
that you used: 1) have another Table / Array of all table names, 2) read
this in once to a native Python data structure (dict here).

However, 4 sec to read in this table seems excessive for data of this size.
 You are probably not reading this in properly.  You should be using:

raw_grps = f.root.grp_names[:]

or similar.

Maybe other people have some other ideas.

Be Well
Anthony

>
> ps: I would be most happy with a glob interface.
>
>
>
> thanks for your advices in advance,
>
> gergo
>
>
> ------------------------------------------------------------------------------
> Get your SQL database under version control now!
> Version control is standard for application code, but databases havent
> caught up. So what steps can you take to put your SQL databases under
> version control? Why should you start doing it? Read more to find out.
> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>