Re: [Pytables-users] [newbie] Array of records ?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

> Date: Thu, 05 Oct 2006 10:19:31 +0200
> From: Francesc Altet <fa...@ca...>
> Subject: Re: [Pytables-users] [newbie] Array of records ?
> To: George Sakkis <geo...@gm...>
> 
> El dc 04 de 10 del 2006 a les 23:19 -0400, en/na George Sakkis va
> escriure:
> > Hi all,
> > 
> > I found pytables a few hours ago and skimmed through the manual, so
> > this may be basic: is it possible to define a table column as an array
> > of records ? For instance, say I have the python list [('aa',1,2.0),
> > ('ab',2, 3.0), ... ('az',37,-2.0)], where each element is a len-3
> > tuple whose first element is StringCol(2), the second UInt16Col and
> > the third Float32Col. The list as a whole has not fixed length (i.e.
> > it may vary for different rows of the table). Ideally, I'd like to be
> > able to access the tuple elements by name (as it's typical for
> > records) but even regular indexing would be ok. 
> 
> You can have nested types in tables. The problem is that you need that
> different records can be of different length, and this is not supported
> for Table objects. For your purposes, you may better use a VLArray with
> an ObjectAtom() class as a container for your records. It's not the most
> efficient way to save your data, but it works.
> 
> Another possibility is to create a couple of objects: a Table for
> keeping the len-3 tuples and a VLArray where in each row you can save
> the row number of Table records that are part of the python list.
> Admittedly, this is a bit more involved, but a good alternative.

Been a while since I've been on the list... guess that means the
software works pretty good!

Anyway, I've taken to using the ObjectAtom as a quick and dirty solution
for stuff that I don't have time to figure out right away, and it works
reasonably well for completely arbitrary data.  To index, you need to
read the whole thing, then filter.

But I'd like to suggest another indexing strategy that I've used (which
I must credit partly to the MINC 2.0 committee at McGill)... which is to
simply have each one of your related lists of tuples stored as a table
in some group dedicated solely for that purpose.

For example, you could have /tuples/0, /tuples/1, ... /tuples/N.

I've incorporated this as a general paradigm for dealing with the kind
of problem you've described, and you can write a simple "get_next_index"
function to find the next available index in a group.  Or, you may have
meaningful names you could use, that would help you index them better.  

Searching on a column then becomes a simple matter of searching on each
of the tables in 'tuples', which will be considerably faster than the VL
/ ObjectAtom approach.  There are a number of ways to get a group's
children - I like

pytablesfile.root...tuples._v_children

As always - love the software!

Cheers,
Dav

-- 
Dav Clark
www.eCult.org
917-544-8408