From: Francesc A. <fa...@gm...> - 2012-07-06 07:06:06
|
On 7/5/12 7:59 PM, Anthony Scopatz wrote: > On Thu, Jul 5, 2012 at 12:34 PM, Jacob Bennett > <jac...@gm... <mailto:jac...@gm...>> wrote: > > Hello Pytables Users, > > I am currently having a maximum number of children error within > pytables. I am trying to store stock updates within hdf5. My > current schema is to have one file represent a trading day, each > table represent a particular instrumentID (stock id) and have each > record in the table belong to a specific update with a timestamp > (where the timestamp could be considered a primary key). > > I am currently having all tables be direct descendants of root. > > The problem with this is that per day I have the following stats: > > #of tables ::= 20000 > #of Records per table ::= 250000 > > The problem persists in that 20000 is too many children to be > associated with a particular node. Continuing with this schema > will consume an exorbitant amount of memory and lead to slower > query times. > > Is there a way to redesign this schema so that it could work > better with pytables? Or is this simply too much data? > > > It certainly isn't too much data. HDF5 scales to petabytes ;) > > Would it help to follow with the current schema and just increase > the depth of the tree by taking parts of the instrumentId > (instrumentId is an int64) as nodes? > > > Yes, this would be one approach that would work. +1 > Basically, nodes in HDF5 only get a fixed amount of storage for > metadata, including what children they have. (I believe this number > is 64 kb. In theory, it is possible to increase this number and > recompile hdf5, but then files generated in this way would only be > compatible with your altered version of the library.) So if a group > has so many children that storing their names and locations takes up > more than 64 kb, you have run out of room. By adding N other > subgroups to the hierarchy you increase the metadata available to N * > 64 kb. No, this is wrong. The hierarchy metadata is stored on a different place than user metadata, and hence it is not affected by the 64 KB limit. The problem is rather that having too many children hanging from a single group affects quite negatively to performance (the same happens with regular filesystems having directories with too many files). -- Francesc Alted |