From: Anthony S. <sc...@gm...> - 2012-07-06 15:48:31
|
Ahh thanks for clarifying.... On Jul 6, 2012 2:06 AM, "Francesc Alted" <fa...@gm...> wrote: > On 7/5/12 7:59 PM, Anthony Scopatz wrote: > > On Thu, Jul 5, 2012 at 12:34 PM, Jacob Bennett <jac...@gm...>wrote: > >> Hello Pytables Users, >> >> I am currently having a maximum number of children error within >> pytables. I am trying to store stock updates within hdf5. My current schema >> is to have one file represent a trading day, each table represent a >> particular instrumentID (stock id) and have each record in the table belong >> to a specific update with a timestamp (where the timestamp could be >> considered a primary key). >> >> I am currently having all tables be direct descendants of root. >> >> The problem with this is that per day I have the following stats: >> >> #of tables ::= 20000 >> #of Records per table ::= 250000 >> >> The problem persists in that 20000 is too many children to be >> associated with a particular node. Continuing with this schema will consume >> an exorbitant amount of memory and lead to slower query times. >> >> Is there a way to redesign this schema so that it could work better >> with pytables? Or is this simply too much data? >> > > It certainly isn't too much data. HDF5 scales to petabytes ;) > > >> Would it help to follow with the current schema and just increase the >> depth of the tree by taking parts of the instrumentId (instrumentId is an >> int64) as nodes? >> > > Yes, this would be one approach that would work. > > > +1 > > Basically, nodes in HDF5 only get a fixed amount of storage for > metadata, including what children they have. (I believe this number is 64 > kb. In theory, it is possible to increase this number and recompile hdf5, > but then files generated in this way would only be compatible with your > altered version of the library.) So if a group has so many children that > storing their names and locations takes up more than 64 kb, you have run > out of room. By adding N other subgroups to the hierarchy you increase the > metadata available to N * 64 kb. > > > No, this is wrong. The hierarchy metadata is stored on a different place > than user metadata, and hence it is not affected by the 64 KB limit. The > problem is rather that having too many children hanging from a single group > affects quite negatively to performance (the same happens with regular > filesystems having directories with too many files). > > -- > Francesc Alted > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |