From: Jacob B. <jac...@gm...> - 2012-07-18 10:49:34
|
I really like this way about going about it; however, would it be better to use the built in hierarchy for separation of the tables or to write to separate hdf5 files? When I am currently experimenting with concurrent read/write operations to a shared hdf5 file w/o hierarchy, I notice that the only errors that I get are occasional read errors (which isn't much of a problem for me), so I am thinking. Could there be a way to reduce the metadata within an hdf5 and at the same time, use a multi-tabled approach to solve my problem? Thanks, Jacob On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uem...@gm...> wrote: > Just to add what Anthony said: > In the end it also depends how unrelated your data is and how you want > to access it. If the access scenaria is that you usually only search > or select within a specific dataset then splitting up the datasets and > putting them into separate tables is the way to go. In RBDMS terms > this is btw called sharding. > I have such a use case where I do have around 30000 datasets (each of > them with around 5 million rows). I am only interested in one dataset > at a time. So I created 30.000 tables. It works really good. > And in case you want to access the data across the datasets (for > aggregating or calculating averages) you can take a MapReduce approach > which should work very well with this approach. > > > On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett > <jac...@gm...> wrote: > > Thanks for the input Anthony! > > > > -Jake > > > > > > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <sc...@gm...> > wrote: > >> > >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett < > jac...@gm...> > >> wrote: > >>> > >>> Hello PyTables Users & Contributors, > >>> > >>> Just a quick question, let's say that I have certain identifiers that > >>> link to a set of data. Would it generally be faster for lookup to have > each > >>> set a data as a separate table with an id as the tables name or to add > this > >>> id as another column to a universal table of data and then let the > in-kernel > >>> search query data only with a specific id? > >> > >> > >> I think that in general it is faster to have more tables with ids as > >> names. For very small data, searching through a single larger table > might > >> be quicker than node access...but even then I doubt it. > >> > >>> > >>> I hope you can understand my question would 1,000 tables of 100,000 > >>> records each be better for searching than 1 table with 100 million > records > >>> and one extra id column? > >> > >> > >> For these data sizes more tables is probably faster. > >> > >> (It should also be noted that in the more tables case, that data is > >> actually smaller, because you can eliminate the id column.) > >> > >> Be Well > >> Anthony > >> > >>> > >>> > >>> Thanks, > >>> Jacob Bennett > >>> > >>> -- > >>> Jacob Bennett > >>> Massachusetts Institute of Technology > >>> Department of Electrical Engineering and Computer Science > >>> Class of 2014| ben...@mi... > >>> > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> Live Security Virtual Conference > >>> Exclusive live event will cover all the ways today's security and > >>> threat landscape has changed and how IT managers can respond. > Discussions > >>> will include endpoint security, mobile security and the latest in > malware > >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >>> _______________________________________________ > >>> Pytables-users mailing list > >>> Pyt...@li... > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >>> > >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > > > > > > > > -- > > Jacob Bennett > > Massachusetts Institute of Technology > > Department of Electrical Engineering and Computer Science > > Class of 2014| ben...@mi... > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |