From: Ümit S. <uem...@gm...> - 2012-07-18 06:22:53
|
Just to add what Anthony said: In the end it also depends how unrelated your data is and how you want to access it. If the access scenaria is that you usually only search or select within a specific dataset then splitting up the datasets and putting them into separate tables is the way to go. In RBDMS terms this is btw called sharding. I have such a use case where I do have around 30000 datasets (each of them with around 5 million rows). I am only interested in one dataset at a time. So I created 30.000 tables. It works really good. And in case you want to access the data across the datasets (for aggregating or calculating averages) you can take a MapReduce approach which should work very well with this approach. On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett <jac...@gm...> wrote: > Thanks for the input Anthony! > > -Jake > > > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <sc...@gm...> wrote: >> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett <jac...@gm...> >> wrote: >>> >>> Hello PyTables Users & Contributors, >>> >>> Just a quick question, let's say that I have certain identifiers that >>> link to a set of data. Would it generally be faster for lookup to have each >>> set a data as a separate table with an id as the tables name or to add this >>> id as another column to a universal table of data and then let the in-kernel >>> search query data only with a specific id? >> >> >> I think that in general it is faster to have more tables with ids as >> names. For very small data, searching through a single larger table might >> be quicker than node access...but even then I doubt it. >> >>> >>> I hope you can understand my question would 1,000 tables of 100,000 >>> records each be better for searching than 1 table with 100 million records >>> and one extra id column? >> >> >> For these data sizes more tables is probably faster. >> >> (It should also be noted that in the more tables case, that data is >> actually smaller, because you can eliminate the id column.) >> >> Be Well >> Anthony >> >>> >>> >>> Thanks, >>> Jacob Bennett >>> >>> -- >>> Jacob Bennett >>> Massachusetts Institute of Technology >>> Department of Electrical Engineering and Computer Science >>> Class of 2014| ben...@mi... >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |