Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I really like this way about going about it; however, would it be better to
use the built in hierarchy for separation of the tables or to write to
separate hdf5 files? When I am currently experimenting with concurrent
read/write operations to a shared hdf5 file w/o hierarchy, I notice that
the only errors that I get are occasional read errors (which isn't much of
a problem for me), so I am thinking. Could there be a way to reduce the
metadata within an hdf5 and at the same time, use a multi-tabled approach
to solve my problem?

Thanks,
Jacob

On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uem...@gm...> wrote:

> Just to add what Anthony said:
> In the end it also depends how unrelated your data is and how you want
> to access it. If the access scenaria is that you usually only search
> or select within a specific dataset then splitting up the datasets and
> putting them into separate tables is the way to go. In RBDMS terms
> this is btw called sharding.
> I have such a use case where I do have around 30000 datasets (each of
> them with around 5 million rows). I am only interested in one dataset
> at a time. So I created 30.000 tables. It works really good.
> And in case you want to access the data across the datasets (for
> aggregating or calculating averages) you can take a MapReduce approach
> which should work very well with this approach.
>
>
> On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett
> <jac...@gm...> wrote:
> > Thanks for the input Anthony!
> >
> > -Jake
> >
> >
> > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <sc...@gm...>
> wrote:
> >>
> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett <
> jac...@gm...>
> >> wrote:
> >>>
> >>> Hello PyTables Users & Contributors,
> >>>
> >>> Just a quick question, let's say that I have certain identifiers that
> >>> link to a set of data. Would it generally be faster for lookup to have
> each
> >>> set a data as a separate table with an id as the tables name or to add
> this
> >>> id as another column to a universal table of data and then let the
> in-kernel
> >>> search query data only with a specific id?
> >>
> >>
> >> I think that in general it is faster to have more tables with ids as
> >> names.  For very small data, searching through a single larger table
> might
> >> be quicker than node access...but even then I doubt it.
> >>
> >>>
> >>> I hope you can understand my question would 1,000 tables of 100,000
> >>> records each be better for searching than 1 table with 100 million
> records
> >>> and one extra id column?
> >>
> >>
> >> For these data sizes more tables is probably faster.
> >>
> >> (It should also be noted that in the more tables case, that data is
> >> actually smaller, because you can eliminate the id column.)
> >>
> >> Be Well
> >> Anthony
> >>
> >>>
> >>>
> >>> Thanks,
> >>> Jacob Bennett
> >>>
> >>> --
> >>> Jacob Bennett
> >>> Massachusetts Institute of Technology
> >>> Department of Electrical Engineering and Computer Science
> >>> Class of 2014| ben...@mi...
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Live Security Virtual Conference
> >>> Exclusive live event will cover all the ways today's security and
> >>> threat landscape has changed and how IT managers can respond.
> Discussions
> >>> will include endpoint security, mobile security and the latest in
> malware
> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >>> _______________________________________________
> >>> Pytables-users mailing list
> >>> Pyt...@li...
> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >>>
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Pytables-users mailing list
> >> Pyt...@li...
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >>
> >
> >
> >
> > --
> > Jacob Bennett
> > Massachusetts Institute of Technology
> > Department of Electrical Engineering and Computer Science
> > Class of 2014| ben...@mi...
> >
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...