From: Jacob B. <jac...@gm...> - 2012-07-18 12:10:34
|
Cool, thanks again for your help! -Jake On Wed, Jul 18, 2012 at 7:07 AM, Ümit Seren <uem...@gm...> wrote: > I actually had 30.000 groups attached to the data group. But I guess > it doesn't really matter whether it is a table or a group. They both > are nodes. > > > On Wed, Jul 18, 2012 at 2:04 PM, Jacob Bennett > <jac...@gm...> wrote: > > Good to hear, were you able to get away with having 30,000 datasets > directly > > linked to a similar node (in this case, data)? I seem to have a problem > > putting that many nodes from one root. > > > > -Jacob > > > > > > On Wed, Jul 18, 2012 at 6:54 AM, Ümit Seren <uem...@gm...> > wrote: > >> > >> On Wed, Jul 18, 2012 at 1:32 PM, Jacob Bennett > >> <jac...@gm...> wrote: > >> > Sounds awesome, thanks for the help, I also have two more concerns. > >> > > >> > #1 - I will never concurrently write, I only have to worry about one > >> > write > >> > with many reads, will the hdf5 metadata for a tree-like structure be > >> > able to > >> > hold up in this scenario? > >> > >> To be honest I haven't really tried the concurrent read and single > >> write use case. > >> In my case I had a cherrypy python web-server (which uses multiple > >> processes to handle requests) and usually I write from one request and > >> reading is done from the same or another. But I don't think I ever had > >> the use case where I read and wrote at the same time. > >> However I had to keep the files open because of the way PyTables > >> handles files (it cashes them as singleton object without a lock). > >> For example if you close the file after you finished writing and at > >> the same time you are reading from another process it will cause an > >> exception in the read thread/process because it loses the file handle. > >> So you probably have to take care of this yourself in your code. > >> > >> > >> > #2 - When you have around 30,000 tables in your hdf5 file, you do not > >> > want > >> > to have every node directly linked to root (plus I don't think hdf5 > can > >> > support that); however, I have no other natural grouping besides this, > >> > could > >> > this be a concern also. > >> > >> > >> Well in my case my datasets consisted not only of one table but also > >> attional data (CArray, etc). > >> So I naturally created groups for each datasets and stored > >> meta-information as attributes on the group. These groups could > >> contain sometimes additional groups and the actual data in form of > >> tables and CArrays. It looked something like this: > >> > >> - root > >> - data > >> - dataset1 > >> - table > >> - transformation > >> -table > >> - CArray > >> - dataset2 > >> . > >> . > >> . > >> - dataset30.000 > >> > >> > >> > If you could help me out with these two items, I think I will have > >> > enough > >> > knowledge under my belt to know what I need to do. Thanks again! ;) > >> > > >> > > >> > On Wed, Jul 18, 2012 at 6:21 AM, Ümit Seren <uem...@gm...> > >> > wrote: > >> >> > >> >> I think it depends and there are different ways to do it. > >> >> But concurrent writes to one HDF5 file is not really supported (not > >> >> even by the underlying HDF5 library unless you use the MPI version). > >> >> So in case you want to write from different threads/processes you > >> >> probably have to use separate hdf5 files. > >> >> However writing from one process and reading from another is not much > >> >> of an issue. > >> >> > >> >> Having everything in one hdf5 file has it's advantages as well as > >> >> putting everything in separate hdf5 files. > >> >> Filesystems can usually cope with one huge file much better than will > >> >> millions of small files (copying, listing, etc). > >> >> Of course if you have the datasets in separate hdf5 files it's easier > >> >> to copy/move just single datasets compared to having everything in > one > >> >> hdf5 file (tough that's also possible using ptrepack). > >> >> > >> >> You could also create one hdf5 file for the meta information and > >> >> create separate hdf5 files for each dataset. Then you can use > >> >> hardlinks to connect the hdf5 file containing the meta-information to > >> >> the hdf5 files for the datasets. > >> >> > >> >> I usually tend to put everything in one hdf5 file. > >> >> > >> >> On Wed, Jul 18, 2012 at 12:49 PM, Jacob Bennett > >> >> <jac...@gm...> wrote: > >> >> > I really like this way about going about it; however, would it be > >> >> > better > >> >> > to > >> >> > use the built in hierarchy for separation of the tables or to write > >> >> > to > >> >> > separate hdf5 files? When I am currently experimenting with > >> >> > concurrent > >> >> > read/write operations to a shared hdf5 file w/o hierarchy, I notice > >> >> > that > >> >> > the > >> >> > only errors that I get are occasional read errors (which isn't much > >> >> > of a > >> >> > problem for me), so I am thinking. Could there be a way to reduce > the > >> >> > metadata within an hdf5 and at the same time, use a multi-tabled > >> >> > approach to > >> >> > solve my problem? > >> >> > > >> >> > Thanks, > >> >> > Jacob > >> >> > > >> >> > > >> >> > On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uem...@gm... > > > >> >> > wrote: > >> >> >> > >> >> >> Just to add what Anthony said: > >> >> >> In the end it also depends how unrelated your data is and how you > >> >> >> want > >> >> >> to access it. If the access scenaria is that you usually only > search > >> >> >> or select within a specific dataset then splitting up the datasets > >> >> >> and > >> >> >> putting them into separate tables is the way to go. In RBDMS terms > >> >> >> this is btw called sharding. > >> >> >> I have such a use case where I do have around 30000 datasets (each > >> >> >> of > >> >> >> them with around 5 million rows). I am only interested in one > >> >> >> dataset > >> >> >> at a time. So I created 30.000 tables. It works really good. > >> >> >> And in case you want to access the data across the datasets (for > >> >> >> aggregating or calculating averages) you can take a MapReduce > >> >> >> approach > >> >> >> which should work very well with this approach. > >> >> >> > >> >> >> > >> >> >> On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett > >> >> >> <jac...@gm...> wrote: > >> >> >> > Thanks for the input Anthony! > >> >> >> > > >> >> >> > -Jake > >> >> >> > > >> >> >> > > >> >> >> > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz > >> >> >> > <sc...@gm...> > >> >> >> > wrote: > >> >> >> >> > >> >> >> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett > >> >> >> >> <jac...@gm...> > >> >> >> >> wrote: > >> >> >> >>> > >> >> >> >>> Hello PyTables Users & Contributors, > >> >> >> >>> > >> >> >> >>> Just a quick question, let's say that I have certain > identifiers > >> >> >> >>> that > >> >> >> >>> link to a set of data. Would it generally be faster for lookup > >> >> >> >>> to > >> >> >> >>> have > >> >> >> >>> each > >> >> >> >>> set a data as a separate table with an id as the tables name > or > >> >> >> >>> to > >> >> >> >>> add > >> >> >> >>> this > >> >> >> >>> id as another column to a universal table of data and then let > >> >> >> >>> the > >> >> >> >>> in-kernel > >> >> >> >>> search query data only with a specific id? > >> >> >> >> > >> >> >> >> > >> >> >> >> I think that in general it is faster to have more tables with > ids > >> >> >> >> as > >> >> >> >> names. For very small data, searching through a single larger > >> >> >> >> table > >> >> >> >> might > >> >> >> >> be quicker than node access...but even then I doubt it. > >> >> >> >> > >> >> >> >>> > >> >> >> >>> I hope you can understand my question would 1,000 tables of > >> >> >> >>> 100,000 > >> >> >> >>> records each be better for searching than 1 table with 100 > >> >> >> >>> million > >> >> >> >>> records > >> >> >> >>> and one extra id column? > >> >> >> >> > >> >> >> >> > >> >> >> >> For these data sizes more tables is probably faster. > >> >> >> >> > >> >> >> >> (It should also be noted that in the more tables case, that > data > >> >> >> >> is > >> >> >> >> actually smaller, because you can eliminate the id column.) > >> >> >> >> > >> >> >> >> Be Well > >> >> >> >> Anthony > >> >> >> >> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> Thanks, > >> >> >> >>> Jacob Bennett > >> >> >> >>> > >> >> >> >>> -- > >> >> >> >>> Jacob Bennett > >> >> >> >>> Massachusetts Institute of Technology > >> >> >> >>> Department of Electrical Engineering and Computer Science > >> >> >> >>> Class of 2014| ben...@mi... > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > >> >> >> >>> > ------------------------------------------------------------------------------ > >> >> >> >>> Live Security Virtual Conference > >> >> >> >>> Exclusive live event will cover all the ways today's security > >> >> >> >>> and > >> >> >> >>> threat landscape has changed and how IT managers can respond. > >> >> >> >>> Discussions > >> >> >> >>> will include endpoint security, mobile security and the latest > >> >> >> >>> in > >> >> >> >>> malware > >> >> >> >>> threats. > >> >> >> >>> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> >> >> >>> _______________________________________________ > >> >> >> >>> Pytables-users mailing list > >> >> >> >>> Pyt...@li... > >> >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> >>> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > ------------------------------------------------------------------------------ > >> >> >> >> Live Security Virtual Conference > >> >> >> >> Exclusive live event will cover all the ways today's security > and > >> >> >> >> threat landscape has changed and how IT managers can respond. > >> >> >> >> Discussions > >> >> >> >> will include endpoint security, mobile security and the latest > in > >> >> >> >> malware > >> >> >> >> threats. > >> >> >> >> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> >> >> >> _______________________________________________ > >> >> >> >> Pytables-users mailing list > >> >> >> >> Pyt...@li... > >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > -- > >> >> >> > Jacob Bennett > >> >> >> > Massachusetts Institute of Technology > >> >> >> > Department of Electrical Engineering and Computer Science > >> >> >> > Class of 2014| ben...@mi... > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > >> >> >> > > ------------------------------------------------------------------------------ > >> >> >> > Live Security Virtual Conference > >> >> >> > Exclusive live event will cover all the ways today's security > and > >> >> >> > threat landscape has changed and how IT managers can respond. > >> >> >> > Discussions > >> >> >> > will include endpoint security, mobile security and the latest > in > >> >> >> > malware > >> >> >> > threats. > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> >> >> > _______________________________________________ > >> >> >> > Pytables-users mailing list > >> >> >> > Pyt...@li... > >> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> >> > > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > ------------------------------------------------------------------------------ > >> >> >> Live Security Virtual Conference > >> >> >> Exclusive live event will cover all the ways today's security and > >> >> >> threat landscape has changed and how IT managers can respond. > >> >> >> Discussions > >> >> >> will include endpoint security, mobile security and the latest in > >> >> >> malware > >> >> >> threats. > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> >> >> _______________________________________________ > >> >> >> Pytables-users mailing list > >> >> >> Pyt...@li... > >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > -- > >> >> > Jacob Bennett > >> >> > Massachusetts Institute of Technology > >> >> > Department of Electrical Engineering and Computer Science > >> >> > Class of 2014| ben...@mi... > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > ------------------------------------------------------------------------------ > >> >> > Live Security Virtual Conference > >> >> > Exclusive live event will cover all the ways today's security and > >> >> > threat landscape has changed and how IT managers can respond. > >> >> > Discussions > >> >> > will include endpoint security, mobile security and the latest in > >> >> > malware > >> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> >> > _______________________________________________ > >> >> > Pytables-users mailing list > >> >> > Pyt...@li... > >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> >> > > >> >> > >> >> > >> >> > >> >> > ------------------------------------------------------------------------------ > >> >> Live Security Virtual Conference > >> >> Exclusive live event will cover all the ways today's security and > >> >> threat landscape has changed and how IT managers can respond. > >> >> Discussions > >> >> will include endpoint security, mobile security and the latest in > >> >> malware > >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> >> _______________________________________________ > >> >> Pytables-users mailing list > >> >> Pyt...@li... > >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > > >> > > >> > > >> > > >> > -- > >> > Jacob Bennett > >> > Massachusetts Institute of Technology > >> > Department of Electrical Engineering and Computer Science > >> > Class of 2014| ben...@mi... > >> > > >> > > >> > > >> > > ------------------------------------------------------------------------------ > >> > Live Security Virtual Conference > >> > Exclusive live event will cover all the ways today's security and > >> > threat landscape has changed and how IT managers can respond. > >> > Discussions > >> > will include endpoint security, mobile security and the latest in > >> > malware > >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> > _______________________________________________ > >> > Pytables-users mailing list > >> > Pyt...@li... > >> > https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > > > -- > > Jacob Bennett > > Massachusetts Institute of Technology > > Department of Electrical Engineering and Computer Science > > Class of 2014| ben...@mi... > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |