From: Jacob B. <jac...@gm...> - 2012-06-28 15:41:37
|
Hey Anthony, Awesome, I think I'm going to take your advice for aiming towards larger tables. Just an inquiry though, let's say you keep track of a dictionary/hashtable that maps node identifiers (keys) to instances of the node object (values) which can be assigned during node creation. ie* mydict['id'] = thisFile.createTable(params). I think this could actually help get away from the expensive search calls. I'm still going to go with larger tables though, since I have to read the data eventually. Thanks Again For Your Time, Jacob On Thu, Jun 28, 2012 at 10:16 AM, Anthony Scopatz <sc...@gm...> wrote: > Hi Jacob, > > This is not a solely PyTables issue. As described the methods you mention > all involve attribute (or metadata) access, which is notaoriously slow in > HDF5. Or rather, much slower that read/write from the datasets (Tables, > Arrays) themselves. Generally, having a single table with 3E8 rows will > be faster than searching through 3E3 tables with 1E5 rows. If there is > any way you can represent you data in a sane way to have larger tables, I > would recommend that you try this. > > The other option too is to simply have an initialization step where you > create the all of the tables and then another loop where you append to all > of them, rather than searching through 3000 tables 3000 times. For > example: > > for i in range(3000): > f.root.createTable("i" + str(i)) > > for i in range(3000): > tab = f.getNode("/i" + str(i)) > tab.append(...) > > In the above pseudocode, __contains__ is never called - let alone calling > it 3 times, like in your previous email. In effect the time that you are > spending searching in your previous email is 3000 tables x 3000 loop > iterations times 3 if-else branches. So you are automatically in a 9 - > 27 million iteration, just by the way you have been using contains. > > I really think that pre-creating the tables so that you *know* that they > are there and just have to get the nodes will be far faster for you. > > Be Well > Anthony > > On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jac...@gm...>wrote: > >> Hello PyTables Users, >> >> I am asking this quick question because my application is currently >> horribly bottlenecking on these methods, all of which are called once >> before each Table.append(rows). The table writing on the other hand is >> much, much faster than the searching for the table. >> >> Any general discussion on this would be great. The current hierarchy >> consists of root leading to around 3000 nodes each of which have around >> 100000 rows. >> >> Thanks, >> Jacob >> >> -- >> Jacob Bennett >> Massachusetts Institute of Technology >> Department of Electrical Engineering and Computer Science >> Class of 2014| ben...@mi... >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |