From: Anthony S. <sc...@gm...> - 2012-06-28 15:16:42
|
Hi Jacob, This is not a solely PyTables issue. As described the methods you mention all involve attribute (or metadata) access, which is notaoriously slow in HDF5. Or rather, much slower that read/write from the datasets (Tables, Arrays) themselves. Generally, having a single table with 3E8 rows will be faster than searching through 3E3 tables with 1E5 rows. If there is any way you can represent you data in a sane way to have larger tables, I would recommend that you try this. The other option too is to simply have an initialization step where you create the all of the tables and then another loop where you append to all of them, rather than searching through 3000 tables 3000 times. For example: for i in range(3000): f.root.createTable("i" + str(i)) for i in range(3000): tab = f.getNode("/i" + str(i)) tab.append(...) In the above pseudocode, __contains__ is never called - let alone calling it 3 times, like in your previous email. In effect the time that you are spending searching in your previous email is 3000 tables x 3000 loop iterations times 3 if-else branches. So you are automatically in a 9 - 27 million iteration, just by the way you have been using contains. I really think that pre-creating the tables so that you *know* that they are there and just have to get the nodes will be far faster for you. Be Well Anthony On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jac...@gm...>wrote: > Hello PyTables Users, > > I am asking this quick question because my application is currently > horribly bottlenecking on these methods, all of which are called once > before each Table.append(rows). The table writing on the other hand is > much, much faster than the searching for the table. > > Any general discussion on this would be great. The current hierarchy > consists of root leading to around 3000 nodes each of which have around > 100000 rows. > > Thanks, > Jacob > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |