From: Anthony S. <sc...@gm...> - 2012-06-28 15:57:19
|
On Thu, Jun 28, 2012 at 10:41 AM, Jacob Bennett <jac...@gm...>wrote: > Hey Anthony, > > Awesome, I think I'm going to take your advice for aiming towards larger > tables. Just an inquiry though, let's say you keep track of a > dictionary/hashtable that maps node identifiers (keys) to instances of the > node object (values) which can be assigned during node creation. ie* > mydict['id'] = thisFile.createTable(params). I think this could actually > help get away from the expensive search calls. Yup. This would probably help a lot. I hadn't even considered it. I guess you learn something new everyday ;) > I'm still going to go with larger tables though, since I have to read the > data eventually. Sounds good! Fee free to ask further questions here! Be Well Anthony > > Thanks Again For Your Time, > Jacob > > > On Thu, Jun 28, 2012 at 10:16 AM, Anthony Scopatz <sc...@gm...>wrote: > >> Hi Jacob, >> >> This is not a solely PyTables issue. As described the methods you >> mention all involve attribute (or metadata) access, which is notaoriously >> slow in HDF5. Or rather, much slower that read/write from the datasets >> (Tables, Arrays) themselves. Generally, having a single table with 3E8 >> rows will be faster than searching through 3E3 tables with 1E5 rows. If >> there is any way you can represent you data in a sane way to have larger >> tables, I would recommend that you try this. >> >> The other option too is to simply have an initialization step where you >> create the all of the tables and then another loop where you append to all >> of them, rather than searching through 3000 tables 3000 times. For >> example: >> >> for i in range(3000): >> f.root.createTable("i" + str(i)) >> >> for i in range(3000): >> tab = f.getNode("/i" + str(i)) >> tab.append(...) >> >> In the above pseudocode, __contains__ is never called - let alone calling >> it 3 times, like in your previous email. In effect the time that you are >> spending searching in your previous email is 3000 tables x 3000 loop >> iterations times 3 if-else branches. So you are automatically in a 9 - >> 27 million iteration, just by the way you have been using contains. >> >> I really think that pre-creating the tables so that you *know* that they >> are there and just have to get the nodes will be far faster for you. >> >> Be Well >> Anthony >> >> On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jac...@gm... >> > wrote: >> >>> Hello PyTables Users, >>> >>> I am asking this quick question because my application is currently >>> horribly bottlenecking on these methods, all of which are called once >>> before each Table.append(rows). The table writing on the other hand is >>> much, much faster than the searching for the table. >>> >>> Any general discussion on this would be great. The current hierarchy >>> consists of root leading to around 3000 nodes each of which have around >>> 100000 rows. >>> >>> Thanks, >>> Jacob >>> >>> -- >>> Jacob Bennett >>> Massachusetts Institute of Technology >>> Department of Electrical Engineering and Computer Science >>> Class of 2014| ben...@mi... >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |