Re: [Pytables-users] How Fast is File.__contains, File.getNode, File.createTable?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hey Anthony,

Awesome, I think I'm going to take your advice for aiming towards larger
tables. Just an inquiry though, let's say you keep track of a
dictionary/hashtable that maps node identifiers (keys) to instances of the
node object (values) which can be assigned during node creation. ie*
mydict['id'] = thisFile.createTable(params). I think this could actually
help get away from the expensive search calls. I'm still going to go with
larger tables though, since I have to read the data eventually.

Thanks Again For Your Time,
Jacob

On Thu, Jun 28, 2012 at 10:16 AM, Anthony Scopatz <sc...@gm...> wrote:

> Hi Jacob,
>
> This is not a solely PyTables issue.  As described the methods you mention
> all involve attribute (or metadata) access, which is notaoriously slow in
> HDF5.  Or rather, much slower that read/write from the datasets (Tables,
> Arrays) themselves.    Generally, having a single table with 3E8 rows will
> be faster than searching through 3E3 tables with 1E5 rows.    If there is
> any way you can represent you data in a sane way to have larger tables, I
> would recommend that you try this.
>
> The other option too is to simply have an initialization step where you
> create the all of the tables and then another loop where you append to all
> of them, rather than searching through 3000 tables 3000 times.   For
> example:
>
> for i in range(3000):
>     f.root.createTable("i" + str(i))
>
> for i in range(3000):
>     tab = f.getNode("/i" + str(i))
>     tab.append(...)
>
> In the above pseudocode, __contains__ is never called - let alone calling
> it 3 times, like in your previous email.  In effect the time that you are
> spending searching in your previous email is 3000 tables x 3000 loop
> iterations times 3 if-else branches.    So you are automatically in a 9 -
> 27 million iteration, just by the way you have been using contains.
>
> I really think that pre-creating the tables so that you *know* that they
> are there and just have to get the nodes will be far faster for you.
>
> Be Well
> Anthony
>
> On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jac...@gm...>wrote:
>
>> Hello PyTables Users,
>>
>> I am asking this quick question because my application is currently
>> horribly bottlenecking on these methods, all of which are called once
>> before each Table.append(rows). The table writing on the other hand is
>> much, much faster than the searching for the table.
>>
>> Any general discussion on this would be great. The current hierarchy
>> consists of root leading to around 3000 nodes each of which have around
>> 100000 rows.
>>
>> Thanks,
>> Jacob
>>
>> --
>> Jacob Bennett
>> Massachusetts Institute of Technology
>> Department of Electrical Engineering and Computer Science
>> Class of 2014| ben...@mi...
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...