[Pytables-users] Re: More on random access writing

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

A Dissabte 11 Octubre 2003 04:23, vareu escriure:
> Francesc,
>
> I have implemented your suggestions to the best of my ability, but the
> resulting code is running very slowly. Could you take a quick look at the
> code and see if I am making any obvious mistakes (other than not using
> Psyco, which I may implement). Thank you again for your time.

Your code looks pretty good. No, the problem is the approach, which is very
inefficient.

I've just made a new Table.copy() method that can do the job much more
efficiently. This method let's you copy one table to another location and
sort it by any column you want (except for String columns, that might be
implemented later on).

Here is and example of use:

import tables

fileh= tables.openFile("data.nobackup/test-big.h5","a")
if hasattr(fileh.root, "newtable"):
    fileh.removeNode(fileh.root, "newtable")
fileh.root.tuple0.copy("newtable", "var3")
fileh.close()

In this example, the /tuple0 table has been copied to /newtable but ordered
by the column "var3" of the Table source.

My preliminary timings shows that you can copy&sort a table with 100.000
rows in 1.2 s and one with 1.000.000 in 27 s (albeit this depends on how
much your initial list is shuffled). So, the increment in time is not linear
(maybe more similar to n*sqrt(n)), and if you want to sort a table with
20.000.000 entries, that could take little more than 40 minutes (in a
similar processor than mine, a P4 @ 2 GHz). Well, it's not fast, but it's
affordable.

You can find the new code in the pytables CVS repository
(http://sourceforge.net/cvs/?group_id=63486). If you are using Windows, you
will need the MSVC 6.0 compiler to install it.

Ah!, I almost forgot that. You will also need to update the HDF5 library at
least to 1.6.0 post4. In the original 1.6.0 there were some bugs that
prevent the new code from running correctly.

Cheers,

-- 
Francesc Alted