From: Francesc A. <fa...@op...> - 2003-10-14 19:24:40
|
A Dissabte 11 Octubre 2003 04:23, vareu escriure: > Francesc, > > I have implemented your suggestions to the best of my ability, but the > resulting code is running very slowly. Could you take a quick look at the > code and see if I am making any obvious mistakes (other than not using > Psyco, which I may implement). Thank you again for your time. Your code looks pretty good. No, the problem is the approach, which is very inefficient. I've just made a new Table.copy() method that can do the job much more efficiently. This method let's you copy one table to another location and sort it by any column you want (except for String columns, that might be implemented later on). Here is and example of use: import tables fileh= tables.openFile("data.nobackup/test-big.h5","a") if hasattr(fileh.root, "newtable"): fileh.removeNode(fileh.root, "newtable") fileh.root.tuple0.copy("newtable", "var3") fileh.close() In this example, the /tuple0 table has been copied to /newtable but ordered by the column "var3" of the Table source. My preliminary timings shows that you can copy&sort a table with 100.000 rows in 1.2 s and one with 1.000.000 in 27 s (albeit this depends on how much your initial list is shuffled). So, the increment in time is not linear (maybe more similar to n*sqrt(n)), and if you want to sort a table with 20.000.000 entries, that could take little more than 40 minutes (in a similar processor than mine, a P4 @ 2 GHz). Well, it's not fast, but it's affordable. You can find the new code in the pytables CVS repository (http://sourceforge.net/cvs/?group_id=63486). If you are using Windows, you will need the MSVC 6.0 compiler to install it. Ah!, I almost forgot that. You will also need to update the HDF5 library at least to 1.6.0 post4. In the original 1.6.0 there were some bugs that prevent the new code from running correctly. Cheers, -- Francesc Alted |