Thread: [Pytables-users] Large (to very large) datasets...

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi All,

    I am pretty new to pytables and I am facing a problem of actually
storing and retrieving data to/from a large dataset. My situation is
the following:

1. I am running stochastic simulations of a number of objects
(typically between 100-1,000 simulations);
2. For every simulation, I have around 1,200 "objects", and for each
of them I have 7 timeseries of 600 time-steps each.

I thought of using pytables to try and get some sense out of my
simulations, but I am failing to implement something intelligent (or
fast, which is important as well...).

The attached script (modified from the pytables tutorial) does the following:

1. Create a table containing these "objects";
2. Adds 1,200 rows, one per "object": for each "object", I assign a 3D
array defined as:

results = Float32Col(shape=(NUM_SIM, len(ALL_DATES), 7))

Where NUM_SIM is the number of simulations and ALL_DATES are the timesteps.

3. For every simulation, I update the "object" results (using random
numbers in the script).

The timings on my computer are as follows (in seconds):

H5 file creation time: 22.510

Saving results for simulation 1   : 3.33599996567
Saving results for simulation 2   : 6.2429997921
Saving results for simulation 3   : 9.15199995041
Saving results for simulation 4   : 12.0759999752
Saving results for simulation 5   : 15.2199997902
Saving results for simulation 6   : 17.9159998894
Saving results for simulation 7   : 21.0659999847
Saving results for simulation 8   : 23.6459999084
Saving results for simulation 9   : 26.5359997749
Saving results for simulation 10  : 29.5579998493

As you can see, at every simulation the processing time increases by 3
seconds, so by the time I get to 100 or 1,000 I will have more than
enough time for 15 coffees in the morning :-D
Also, the file creation time is somewhat on the slow side...

I am sure I am missing a lot of things here, so I would appreciate any
suggestion to implement my code in a better/more intelligent way (and
also suggestions on other approaches in order to do what I am trying
to do).

Thank you in advance for your suggestions.

Andrea.

"Imagination Is The Only Weapon In The War Against Reality."
http://www.infinity77.net

Thread: [Pytables-users] Large (to very large) datasets...

pytables-users