pytables-users Mailing List for PyTables - Hierarchical datasets (Page 15)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 10/31/12 10:12 AM, Andrea Gavana wrote:
> Hi Francesc & All,
>
> On 31 October 2012 14:13, Francesc Alted wrote:
>> On 10/31/12 4:30 AM, Andrea Gavana wrote:
>>> Thank you for all your suggestions. I managed to slightly modify the
>>> script you attached and I am also experimenting with compression.
>>> However, in the newly attached script the underlying table is not
>>> modified, i.e., this assignment:
>>>
>>> for p in table:
>>>       p['results'][:NUM_SIM, :, :] = numpy.random.random(size=(NUM_SIM,
>>> len(ALL_DATES), 7))
>>>       table.flush()
>> For modifying row values you need to assign a complete row object.
>> Something like:
>>
>> for i in range(len(table)):
>>       myrow = table[i]
>>       myrow['results'][:NUM_SIM, :, :] =
>> numpy.random.random(size=(NUM_SIM, len(ALL_DATES), 7))
>>       table[i] = myrow
>>
>> You may also use Table.modifyColumn() for better efficiency.  Look at
>> the different modification methods here:
>>
>> http://pytables.github.com/usersguide/libref/structured_storage.html#table-methods-writing
>>
>> and experiment with them.
> Thank you, I have tried different approaches and they all seem to run
> more or less at the same speed (see below). I had to slightly modify
> your code from:
>
> table[i] = myrow
>
> to
>
> table[i] = [myrow]
>
> To avoid exceptions.
>
> In the newly attached file, I switched to blosc for compression (but
> with compression level 1) and run a few sensitivities. By calling the
> attached script as:
>
> python pytables_test.py NUM_SIM
>
> where "NUM_SIM" is an integer, I get the following timings and file sizes:
>
> C:\MyProjects\Phaser\tests>python pytables_test.py 10
> Number of simulations   : 10
> H5 file creation time   : 0.879s
> Saving results for table: 6.413s
> H5 file size (MB)       : 193
>
>
> C:\MyProjects\Phaser\tests>python pytables_test.py 100
> Number of simulations   : 100
> H5 file creation time   : 4.155s
> Saving results for table: 86.326s
> H5 file size (MB)       : 1935
>
>
> I dont think I will try the 1,000 simulations case :-) . I believe I
> still don't understand what the best strategy would be for my problem.
> I basically need to save all the simulation results for all the 1,200
> "objects", each of which has a timeseries matrix of 600x7 size. In the
> GUI I have, these 1,200 "objects" are grouped into multiple
> categories, and multiple categories can reference the same "object",
> i.e.:
>
> Category_1: object_1, object_23, object_543, etc...
> Category_2: object_23, object_100, object_543, etc...
>
> So my idea was to save all the "objects" results to disk and, upon the
> user's choice, build the categories results "on the fly", i.e. by
> seeking the H5 file on disk for the "objects" belonging to that
> specific category and summing up all their results (over time, i.e.
> the 600 time-steps). Maybe I would be better off with a 4D array
> (NUM_OBJECTS, NUM_SIM, TSTEPS, 7) as a table, but then I will lose the
> ability to reference the "objects" by their names...

You should keep trying experimenting with different approaches and 
discover the one that works for you the best.  Regarding using the 4D 
array as a table, I might be misunderstanding your problem, but you can 
still reference objects by name by using:

row = table.where("name == %s" % my_name)
table[row.nrow] = ...

You may want to index the 'name' column for better performance.

-- 
Francesc Alted

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

pytables-users Mailing List for PyTables - Hierarchical datasets (Page 15)

pytables-users — PyTables users discussion list