From: Aquil H. A. <aqu...@gm...> - 2012-06-26 21:19:43
|
Hello All, In my newbist state, I called createIndex on two columns in one of my tables: import tables table_desc = {'timestamp':tables.Time32Col(), 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()} h5f = tables.openFile('test.h5',mode='w') group = h5f.createGroup('/','data') table = h5f.createTable(group, 'test',table_desc,'Test Table') table.cols.timestamp.createIndex() table.cols.symbol.createIndex() … Now from what I've been able to find on the internet an index is only associated with one column: class tables.Index Represents the index of a column in a table. This class is used to keep the indexing information for columns in a Table dataset (see The Table class). It is actually the descendant of the Group class (see The Group class), with some added functionality. An Index is always associated with one and only one column in a table. - PyTables 2.3.1 User's Guide - Library Reference/The Index Class http://pytables.github.com/usersguide/libref.html#indexclassdescr - Efficient way to verify that records are unique in Python/PyTables http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables - Hints For SQL Users (Creating an index) http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex So how does PyTables interpret a table with multiple column indices? The best solution that I've found is creating a hash from the two fields that I am interested in indexing and then indexing that table on that hash. The other solution would be to shard my data by symbol and then index each symbol table by timestamp. Can anyone explain what effect two index columns has on Pytables? Also, can anyone tell me if they've come up with a better solution for dealing with tables that require multiple indices than any that I've mentioned? Regards, -- Aquil H. Abdullah |