From: nikola s. <nid...@gm...> - 2012-05-18 10:01:10
|
*Hi,* Couple days ago, I make some experiments with pytables. I was curious about reading and writing speed for my future project. So, I decided make some tests. In my hdf5 files I have only one table named *Table_1*. I started tests with one million rows and after that keep continue testing with 100 000 000 and 500 000 000. This is how looks table structure: /Table_1 (Table(500000000,)) '' description := { "Device_ID": StringCol(itemsize=14, shape=(), dflt='', pos=0), "DateTime": Time32Col(shape=(), dflt=0, pos=1), "Value": Float32Col(shape=(), dflt=0.0, pos=2), "Status": StringCol(itemsize=10, shape=(), dflt='', pos=3)} byteorder := 'little' chunkshape := (2048,0) autoIndex := True colindexes := { "DateTime": Index(9, full, shuffle, zlib(1)).is_CSI=True} I didn't change chunkshape (default from creating table chunkshape=(2048,0)). Only thing I did is creating index on column DateTime. Everything worked fine. But, after 500 000 000 rows, I decide compare this table and table whith chunkshape=(65536). So I copy this table in other hdf5 file using ptrepack tool: ptrepack --chunkshape='(65536,0)' /home/azura/a.h5:/Table_1 /home/azura/b.h5:/ My new table work fine until I create index (CSIndex()) on DateTime column. Index creation was successful, but calling methods as *where(), getWhereList()* throws following exception: query = '(DateTime > 1293836400.0) & (DateTime < 1297292400.0)' a = numpy.array([ (x['Device_ID'],x['DateTime'],x['Value']) for x in tbl.where(query) ]) Traceback (most recent call last): File "<pyshell#100>", line 1, in <module> a = numpy.array([ (x['Device_ID'],x['DateTime'],x['Value']) for x in tbl.where(query) ]) File "tableExtension.pyx", line 858, in tables.tableExtension.Row.__next__ (tables/tableExtension.c:7788) File "tableExtension.pyx", line 879, in tables.tableExtension.Row.__next__indexed (tables/tableExtension.c:7922) AssertionError Then I decide make same table without ptrepack tool. So I created new table and fill with 500 000 000 rows (same chunkshape, same record structure). Everythings works fine, so my conclusion is that there is a bug in ptrepack tool. Note that exception appear in copied table after creating CS index. I'm just curious about this. What can be wrong? I'm using Ubuntu 12.04TLS with ext4 Processor: Intel® Core™ i3 CPU M 380 @ 2.53GHz × 4 RAM: 4GB HARD DISK: 500GB -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= PyTables version: 2.3.1 HDF5 version: 1.8.4-patch1 NumPy version: 1.6.0 Numexpr version: 2.0.1 (not using Intel's VML/MKL) Zlib version: 1.2.3.4 (in Python interpreter) Blosc version: 1.1.2 (2010-11-04) Cython version: 0.16 Python version: 2.7.3 (default, Apr 20 2012, 22:44:07) [GCC 4.6.3] Platform: linux2-i686 Byte-ordering: little Detected cores: 4 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= File(filename=/home/azura/b.h5, title='', mode='a', rootUEP='/', filters=Filters(complevel=0, shuffle=False, fletcher32=False)) / (RootGroup) '' /Table_1 (Table(500000000,)) '' description := { "Device_ID": StringCol(itemsize=14, shape=(), dflt='', pos=0), "DateTime": Time32Col(shape=(), dflt=0, pos=1), "Value": Float32Col(shape=(), dflt=0.0, pos=2), "Status": StringCol(itemsize=10, shape=(), dflt='', pos=3)} byteorder := 'little' chunkshape := (65536,) autoIndex := True colindexes := { "DateTime": Index(9, full, shuffle, zlib(1)).is_CSI=True} *Cheers!* |