[Pytables-users] Possible bug in ptrepack tool

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

*Hi,*

Couple days ago, I make some experiments with pytables. I was curious about
reading and writing speed for my future project.
So, I decided make some tests. In my hdf5 files I have only one table named
*Table_1*. I started tests with one million rows and after that keep
continue testing with 100 000 000 and 500 000 000. This is how looks table
structure:

/Table_1 (Table(500000000,)) ''
  description := {
  "Device_ID": StringCol(itemsize=14, shape=(), dflt='', pos=0),
  "DateTime": Time32Col(shape=(), dflt=0, pos=1),
  "Value": Float32Col(shape=(), dflt=0.0, pos=2),
  "Status": StringCol(itemsize=10, shape=(), dflt='', pos=3)}
  byteorder := 'little'
  chunkshape := (2048,0)
  autoIndex := True
  colindexes := {
    "DateTime": Index(9, full, shuffle, zlib(1)).is_CSI=True}

I didn't change chunkshape (default from creating table
chunkshape=(2048,0)). Only thing I did is creating index on column
DateTime. Everything worked fine. But, after 500 000 000 rows, I decide
compare this table and table whith chunkshape=(65536). So I copy this table
in other hdf5 file using ptrepack tool:

ptrepack --chunkshape='(65536,0)' /home/azura/a.h5:/Table_1
/home/azura/b.h5:/

My new table work fine until I create index (CSIndex()) on DateTime column.
Index creation was successful, but calling methods as *where(),
getWhereList()* throws following exception:

query = '(DateTime > 1293836400.0) & (DateTime < 1297292400.0)'
a = numpy.array([ (x['Device_ID'],x['DateTime'],x['Value']) for x in
tbl.where(query) ])
Traceback (most recent call last):
  File "<pyshell#100>", line 1, in <module>
    a = numpy.array([ (x['Device_ID'],x['DateTime'],x['Value']) for x in
tbl.where(query) ])
  File "tableExtension.pyx", line 858, in
tables.tableExtension.Row.__next__ (tables/tableExtension.c:7788)
  File "tableExtension.pyx", line 879, in
tables.tableExtension.Row.__next__indexed (tables/tableExtension.c:7922)
AssertionError

Then I decide make same table without ptrepack tool. So I created new table
and fill with 500 000 000 rows (same chunkshape, same record structure).
Everythings works fine, so my conclusion is that there is a bug in ptrepack
tool. Note that exception appear in copied table after creating CS index.
I'm just curious about this. What can be wrong?

I'm using Ubuntu 12.04TLS with ext4
Processor: Intel® Core™ i3 CPU M 380 @ 2.53GHz × 4
RAM: 4GB
HARD DISK: 500GB

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version:  2.3.1
HDF5 version:      1.8.4-patch1
NumPy version:     1.6.0
Numexpr version:   2.0.1 (not using Intel's VML/MKL)
Zlib version:      1.2.3.4 (in Python interpreter)
Blosc version:     1.1.2 (2010-11-04)
Cython version:    0.16
Python version:    2.7.3 (default, Apr 20 2012, 22:44:07)
[GCC 4.6.3]
Platform:          linux2-i686
Byte-ordering:     little
Detected cores:    4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

File(filename=/home/azura/b.h5, title='', mode='a', rootUEP='/',
filters=Filters(complevel=0, shuffle=False, fletcher32=False))
/ (RootGroup) ''
/Table_1 (Table(500000000,)) ''
  description := {
  "Device_ID": StringCol(itemsize=14, shape=(), dflt='', pos=0),
  "DateTime": Time32Col(shape=(), dflt=0, pos=1),
  "Value": Float32Col(shape=(), dflt=0.0, pos=2),
  "Status": StringCol(itemsize=10, shape=(), dflt='', pos=3)}
  byteorder := 'little'
  chunkshape := (65536,)
  autoIndex := True
  colindexes := {
    "DateTime": Index(9, full, shuffle, zlib(1)).is_CSI=True}

*Cheers!*