[Pytables-users] Append only file is growing in size like crazy

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I have a PyTables file that receives many appends to a Table throughout the
day, the file is opened, a small bit of data is appended, and the file is
closed. The open/append/close can happen many times in a minute. Anywhere
from 1-500 rows are appended at any given time. By the end of the day, this
file is expected to have roughly 66000 rows. Chunkshape is set to 1500 for
no particular reason (doesn't seem to make a difference, and some other
files can be 5 million/day). BLOSC with lvl 9 compression is used on the
table. Data is never deleted from the table. There are roughly 12 columns
on the Table.

The problem is that at the end of the day this file is 1GB in size. I don't
understand why the file is growing so big. The tbl.size_on_disk shows a
meager 20MB.

I have used ptrepack with --keep-source-filters and --chunkshape=keep. The
new file is only 30MB in size which is reasonable.
I have also used ptrepack with --chunkshape=auto and although it set the
chunkshape to around 388, there was no significant change in filesize from
chunkshape of 1500.

Is pytables not re-using chunks on new appends. When 50 rows are appended,
is it still writing a chunk sized for 1500 rows. When the next append comes
along, it writes a brand new chunk instead of opening the old chunk and
appending the data?

Should my chunksize really be "expected rows to append each time" instead
of "expected total rows"?

--
Thadeus