|
From: Thadeus B. <tha...@th...> - 2013-03-07 23:27:22
|
I have a PyTables file that receives many appends to a Table throughout the day, the file is opened, a small bit of data is appended, and the file is closed. The open/append/close can happen many times in a minute. Anywhere from 1-500 rows are appended at any given time. By the end of the day, this file is expected to have roughly 66000 rows. Chunkshape is set to 1500 for no particular reason (doesn't seem to make a difference, and some other files can be 5 million/day). BLOSC with lvl 9 compression is used on the table. Data is never deleted from the table. There are roughly 12 columns on the Table. The problem is that at the end of the day this file is 1GB in size. I don't understand why the file is growing so big. The tbl.size_on_disk shows a meager 20MB. I have used ptrepack with --keep-source-filters and --chunkshape=keep. The new file is only 30MB in size which is reasonable. I have also used ptrepack with --chunkshape=auto and although it set the chunkshape to around 388, there was no significant change in filesize from chunkshape of 1500. Is pytables not re-using chunks on new appends. When 50 rows are appended, is it still writing a chunk sized for 1500 rows. When the next append comes along, it writes a brand new chunk instead of opening the old chunk and appending the data? Should my chunksize really be "expected rows to append each time" instead of "expected total rows"? -- Thadeus |