pytables-users Mailing List for PyTables - Hierarchical datasets (Page 10)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I have a PyTables file that receives many appends to a Table throughout the
day, the file is opened, a small bit of data is appended, and the file is
closed. The open/append/close can happen many times in a minute. Anywhere
from 1-500 rows are appended at any given time. By the end of the day, this
file is expected to have roughly 66000 rows. Chunkshape is set to 1500 for
no particular reason (doesn't seem to make a difference, and some other
files can be 5 million/day). BLOSC with lvl 9 compression is used on the
table. Data is never deleted from the table. There are roughly 12 columns
on the Table.

The problem is that at the end of the day this file is 1GB in size. I don't
understand why the file is growing so big. The tbl.size_on_disk shows a
meager 20MB.

I have used ptrepack with --keep-source-filters and --chunkshape=keep. The
new file is only 30MB in size which is reasonable.
I have also used ptrepack with --chunkshape=auto and although it set the
chunkshape to around 388, there was no significant change in filesize from
chunkshape of 1500.

Is pytables not re-using chunks on new appends. When 50 rows are appended,
is it still writing a chunk sized for 1500 rows. When the next append comes
along, it writes a brand new chunk instead of opening the old chunk and
appending the data?

Should my chunksize really be "expected rows to append each time" instead
of "expected total rows"?

--
Thadeus

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

pytables-users Mailing List for PyTables - Hierarchical datasets (Page 10)

pytables-users — PyTables users discussion list