pytables-users Mailing List for PyTables - Hierarchical datasets (Page 29)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi, a minor update on this thread

>> * a bool array of 10**8 elements with True in two separate slices of
>> length 10**6 each compresses by ~350. Using .wheretrue to obtain
>> indices is faster by a factor of 2 to 3 than np.nonzero(normal numpy
>> array). The resulting filesize is 248kb, still far from storing the 4
>> or 6 integer indexes that define the slices (I am experimenting with
>> an approach for scientific databases where this is a concern).
>
> Oh, you were asking for a 8 to 1 compressor (booleans as bits), but
> apparently a 350 to 1 is not enough? :)

Here I expected more from a run-length-like compression scheme. My
array would be compressible to the following representation:

(0, x) : 0
(x, x+10**6) : 1
(x+10**6, y) : 0
(y, y+10**6) : 1
(y+10**6, 10**8) : 0

or just:
(x, x+10**6) : 1
(y, y+10**6) : 1

where x and y are two reasonable integers (i.e. in range and with no overlap).

>> * how blosc choses the chunklen is black magic for me, but it seems to
>> be quite spot-on. (e.g. it changed from '1' for a 64x15M array to
>> 64*1024 when CArraying only one row).
>
> Uh?  You mean 1 byte as a blocksize?  This is certainly a bug.  Could
> you detail a bit more how you achieve this result?  Providing an example
> would be very useful.

I revisited this issue. While in PyTables CArray the guesses are
reasonable, the problem is in carray.carray (or in its reporting of
chunklen).

This is the offender:
carray((64, 15600000), int16)  nbytes: 1.86 GB; cbytes: 1.04 GB; ratio: 1.78
  cparams := cparams(clevel=5, shuffle=True)

In [87]: x.chunklen
Out[87]: 1

Could it be that carray is not reporting the second dimension of the
chunkshape? (in PyTables, this is 262144)

The fact that both PyTable's CArray and carray.carray are named carray
is a bit confusing.

>
> --
> Francesc Alted
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

pytables-users Mailing List for PyTables - Hierarchical datasets (Page 29)

pytables-users — PyTables users discussion list