From: Alvaro T. C. <al...@mi...> - 2012-04-18 17:33:31
|
A single array with 312 000 000 int 16 values. Two (uncompressed) ways to store it: * Array >>> wa02[:10] array([306, 345, 353, 335, 345, 345, 356, 341, 338, 357], dtype=int16 * Table wtab02 (single column, named 'val') >>> wtab02[:10] array([(306,), (345,), (353,), (335,), (345,), (345,), (356,), (341,), (338,), (357,)], dtype=[('val', '<i2')]) read time respectively 120 ms, 220 ms. >>> timeit big=np.nonzero(wa02[:]>1) 1 loops, best of 3: 1.66 s per loop >>> timeit bigtab=wtab02.getWhereList('val>1') 1 loops, best of 3: 119 s per loop with a Complete Sorted Index on val and blosc9 compression: 1 loops, best of 3: 149 s per loop indicating expectedrows=312 000 000 (so that chunklen goes from 32K to 132K) 1 loops, best of 3: 119 s per loop (I wanted to compare getting a boolean mask, but it seems that Tables don't have a .wheretrue like carrays in Francesc's carray package (?). For reference just the mask times to 344 ms). --- Question: the difference in speed is due to in-core vs out-of-core? If so, and if maximum unit of data fits in memory (even considering loading a few columns to operate among them) -> is the corollary is 'stay in memory at all costs'? With this exercise, I was trying to find out what is the best structure to hold raw data (just one col in this case), and whether indexing could help in queries. -á. |
From: Anthony S. <sc...@gm...> - 2012-04-18 18:02:33
|
Hello Alvaro, What are the timings using the normal where() method? http://pytables.github.com/usersguide/libref.html?highlight=where#tables.Table.where Be Well Anthony On Wed, Apr 18, 2012 at 12:33 PM, Alvaro Tejero Cantero <al...@mi...>wrote: > A single array with 312 000 000 int 16 values. > > Two (uncompressed) ways to store it: > > * Array > > >>> wa02[:10] > array([306, 345, 353, 335, 345, 345, 356, 341, 338, 357], dtype=int16 > > * Table wtab02 (single column, named 'val') > >>> wtab02[:10] > array([(306,), (345,), (353,), (335,), (345,), (345,), (356,), (341,), > (338,), (357,)], > dtype=[('val', '<i2')]) > > read time respectively 120 ms, 220 ms. > > >>> timeit big=np.nonzero(wa02[:]>1) > 1 loops, best of 3: 1.66 s per loop > > >>> timeit bigtab=wtab02.getWhereList('val>1') > 1 loops, best of 3: 119 s per loop > > with a Complete Sorted Index on val and blosc9 compression: > 1 loops, best of 3: 149 s per loop > > indicating expectedrows=312 000 000 (so that chunklen goes from 32K to > 132K) > 1 loops, best of 3: 119 s per loop > > (I wanted to compare getting a boolean mask, but it seems that Tables > don't have a .wheretrue like carrays in Francesc's carray package (?). > For reference just the mask times to 344 ms). > > --- > > Question: the difference in speed is due to in-core vs out-of-core? > > If so, and if maximum unit of data fits in memory (even considering > loading a few columns to operate among them) -> is the corollary is > 'stay in memory at all costs'? > > With this exercise, I was trying to find out what is the best > structure to hold raw data (just one col in this case), and whether > indexing could help in queries. > > -á. > > > ------------------------------------------------------------------------------ > Better than sec? Nothing is better than sec when it comes to > monitoring Big Data applications. Try Boundary one-second > resolution app monitoring today. Free. > http://p.sf.net/sfu/Boundary-dev2dev > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Alvaro T. C. <al...@mi...> - 2012-04-19 11:46:29
|
where will give me an iterator over the /values/; in this case I wanted the indexes. Plus, it will give me an iterator, so it will be trivially fast. Are you interested in the timings of where + building a list? or where + building an array? -á. On Wed, Apr 18, 2012 at 19:02, Anthony Scopatz <sc...@gm...> wrote: > |
From: Alvaro T. C. <al...@mi...> - 2012-04-19 13:43:59
|
Some complementary info (I copy the details of the tables below) timeit vals = numpy.fromiter((x['val'] for x in my.root.raw.t0.wtab02.where('val>1')),dtype=np.int16) 1 loops, best of 3: 30.4 s per loop Using the compressed and indexed version, it mysteriously does not work (output is empty list) >>> cvals = np.fromiter((x['val'] for x in wctab02.where('val>1')), dtype=np.int16) >>> cvals array([], dtype=int16) But it does if we skip using where ( I don't print cvals, but it is correct ) >>> timeit cvals = np.fromiter((x['val'] for x in wctab02 if x['val']>1), dtype=np.int16) 1 loops, best of 3: 54.8 s per loop (the version with longer chunklen works fine and times to 30.7s). -á. wtab02: not compressed, not indexed, small chunklen: /raw/t0/wtab02 (Table(312000000,)) '' description := { "val": Int16Col(shape=(), dflt=0, pos=0)} byteorder := 'little' chunkshape := (32768,) larger chunklen (as calculated from expectedrows=312000000) /raw/t0/wcetab02 (Table(312000000,)) 'test' description := { "val": Int16Col(shape=(), dflt=0, pos=0)} byteorder := 'little' chunkshape := (131072,) wctab02: compressed, with CSI index /raw/t0/wctab02 (Table(312000000,), shuffle, blosc(9)) 'test' description := { "val": Int16Col(shape=(), dflt=0, pos=0)} byteorder := 'little' chunkshape := (32768,) autoIndex := True colindexes := { "val": Index(9, full, shuffle, zlib(1)).is_CSI=True} On Thu, Apr 19, 2012 at 12:46, Alvaro Tejero Cantero <al...@mi...> wrote: > where will give me an iterator over the /values/; in this case I > wanted the indexes. Plus, it will give me an iterator, so it will be > trivially fast. > > Are you interested in the timings of where + building a list? or where > + building an array? > > > -á. > > > > On Wed, Apr 18, 2012 at 19:02, Anthony Scopatz <sc...@gm...> wrote: >> |
From: Anthony S. <sc...@gm...> - 2012-04-19 14:33:38
|
I was interested in how long it takes to iterate, since this is arguably where the majority of the time is spent. On Thu, Apr 19, 2012 at 8:43 AM, Alvaro Tejero Cantero <al...@mi...>wrote: > Some complementary info (I copy the details of the tables below) > > timeit vals = numpy.fromiter((x['val'] for x in > my.root.raw.t0.wtab02.where('val>1')),dtype=np.int16) > 1 loops, best of 3: 30.4 s per loop > > > Using the compressed and indexed version, it mysteriously does not > work (output is empty list) > >>> cvals = np.fromiter((x['val'] for x in wctab02.where('val>1')), > dtype=np.int16) > >>> cvals > array([], dtype=int16) > This doesn't work because numpy doesn't accept generators. The following should work: >>> cvals = np.fromiter([x['val'] for x in wctab02.where('val>1')], dtype=np.int16) Also, I am a little concerned that np.nonzero() doesn't really compare to Table.getWhereList('val>1'). Testing for all zero bits *should be* a lot faster than a numeric comparison. Could you instead try the same actual operation in numpy as whereList(): >>> timeit big=np.argwhere(np.greater(wa02[:], 1)) Thanks! Anthony > > But it does if we skip using where ( I don't print cvals, but it is > correct ) > >>> timeit cvals = np.fromiter((x['val'] for x in wctab02 if x['val']>1), > dtype=np.int16) > 1 loops, best of 3: 54.8 s per loop > > (the version with longer chunklen works fine and times to 30.7s). > > > -á. > > wtab02: not compressed, not indexed, small chunklen: > /raw/t0/wtab02 (Table(312000000,)) '' > description := { > "val": Int16Col(shape=(), dflt=0, pos=0)} > byteorder := 'little' > chunkshape := (32768,) > > larger chunklen (as calculated from expectedrows=312000000) > /raw/t0/wcetab02 (Table(312000000,)) 'test' > description := { > "val": Int16Col(shape=(), dflt=0, pos=0)} > byteorder := 'little' > chunkshape := (131072,) > > wctab02: compressed, with CSI index > /raw/t0/wctab02 (Table(312000000,), shuffle, blosc(9)) 'test' > description := { > "val": Int16Col(shape=(), dflt=0, pos=0)} > byteorder := 'little' > chunkshape := (32768,) > autoIndex := True > colindexes := { > "val": Index(9, full, shuffle, zlib(1)).is_CSI=True} > > > > On Thu, Apr 19, 2012 at 12:46, Alvaro Tejero Cantero <al...@mi...> > wrote: > > where will give me an iterator over the /values/; in this case I > > wanted the indexes. Plus, it will give me an iterator, so it will be > > trivially fast. > > > > Are you interested in the timings of where + building a list? or where > > + building an array? > > > > > > -á. > > > > > > > > On Wed, Apr 18, 2012 at 19:02, Anthony Scopatz <sc...@gm...> > wrote: > >> > > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Alvaro T. C. <al...@mi...> - 2012-04-19 16:47:07
|
I have to run, but here's what you requested (I won't be back on this computer until monday) >>> cvals = np.fromiter([x['val'] for x in wctab02.where('val>1')], dtype=np.int16) >>> cvals array([], dtype=int16) >>> timeit big=np.argwhere(np.greater(wa02[:], 1)) 1 loops, best of 3: 15.3 s per loop this gives me a mask, that I can get with >>> big2 = wa02[:]>1 >>> np.alltrue(big == big2) True and in far less time: >>> timeit big2 = wa02[:]>1 1 loops, best of 3: 348 ms per loop -á. /raw/t0/wa02 (Array(312000000,)) '' atom := Int16Atom(shape=(), dflt=0) maindim := 0 flavor := 'numpy' byteorder := 'little' chunkshape := None On Thu, Apr 19, 2012 at 15:33, Anthony Scopatz <sc...@gm...> wrote: > I was interested in how long it takes to iterate, since this is arguably > where the > majority of the time is spent. > > On Thu, Apr 19, 2012 at 8:43 AM, Alvaro Tejero Cantero <al...@mi...> > wrote: >> >> Some complementary info (I copy the details of the tables below) >> >> timeit vals = numpy.fromiter((x['val'] for x in >> my.root.raw.t0.wtab02.where('val>1')),dtype=np.int16) >> 1 loops, best of 3: 30.4 s per loop >> >> >> Using the compressed and indexed version, it mysteriously does not >> work (output is empty list) >> >>> cvals = np.fromiter((x['val'] for x in wctab02.where('val>1')), >> >>> dtype=np.int16) >> >>> cvals >> array([], dtype=int16) > > > This doesn't work because numpy doesn't accept generators. The following > should work: >>>> cvals = np.fromiter([x['val'] for x in wctab02.where('val>1')], >>>> dtype=np.int16) > > Also, I am a little concerned that np.nonzero() doesn't really compare to > Table.getWhereList('val>1'). Testing for all zero bits should be a lot > faster > than a numeric comparison. Could you instead try the same actual operation > in numpy as whereList(): > >>>> timeit big=np.argwhere(np.greater(wa02[:], 1)) > > Thanks! > Anthony > >> >> >> But it does if we skip using where ( I don't print cvals, but it is >> correct ) >> >>> timeit cvals = np.fromiter((x['val'] for x in wctab02 if x['val']>1), >> >>> dtype=np.int16) >> 1 loops, best of 3: 54.8 s per loop >> >> (the version with longer chunklen works fine and times to 30.7s). >> >> >> -á. >> >> wtab02: not compressed, not indexed, small chunklen: >> /raw/t0/wtab02 (Table(312000000,)) '' >> description := { >> "val": Int16Col(shape=(), dflt=0, pos=0)} >> byteorder := 'little' >> chunkshape := (32768,) >> >> larger chunklen (as calculated from expectedrows=312000000) >> /raw/t0/wcetab02 (Table(312000000,)) 'test' >> description := { >> "val": Int16Col(shape=(), dflt=0, pos=0)} >> byteorder := 'little' >> chunkshape := (131072,) >> >> wctab02: compressed, with CSI index >> /raw/t0/wctab02 (Table(312000000,), shuffle, blosc(9)) 'test' >> description := { >> "val": Int16Col(shape=(), dflt=0, pos=0)} >> byteorder := 'little' >> chunkshape := (32768,) >> autoIndex := True >> colindexes := { >> "val": Index(9, full, shuffle, zlib(1)).is_CSI=True} >> >> >> >> On Thu, Apr 19, 2012 at 12:46, Alvaro Tejero Cantero <al...@mi...> >> wrote: >> > where will give me an iterator over the /values/; in this case I >> > wanted the indexes. Plus, it will give me an iterator, so it will be >> > trivially fast. >> > >> > Are you interested in the timings of where + building a list? or where >> > + building an array? >> > >> > >> > -á. >> > >> > >> > >> > On Wed, Apr 18, 2012 at 19:02, Anthony Scopatz <sc...@gm...> >> > wrote: >> >> >> >> >> ------------------------------------------------------------------------------ >> For Developers, A Lot Can Happen In A Second. >> Boundary is the first to Know...and Tell You. >> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! >> http://p.sf.net/sfu/Boundary-d2dvs2 >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Anthony S. <sc...@gm...> - 2012-04-19 17:24:02
|
On Thu, Apr 19, 2012 at 11:46 AM, Alvaro Tejero Cantero <al...@mi...>wrote: > I have to run, but here's what you requested (I won't be back on this > computer until monday) > > >>> cvals = np.fromiter([x['val'] for x in wctab02.where('val>1')], > dtype=np.int16) > >>> cvals > array([], dtype=int16) > Hmmm... > > >>> timeit big=np.argwhere(np.greater(wa02[:], 1)) > 1 loops, best of 3: 15.3 s per loop > > this gives me a mask, argwhere() should not give you a mask. It should give you the coordinates.<http://docs.scipy.org/doc/numpy/reference/generated/numpy.argwhere.html> Also it seems like np.argwhere(np.greater(wa02[:], 1)) and np.argwhere(wa02[:]>1) should run in the same amount of time. At this point though we are just comparing the performance of numpy routines. What we really want is to compare numpy to PyTables. Maybe I'll try playing around with this this weekend. > that I can get with > > >>> big2 = wa02[:]>1 > >>> np.alltrue(big == big2) > True > > and in far less time: > >>> timeit big2 = wa02[:]>1 > 1 loops, best of 3: 348 ms per loop > > > > > -á. > > /raw/t0/wa02 (Array(312000000,)) '' > atom := Int16Atom(shape=(), dflt=0) > maindim := 0 > flavor := 'numpy' > byteorder := 'little' > chunkshape := None > > > On Thu, Apr 19, 2012 at 15:33, Anthony Scopatz <sc...@gm...> wrote: > > I was interested in how long it takes to iterate, since this is arguably > > where the > > majority of the time is spent. > > > > On Thu, Apr 19, 2012 at 8:43 AM, Alvaro Tejero Cantero <al...@mi...> > > wrote: > >> > >> Some complementary info (I copy the details of the tables below) > >> > >> timeit vals = numpy.fromiter((x['val'] for x in > >> my.root.raw.t0.wtab02.where('val>1')),dtype=np.int16) > >> 1 loops, best of 3: 30.4 s per loop > >> > >> > >> Using the compressed and indexed version, it mysteriously does not > >> work (output is empty list) > >> >>> cvals = np.fromiter((x['val'] for x in wctab02.where('val>1')), > >> >>> dtype=np.int16) > >> >>> cvals > >> array([], dtype=int16) > > > > > > This doesn't work because numpy doesn't accept generators. The following > > should work: > >>>> cvals = np.fromiter([x['val'] for x in wctab02.where('val>1')], > >>>> dtype=np.int16) > > > > Also, I am a little concerned that np.nonzero() doesn't really compare to > > Table.getWhereList('val>1'). Testing for all zero bits should be a lot > > faster > > than a numeric comparison. Could you instead try the same actual > operation > > in numpy as whereList(): > > > >>>> timeit big=np.argwhere(np.greater(wa02[:], 1)) > > > > Thanks! > > Anthony > > > >> > >> > >> But it does if we skip using where ( I don't print cvals, but it is > >> correct ) > >> >>> timeit cvals = np.fromiter((x['val'] for x in wctab02 if > x['val']>1), > >> >>> dtype=np.int16) > >> 1 loops, best of 3: 54.8 s per loop > >> > >> (the version with longer chunklen works fine and times to 30.7s). > >> > >> > >> -á. > >> > >> wtab02: not compressed, not indexed, small chunklen: > >> /raw/t0/wtab02 (Table(312000000,)) '' > >> description := { > >> "val": Int16Col(shape=(), dflt=0, pos=0)} > >> byteorder := 'little' > >> chunkshape := (32768,) > >> > >> larger chunklen (as calculated from expectedrows=312000000) > >> /raw/t0/wcetab02 (Table(312000000,)) 'test' > >> description := { > >> "val": Int16Col(shape=(), dflt=0, pos=0)} > >> byteorder := 'little' > >> chunkshape := (131072,) > >> > >> wctab02: compressed, with CSI index > >> /raw/t0/wctab02 (Table(312000000,), shuffle, blosc(9)) 'test' > >> description := { > >> "val": Int16Col(shape=(), dflt=0, pos=0)} > >> byteorder := 'little' > >> chunkshape := (32768,) > >> autoIndex := True > >> colindexes := { > >> "val": Index(9, full, shuffle, zlib(1)).is_CSI=True} > >> > >> > >> > >> On Thu, Apr 19, 2012 at 12:46, Alvaro Tejero Cantero <al...@mi...> > >> wrote: > >> > where will give me an iterator over the /values/; in this case I > >> > wanted the indexes. Plus, it will give me an iterator, so it will be > >> > trivially fast. > >> > > >> > Are you interested in the timings of where + building a list? or where > >> > + building an array? > >> > > >> > > >> > -á. > >> > > >> > > >> > > >> > On Wed, Apr 18, 2012 at 19:02, Anthony Scopatz <sc...@gm...> > >> > wrote: > >> >> > >> > >> > >> > ------------------------------------------------------------------------------ > >> For Developers, A Lot Can Happen In A Second. > >> Boundary is the first to Know...and Tell You. > >> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > >> http://p.sf.net/sfu/Boundary-d2dvs2 > >> > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > > ------------------------------------------------------------------------------ > > For Developers, A Lot Can Happen In A Second. > > Boundary is the first to Know...and Tell You. > > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > > http://p.sf.net/sfu/Boundary-d2dvs2 > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > For Developers, A Lot Can Happen In A Second. > Boundary is the first to Know...and Tell You. > Monitor Your Applications in Ultra-Fine Resolution. Try it FREE! > http://p.sf.net/sfu/Boundary-d2dvs2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Francesc A. <fa...@py...> - 2012-04-24 02:10:21
|
On 4/18/12 12:33 PM, Alvaro Tejero Cantero wrote: > A single array with 312 000 000 int 16 values. > > Two (uncompressed) ways to store it: > > * Array > >>>> wa02[:10] > array([306, 345, 353, 335, 345, 345, 356, 341, 338, 357], dtype=int16 > > * Table wtab02 (single column, named 'val') >>>> wtab02[:10] > array([(306,), (345,), (353,), (335,), (345,), (345,), (356,), (341,), > (338,), (357,)], > dtype=[('val', '<i2')]) > > read time respectively 120 ms, 220 ms. > >>>> timeit big=np.nonzero(wa02[:]>1) > 1 loops, best of 3: 1.66 s per loop > >>>> timeit bigtab=wtab02.getWhereList('val>1') > 1 loops, best of 3: 119 s per loop Yes, this is expected. The fact that one method is much faster than the other is precisely that one is designed for operating out-of-core, while the other is operating completely in-memory, and this has a cost. But that does not mean that out-of-core has to be necessarily slower. Look at this: In [107]: da Out[107]: /da (Array(10000000,)) '' atom := Int16Atom(shape=(), dflt=0) maindim := 0 flavor := 'numpy' byteorder := 'little' chunkshape := None In [108]: dra Out[108]: /dra (Table(10000000,), shuffle, blosc(5)) '' description := { "a": Int16Col(shape=(), dflt=0, pos=0)} byteorder := 'little' chunkshape := (65536,) In [127]: time r = np.argwhere(da[:] == 1) CPU times: user 0.08 s, sys: 0.02 s, total: 0.10 s Wall time: 0.10 s In [111]: time l = dra.getWhereList('a == 1') CPU times: user 0.10 s, sys: 0.01 s, total: 0.11 s Wall time: 0.11 s So, tables' getWhereList() perfomance is pretty close to NumPy, even if the former is using compression. This is a great achievement. Why I'm getting very different results than you is this: In [119]: len(l) Out[119]: 153 That is, the selectivity of the query is extremely high (153 out of 10 million elements), which is the scenario where queries are designed to shine. If you use indexing, then you can get even more speed: In [131]: dra.cols.a.createCSIndex() Out[131]: 10000000 In [132]: time l = dra.getWhereList('a == 1') CPU times: user 0.02 s, sys: 0.01 s, total: 0.03 s Wall time: 0.02 s In your case, using small selectivities (you are asking possibly for almost 50% of the initial datasets, perhaps less or perhaps more, depending on your data pattern), makes the data object creation (one for iteration in loop) in PyTables the big overhead: In [134]: time r = np.argwhere(da[:] > 1) CPU times: user 1.03 s, sys: 0.03 s, total: 1.06 s Wall time: 1.12 s In [135]: time l = dra.getWhereList('a > 1') CPU times: user 5.62 s, sys: 0.16 s, total: 5.78 s Wall time: 5.89 s Now getWhereList() is more than 5x times slower. Removing the index helps a bit here: In [136]: dra.cols.a.removeIndex() In [137]: time l = dra.getWhereList('a > 1') CPU times: user 5.10 s, sys: 0.12 s, total: 5.22 s Wall time: 5.30 s But, if the internal query machinery in PyTables is the same, why it takes longer? The short answer is object creation (and some data copy). getWhereList() can be expressed like this: In [165]: time l = np.array([r.nrow for r in dra.where('a > 1')]) CPU times: user 5.54 s, sys: 0.09 s, total: 5.63 s Wall time: 5.71 s Now, if we count the time to get the coordinates only: In [159]: time s = [r.nrow for r in dra.where('a > 1')] CPU times: user 3.86 s, sys: 0.08 s, total: 3.95 s Wall time: 4.02 s This time is a bit long, but this is due to the .nrow implementation (a Cython property of the Row class; I wonder if this could be accelerated somewhat). In general, the Row iterator can be much faster, like for example, in getting values: In [161]: time s = [r['a'] for r in dra.where('a > 1')] CPU times: user 1.57 s, sys: 0.07 s, total: 1.63 s Wall time: 1.61 s and you can notice that this is barely the time that it takes a pure list creation: In [139]: time l = [r for r in xrange(len(l))] CPU times: user 1.44 s, sys: 0.11 s, total: 1.55 s Wall time: 1.53 s So, the 'slow' times that you are seeing are a consequence of the different data object creation and the internal data copies (for building the final NumPy array). NumPy is much faster because all this process is made in pure C. But again, this does not preclude the fact that queries in PyTables are actually fast --and potentially much faster than NumPy for high selectivities and indexing. Hope this helps, -- Francesc Alted |
From: Francesc A. <fa...@py...> - 2012-04-24 02:14:49
|
On 4/19/12 8:43 AM, Alvaro Tejero Cantero wrote: > Some complementary info (I copy the details of the tables below) > > timeit vals = numpy.fromiter((x['val'] for x in > my.root.raw.t0.wtab02.where('val>1')),dtype=np.int16) > 1 loops, best of 3: 30.4 s per loop > > > Using the compressed and indexed version, it mysteriously does not > work (output is empty list) >>>> cvals = np.fromiter((x['val'] for x in wctab02.where('val>1')), dtype=np.int16) >>>> cvals > array([], dtype=int16) This smells like a bug, but I cannot reproduce it. Could you send an self-contained example reproducing this behavior? -- Francesc Alted |
From: Anthony S. <sc...@gm...> - 2012-04-24 03:40:23
|
On Mon, Apr 23, 2012 at 9:14 PM, Francesc Alted <fa...@py...> wrote: > On 4/19/12 8:43 AM, Alvaro Tejero Cantero wrote: > > Some complementary info (I copy the details of the tables below) > > > > timeit vals = numpy.fromiter((x['val'] for x in > > my.root.raw.t0.wtab02.where('val>1')),dtype=np.int16) > > 1 loops, best of 3: 30.4 s per loop > > > > > > Using the compressed and indexed version, it mysteriously does not > > work (output is empty list) > >>>> cvals = np.fromiter((x['val'] for x in wctab02.where('val>1')), > dtype=np.int16) > >>>> cvals > > array([], dtype=int16) > > This smells like a bug, but I cannot reproduce it. Could you send an > self-contained example reproducing this behavior? > I am not able to reproduce this either... > > -- > Francesc Alted > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Alvaro T. C. <al...@mi...> - 2012-04-25 11:13:37
|
Hi, Thanks for the clarification. I retried today both with a normal and a completely sorted index on a a blosc-compressed table (complevel 5) and could not reproduce the putative bug either. -á. On Tue, Apr 24, 2012 at 04:39, Anthony Scopatz <sc...@gm...> wrote: > On Mon, Apr 23, 2012 at 9:14 PM, Francesc Alted <fa...@py...> wrote: >> >> On 4/19/12 8:43 AM, Alvaro Tejero Cantero wrote: >> > Some complementary info (I copy the details of the tables below) >> > >> > timeit vals = numpy.fromiter((x['val'] for x in >> > my.root.raw.t0.wtab02.where('val>1')),dtype=np.int16) >> > 1 loops, best of 3: 30.4 s per loop >> > >> > >> > Using the compressed and indexed version, it mysteriously does not >> > work (output is empty list) >> >>>> cvals = np.fromiter((x['val'] for x in wctab02.where('val>1')), >> >>>> dtype=np.int16) >> >>>> cvals >> > array([], dtype=int16) >> >> This smells like a bug, but I cannot reproduce it. Could you send an >> self-contained example reproducing this behavior? > > > I am not able to reproduce this either... > >> >> >> -- >> Francesc Alted >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Francesc A. <fa...@py...> - 2012-04-26 03:10:22
|
On 4/25/12 6:13 AM, Alvaro Tejero Cantero wrote: > Hi, > > Thanks for the clarification. > > I retried today both with a normal and a completely sorted index on a > a blosc-compressed table (complevel 5) and could not reproduce the > putative bug either. So could you please confirm if you can reproduce the problem with blosc level 9? Thanks! > > -á. > > > On Tue, Apr 24, 2012 at 04:39, Anthony Scopatz<sc...@gm...> wrote: >> On Mon, Apr 23, 2012 at 9:14 PM, Francesc Alted<fa...@py...> wrote: >>> On 4/19/12 8:43 AM, Alvaro Tejero Cantero wrote: >>>> Some complementary info (I copy the details of the tables below) >>>> >>>> timeit vals = numpy.fromiter((x['val'] for x in >>>> my.root.raw.t0.wtab02.where('val>1')),dtype=np.int16) >>>> 1 loops, best of 3: 30.4 s per loop >>>> >>>> >>>> Using the compressed and indexed version, it mysteriously does not >>>> work (output is empty list) >>>>>>> cvals = np.fromiter((x['val'] for x in wctab02.where('val>1')), >>>>>>> dtype=np.int16) >>>>>>> cvals >>>> array([], dtype=int16) >>> This smells like a bug, but I cannot reproduce it. Could you send an >>> self-contained example reproducing this behavior? >> >> I am not able to reproduce this either... >> >>> >>> -- >>> Francesc Alted >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users -- Francesc Alted |
From: Alvaro T. C. <al...@mi...> - 2012-04-26 10:47:53
|
Hi, I tried again, also with different chunklens and couldn't reproduce it. Unfortunately the session where I had this result was killed by a power outage and the history buffer does not go as far back, so I can't find out what exactly triggered it. -á. On Thu, Apr 26, 2012 at 04:10, Francesc Alted <fa...@py...> wrote: > On 4/25/12 6:13 AM, Alvaro Tejero Cantero wrote: > > Hi, > > > > Thanks for the clarification. > > > > I retried today both with a normal and a completely sorted index on a > > a blosc-compressed table (complevel 5) and could not reproduce the > > putative bug either. > > So could you please confirm if you can reproduce the problem with blosc > level 9? > > Thanks! > > > > > -á. > > > > > > On Tue, Apr 24, 2012 at 04:39, Anthony Scopatz<sc...@gm...> > wrote: > >> On Mon, Apr 23, 2012 at 9:14 PM, Francesc Alted<fa...@py...> > wrote: > >>> On 4/19/12 8:43 AM, Alvaro Tejero Cantero wrote: > >>>> Some complementary info (I copy the details of the tables below) > >>>> > >>>> timeit vals = numpy.fromiter((x['val'] for x in > >>>> my.root.raw.t0.wtab02.where('val>1')),dtype=np.int16) > >>>> 1 loops, best of 3: 30.4 s per loop > >>>> > >>>> > >>>> Using the compressed and indexed version, it mysteriously does not > >>>> work (output is empty list) > >>>>>>> cvals = np.fromiter((x['val'] for x in wctab02.where('val>1')), > >>>>>>> dtype=np.int16) > >>>>>>> cvals > >>>> array([], dtype=int16) > >>> This smells like a bug, but I cannot reproduce it. Could you send an > >>> self-contained example reproducing this behavior? > >> > >> I am not able to reproduce this either... > >> > >>> > >>> -- > >>> Francesc Alted > >>> > >>> > >>> > >>> > ------------------------------------------------------------------------------ > >>> Live Security Virtual Conference > >>> Exclusive live event will cover all the ways today's security and > >>> threat landscape has changed and how IT managers can respond. > Discussions > >>> will include endpoint security, mobile security and the latest in > malware > >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >>> _______________________________________________ > >>> Pytables-users mailing list > >>> Pyt...@li... > >>> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > -- > Francesc Alted > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |