pytables-users Mailing List for PyTables - Hierarchical datasets (Page 5)

Brought to you by: a_valentino, falted, ivilata, joshmoore

pytables-users — PyTables users discussion list

You can subscribe to this list here.

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 3 4 5 6 7 .. 165 > >> (Page 5 of 165)

Re: [Pytables-users] Chunk selection for optimized data access

From: Francesc A. <fa...@gm...> - 2013-06-05 13:10:45

On 6/5/13 11:45 AM, Andreas Hilboll wrote:
> On 05.06.2013 10:31, Andreas Hilboll wrote:
>> On 05.06.2013 03:29, Tim Burgess wrote:
>>> I was playing around with in-memory HDF5 prior to the 3.0 release.
>>> Here's an example based on what I was doing.
>>> I looked over the docs and it does mention that there is an option to
>>> throw away the 'file' rather than write it to disk.
>>> Not sure how to do that and can't actually think of a use case where I
>>> would want to :-)
>>>
>>> And be wary, it is H5FD_CORE.
>>>
>>>
>>> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz <sc...@gm...> wrote:
>>>> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
>>>> haven't ever used this personally, but it would be great to have an
>>>> example script, if someone wants to write one ;)
>>>>
>>>   
>>>
>>> import numpy as np
>>> import tables
>>>
>>> CHUNKY = 30
>>> CHUNKX = 8640
>>>
>>> if __name__ == '__main__':
>>>
>>>      # create dataset and add global attrs
>>>
>>>      file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>>>
>>>      with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
>>> example', driver='H5FD_CORE') as h5f:
>>>          
>>>          # dummy some data
>>>          lats = np.empty([4320])
>>>          lons = np.empty([8640])
>>>
>>>          # create some simple arrays
>>>          lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>>>          lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>>>
>>>          # create a 365 x 4320 x 8640 CArray of 32bit float
>>>          shape = (365, 4320, 8640)
>>>          atom = tables.Float32Atom(dflt=np.nan)
>>>
>>>          # chunk into daily slices and then further chunk days
>>>          sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
>>> chunkshape=(1, CHUNKY, CHUNKX))
>>>
>>>          # dummy up an ndarray
>>>          sst = np.empty([4320, 8640], dtype=np.float32)
>>>          sst.fill(30.0)
>>>
>>>          # write ndarray to a 2D plane in the HDF5
>>>          sst_node[0] = sst
>> Thanks Tim,
>>
>> I adapted your example for my use case (I'm using the EArray class,
>> because I need to continuously update my database), and it works well.
>>
>> However, when I use this with my own data (but also creating the arrays
>> like you did), I'm running into errors like "Could not wait on barrier".
>> It seems like the HDF library is spawing several threads.
>>
>> Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
>> runtime?
> Update:
>
> When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
> seems to work as expected (but a bit on the slow side ...).

BTW, can you really notice the difference between using 1, 2 or 4 
threads?  Can you show some figures?  Just curious.

-- 
Francesc Alted

Re: [Pytables-users] Chunk selection for optimized data access

From: Francesc A. <fa...@gm...> - 2013-06-05 10:26:23

On 6/5/13 11:45 AM, Andreas Hilboll wrote:
> On 05.06.2013 10:31, Andreas Hilboll wrote:
>> On 05.06.2013 03:29, Tim Burgess wrote:
>>> I was playing around with in-memory HDF5 prior to the 3.0 release.
>>> Here's an example based on what I was doing.
>>> I looked over the docs and it does mention that there is an option to
>>> throw away the 'file' rather than write it to disk.
>>> Not sure how to do that and can't actually think of a use case where I
>>> would want to :-)
>>>
>>> And be wary, it is H5FD_CORE.
>>>
>>>
>>> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz <sc...@gm...> wrote:
>>>> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
>>>> haven't ever used this personally, but it would be great to have an
>>>> example script, if someone wants to write one ;)
>>>>
>>>   
>>>
>>> import numpy as np
>>> import tables
>>>
>>> CHUNKY = 30
>>> CHUNKX = 8640
>>>
>>> if __name__ == '__main__':
>>>
>>>      # create dataset and add global attrs
>>>
>>>      file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>>>
>>>      with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
>>> example', driver='H5FD_CORE') as h5f:
>>>          
>>>          # dummy some data
>>>          lats = np.empty([4320])
>>>          lons = np.empty([8640])
>>>
>>>          # create some simple arrays
>>>          lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>>>          lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>>>
>>>          # create a 365 x 4320 x 8640 CArray of 32bit float
>>>          shape = (365, 4320, 8640)
>>>          atom = tables.Float32Atom(dflt=np.nan)
>>>
>>>          # chunk into daily slices and then further chunk days
>>>          sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
>>> chunkshape=(1, CHUNKY, CHUNKX))
>>>
>>>          # dummy up an ndarray
>>>          sst = np.empty([4320, 8640], dtype=np.float32)
>>>          sst.fill(30.0)
>>>
>>>          # write ndarray to a 2D plane in the HDF5
>>>          sst_node[0] = sst
>> Thanks Tim,
>>
>> I adapted your example for my use case (I'm using the EArray class,
>> because I need to continuously update my database), and it works well.
>>
>> However, when I use this with my own data (but also creating the arrays
>> like you did), I'm running into errors like "Could not wait on barrier".
>> It seems like the HDF library is spawing several threads.
>>
>> Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
>> runtime?
> Update:
>
> When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
> seems to work as expected (but a bit on the slow side ...). With
> max_blosc_threads=4, the error pops up.

Hmm, this seems like a bad interaction among threads in numexpr and 
blosc.  I'm not sure why this is triggering because the libraries should 
execute at different times.  Hmm is your app multi-threaded?

Although Blosc has implemented a lock for preventing this situation in 
the latest releases, numexpr still lacks this protection.  As the 
multithreading engine is the same for both packages, it should be 
relatively easy to implement the lock support to numexpr too. Volunteers?

-- 
Francesc Alted

Re: [Pytables-users] Chunk selection for optimized data access

From: Andreas H. <li...@hi...> - 2013-06-05 09:46:12

On 05.06.2013 10:31, Andreas Hilboll wrote:
> On 05.06.2013 03:29, Tim Burgess wrote:
>> I was playing around with in-memory HDF5 prior to the 3.0 release.
>> Here's an example based on what I was doing.
>> I looked over the docs and it does mention that there is an option to
>> throw away the 'file' rather than write it to disk.
>> Not sure how to do that and can't actually think of a use case where I
>> would want to :-)
>>
>> And be wary, it is H5FD_CORE.
>>
>>
>> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz <sc...@gm...> wrote:
>>>
>>> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
>>> haven't ever used this personally, but it would be great to have an
>>> example script, if someone wants to write one ;)
>>>
>>  
>>
>> import numpy as np
>> import tables
>>
>> CHUNKY = 30 
>> CHUNKX = 8640
>>
>> if __name__ == '__main__':
>>
>>     # create dataset and add global attrs
>>
>>     file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>>
>>     with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
>> example', driver='H5FD_CORE') as h5f:
>>         
>>         # dummy some data
>>         lats = np.empty([4320])
>>         lons = np.empty([8640])
>>
>>         # create some simple arrays
>>         lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>>         lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>>
>>         # create a 365 x 4320 x 8640 CArray of 32bit float
>>         shape = (365, 4320, 8640)
>>         atom = tables.Float32Atom(dflt=np.nan)
>>
>>         # chunk into daily slices and then further chunk days
>>         sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
>> chunkshape=(1, CHUNKY, CHUNKX))
>>
>>         # dummy up an ndarray
>>         sst = np.empty([4320, 8640], dtype=np.float32)
>>         sst.fill(30.0)
>>
>>         # write ndarray to a 2D plane in the HDF5
>>         sst_node[0] = sst
> 
> Thanks Tim,
> 
> I adapted your example for my use case (I'm using the EArray class,
> because I need to continuously update my database), and it works well.
> 
> However, when I use this with my own data (but also creating the arrays
> like you did), I'm running into errors like "Could not wait on barrier".
> It seems like the HDF library is spawing several threads.
> 
> Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
> runtime?

Update:

When setting max_blosc_threads=2 and max_numexpr_threads=2, everything
seems to work as expected (but a bit on the slow side ...). With
max_blosc_threads=4, the error pops up.

Cheers, Andreas.

Re: [Pytables-users] Chunk selection for optimized data access

From: Andreas H. <li...@hi...> - 2013-06-05 08:48:39

On 05.06.2013 09:15, Seref Arikan wrote:
> You would be suprised to see how convenient HDF5 can be in small scale
> data :) There are cases where one may need to use binary serialization
> of a few thousand items, but still needing metadata, indexing and other
> nice features provided by HDF5/pyTables. 

You're right, Seref! That's why I wrote a small little script which
supports saving the script which generates the H5 file to the H5 file
itself, in a file_node. That way, if you have the data file, you can
always see what you did to create it :)

You can find the script here:

   https://github.com/andreas-h/pyrepsci

It's not cleaned up, but does the job. Currently, it works only via
pandas, but when I find the time I'll make it more general. Maybe you
find this useful.

-- Andreas.

Re: [Pytables-users] Chunk selection for optimized data access

From: Andreas H. <li...@hi...> - 2013-06-05 08:31:22

On 05.06.2013 03:29, Tim Burgess wrote:
> I was playing around with in-memory HDF5 prior to the 3.0 release.
> Here's an example based on what I was doing.
> I looked over the docs and it does mention that there is an option to
> throw away the 'file' rather than write it to disk.
> Not sure how to do that and can't actually think of a use case where I
> would want to :-)
> 
> And be wary, it is H5FD_CORE.
> 
> 
> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz <sc...@gm...> wrote:
>>
>> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
>> haven't ever used this personally, but it would be great to have an
>> example script, if someone wants to write one ;)
>>
>  
> 
> import numpy as np
> import tables
> 
> CHUNKY = 30 
> CHUNKX = 8640
> 
> if __name__ == '__main__':
> 
>     # create dataset and add global attrs
> 
>     file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
> 
>     with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
> example', driver='H5FD_CORE') as h5f:
>         
>         # dummy some data
>         lats = np.empty([4320])
>         lons = np.empty([8640])
> 
>         # create some simple arrays
>         lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>         lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
> 
>         # create a 365 x 4320 x 8640 CArray of 32bit float
>         shape = (365, 4320, 8640)
>         atom = tables.Float32Atom(dflt=np.nan)
> 
>         # chunk into daily slices and then further chunk days
>         sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
> chunkshape=(1, CHUNKY, CHUNKX))
> 
>         # dummy up an ndarray
>         sst = np.empty([4320, 8640], dtype=np.float32)
>         sst.fill(30.0)
> 
>         # write ndarray to a 2D plane in the HDF5
>         sst_node[0] = sst

Thanks Tim,

I adapted your example for my use case (I'm using the EArray class,
because I need to continuously update my database), and it works well.

However, when I use this with my own data (but also creating the arrays
like you did), I'm running into errors like "Could not wait on barrier".
It seems like the HDF library is spawing several threads.

Any idea what's going wrong? Can I somehow avoid HDF5 multithreading at
runtime?

Cheers, Andreas.

Re: [Pytables-users] Chunk selection for optimized data access

From: Seref A. <ser...@gm...> - 2013-06-05 07:15:47

You would be suprised to see how convenient HDF5 can be in small scale data
:) There are cases where one may need to use binary serialization of a few
thousand items, but still needing metadata, indexing and other nice
features provided by HDF5/pyTables.




On Wed, Jun 5, 2013 at 2:29 AM, Tim Burgess <tim...@ma...> wrote:

> I was playing around with in-memory HDF5 prior to the 3.0 release. Here's
> an example based on what I was doing.
> I looked over the docs and it does mention that there is an option to
> throw away the 'file' rather than write it to disk.
> Not sure how to do that and can't actually think of a use case where I
> would want to :-)
>
> And be wary, it is H5FD_CORE.
>
>
> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz <sc...@gm...> wrote:
>
>
> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I
> haven't ever used this personally, but it would be great to have an example
> script, if someone wants to write one ;)
>
>
>
> import numpy as np
> import tables
>
> CHUNKY = 30
> CHUNKX = 8640
>
> if __name__ == '__main__':
>
>     # create dataset and add global attrs
>
>     file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>
>     with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
> example', driver='H5FD_CORE') as h5f:
>
>         # dummy some data
>         lats = np.empty([4320])
>         lons = np.empty([8640])
>
>         # create some simple arrays
>         lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>         lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>
>         # create a 365 x 4320 x 8640 CArray of 32bit float
>         shape = (365, 4320, 8640)
>         atom = tables.Float32Atom(dflt=np.nan)
>
>         # chunk into daily slices and then further chunk days
>         sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
> chunkshape=(1, CHUNKY, CHUNKX))
>
>         # dummy up an ndarray
>         sst = np.empty([4320, 8640], dtype=np.float32)
>         sst.fill(30.0)
>
>         # write ndarray to a 2D plane in the HDF5
>         sst_node[0] = sst
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Chunk selection for optimized data access

From: Antonio V. <ant...@ti...> - 2013-06-05 07:11:03

Hi Tim,

Il 05/06/2013 03:29, Tim Burgess ha scritto:
> I was playing around with in-memory HDF5 prior to the 3.0 release. Here's an
> example based on what I was doing.
> I looked over the docs and it does mention that there is an option to throw away
> the 'file' rather than write it to disk.

Please see the DRIVER_CORE_BACKING_STORE parameter [1]

[1] 
http://pytables.github.io/usersguide/parameter_files.html#tables.parameters.DRIVER_CORE_BACKING_STORE


regards

> Not sure how to do that and can't actually think of a use case where I would
> want to :-)
>
> And be wary, it is H5FD_CORE.
>
>
> On Jun 05, 2013, at 08:38 AM, Anthony Scopatz <sc...@gm...> wrote:
>>
>> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I haven't
>> ever used this personally, but it would be great to have an example script, if
>> someone wants to write one ;)
>>
>
> import numpy as np
> import tables
>
> CHUNKY = 30
> CHUNKX = 8640
>
> if __name__ == '__main__':
>
>       # create dataset and add global attrs
>
>       file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)
>
>       with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory
> example', driver='H5FD_CORE') as h5f:
>           # dummy some data
>           lats = np.empty([4320])
>           lons = np.empty([8640])
>
>           # create some simple arrays
>           lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
>           lon_node = h5f.create_array('/', 'lon', lons, title='longitude')
>
>           # create a 365 x 4320 x 8640 CArray of 32bit float
>           shape = (365, 4320, 8640)
>           atom = tables.Float32Atom(dflt=np.nan)
>
>           # chunk into daily slices and then further chunk days
>           sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape,
> chunkshape=(1, CHUNKY, CHUNKX))
>
>           # dummy up an ndarray
>           sst = np.empty([4320, 8640], dtype=np.float32)
>           sst.fill(30.0)
>
>           # write ndarray to a 2D plane in the HDF5
>           sst_node[0] = sst
>


-- 
Antonio Valentino

Re: [Pytables-users] Chunk selection for optimized data access

From: Antonio V. <ant...@ti...> - 2013-06-05 07:06:53

Hi list,

Il 05/06/2013 00:38, Anthony Scopatz ha scritto:
> On Tue, Jun 4, 2013 at 12:30 PM, Seref Arikan <ser...@gm...> wrote:
>
>> I think I've seen this in the release notes of 3.0. This is actually
>> something that I'm looking into as well. So any experience/feedback about
>> creating files in memory would be much appreciated.
>>
>
> I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I haven't
> ever used this personally, but it would be great to have an example script,
> if someone wants to write one ;)
>
> Be Well
> Anthony
>
> 1.
> http://pytables.github.io/usersguide/parameter_files.html#hdf5-driver-management
>


thare is also a small example of usage in the cookbook [1]


[1] http://pytables.github.io/cookbook/inmemory_hdf5_files.html


ciao

-- 
Antonio Valentino

Re: [Pytables-users] Chunk selection for optimized data access

From: Tim B. <tim...@ma...> - 2013-06-05 01:29:29

I was playing around with in-memory HDF5 prior to the 3.0 release. Here's an example based on what I was doing.
I looked over the docs and it does mention that there is an option to throw away the 'file' rather than write it to disk.
Not sure how to do that and can't actually think of a use case where I would want to :-)

And be wary, it is H5FD_CORE.


On Jun 05, 2013, at 08:38 AM, Anthony Scopatz <sc...@gm...> wrote:

I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I haven't ever used this personally, but it would be great to have an example script, if someone wants to write one ;)

 

import numpy as np
import tables

CHUNKY = 30 
CHUNKX = 8640

if __name__ == '__main__':

	
    # create dataset and add global attrs

    file_path = 'demofile_chunk%sx%d.h5' % (CHUNKY, CHUNKX)

    with tables.open_file(file_path, 'w', title='PyTables HDF5 In-memory example', driver='H5FD_CORE') as h5f:
        
        # dummy some data
        lats = np.empty([4320])
        lons = np.empty([8640])

        # create some simple arrays
        lat_node = h5f.create_array('/', 'lat', lats, title='latitude')
        lon_node = h5f.create_array('/', 'lon', lons, title='longitude')

        # create a 365 x 4320 x 8640 CArray of 32bit float
        shape = (365, 4320, 8640)
        atom = tables.Float32Atom(dflt=np.nan)

        # chunk into daily slices and then further chunk days
        sst_node = h5f.create_carray(h5f.root, 'sst', atom, shape, chunkshape=(1, CHUNKY, CHUNKX))

        # dummy up an ndarray
        sst = np.empty([4320, 8640], dtype=np.float32)
        sst.fill(30.0)

        # write ndarray to a 2D plane in the HDF5
        sst_node[0] = sst

Re: [Pytables-users] pytable 30 - encoding

From: Jeff R. <jr...@ya...> - 2013-06-05 01:16:47

Anthony,
 
I created an issue with more info
 
I am not sure if this is a bug, or just a way both ne/pytables treat strings that need to touch an encoded value;
 
I found workaround by specifying the condvars to readWhere. Any more thoughts on this?
 
thanks Jeff
 
 
https://github.com/PyTables/PyTables/issues/265

I can be reached on my cell (917)971-6387
 

________________________________
 From: Anthony Scopatz <sc...@gm...>
To: Jeff Reback <je...@re...> 
Cc: Discussion list for PyTables <pyt...@li...> 
Sent: Tuesday, June 4, 2013 6:39 PM
Subject: Re: [Pytables-users] pytable 30 - encoding
  


Hi Jeff,
Hmmm, Could you try doing the same thing on just an in-memory numpy array using numexpr.  If this succeeds it tells us that the problem is in PyTables, not numexpr.

Be Well
Anthony



On Tue, Jun 4, 2013 at 11:35 AM, Jeff Reback <jr...@ya...> wrote:

Anthony,
>  
>I am using numexpr 2.1 (latest)
> 
>this is puzzling; doesn't matter what I pass (bytes or str) , same result?
> 
>(column == 'str-2')
>> /mnt/code/arb/test/pytables-3.py(38)<module>()
>-> result = handle.root.test.table.readWhere(selector)
>(Pdb) handle.root.test.table.readWhere(selector)
>*** TypeError: string argument without an encoding
>(Pdb) handle.root.test.table.readWhere(selector.encode(encoding))
>*** TypeError: string argument without an encoding
>(Pdb) 
>
>
> From: Anthony Scopatz <sc...@gm...>
>To: Jeff Reback <je...@re...>; Discussion list for PyTables <pyt...@li...> 
>Sent: Tuesday, June 4, 2013 12:25 PM
>Subject: Re: [Pytables-users] pytable 30 - encoding
> 
>
>
>Hi Jeff, 
>
>
>Have you also updated numexpr to the most recent version?  The error is coming from numexpr not compiling the expression correctly. Also, you might try making selector a str, rather than bytes: 
>
>
>selector = "(column == 'str-2')"
>
>
>
>rather than
>
>
>selector = "(column == 'str-2')".encode(encoding)
>
>
>
>Be Well
>Anthony
>
>
>
>On Tue, Jun 4, 2013 at 8:51 AM, Jeff Reback <jr...@ya...> wrote:
>
>anthony,where am I going wrong here?  
>>#!/usr/local/bin/python3
>>import tables
>>import numpy as np
>>import datetime, time
>>encoding = 'UTF-8'
>>test_file = 'test_select.h5'
>>handle = tables.openFile(test_file, "w")
>>node   = handle.createGroup(handle.root, 'test')
>>table  = handle.createTable(node, 'table', dict(
>>index   = tables.Int64Col(),
>>        column  = tables.StringCol(25),
>>values  = tables.FloatCol(shape=(3)),
>>))
>>
>># add data
>>r = table.row
>>for i in range(10):
>>r['index'] = i
>>r['column'] = ("str-%d" % (i % 5)).encode(encoding)
>>r['values'] = np.arange(3)
>>r.append()
>>table.flush()
>>handle.close()
>># read
>>handle =
 tables.openFile(test_file,"r")
>>result = handle.root.test.table.read()
>>print("table data\n")
>>print(result)
>># where
>>print("\nselector\n")
>>selector = "(column == 'str-2')".encode(encoding)
>>print(selector)
>>result = handle.root.test.table.readWhere(selector)
>>print(result)
>>
>>and the following out:
>>
>>[sheep-jreback-/code/arb/test] python3 pytables-3.py
>>table data
>>[(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0])
>>(b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0])
>>(b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0])
>>(b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0])
>>(b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])]
>>selector
>>b"(column == 'str-2')"
>>Traceback (most recent call last):
>>File "pytables-3.py", line 37, in <module>
>>result =
 handle.root.test.table.readWhere(selector)
>>File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py", line 35, in oldfunc
>>return obj(*args, **kwargs)
>>File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py", line 1522, in read_where
>>self._where(condition, condvars, start, stop, step)]
>>File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py", line 1484, in _where
>>compiled = self._compile_condition(condition, condvars)
>>File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py", line 1358, in _compile_condition
>>compiled = compile_condition(condition, typemap, indexedcols)
>>File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py", line 419, in compile_condition
>>func = NumExpr(expr, signature)
>>File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 559, in NumExpr
>>precompile(ex, signature, context)
>>File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 511, in precompile
>>constants_order, constants = getConstants(ast)
>>File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 294, in getConstants
>>for a in constants_order]
>>File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 294, in <listcomp>
>>for a in constants_order]
>>File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 284, in convertConstantToKind
>>return kind_to_type[kind](x)
>>TypeError: string argument without an encoding
>>Closing remaining open files:
 test_select.h5... done 
>>
>>------------------------------------------------------------------------------
>>How ServiceNow helps IT people transform IT departments:
>>1. A cloud service to automate IT design, transition and operations
>>2. Dashboards that offer high-level views of enterprise services
>>3. A single system of record for all IT processes
>>http://p.sf.net/sfu/servicenow-d2d-j
>>_______________________________________________
>>Pytables-users mailing list
>>Pyt...@li...
>>https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
>

Re: [Pytables-users] pytable 30 - encoding

From: Anthony S. <sc...@gm...> - 2013-06-04 22:40:04

Hi Jeff,

Hmmm, Could you try doing the same thing on just an in-memory numpy array
using numexpr.  If this succeeds it tells us that the problem is in
PyTables, not numexpr.

Be Well
Anthony


On Tue, Jun 4, 2013 at 11:35 AM, Jeff Reback <jr...@ya...> wrote:

> Anthony,
>
> I am using numexpr 2.1 (latest)
>
> this is puzzling; doesn't matter what I pass (bytes or str) , same result?
>
> (column == 'str-2')
> > /mnt/code/arb/test/pytables-3.py(38)<module>()
> -> result = handle.root.test.table.readWhere(selector)
> (Pdb) handle.root.test.table.readWhere(selector)
> *** TypeError: string argument without an encoding
> (Pdb) handle.root.test.table.readWhere(selector.encode(encoding))
> *** TypeError: string argument without an encoding
> (Pdb)
>
>
>    *From:* Anthony Scopatz <sc...@gm...>
> *To:* Jeff Reback <je...@re...>; Discussion list for PyTables <
> pyt...@li...>
> *Sent:* Tuesday, June 4, 2013 12:25 PM
> *Subject:* Re: [Pytables-users] pytable 30 - encoding
>
> Hi Jeff,
>
> Have you also updated numexpr to the most recent version?  The error is
> coming from numexpr not compiling the expression correctly. Also, you might
> try making selector a str, rather than bytes:
>
> selector = "(column == 'str-2')"
>
> rather than
>
> selector = "(column == 'str-2')".encode(encoding)
>
> Be Well
> Anthony
>
>
> On Tue, Jun 4, 2013 at 8:51 AM, Jeff Reback <jr...@ya...> wrote:
>
> anthony,where am I going wrong here?
> #!/usr/local/bin/python3
> import tables
> import numpy as np
> import datetime, time
> encoding = 'UTF-8'
> test_file = 'test_select.h5'
> handle = tables.openFile(test_file, "w")
> node = handle.createGroup(handle.root, 'test')
> table = handle.createTable(node, 'table', dict(
> index = tables.Int64Col(),
>         column = tables.StringCol(25),
> values = tables.FloatCol(shape=(3)),
> ))
>
> # add data
> r = table.row
> for i in range(10):
> r['index'] = i
> r['column'] = ("str-%d" % (i % 5)).encode(encoding)
> r['values'] = np.arange(3)
> r.append()
> table.flush()
> handle.close()
> # read
> handle = tables.openFile(test_file,"r")
> result = handle.root.test.table.read()
> print("table data\n")
> print(result)
> # where
> print("\nselector\n")
> selector = "(column == 'str-2')".encode(encoding)
> print(selector)
> result = handle.root.test.table.readWhere(selector)
> print(result)
> and the following out:
>
> [sheep-jreback-/code/arb/test] python3 pytables-3.py
> table data
> [(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0])
> (b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0])
> (b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0])
> (b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0])
> (b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])]
> selector
> b"(column == 'str-2')"
> Traceback (most recent call last):
> File "pytables-3.py", line 37, in <module>
> result = handle.root.test.table.readWhere(selector)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py",
> line 35, in oldfunc
> return obj(*args, **kwargs)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py",
> line 1522, in read_where
> self._where(condition, condvars, start, stop, step)]
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py",
> line 1484, in _where
> compiled = self._compile_condition(condition, condvars)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py",
> line 1358, in _compile_condition
> compiled = compile_condition(condition, typemap, indexedcols)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py",
> line 419, in compile_condition
> func = NumExpr(expr, signature)
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 559, in NumExpr
> precompile(ex, signature, context)
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 511, in precompile
> constants_order, constants = getConstants(ast)
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 294, in getConstants
> for a in constants_order]
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 294, in <listcomp>
> for a in constants_order]
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 284, in convertConstantToKind
> return kind_to_type[kind](x)
> TypeError: string argument without an encoding
> Closing remaining open files: test_select.h5... done
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
>

Re: [Pytables-users] Chunk selection for optimized data access

From: Anthony S. <sc...@gm...> - 2013-06-04 22:38:31

On Tue, Jun 4, 2013 at 12:30 PM, Seref Arikan <ser...@gm...> wrote:

> I think I've seen this in the release notes of 3.0. This is actually
> something that I'm looking into as well. So any experience/feedback about
> creating files in memory would be much appreciated.
>

I think that you want to set parameters.DRIVER to H5DF_CORE [1].  I haven't
ever used this personally, but it would be great to have an example script,
if someone wants to write one ;)

Be Well
Anthony

1.
http://pytables.github.io/usersguide/parameter_files.html#hdf5-driver-management


>
> Best regards
> Seref
>
>
>
> On Tue, Jun 4, 2013 at 2:09 PM, Andreas Hilboll <li...@hi...> wrote:
>
>> On 04.06.2013 05:35, Tim Burgess wrote:
>> > My thoughts are:
>> >
>> > - try it without any compression. Assuming 32 bit floats, your monthly
>> > 5760 x 2880 is only about 65MB. Uncompressed data may perform well and
>> > at the least it will give you a baseline to work from - and will help if
>> > you are investigating IO tuning.
>> >
>> > - I have found with CArray that the auto chunksize works fairly well.
>> > Experiment with that chunksize and with some chunksizes that you think
>> > are more appropriate (maybe temporal rather than spatial in your case).
>> >
>> > On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hi...> wrote:
>> >
>> >> On 03.06.2013 14:43, Andreas Hilboll wrote:
>> >> > Hi,
>> >> >
>> >> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed
>> EArray
>> >> > (the last dimension represents time, and once per month there'll be
>> one
>> >> > more 5760x2880 array to add to the end).
>> >> >
>> >> > Now, extracting timeseries at one index location is slow; e.g., for
>> four
>> >> > indices, it takes several seconds:
>> >> >
>> >> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
>> >> >
>> >> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
>> >> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
>> >> > Wall time: 7.17 s
>> >> >
>> >> > I have the feeling that this performance could be improved, but I'm
>> not
>> >> > sure about how to properly use the `chunkshape` parameter in my case.
>> >> >
>> >> > Any help is greatly appreciated :)
>> >> >
>> >> > Cheers, Andreas.
>> >>
>> >> PS: If I could get significant performance gains by not using an EArray
>> >> and therefore re-creating the whole database each month, then this
>> would
>> >> also be an option.
>> >>
>> >> -- Andreas.
>>
>> Thanks a lot, Anthony and Tim! I was able to get down the readout time
>> considerably using  chunkshape=(32, 32, 256) for my 5760x2880x150 array.
>> Now, reading times are about as fast as I expected.
>>
>> the downside is that now, building up the database takes up a lot of
>> time, because i get the data in chunks of 5760x2880x1. So I guess that
>> writing the data to disk like this causes a load of IO operations ...
>>
>> My new question: Is there a way to create a file in-memory? If possible,
>> I could then build up my database in-memory and then, once it's done,
>> just copy the arrays to an on-disk file. Is that possible? If so, how?
>>
>> Thanks a lot for your help!
>>
>> -- Andreas.
>>
>>
>>
>> ------------------------------------------------------------------------------
>> How ServiceNow helps IT people transform IT departments:
>> 1. A cloud service to automate IT design, transition and operations
>> 2. Dashboards that offer high-level views of enterprise services
>> 3. A single system of record for all IT processes
>> http://p.sf.net/sfu/servicenow-d2d-j
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Chunk selection for optimized data access

From: Seref A. <ser...@gm...> - 2013-06-04 17:30:24

I think I've seen this in the release notes of 3.0. This is actually
something that I'm looking into as well. So any experience/feedback about
creating files in memory would be much appreciated.

Best regards
Seref



On Tue, Jun 4, 2013 at 2:09 PM, Andreas Hilboll <li...@hi...> wrote:

> On 04.06.2013 05:35, Tim Burgess wrote:
> > My thoughts are:
> >
> > - try it without any compression. Assuming 32 bit floats, your monthly
> > 5760 x 2880 is only about 65MB. Uncompressed data may perform well and
> > at the least it will give you a baseline to work from - and will help if
> > you are investigating IO tuning.
> >
> > - I have found with CArray that the auto chunksize works fairly well.
> > Experiment with that chunksize and with some chunksizes that you think
> > are more appropriate (maybe temporal rather than spatial in your case).
> >
> > On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hi...> wrote:
> >
> >> On 03.06.2013 14:43, Andreas Hilboll wrote:
> >> > Hi,
> >> >
> >> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> >> > (the last dimension represents time, and once per month there'll be
> one
> >> > more 5760x2880 array to add to the end).
> >> >
> >> > Now, extracting timeseries at one index location is slow; e.g., for
> four
> >> > indices, it takes several seconds:
> >> >
> >> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> >> >
> >> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
> >> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
> >> > Wall time: 7.17 s
> >> >
> >> > I have the feeling that this performance could be improved, but I'm
> not
> >> > sure about how to properly use the `chunkshape` parameter in my case.
> >> >
> >> > Any help is greatly appreciated :)
> >> >
> >> > Cheers, Andreas.
> >>
> >> PS: If I could get significant performance gains by not using an EArray
> >> and therefore re-creating the whole database each month, then this would
> >> also be an option.
> >>
> >> -- Andreas.
>
> Thanks a lot, Anthony and Tim! I was able to get down the readout time
> considerably using  chunkshape=(32, 32, 256) for my 5760x2880x150 array.
> Now, reading times are about as fast as I expected.
>
> the downside is that now, building up the database takes up a lot of
> time, because i get the data in chunks of 5760x2880x1. So I guess that
> writing the data to disk like this causes a load of IO operations ...
>
> My new question: Is there a way to create a file in-memory? If possible,
> I could then build up my database in-memory and then, once it's done,
> just copy the arrays to an on-disk file. Is that possible? If so, how?
>
> Thanks a lot for your help!
>
> -- Andreas.
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] pytable 30 - encoding

From: Jeff R. <jr...@ya...> - 2013-06-04 16:35:52

Anthony,
 
I am using numexpr 2.1 (latest)
 
this is puzzling; doesn't matter what I pass (bytes or str) , same result?
 
(column == 'str-2')
> /mnt/code/arb/test/pytables-3.py(38)<module>()
-> result = handle.root.test.table.readWhere(selector)
(Pdb) handle.root.test.table.readWhere(selector)
*** TypeError: string argument without an encoding
(Pdb) handle.root.test.table.readWhere(selector.encode(encoding))
*** TypeError: string argument without an encoding
(Pdb) 

  

________________________________
 From: Anthony Scopatz <sc...@gm...>
To: Jeff Reback <je...@re...>; Discussion list for PyTables <pyt...@li...> 
Sent: Tuesday, June 4, 2013 12:25 PM
Subject: Re: [Pytables-users] pytable 30 - encoding
  


Hi Jeff, 

Have you also updated numexpr to the most recent version?  The error is coming from numexpr not compiling the expression correctly. Also, you might try making selector a str, rather than bytes: 

selector = "(column == 'str-2')"


rather than

selector = "(column == 'str-2')".encode(encoding)


Be Well
Anthony



On Tue, Jun 4, 2013 at 8:51 AM, Jeff Reback <jr...@ya...> wrote:

anthony,where am I going wrong here?  
>#!/usr/local/bin/python3
>import tables
>import numpy as np
>import datetime, time
>encoding = 'UTF-8'
>test_file = 'test_select.h5'
>handle = tables.openFile(test_file, "w")
>node   = handle.createGroup(handle.root, 'test')
>table  = handle.createTable(node, 'table', dict(
>index   = tables.Int64Col(),
>        column  = tables.StringCol(25),
>values  = tables.FloatCol(shape=(3)),
>))
>
># add data
>r = table.row
>for i in range(10):
>r['index'] = i
>r['column'] = ("str-%d" % (i % 5)).encode(encoding)
>r['values'] = np.arange(3)
>r.append()
>table.flush()
>handle.close()
># read
>handle =
 tables.openFile(test_file,"r")
>result = handle.root.test.table.read()
>print("table data\n")
>print(result)
># where
>print("\nselector\n")
>selector = "(column == 'str-2')".encode(encoding)
>print(selector)
>result = handle.root.test.table.readWhere(selector)
>print(result)
>
>and the following out:
>
>[sheep-jreback-/code/arb/test] python3 pytables-3.py
>table data
>[(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0])
>(b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0])
>(b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0])
>(b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0])
>(b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])]
>selector
>b"(column == 'str-2')"
>Traceback (most recent call last):
>File "pytables-3.py", line 37, in <module>
>result =
 handle.root.test.table.readWhere(selector)
>File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py", line 35, in oldfunc
>return obj(*args, **kwargs)
>File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py", line 1522, in read_where
>self._where(condition, condvars, start, stop, step)]
>File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py", line 1484, in _where
>compiled = self._compile_condition(condition, condvars)
>File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py", line 1358, in _compile_condition
>compiled = compile_condition(condition, typemap, indexedcols)
>File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py", line 419, in compile_condition
>func = NumExpr(expr, signature)
>File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 559, in NumExpr
>precompile(ex, signature, context)
>File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 511, in precompile
>constants_order, constants = getConstants(ast)
>File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 294, in getConstants
>for a in constants_order]
>File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 294, in <listcomp>
>for a in constants_order]
>File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 284, in convertConstantToKind
>return kind_to_type[kind](x)
>TypeError: string argument without an encoding
>Closing remaining open files:
 test_select.h5... done 
>
>------------------------------------------------------------------------------
>How ServiceNow helps IT people transform IT departments:
>1. A cloud service to automate IT design, transition and operations
>2. Dashboards that offer high-level views of enterprise services
>3. A single system of record for all IT processes
>http://p.sf.net/sfu/servicenow-d2d-j
>_______________________________________________
>Pytables-users mailing list
>Pyt...@li...
>https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] pytable 30 - encoding

From: Anthony S. <sc...@gm...> - 2013-06-04 16:26:14

Hi Jeff,

Have you also updated numexpr to the most recent version?  The error is
coming from numexpr not compiling the expression correctly. Also, you might
try making selector a str, rather than bytes:

selector = "(column == 'str-2')"

rather than

selector = "(column == 'str-2')".encode(encoding)

Be Well
Anthony


On Tue, Jun 4, 2013 at 8:51 AM, Jeff Reback <jr...@ya...> wrote:

> anthony,where am I going wrong here?
> #!/usr/local/bin/python3
> import tables
> import numpy as np
> import datetime, time
> encoding = 'UTF-8'
> test_file = 'test_select.h5'
> handle = tables.openFile(test_file, "w")
> node = handle.createGroup(handle.root, 'test')
> table = handle.createTable(node, 'table', dict(
> index = tables.Int64Col(),
>         column = tables.StringCol(25),
> values = tables.FloatCol(shape=(3)),
> ))
>
> # add data
> r = table.row
> for i in range(10):
> r['index'] = i
> r['column'] = ("str-%d" % (i % 5)).encode(encoding)
> r['values'] = np.arange(3)
> r.append()
> table.flush()
> handle.close()
> # read
> handle = tables.openFile(test_file,"r")
> result = handle.root.test.table.read()
> print("table data\n")
> print(result)
> # where
> print("\nselector\n")
> selector = "(column == 'str-2')".encode(encoding)
> print(selector)
> result = handle.root.test.table.readWhere(selector)
> print(result)
> and the following out:
>
> [sheep-jreback-/code/arb/test] python3 pytables-3.py
> table data
> [(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0])
> (b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0])
> (b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0])
> (b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0])
> (b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])]
> selector
> b"(column == 'str-2')"
> Traceback (most recent call last):
> File "pytables-3.py", line 37, in <module>
> result = handle.root.test.table.readWhere(selector)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py",
> line 35, in oldfunc
> return obj(*args, **kwargs)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py",
> line 1522, in read_where
> self._where(condition, condvars, start, stop, step)]
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py",
> line 1484, in _where
> compiled = self._compile_condition(condition, condvars)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py",
> line 1358, in _compile_condition
> compiled = compile_condition(condition, typemap, indexedcols)
> File
> "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py",
> line 419, in compile_condition
> func = NumExpr(expr, signature)
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 559, in NumExpr
> precompile(ex, signature, context)
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 511, in precompile
> constants_order, constants = getConstants(ast)
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 294, in getConstants
> for a in constants_order]
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 294, in <listcomp>
> for a in constants_order]
> File
> "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py",
> line 284, in convertConstantToKind
> return kind_to_type[kind](x)
> TypeError: string argument without an encoding
> Closing remaining open files: test_select.h5... done
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] pytable 30 - encoding

From: Jeff R. <jr...@ya...> - 2013-06-04 13:51:28

anthony,
where am I going wrong here? 
#!/usr/local/bin/python3
import tables
import numpy as np
import datetime, time
encoding = 'UTF-8'
test_file = 'test_select.h5'
handle = tables.openFile(test_file, "w")
node   = handle.createGroup(handle.root, 'test')
table  = handle.createTable(node, 'table', dict(
index   = tables.Int64Col(),
        column  = tables.StringCol(25),
values  = tables.FloatCol(shape=(3)),
))

# add data
r = table.row
for i in range(10):
r['index'] = i
r['column'] = ("str-%d" % (i % 5)).encode(encoding)
r['values'] = np.arange(3)
r.append()
table.flush()
handle.close()
# read
handle = tables.openFile(test_file,"r")
result = handle.root.test.table.read()
print("table data\n")
print(result)
# where
print("\nselector\n")
selector = "(column == 'str-2')".encode(encoding)
print(selector)
result = handle.root.test.table.readWhere(selector)
print(result)

and the following out:

[sheep-jreback-/code/arb/test] python3 pytables-3.py
table data
[(b'str-0', 0, [0.0, 1.0, 2.0]) (b'str-1', 1, [0.0, 1.0, 2.0])
(b'str-2', 2, [0.0, 1.0, 2.0]) (b'str-3', 3, [0.0, 1.0, 2.0])
(b'str-4', 4, [0.0, 1.0, 2.0]) (b'str-0', 5, [0.0, 1.0, 2.0])
(b'str-1', 6, [0.0, 1.0, 2.0]) (b'str-2', 7, [0.0, 1.0, 2.0])
(b'str-3', 8, [0.0, 1.0, 2.0]) (b'str-4', 9, [0.0, 1.0, 2.0])]
selector
b"(column == 'str-2')"
Traceback (most recent call last):
File "pytables-3.py", line 37, in <module>
result = handle.root.test.table.readWhere(selector)
File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/_past.py", line 35, in oldfunc
return obj(*args, **kwargs)
File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py", line 1522, in read_where
self._where(condition, condvars, start, stop, step)]
File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py", line 1484, in _where
compiled = self._compile_condition(condition, condvars)
File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/table.py", line 1358, in _compile_condition
compiled = compile_condition(condition, typemap, indexedcols)
File "/usr/local/lib/python3.3/site-packages/tables-3.0.0-py3.3-linux-x86_64.egg/tables/conditions.py", line 419, in compile_condition
func = NumExpr(expr, signature)
File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 559, in NumExpr
precompile(ex, signature, context)
File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 511, in precompile
constants_order, constants = getConstants(ast)
File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 294, in getConstants
for a in constants_order]
File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 294, in <listcomp>
for a in constants_order]
File "/usr/local/lib/python3.3/site-packages/numexpr-2.1-py3.3-linux-x86_64.egg/numexpr/necompiler.py", line 284, in convertConstantToKind
return kind_to_type[kind](x)
TypeError: string argument without an encoding
Closing remaining open files: test_select.h5... done

[Pytables-users] pytable 3 - with encoding

From: Jeff R. <jr...@ya...> - 2013-06-04 13:46:25

anthony,

I can be reached on my cell (917)971-6387

Re: [Pytables-users] Chunk selection for optimized data access

From: Andreas H. <li...@hi...> - 2013-06-04 13:09:34

On 04.06.2013 05:35, Tim Burgess wrote:
> My thoughts are:
> 
> - try it without any compression. Assuming 32 bit floats, your monthly
> 5760 x 2880 is only about 65MB. Uncompressed data may perform well and
> at the least it will give you a baseline to work from - and will help if
> you are investigating IO tuning.
> 
> - I have found with CArray that the auto chunksize works fairly well.
> Experiment with that chunksize and with some chunksizes that you think
> are more appropriate (maybe temporal rather than spatial in your case).
> 
> On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hi...> wrote:
> 
>> On 03.06.2013 14:43, Andreas Hilboll wrote:
>> > Hi,
>> >
>> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
>> > (the last dimension represents time, and once per month there'll be one
>> > more 5760x2880 array to add to the end).
>> >
>> > Now, extracting timeseries at one index location is slow; e.g., for four
>> > indices, it takes several seconds:
>> >
>> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
>> >
>> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
>> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
>> > Wall time: 7.17 s
>> >
>> > I have the feeling that this performance could be improved, but I'm not
>> > sure about how to properly use the `chunkshape` parameter in my case.
>> >
>> > Any help is greatly appreciated :)
>> >
>> > Cheers, Andreas.
>>
>> PS: If I could get significant performance gains by not using an EArray
>> and therefore re-creating the whole database each month, then this would
>> also be an option.
>>
>> -- Andreas.

Thanks a lot, Anthony and Tim! I was able to get down the readout time
considerably using  chunkshape=(32, 32, 256) for my 5760x2880x150 array.
Now, reading times are about as fast as I expected.

the downside is that now, building up the database takes up a lot of
time, because i get the data in chunks of 5760x2880x1. So I guess that
writing the data to disk like this causes a load of IO operations ...

My new question: Is there a way to create a file in-memory? If possible,
I could then build up my database in-memory and then, once it's done,
just copy the arrays to an on-disk file. Is that possible? If so, how?

Thanks a lot for your help!

-- Andreas.

Re: [Pytables-users] Chunk selection for optimized data access

From: Tim B. <tim...@ma...> - 2013-06-04 04:04:39

and for the record...yes, it should be much faster than 4 seconds.

>>> foo = np.empty([5760,2880,150],dtype=np.float32)
>>> idx = ((5000,600,800,900),(1000,2000,500,1))
>>> import time
>>> t0 = time.time();bar=np.vstack([foo[i,j] for i,j in zip(*idx)]);t1=time.time(); print t1-t0
0.000144004821777

On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hi...> wrote:

On 03.06.2013 14:43, Andreas Hilboll wrote:
> Hi,
> 
> I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> (the last dimension represents time, and once per month there'll be one
> more 5760x2880 array to add to the end).
> 
> Now, extracting timeseries at one index location is slow; e.g., for four
> indices, it takes several seconds:
> 
> In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> 
> In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
> CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
> Wall time: 7.17 s
> 
> I have the feeling that this performance could be improved, but I'm not
> sure about how to properly use the `chunkshape` parameter in my case.
> 
> Any help is greatly appreciated :)
> 
> Cheers, Andreas.

PS: If I could get significant performance gains by not using an EArray
and therefore re-creating the whole database each month, then this would
also be an option.

-- Andreas.


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Pytables-users mailing list
Pyt...@li...
https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Chunk selection for optimized data access

From: Anthony S. <sc...@gm...> - 2013-06-04 03:37:48

Opps!  I forgot to mention CArray!


On Mon, Jun 3, 2013 at 10:35 PM, Tim Burgess <tim...@ma...> wrote:

> My thoughts are:
>
> - try it without any compression. Assuming 32 bit floats, your monthly
> 5760 x 2880 is only about 65MB. Uncompressed data may perform well and at
> the least it will give you a baseline to work from - and will help if you
> are investigating IO tuning.
>
> - I have found with CArray that the auto chunksize works fairly well.
> Experiment with that chunksize and with some chunksizes that you think are
> more appropriate (maybe temporal rather than spatial in your case).
>
>
> On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hi...> wrote:
>
> On 03.06.2013 14:43, Andreas Hilboll wrote:
> > Hi,
> >
> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> > (the last dimension represents time, and once per month there'll be one
> > more 5760x2880 array to add to the end).
> >
> > Now, extracting timeseries at one index location is slow; e.g., for four
> > indices, it takes several seconds:
> >
> > In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> >
> > In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
> > CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
> > Wall time: 7.17 s
> >
> > I have the feeling that this performance could be improved, but I'm not
> > sure about how to properly use the `chunkshape` parameter in my case.
> >
> > Any help is greatly appreciated :)
> >
> > Cheers, Andreas.
>
> PS: If I could get significant performance gains by not using an EArray
> and therefore re-creating the whole database each month, then this would
> also be an option.
>
> -- Andreas.
>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> ------------------------------------------------------------------------------
> How ServiceNow helps IT people transform IT departments:
> 1. A cloud service to automate IT design, transition and operations
> 2. Dashboards that offer high-level views of enterprise services
> 3. A single system of record for all IT processes
> http://p.sf.net/sfu/servicenow-d2d-j
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Chunk selection for optimized data access

From: Tim B. <tim...@ma...> - 2013-06-04 03:35:56

My thoughts are:

- try it without any compression. Assuming 32 bit floats, your monthly 5760 x 2880 is only about 65MB. Uncompressed data may perform well and at the least it will give you a baseline to work from - and will help if you are investigating IO tuning.

- I have found with CArray that the auto chunksize works fairly well. Experiment with that chunksize and with some chunksizes that you think are more appropriate (maybe temporal rather than spatial in your case).

On Jun 03, 2013, at 10:45 PM, Andreas Hilboll <li...@hi...> wrote:

On 03.06.2013 14:43, Andreas Hilboll wrote:
> Hi,
> 
> I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> (the last dimension represents time, and once per month there'll be one
> more 5760x2880 array to add to the end).
> 
> Now, extracting timeseries at one index location is slow; e.g., for four
> indices, it takes several seconds:
> 
> In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> 
> In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
> CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
> Wall time: 7.17 s
> 
> I have the feeling that this performance could be improved, but I'm not
> sure about how to properly use the `chunkshape` parameter in my case.
> 
> Any help is greatly appreciated :)
> 
> Cheers, Andreas.

PS: If I could get significant performance gains by not using an EArray
and therefore re-creating the whole database each month, then this would
also be an option.

-- Andreas.


------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Pytables-users mailing list
Pyt...@li...
https://lists.sourceforge.net/lists/listinfo/pytables-users

[Pytables-users] Anyone want to present at PyData Boston, July 27-28th

From: Anthony S. <sc...@gm...> - 2013-06-03 17:14:21

Hey everyone,

Leah Silen (CC'd) of NumFOCUS was wondering if anyone wanted to give a talk
or tutorial about PyTables at PyData Boston [1].

I don't think that I'll be able to make it, but I highly encourage others
to take her up on this.  This sort of thing shouldn't be too hard to put
together since I have already assembled a repo of slides and exercises for
a 4 hour long tutorial [2].  Feel free to use them!

Be Well
Anthony

1. http://pydata.org/bos2013/
2. https://github.com/scopatz/hdf5-is-for-lovers

Re: [Pytables-users] Chunk selection for optimized data access

From: Anthony S. <sc...@gm...> - 2013-06-03 15:50:17

Hi Andreas,

First off, nothing should be this bad, but....

What is the data type of the array?  Also are you selecting chunksize
manually or letting PyTables figure it out?

Here are some things that you can try:

1.  Query with fancy indexing, once.  That is, rather than using a list
comprehension just say, _a[zip(*idx)]

2. set _a.nrowsinbuf [1] to a much smaller value (1, 5, or 10) which is
more appropriate for pulling out individual indexes.

Lastly, it is my opinion that the iteration mechanics are slower than they
can / should be.  I have a bunch of ideas about how to make them faster AND
clean up the code base but I won't have a ton of time to work on them in
the near term.  However, if this is something that you are interested in,
that would be great!  I'd love to help out anyone who was willing to take
this on.

Be Well
Anthony

1.
http://pytables.github.io/usersguide/libref/hierarchy_classes.html#tables.Leaf.nrowsinbuf

On Mon, Jun 3, 2013 at 7:45 AM, Andreas Hilboll <li...@hi...> wrote:

> On 03.06.2013 14:43, Andreas Hilboll wrote:
> > Hi,
> >
> > I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> > (the last dimension represents time, and once per month there'll be one
> > more 5760x2880 array to add to the end).
> >
> > Now, extracting timeseries at one index location is slow; e.g., for four
> > indices, it takes several seconds:
> >
> >    In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> >
> >    In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
> >    CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
> >    Wall time: 7.17 s
> >
> > I have the feeling that this performance could be improved, but I'm not
> > sure about how to properly use the `chunkshape` parameter in my case.
> >
> > Any help is greatly appreciated :)
> >
> > Cheers, Andreas.
>
> PS: If I could get significant performance gains by not using an EArray
> and therefore re-creating the whole database each month, then this would
> also be an option.
>
> -- Andreas.
>
>
>
> ------------------------------------------------------------------------------
> Get 100% visibility into Java/.NET code with AppDynamics Lite
> It's a free troubleshooting tool designed for production
> Get down to code-level detail for bottlenecks, with <2% overhead.
> Download for free and get started troubleshooting in minutes.
> http://p.sf.net/sfu/appdyn_d2d_ap2
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Chunk selection for optimized data access

From: Andreas H. <li...@hi...> - 2013-06-03 12:46:05

On 03.06.2013 14:43, Andreas Hilboll wrote:
> Hi,
> 
> I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
> (the last dimension represents time, and once per month there'll be one
> more 5760x2880 array to add to the end).
> 
> Now, extracting timeseries at one index location is slow; e.g., for four
> indices, it takes several seconds:
> 
>    In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))
> 
>    In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
>    CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
>    Wall time: 7.17 s
> 
> I have the feeling that this performance could be improved, but I'm not
> sure about how to properly use the `chunkshape` parameter in my case.
> 
> Any help is greatly appreciated :)
> 
> Cheers, Andreas.

PS: If I could get significant performance gains by not using an EArray
and therefore re-creating the whole database each month, then this would
also be an option.

-- Andreas.

[Pytables-users] Chunk selection for optimized data access

From: Andreas H. <li...@hi...> - 2013-06-03 12:43:27

Hi,

I'm storing large datasets (5760 x 2880 x ~150) in a compressed EArray
(the last dimension represents time, and once per month there'll be one
more 5760x2880 array to add to the end).

Now, extracting timeseries at one index location is slow; e.g., for four
indices, it takes several seconds:

   In [19]: idx = ((5000, 600, 800, 900), (1000, 2000, 500, 1))

   In [20]: %time AA = np.vstack([_a[i,j] for i,j in zip(*idx)])
   CPU times: user 4.31 s, sys: 0.07 s, total: 4.38 s
   Wall time: 7.17 s

I have the feeling that this performance could be improved, but I'm not
sure about how to properly use the `chunkshape` parameter in my case.

Any help is greatly appreciated :)

Cheers, Andreas.

22 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 3 4 5 6 7 .. 165 > >> (Page 5 of 165)