pytables-users Mailing List for PyTables - Hierarchical datasets (Page 27)

Brought to you by: a_valentino, falted, ivilata, joshmoore

pytables-users — PyTables users discussion list

You can subscribe to this list here.

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 25 26 27 28 29 .. 165 > >> (Page 27 of 165)

Re: [Pytables-users] Combining in-kernel queries with out-of-core computations

From: Jon W. <js...@fn...> - 2012-06-06 15:24:44

Hi Anthony,

On 06/06/2012 12:45 AM, Anthony Scopatz wrote:
>
>     I think something like
>     histogram(tables.Expr('col0 + col1**2', mytable.where('col2 > 15 &
>     abs(col3) < 5')).eval())
>     would be ideal, but since where() returns a row iterator, and not
>     something that I can extract Column objects from, I don't see any
>     way to make it work.
>
>
> You are probably looking for the readWhere() method 
> <http://pytables.github.com/usersguide/libref.html#tables.Table.readWhere> which 
> normally returns a numpy structured array.  The line you are looking 
> for is thus:
>
> histogram(tables.Expr('col0 + col1**2', mytable.readWhere('col2 > 15 & 
> abs(col3) < 5')).eval())
>
> This will likely be fast in both cases.  I hope this helps.

Oddly, it doesn't work with tables.Expr, but does work with 
numexpr.evaluate.  In the case I talked about before with 7M rows, when 
selecting very few rows, it does just fine (between the other two 
solutions), but when selecting all rows, it is still about 2.75x slower 
than the technique of using tables.Expr for both the histogram var and 
the condition.

I think that this is because .readWhere() pulls all the table rows 
satisfying the where condition into memory first, and it furthermore 
does so for all columns of all selected rows, so, for a table with many 
columns, it has to read many times as much data into memory.  I can use 
the field parameter, but it only accepts one single field, so I would 
have to perform the query once per variable used in the histogram 
variable expression to do that.

Using .readWhere() gives a medium-fast performance in both cases, but I 
still feel like it is not quite the right thing because it reads the 
data completely into memory instead of allowing the computation to be 
performed out-of-core.  Perhaps it is not really feasible, but I think 
the ideal would be to have a .where type query operator that returns 
Column objects or a Table object, with a "view" imposed in either case.
Regards,
Jon

Re: [Pytables-users] Combining in-kernel queries with out-of-core computations

From: Anthony S. <sc...@gm...> - 2012-06-06 05:45:40

On Tue, Jun 5, 2012 at 10:32 PM, Jon Wilson <js...@fn...> wrote:

[snip]


>  I think something like
> histogram(tables.Expr('col0 + col1**2', mytable.where('col2 > 15 &
> abs(col3) < 5')).eval())
> would be ideal, but since where() returns a row iterator, and not
> something that I can extract Column objects from, I don't see any way to
> make it work.
>

You are probably looking for the readWhere()
method<http://pytables.github.com/usersguide/libref.html#tables.Table.readWhere>
which
normally returns a numpy structured array.  The line you are looking for is
thus:

histogram(tables.Expr('col0 + col1**2', mytable.readWhere('col2 > 15 &
abs(col3) < 5')).eval())

This will likely be fast in both cases.  I hope this helps.

Be Well
Anthony


>
> So, am I missing some way to compute the histogram variable in the numexpr
> kernel, but only for rows I'm interested in?
> Regards,
> Jon
>
>
> On 06/05/2012 09:45 PM, Anthony Scopatz wrote:
>
> Hello Jon,
>
>  I believe that the where() method just uses the Expr / numexpr
> functionality under the covers.  Anything that you can do in Expr you
> should be able to do in where().  Can you provide a short example where
> this is not the case?
>
>  Be Well
> Anthony
>
> On Tue, Jun 5, 2012 at 6:17 PM, Jon Wilson <js...@fn...> wrote:
>
>> Hi all,
>> In looking through the docs, I see two very nice features: the .where()
>> query method, and the tables.Expr computation mechanism.  But, it
>> doesn't appear to be possible to combine the two.  It appears that, if I
>> want to compute some function of my columns, but only for certain rows,
>> I have two options.
>>  - I can use tables.Expr to compute the function, and then filter the
>> results in python
>>  - I can use mytable.where() to select the rows I'm interested in, and
>> then compute the function in python
>>
>> Am I missing anything?  Is it possible to perform fast out-of-core
>> computations with numexpr, but only on a subset of the existing rows?
>> Regards,
>> Jon Wilson
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Combining in-kernel queries with out-of-core computations

From: Jon W. <js...@fn...> - 2012-06-06 03:32:07

Hi Anthony,
Allow me to clarify.  I wish to perform a reduction (histogramming,
specifically) over a function of some values, but only including certain
rows.  For instance, say I have a table with three columns, col0 --
col3.  I would like to create a histogram of col0 + col1**2, but only
where col2 > 15 and abs(col3) < 5.  As far as I understand, I can do the
following:
histogram(array([row['col0'] + row['col1']**2 for row in
mytable.where('col2 > 15 & abs(col3) < 5')]))

And this does produce the desired histogram.  However, mytable.where()
returns an iterator over rows, and then the list comprehension computes
col0 + col1**2 for each row in python space, which lacks the
optimization and multithreading of the numexpr kernel.  It seems as
though it should be possible to have both the condition and the
histogramming variable (col0 + col1**2) computed in the parallelized and
optimized numexpr kernel, but I do not see a way to do this using where().

The alternative that I can see would be to do something like:
histvar = tables.Expr('col0 + col1**2', vars(mytable.cols)).eval()
selection = tables.Expr('col2 > 15 & abs(col3) < 5',
vars(mytable.cols)).eval()
histogram(histvar, weights = selection)

This should produce the same histogram as above, and it does compute
both the histogram variable and the query condition in the numexpr
kernel, but it requires the computation of the histogram variable even
for rows I do not wish to include in the histogram.  If the table is
very large and relatively few rows are selected, or if computing the
histogram variable is expensive, this is quite undesirable.

So, it seems that I can either a) use the fast query operator where();
or, b) perform all computation in numexpr.  But not both.

FWIW, a quick timeit test shows that, on a table with ~1M rows, for a
very simple condition and a very simple histogram variable, the first
method is faster than the second method even when all rows are selected.
For a table with ~7M rows, for a more complex histogram variable and
still a very simple condition, the first method is faster than the
second method when only a few rows are selected, but when all rows are
selected, the second method is more than 10x faster.  (2.16s vs 3.27s
for few rows, 43.1s vs 3.19s for all 7M rows)  So it is clear that in
some cases, method 2 could be sped up substantially, and in other cases,
method 1 could be sped up enormously.

I think something like
histogram(tables.Expr('col0 + col1**2', mytable.where('col2 > 15 &
abs(col3) < 5')).eval())
would be ideal, but since where() returns a row iterator, and not
something that I can extract Column objects from, I don't see any way to
make it work.

So, am I missing some way to compute the histogram variable in the
numexpr kernel, but only for rows I'm interested in?
Regards,
Jon

On 06/05/2012 09:45 PM, Anthony Scopatz wrote:
> Hello Jon, 
>
> I believe that the where() method just uses the Expr / numexpr
> functionality under the covers.  Anything that you can do in Expr you
> should be able to do in where().  Can you provide a short example
> where this is not the case?
>
> Be Well
> Anthony
>
> On Tue, Jun 5, 2012 at 6:17 PM, Jon Wilson <js...@fn...
> <mailto:js...@fn...>> wrote:
>
>     Hi all,
>     In looking through the docs, I see two very nice features: the
>     .where()
>     query method, and the tables.Expr computation mechanism.  But, it
>     doesn't appear to be possible to combine the two.  It appears
>     that, if I
>     want to compute some function of my columns, but only for certain
>     rows,
>     I have two options.
>      - I can use tables.Expr to compute the function, and then filter the
>     results in python
>      - I can use mytable.where() to select the rows I'm interested in, and
>     then compute the function in python
>
>     Am I missing anything?  Is it possible to perform fast out-of-core
>     computations with numexpr, but only on a subset of the existing rows?
>     Regards,
>     Jon Wilson
>
>     ------------------------------------------------------------------------------
>     Live Security Virtual Conference
>     Exclusive live event will cover all the ways today's security and
>     threat landscape has changed and how IT managers can respond.
>     Discussions
>     will include endpoint security, mobile security and the latest in
>     malware
>     threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>     _______________________________________________
>     Pytables-users mailing list
>     Pyt...@li...
>     <mailto:Pyt...@li...>
>     https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Combining in-kernel queries with out-of-core computations

From: Anthony S. <sc...@gm...> - 2012-06-06 02:46:17

Hello Jon,

I believe that the where() method just uses the Expr / numexpr
functionality under the covers.  Anything that you can do in Expr you
should be able to do in where().  Can you provide a short example where
this is not the case?

Be Well
Anthony

On Tue, Jun 5, 2012 at 6:17 PM, Jon Wilson <js...@fn...> wrote:

> Hi all,
> In looking through the docs, I see two very nice features: the .where()
> query method, and the tables.Expr computation mechanism.  But, it
> doesn't appear to be possible to combine the two.  It appears that, if I
> want to compute some function of my columns, but only for certain rows,
> I have two options.
>  - I can use tables.Expr to compute the function, and then filter the
> results in python
>  - I can use mytable.where() to select the rows I'm interested in, and
> then compute the function in python
>
> Am I missing anything?  Is it possible to perform fast out-of-core
> computations with numexpr, but only on a subset of the existing rows?
> Regards,
> Jon Wilson
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

[Pytables-users] Combining in-kernel queries with out-of-core computations

From: Jon W. <js...@fn...> - 2012-06-05 23:31:17

Hi all,
In looking through the docs, I see two very nice features: the .where() 
query method, and the tables.Expr computation mechanism.  But, it 
doesn't appear to be possible to combine the two.  It appears that, if I 
want to compute some function of my columns, but only for certain rows, 
I have two options.
  - I can use tables.Expr to compute the function, and then filter the 
results in python
  - I can use mytable.where() to select the rows I'm interested in, and 
then compute the function in python

Am I missing anything?  Is it possible to perform fast out-of-core 
computations with numexpr, but only on a subset of the existing rows?
Regards,
Jon Wilson

Re: [Pytables-users] Can't open the group: '/'. 'File' object has no attribute 'root'

From: Francesc A. <fa...@py...> - 2012-06-02 09:58:43

Hi Chao,

On 6/2/12 11:55 AM, Chao YUE wrote:
> if I use gdalinfo to check the file:
> chaoyue@chaoyue-Aspire-4750:~/Downloads/LISOTD$ gdalinfo 
> LISOTD_HRMC_V2.3.2011.hdf
> Driver: HDF4/Hierarchical Data Format Release 4
[clip]

This says that the file has HDF4 format, not HDF5.  Please note that 
PyTables only can deal with HDF5 files.  For HDF4 I'd rather use pyhdf:

http://pysclint.sourceforge.net/pyhdf/

-- 
Francesc Alted

Re: [Pytables-users] Can't open the group: '/'. 'File' object has no attribute 'root'

From: Chao Y. <cha...@gm...> - 2012-06-02 09:56:02

if I use gdalinfo to check the file:
chaoyue@chaoyue-Aspire-4750:~/Downloads/LISOTD$ gdalinfo
LISOTD_HRMC_V2.3.2011.hdf
Driver: HDF4/Hierarchical Data Format Release 4
Files: LISOTD_HRMC_V2.3.2011.hdf
Size is 512, 512
Coordinate System is `'
Subdatasets:
  SUBDATASET_1_NAME=HDF4_SDS:UNKNOWN:"LISOTD_HRMC_V2.3.2011.hdf":0
  SUBDATASET_1_DESC=[360x720x12] HRMC_COM_FR (32-bit floating-point)
  SUBDATASET_2_NAME=HDF4_SDS:UNKNOWN:"LISOTD_HRMC_V2.3.2011.hdf":4
  SUBDATASET_2_DESC=[360x720x12] HRSC_COM_FR (32-bit floating-point)
Corner Coordinates:
Upper Left  (    0.0,    0.0)
Lower Left  (    0.0,  512.0)
Upper Right (  512.0,    0.0)
Lower Right (  512.0,  512.0)
Center      (  256.0,  256.0)

Chao

2012/6/2 Chao YUE <cha...@gm...>

> Dear all,
>
> I tried to use pytalbes to read a hdf file, but I got error:
> I searched a little bit online, there might be cases you have more than 2
> file handlers for the same file and they are opened for both read and
> write, you'll probably have this error.
> But it's not my case that I open it only for the first time.  From the
> error message, it seems that there is no root group.
>
> In [1]: h5file=tables.openFile('LISOTD_HRMC_V2.3.2011.hdf','r')
> ---------------------------------------------------------------------------
> HDF5ExtError                              Traceback (most recent call last)
> /home/chaoyue/Downloads/LISOTD/<ipython-input-1-b53d861308cf> in <module>()
> ----> 1 h5file=tables.openFile('LISOTD_HRMC_V2.3.2011.hdf','r')
>
> /usr/local/lib/python2.7/dist-packages/tables/file.pyc in
> openFile(filename, mode, title, rootUEP, filters, **kwargs)
>     256             return filehandle
>     257     # Finally, create the File instance, and return it
>
> --> 258     return File(filename, mode, title, rootUEP, filters, **kwargs)
>     259
>     260
>
> /usr/local/lib/python2.7/dist-packages/tables/file.pyc in __init__(self,
> filename, mode, title, rootUEP, filters, **kwargs)
>     565
>     566         # Get the root group from this file
>
> --> 567         self.root = root = self.__getRootGroup(rootUEP, title,
> filters)
>     568         # Complete the creation of the root node
>
>     569         # (see the explanation in ``RootGroup.__init__()``.
>
>
> /usr/local/lib/python2.7/dist-packages/tables/file.pyc in
> __getRootGroup(self, rootUEP, title, filters)
>     614         # Create new attributes for the root Group instance and
>
>     615         # create the object tree
>
> --> 616         return RootGroup(self, rootUEP, title=title, new=new,
> filters=filters)
>     617
>     618
>
> /usr/local/lib/python2.7/dist-packages/tables/group.pyc in __init__(self,
> ptFile, name, title, new, filters)
>    1155         self._g_new(ptFile, name, init=True)
>    1156         #   Open the node and get its object ID.
>
> -> 1157         self._v_objectID = self._g_open()
>    1158
>    1159         # Set disk attributes and read children names.
>
>
> /usr/local/lib/python2.7/dist-packages/tables/hdf5Extension.so in
> tables.hdf5Extension.Group._g_open (tables/hdf5Extension.c:5521)()
>
> HDF5ExtError: Can't open the group: '/'.
>
> If I open the file with this command, it gave no error:
>
> In [2]: h5file=tables.openFile('LISOTD_HRMC_V2.3.2011.hdf',mode='r')
>
> But when I try to print the file information, I get:
>
> In [3]: print h5file
> Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing
> the Group /',) in  ignored
> ---------------------------------------------------------------------------
> AttributeError                            Traceback (most recent call last)
> /home/chaoyue/Downloads/LISOTD/<ipython-input-3-64c76de88957> in <module>()
> ----> 1 print h5file
>
> /usr/local/lib/python2.7/dist-packages/tables/file.pyc in __str__(self)
>    2197         # Print all the nodes (Group and Leaf objects) on object
> tree
>
>    2198         date =
> time.asctime(time.localtime(os.stat(self.filename)[8]))
> -> 2199         astring =  self.filename + ' (File) ' + repr(self.title) +
> '\n'
>    2200 #         astring += 'rootUEP :=' + repr(self.rootUEP) + '; '
>
>    2201 #         astring += 'format_version := ' + self.format_version +
> '\n'
>
>
> /usr/local/lib/python2.7/dist-packages/tables/file.pyc in _gettitle(self)
>     474
>     475     def _gettitle(self):
> --> 476         return self.root._v_title
>     477     def _settitle(self, title):
>     478         self.root._v_title = title
>
> AttributeError: 'File' object has no attribute 'root'
>
> I have not too much experience handling HDF data but it's the second time
> I have this problem.
> In both cases the data are downloaded from official release of research
> data so I think it's unlikely that the data itself are badly produced.
> But if anyone has any interest trying to have a look of the issue, the
> data is at:
> ftp://ghrc.nsstc.nasa.gov/pub/lis/climatology/HRMC/data/
>
> The ftp is anonymous and the data released by NASA.
>
> thanks for any help in advance,
>
> best regards,
>
> Chao
>
> --
>
> ***********************************************************************************
> Chao YUE
> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
> UMR 1572 CEA-CNRS-UVSQ
> Batiment 712 - Pe 119
> 91191 GIF Sur YVETTE Cedex
> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
>
> ************************************************************************************
>
>


-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************

[Pytables-users] Can't open the group: '/'. 'File' object has no attribute 'root'

From: Chao Y. <cha...@gm...> - 2012-06-02 09:36:21

Dear all,

I tried to use pytalbes to read a hdf file, but I got error:
I searched a little bit online, there might be cases you have more than 2
file handlers for the same file and they are opened for both read and
write, you'll probably have this error.
But it's not my case that I open it only for the first time.  From the
error message, it seems that there is no root group.

In [1]: h5file=tables.openFile('LISOTD_HRMC_V2.3.2011.hdf','r')
---------------------------------------------------------------------------
HDF5ExtError                              Traceback (most recent call last)
/home/chaoyue/Downloads/LISOTD/<ipython-input-1-b53d861308cf> in <module>()
----> 1 h5file=tables.openFile('LISOTD_HRMC_V2.3.2011.hdf','r')

/usr/local/lib/python2.7/dist-packages/tables/file.pyc in
openFile(filename, mode, title, rootUEP, filters, **kwargs)
    256             return filehandle
    257     # Finally, create the File instance, and return it

--> 258     return File(filename, mode, title, rootUEP, filters, **kwargs)
    259
    260

/usr/local/lib/python2.7/dist-packages/tables/file.pyc in __init__(self,
filename, mode, title, rootUEP, filters, **kwargs)
    565
    566         # Get the root group from this file

--> 567         self.root = root = self.__getRootGroup(rootUEP, title,
filters)
    568         # Complete the creation of the root node

    569         # (see the explanation in ``RootGroup.__init__()``.


/usr/local/lib/python2.7/dist-packages/tables/file.pyc in
__getRootGroup(self, rootUEP, title, filters)
    614         # Create new attributes for the root Group instance and

    615         # create the object tree

--> 616         return RootGroup(self, rootUEP, title=title, new=new,
filters=filters)
    617
    618

/usr/local/lib/python2.7/dist-packages/tables/group.pyc in __init__(self,
ptFile, name, title, new, filters)
   1155         self._g_new(ptFile, name, init=True)
   1156         #   Open the node and get its object ID.

-> 1157         self._v_objectID = self._g_open()
   1158
   1159         # Set disk attributes and read children names.


/usr/local/lib/python2.7/dist-packages/tables/hdf5Extension.so in
tables.hdf5Extension.Group._g_open (tables/hdf5Extension.c:5521)()

HDF5ExtError: Can't open the group: '/'.

If I open the file with this command, it gave no error:

In [2]: h5file=tables.openFile('LISOTD_HRMC_V2.3.2011.hdf',mode='r')

But when I try to print the file information, I get:

In [3]: print h5file
Exception tables.exceptions.HDF5ExtError: HDF5ExtError('Problems closing
the Group /',) in  ignored
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/home/chaoyue/Downloads/LISOTD/<ipython-input-3-64c76de88957> in <module>()
----> 1 print h5file

/usr/local/lib/python2.7/dist-packages/tables/file.pyc in __str__(self)
   2197         # Print all the nodes (Group and Leaf objects) on object
tree

   2198         date =
time.asctime(time.localtime(os.stat(self.filename)[8]))
-> 2199         astring =  self.filename + ' (File) ' + repr(self.title) +
'\n'
   2200 #         astring += 'rootUEP :=' + repr(self.rootUEP) + '; '

   2201 #         astring += 'format_version := ' + self.format_version +
'\n'


/usr/local/lib/python2.7/dist-packages/tables/file.pyc in _gettitle(self)
    474
    475     def _gettitle(self):
--> 476         return self.root._v_title
    477     def _settitle(self, title):
    478         self.root._v_title = title

AttributeError: 'File' object has no attribute 'root'

I have not too much experience handling HDF data but it's the second time I
have this problem.
In both cases the data are downloaded from official release of research
data so I think it's unlikely that the data itself are badly produced.
But if anyone has any interest trying to have a look of the issue, the data
is at:
ftp://ghrc.nsstc.nasa.gov/pub/lis/climatology/HRMC/data/

The ftp is anonymous and the data released by NASA.

thanks for any help in advance,

best regards,

Chao

-- 
***********************************************************************************
Chao YUE
Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL)
UMR 1572 CEA-CNRS-UVSQ
Batiment 712 - Pe 119
91191 GIF Sur YVETTE Cedex
Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16
************************************************************************************

Re: [Pytables-users] External software access to PyTables HDF5 files

From: Francesc A. <fa...@py...> - 2012-05-22 09:18:16

On 5/21/12 9:31 PM, Josh Ayers wrote:
> Hi Alex,
>
> Reading a PyTables file in another platform should be easy, as long as 
> you use a compression library that is supported on both platforms.  
> The most widely available is likely to be zlib, since it is included 
> in the pre-built binaries available from the HDF group's website.  
> There are C and Fortran versions available here: 
> http://www.hdfgroup.org/HDF5/release/obtain5.html.  It looks like 
> there's also a partial .NET wrapper of the library here: http://hdf5.net/.
>
> Recent versions of Matlab also have support for HDF5 (the v7.3 
> "mat-file" format is based on it).  Since I have it available, I just 
> verified that Matlab R2011b can read PyTables files in uncompressed 
> and zlib compressed formats, using Matlab's h5read function.  It 
> failed when the PyTables file was compressed with bzip2, lzo, or 
> blosc.  I only tested it with a PyTables table, which is read into 
> Matlab as a struct.

Blosc is prepared to interact with the generic HDF5 library, so your 
current HDF5 applications can read datasets compressed with it.  But you 
need to re-compile your HDF5 library for getting this support:

https://github.com/FrancescAlted/blosc/tree/master/hdf5


>
> As far as writing files on another platform and then reading them in 
> PyTables, that will be a little more difficult.  There are certain 
> HDF5 attributes that are required by PyTables on each group and 
> dataset.  All the details are documented here: 
> http://pytables.github.com/usersguide/file_format.html.

Nope.  These attributes are not required, they are optional.  PyTables 
generally makes a good job at accessing HDF5 without this info.  FYI, 
these attributes are a superset of the High Level HDF5 library:

http://www.hdfgroup.org/HDF5/hdf5_hl/

-- 
Francesc Alted

Re: [Pytables-users] External software access to PyTables HDF5 files

From: Anthony S. <sc...@gm...> - 2012-05-21 20:45:07

Hi Alex,

In general, HDF5 files are very portable to many platforms and many
languages.  Indeed, that is sort of the purpose behind the HDF Group.
 While there are some incompatible edge cases, you sort of have to look for
them.  Josh did a very good job of outlining the support for HDF5 across
the board.  However, I would like to add that I have been using HDF5 /
PyTables for 3 - 4 years and have never had a compatibility issue.

Be Well
Anthony

On Mon, May 21, 2012 at 2:31 PM, Josh Ayers <jos...@gm...> wrote:

> Hi Alex,
>
> Reading a PyTables file in another platform should be easy, as long as you
> use a compression library that is supported on both platforms.  The most
> widely available is likely to be zlib, since it is included in the
> pre-built binaries available from the HDF group's website.  There are C and
> Fortran versions available here:
> http://www.hdfgroup.org/HDF5/release/obtain5.html.  It looks like there's
> also a partial .NET wrapper of the library here: http://hdf5.net/.
>
> Recent versions of Matlab also have support for HDF5 (the v7.3 "mat-file"
> format is based on it).  Since I have it available, I just verified that
> Matlab R2011b can read PyTables files in uncompressed and zlib compressed
> formats, using Matlab's h5read function.  It failed when the PyTables file
> was compressed with bzip2, lzo, or blosc.  I only tested it with a PyTables
> table, which is read into Matlab as a struct.
>
> As far as writing files on another platform and then reading them in
> PyTables, that will be a little more difficult.  There are certain HDF5
> attributes that are required by PyTables on each group and dataset.  All
> the details are documented here:
> http://pytables.github.com/usersguide/file_format.html.
>
> Hope that helps,
>
> Josh
>
>
> On Mon, May 21, 2012 at 10:12 AM, Alex Liberzon <ale...@gm...>wrote:
>
>> Dear PyTables developers,
>>
>> Thanks for the great project.
>>
>> I would like to suggest to a small scientific community (Lagrangian
>> particle tracking velocimetry and numerical simulations of turbulent flows)
>> to start using PyTables as a common platform for exchanging of large
>> datasets (few gigas to tens of terabytes). The major advantage I see in the
>> great query and on-disk analysis capabilities that are not present in the
>> original HDF5. However, one major drawback from some of the groups is the
>> question of software: people work with C, C#, Fortran, Python, Matlab and
>> use a wide range of visualization software platforms. In order to get to a
>> common ground we need something that I couldn't find so far: C, Fortran
>> libraries to access PyTables HDF files created by this great Python
>> library. What are the suggestions? Does somebody have a similar experience
>> of sharing data between groups that do not use Python?
>>
>> Thank you,
>> Alex Liberzon
>> Turbulence Structure Laboratory
>> Tel Aviv University
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] External software access to PyTables HDF5 files

From: Josh A. <jos...@gm...> - 2012-05-21 19:31:12

Hi Alex,

Reading a PyTables file in another platform should be easy, as long as you
use a compression library that is supported on both platforms.  The most
widely available is likely to be zlib, since it is included in the
pre-built binaries available from the HDF group's website.  There are C and
Fortran versions available here:
http://www.hdfgroup.org/HDF5/release/obtain5.html.  It looks like there's
also a partial .NET wrapper of the library here: http://hdf5.net/.

Recent versions of Matlab also have support for HDF5 (the v7.3 "mat-file"
format is based on it).  Since I have it available, I just verified that
Matlab R2011b can read PyTables files in uncompressed and zlib compressed
formats, using Matlab's h5read function.  It failed when the PyTables file
was compressed with bzip2, lzo, or blosc.  I only tested it with a PyTables
table, which is read into Matlab as a struct.

As far as writing files on another platform and then reading them in
PyTables, that will be a little more difficult.  There are certain HDF5
attributes that are required by PyTables on each group and dataset.  All
the details are documented here:
http://pytables.github.com/usersguide/file_format.html.

Hope that helps,

Josh

On Mon, May 21, 2012 at 10:12 AM, Alex Liberzon <ale...@gm...>wrote:

> Dear PyTables developers,
>
> Thanks for the great project.
>
> I would like to suggest to a small scientific community (Lagrangian
> particle tracking velocimetry and numerical simulations of turbulent flows)
> to start using PyTables as a common platform for exchanging of large
> datasets (few gigas to tens of terabytes). The major advantage I see in the
> great query and on-disk analysis capabilities that are not present in the
> original HDF5. However, one major drawback from some of the groups is the
> question of software: people work with C, C#, Fortran, Python, Matlab and
> use a wide range of visualization software platforms. In order to get to a
> common ground we need something that I couldn't find so far: C, Fortran
> libraries to access PyTables HDF files created by this great Python
> library. What are the suggestions? Does somebody have a similar experience
> of sharing data between groups that do not use Python?
>
> Thank you,
> Alex Liberzon
> Turbulence Structure Laboratory
> Tel Aviv University
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] External software access to PyTables HDF5 files

From: Alex L. <ale...@gm...> - 2012-05-21 17:12:58

Dear PyTables developers,

Thanks for the great project.

I would like to suggest to a small scientific community (Lagrangian
particle tracking velocimetry and numerical simulations of turbulent flows)
to start using PyTables as a common platform for exchanging of large
datasets (few gigas to tens of terabytes). The major advantage I see in the
great query and on-disk analysis capabilities that are not present in the
original HDF5. However, one major drawback from some of the groups is the
question of software: people work with C, C#, Fortran, Python, Matlab and
use a wide range of visualization software platforms. In order to get to a
common ground we need something that I couldn't find so far: C, Fortran
libraries to access PyTables HDF files created by this great Python
library. What are the suggestions? Does somebody have a similar experience
of sharing data between groups that do not use Python?

Thank you,
Alex Liberzon
Turbulence Structure Laboratory
Tel Aviv University

Re: [Pytables-users] iterate over specific Tables

From: Anthony S. <sc...@gm...> - 2012-05-21 14:34:41

Hi Uwe,

Sorry, I wrote this when I was away from my computer and so I couldn't test
it.
Our documentation is clearly wrong then.    However, what you *can* do is
take
the dtype from a known VideoNode table and then compare using this.

known_dtype = f.root.path_to_a_video_node.dtype
bar = filter(x.dtype == known_dtype for x in f.walkNodes('/', 'Table'))

Note that in your file your file you could create an empty table with the
VideoNode
description at a specific location just so that you can read out this dtype.

Be Well
Anthony

On Mon, May 21, 2012 at 6:20 AM, Uwe Mayer <uwe...@df...> wrote:

> Hi Anthony,
>
> On 05/19/2012 08:12 PM, Anthony Scopatz wrote:
> > Hello Uwe,
> >
> > Why don't you try something like:
> >
> > bar = filter(x.description == VideoNode for x in f.walkNodes('/',
> 'Table'))
> >
> > or
> >
> > bar = filter(x.dtype == VideoNode._v_dtype for x in f.walkNodes('/',
> > 'Table'))
> >
> > to compare the dtype / description directly?
>
> correction on my behalf, that would be exactly what I needed, but:
>
> - x.description compares false to a (correct) subclass of
> tables.IsDescription
>
> - a subclass of tables.IsDescription has no property _v_dtype to compare
> to x.dtype (from your example above)
>
>
> Any other ideas?
>
> Thanks in advance,
> Uwe
>
>
> > On May 18, 2012 8:00 AM, "Uwe Mayer" <uwe...@df...
> > <mailto:uwe...@df...>> wrote:
> >
> >     Hi,
> >
> >     I have several leaf nodes of the same table dtype:
> >
> >     class VideoNode(tables.IsDescription):
> >         ...
> >
> >     Not all tables in the hdf5 file are of the same type, however. How
> >     do I iterate
> >     over all leafes which are tables of the above class, while ignoring
> >     tables with
> >     different signatures?
> >
> >     i.e. I'd like to write something like:
> >     <code>
> >     f = tables.openFile(...)
> >     foo = f.walkNodes('/', classname='VideoNode')
> >     </code>
> >     which does not work because only the class name is "Table"...
> >
> >     or
> >     <code>
> >     bar = filter(isinstance(x, VideoNode) for x in f.walkNodes('/',
> >     'Table')))
> >     </code>
> >
> >     which does not work, because x is never an instance of VideoNode.
> >
> >     Any ideas?
> >
> >     Thanks in advance
> >     Uwe
> >
> >
> >
> ------------------------------------------------------------------------------
> >     Live Security Virtual Conference
> >     Exclusive live event will cover all the ways today's security and
> >     threat landscape has changed and how IT managers can respond.
> >     Discussions
> >     will include endpoint security, mobile security and the latest in
> >     malware
> >     threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >     _______________________________________________
> >     Pytables-users mailing list
> >     Pyt...@li...
> >     <mailto:Pyt...@li...>
> >     https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >
> >
> >
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] iterate over specific Tables

From: Uwe M. <uwe...@df...> - 2012-05-21 11:20:50

Hi Anthony,

On 05/19/2012 08:12 PM, Anthony Scopatz wrote:
> Hello Uwe,
>
> Why don't you try something like:
>
> bar = filter(x.description == VideoNode for x in f.walkNodes('/', 'Table'))
>
> or
>
> bar = filter(x.dtype == VideoNode._v_dtype for x in f.walkNodes('/',
> 'Table'))
>
> to compare the dtype / description directly?

correction on my behalf, that would be exactly what I needed, but:

- x.description compares false to a (correct) subclass of 
tables.IsDescription

- a subclass of tables.IsDescription has no property _v_dtype to compare 
to x.dtype (from your example above)


Any other ideas?

Thanks in advance,
Uwe


> On May 18, 2012 8:00 AM, "Uwe Mayer" <uwe...@df...
> <mailto:uwe...@df...>> wrote:
>
>     Hi,
>
>     I have several leaf nodes of the same table dtype:
>
>     class VideoNode(tables.IsDescription):
>         ...
>
>     Not all tables in the hdf5 file are of the same type, however. How
>     do I iterate
>     over all leafes which are tables of the above class, while ignoring
>     tables with
>     different signatures?
>
>     i.e. I'd like to write something like:
>     <code>
>     f = tables.openFile(...)
>     foo = f.walkNodes('/', classname='VideoNode')
>     </code>
>     which does not work because only the class name is "Table"...
>
>     or
>     <code>
>     bar = filter(isinstance(x, VideoNode) for x in f.walkNodes('/',
>     'Table')))
>     </code>
>
>     which does not work, because x is never an instance of VideoNode.
>
>     Any ideas?
>
>     Thanks in advance
>     Uwe
>
>
>     ------------------------------------------------------------------------------
>     Live Security Virtual Conference
>     Exclusive live event will cover all the ways today's security and
>     threat landscape has changed and how IT managers can respond.
>     Discussions
>     will include endpoint security, mobile security and the latest in
>     malware
>     threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>     _______________________________________________
>     Pytables-users mailing list
>     Pyt...@li...
>     <mailto:Pyt...@li...>
>     https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] iterate over specific Tables

From: Uwe M. <uwe...@df...> - 2012-05-21 08:15:04

Hi Anthony,

On 05/19/2012 08:12 PM, Anthony Scopatz wrote:
> Why don't you try something like:
>
> bar = filter(x.description == VideoNode for x in f.walkNodes('/', 'Table'))
>
> or
>
> bar = filter(x.dtype == VideoNode._v_dtype for x in f.walkNodes('/',
> 'Table'))
>
> to compare the dtype / description directly?

Oh. *g*
This is exactly what I was looking for. I did not know how to use the 
class in a comparison for this.

Thank you!
Uwe

> On May 18, 2012 8:00 AM, "Uwe Mayer" <uwe...@df...
> <mailto:uwe...@df...>> wrote:
>
>     Hi,
>
>     I have several leaf nodes of the same table dtype:
>
>     class VideoNode(tables.IsDescription):
>         ...
>
>     Not all tables in the hdf5 file are of the same type, however. How
>     do I iterate
>     over all leafes which are tables of the above class, while ignoring
>     tables with
>     different signatures?
>
>     i.e. I'd like to write something like:
>     <code>
>     f = tables.openFile(...)
>     foo = f.walkNodes('/', classname='VideoNode')
>     </code>
>     which does not work because only the class name is "Table"...
>
>     or
>     <code>
>     bar = filter(isinstance(x, VideoNode) for x in f.walkNodes('/',
>     'Table')))
>     </code>
>
>     which does not work, because x is never an instance of VideoNode.
>
>     Any ideas?
>
>     Thanks in advance
>     Uwe
>
>
>     ------------------------------------------------------------------------------
>     Live Security Virtual Conference
>     Exclusive live event will cover all the ways today's security and
>     threat landscape has changed and how IT managers can respond.
>     Discussions
>     will include endpoint security, mobile security and the latest in
>     malware
>     threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>     _______________________________________________
>     Pytables-users mailing list
>     Pyt...@li...
>     <mailto:Pyt...@li...>
>     https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] iterate over specific Tables

From: Anthony S. <sc...@gm...> - 2012-05-19 18:13:03

Hello Uwe,

Why don't you try something like:

bar = filter(x.description == VideoNode for x in f.walkNodes('/', 'Table'))

or

bar = filter(x.dtype == VideoNode._v_dtype for x in f.walkNodes('/',
'Table'))

to compare the dtype / description directly?

Be Well
Anthony
On May 18, 2012 8:00 AM, "Uwe Mayer" <uwe...@df...> wrote:

> Hi,
>
> I have several leaf nodes of the same table dtype:
>
> class VideoNode(tables.IsDescription):
>    ...
>
> Not all tables in the hdf5 file are of the same type, however. How do I
> iterate
> over all leafes which are tables of the above class, while ignoring tables
> with
> different signatures?
>
> i.e. I'd like to write something like:
> <code>
> f = tables.openFile(...)
> foo = f.walkNodes('/', classname='VideoNode')
> </code>
> which does not work because only the class name is "Table"...
>
> or
> <code>
> bar = filter(isinstance(x, VideoNode) for x in f.walkNodes('/', 'Table')))
> </code>
>
> which does not work, because x is never an instance of VideoNode.
>
> Any ideas?
>
> Thanks in advance
> Uwe
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Possible bug in ptrepack tool

From: Anthony S. <sc...@gm...> - 2012-05-19 17:57:50

Hello Nikola,

Thanks for reporting this issue (and sorry about the delayed reply).  I
have two requests for you:

1. could you come up with a self contained example that reproduces this
behaviour?

2. and could you maybe make a github issue related to this problem?

#1 is much more important.  Thanks a ton!

Be Well
Anthony
On May 18, 2012 5:01 AM, "nikola stevanovic" <nid...@gm...> wrote:

> *Hi,*
>
> Couple days ago, I make some experiments with pytables. I was curious
> about reading and writing speed for my future project.
> So, I decided make some tests. In my hdf5 files I have only one table
> named *Table_1*. I started tests with one million rows and after that
> keep continue testing with 100 000 000 and 500 000 000. This is how looks
> table structure:
>
> /Table_1 (Table(500000000,)) ''
>   description := {
>   "Device_ID": StringCol(itemsize=14, shape=(), dflt='', pos=0),
>   "DateTime": Time32Col(shape=(), dflt=0, pos=1),
>   "Value": Float32Col(shape=(), dflt=0.0, pos=2),
>   "Status": StringCol(itemsize=10, shape=(), dflt='', pos=3)}
>   byteorder := 'little'
>   chunkshape := (2048,0)
>   autoIndex := True
>   colindexes := {
>     "DateTime": Index(9, full, shuffle, zlib(1)).is_CSI=True}
>
>
> I didn't change chunkshape (default from creating table
> chunkshape=(2048,0)). Only thing I did is creating index on column
> DateTime. Everything worked fine. But, after 500 000 000 rows, I decide
> compare this table and table whith chunkshape=(65536). So I copy this table
> in other hdf5 file using ptrepack tool:
>
> ptrepack --chunkshape='(65536,0)' /home/azura/a.h5:/Table_1
> /home/azura/b.h5:/
>
> My new table work fine until I create index (CSIndex()) on DateTime
> column. Index creation was successful, but calling methods as *where(),
> getWhereList()* throws following exception:
>
> query = '(DateTime > 1293836400.0) & (DateTime < 1297292400.0)'
> a = numpy.array([ (x['Device_ID'],x['DateTime'],x['Value']) for x in
> tbl.where(query) ])
> Traceback (most recent call last):
>   File "<pyshell#100>", line 1, in <module>
>     a = numpy.array([ (x['Device_ID'],x['DateTime'],x['Value']) for x in
> tbl.where(query) ])
>   File "tableExtension.pyx", line 858, in
> tables.tableExtension.Row.__next__ (tables/tableExtension.c:7788)
>   File "tableExtension.pyx", line 879, in
> tables.tableExtension.Row.__next__indexed (tables/tableExtension.c:7922)
> AssertionError
>
>
> Then I decide make same table without ptrepack tool. So I created new
> table and fill with 500 000 000 rows (same chunkshape, same record
> structure). Everythings works fine, so my conclusion is that there is a bug
> in ptrepack tool. Note that exception appear in copied table after creating
> CS index. I'm just curious about this. What can be wrong?
>
> I'm using Ubuntu 12.04TLS with ext4
> Processor: Intel® Core™ i3 CPU M 380 @ 2.53GHz × 4
> RAM: 4GB
> HARD DISK: 500GB
>
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
> PyTables version:  2.3.1
> HDF5 version:      1.8.4-patch1
> NumPy version:     1.6.0
> Numexpr version:   2.0.1 (not using Intel's VML/MKL)
> Zlib version:      1.2.3.4 (in Python interpreter)
> Blosc version:     1.1.2 (2010-11-04)
> Cython version:    0.16
> Python version:    2.7.3 (default, Apr 20 2012, 22:44:07)
> [GCC 4.6.3]
> Platform:          linux2-i686
> Byte-ordering:     little
> Detected cores:    4
>
> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>
> File(filename=/home/azura/b.h5, title='', mode='a', rootUEP='/',
> filters=Filters(complevel=0, shuffle=False, fletcher32=False))
> / (RootGroup) ''
> /Table_1 (Table(500000000,)) ''
>   description := {
>   "Device_ID": StringCol(itemsize=14, shape=(), dflt='', pos=0),
>   "DateTime": Time32Col(shape=(), dflt=0, pos=1),
>   "Value": Float32Col(shape=(), dflt=0.0, pos=2),
>   "Status": StringCol(itemsize=10, shape=(), dflt='', pos=3)}
>   byteorder := 'little'
>   chunkshape := (65536,)
>   autoIndex := True
>   colindexes := {
>     "DateTime": Index(9, full, shuffle, zlib(1)).is_CSI=True}
>
> *Cheers!*
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] iterate over specific Tables

From: Uwe M. <uwe...@df...> - 2012-05-18 13:00:18

Hi,

I have several leaf nodes of the same table dtype:

class VideoNode(tables.IsDescription):
    ...

Not all tables in the hdf5 file are of the same type, however. How do I iterate
over all leafes which are tables of the above class, while ignoring tables with
different signatures?

i.e. I'd like to write something like:
<code>
f = tables.openFile(...)
foo = f.walkNodes('/', classname='VideoNode')
</code>
which does not work because only the class name is "Table"...

or
<code>
bar = filter(isinstance(x, VideoNode) for x in f.walkNodes('/', 'Table')))
</code>

which does not work, because x is never an instance of VideoNode.

Any ideas?

Thanks in advance
Uwe

[Pytables-users] Possible bug in ptrepack tool

From: nikola s. <nid...@gm...> - 2012-05-18 10:01:10

*Hi,*

Couple days ago, I make some experiments with pytables. I was curious about
reading and writing speed for my future project.
So, I decided make some tests. In my hdf5 files I have only one table named
*Table_1*. I started tests with one million rows and after that keep
continue testing with 100 000 000 and 500 000 000. This is how looks table
structure:

/Table_1 (Table(500000000,)) ''
  description := {
  "Device_ID": StringCol(itemsize=14, shape=(), dflt='', pos=0),
  "DateTime": Time32Col(shape=(), dflt=0, pos=1),
  "Value": Float32Col(shape=(), dflt=0.0, pos=2),
  "Status": StringCol(itemsize=10, shape=(), dflt='', pos=3)}
  byteorder := 'little'
  chunkshape := (2048,0)
  autoIndex := True
  colindexes := {
    "DateTime": Index(9, full, shuffle, zlib(1)).is_CSI=True}


I didn't change chunkshape (default from creating table
chunkshape=(2048,0)). Only thing I did is creating index on column
DateTime. Everything worked fine. But, after 500 000 000 rows, I decide
compare this table and table whith chunkshape=(65536). So I copy this table
in other hdf5 file using ptrepack tool:

ptrepack --chunkshape='(65536,0)' /home/azura/a.h5:/Table_1
/home/azura/b.h5:/

My new table work fine until I create index (CSIndex()) on DateTime column.
Index creation was successful, but calling methods as *where(),
getWhereList()* throws following exception:

query = '(DateTime > 1293836400.0) & (DateTime < 1297292400.0)'
a = numpy.array([ (x['Device_ID'],x['DateTime'],x['Value']) for x in
tbl.where(query) ])
Traceback (most recent call last):
  File "<pyshell#100>", line 1, in <module>
    a = numpy.array([ (x['Device_ID'],x['DateTime'],x['Value']) for x in
tbl.where(query) ])
  File "tableExtension.pyx", line 858, in
tables.tableExtension.Row.__next__ (tables/tableExtension.c:7788)
  File "tableExtension.pyx", line 879, in
tables.tableExtension.Row.__next__indexed (tables/tableExtension.c:7922)
AssertionError


Then I decide make same table without ptrepack tool. So I created new table
and fill with 500 000 000 rows (same chunkshape, same record structure).
Everythings works fine, so my conclusion is that there is a bug in ptrepack
tool. Note that exception appear in copied table after creating CS index.
I'm just curious about this. What can be wrong?

I'm using Ubuntu 12.04TLS with ext4
Processor: Intel® Core™ i3 CPU M 380 @ 2.53GHz × 4
RAM: 4GB
HARD DISK: 500GB

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
PyTables version:  2.3.1
HDF5 version:      1.8.4-patch1
NumPy version:     1.6.0
Numexpr version:   2.0.1 (not using Intel's VML/MKL)
Zlib version:      1.2.3.4 (in Python interpreter)
Blosc version:     1.1.2 (2010-11-04)
Cython version:    0.16
Python version:    2.7.3 (default, Apr 20 2012, 22:44:07)
[GCC 4.6.3]
Platform:          linux2-i686
Byte-ordering:     little
Detected cores:    4
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

File(filename=/home/azura/b.h5, title='', mode='a', rootUEP='/',
filters=Filters(complevel=0, shuffle=False, fletcher32=False))
/ (RootGroup) ''
/Table_1 (Table(500000000,)) ''
  description := {
  "Device_ID": StringCol(itemsize=14, shape=(), dflt='', pos=0),
  "DateTime": Time32Col(shape=(), dflt=0, pos=1),
  "Value": Float32Col(shape=(), dflt=0.0, pos=2),
  "Status": StringCol(itemsize=10, shape=(), dflt='', pos=3)}
  byteorder := 'little'
  chunkshape := (65536,)
  autoIndex := True
  colindexes := {
    "DateTime": Index(9, full, shuffle, zlib(1)).is_CSI=True}

*Cheers!*

Re: [Pytables-users] Expr performance with Tables on multicore machines

From: Francesc A. <fa...@py...> - 2012-05-14 21:11:39

On 5/14/12 3:12 PM, Anthony Scopatz wrote:
>
>
> On Mon, May 14, 2012 at 3:05 PM, Francesc Alted <fa...@py... 
> <mailto:fa...@py...>> wrote:
>
> [snip]
>
>     However, do not expect to use all your cores at full speed in this
>     cases, as the reductions in numexpr can only make use of one
>     thread (this is because this has not been implemented yet, not due
>     to a intrinsic limitation of numexpr).
>
>
> Hello Francesc,
>
> Not to side track the discussion too much, but is there a ticket open 
> for this in numexpr?

There is:

http://code.google.com/p/numexpr/issues/detail?id=73


>  It seems that at least for certain reductions (sum, mult, etc), 
> splitting this up over many cores would be pretty easy.  I may to 
> wrong about this though ;)

Apparently should be easy, but the reality proves it to be a bit harder 
;)  I remember to spent some quality time on this, but did not get able 
to solve the problem.  But it is *solvable* for sure.

Anyway, after looking into the ticket above, the next code could be faster:

     fn_str = '(a - (b + %s))**2' % db
     expr = Expr(fn_str,uservars=uv)
     # returning the "sum of squares"
     return expr.eval().sum()

Which is basically what you was suggesting: using numpy.sum().  But 
definitely, the elegant solution would be to make reductions use 
multiple cores on numexpr.

Francesc

>
> Be Well
> Anthony
>
>
>     Francesc
>
>
>>
>>     I hope this helps.  If you need other tips on speeding up the
>>     sum operation, please let us know.
>>
>>     Be Well
>>     Anthony
>>
>>     Timer unit: 1e-06 s
>>
>>     File: pytables_expr_test.py
>>     Function: fn at line 66
>>     Total time: 1.63254 s
>>
>>     Line #      Hits         Time  Per Hit   % Time  Line Contents
>>     ==============================================================
>>         66                                           def fn(p, h5table):
>>         67                                               '''
>>         68                                                   actual
>>     function we are going to minimize. It consists of
>>         69                                                   the
>>     pytables Table object and a list of parameters.
>>         70                                               '''
>>         71         1           14     14.0      0.0      uv =
>>     h5table.colinstances
>>         72
>>         73                                               # store
>>     parameters in a dict object with names
>>         74                                               # like p0,
>>     p1, p2, etc. so they can be used in
>>         75                                               # the Expr
>>     object.
>>         76         4           21      5.2      0.0      for i in
>>     xrange(len(p)):
>>         77         3           19      6.3      0.0          k =
>>     'p'+str(i)
>>         78         3           14      4.7      0.0          uv[k] = p[i]
>>         79
>>         80                                               # systematic
>>     shift on b is a polynomial in a
>>         81         1            4      4.0      0.0      db = 'p0 *
>>     a*a  +  p1 * a  +  p2'
>>         82
>>         83                                               # the
>>     element-wise function
>>         84         1            6      6.0      0.0      fn_str = '(a
>>     - (b + %s))**2' % db
>>         85
>>         86         1        16427  16427.0      1.0      expr =
>>     Expr(fn_str,uservars=uv)
>>         87         1        21438  21438.0      1.3      expr.eval()
>>         88
>>         89                                               # returning
>>     the "sum of squares"
>>         90         1      1594600 1594600.0     97.7      return
>>     sum(expr)
>>
>>
>>
>>
>>     On Mon, May 14, 2012 at 1:59 PM, Johann Goetz <jg...@uc...
>>     <mailto:jg...@uc...>> wrote:
>>
>>         SHORT VERSION:
>>
>>         Please take a look at the fn() function in the attached file
>>         (pasted below). When I run this with 10M events or more I
>>         notice that the total CPU usage never goes above the
>>         percentage I get using single-threaded eval(). Am I at some
>>         other limit or can I improve performance by doing something else?
>>
>>         LONG VERSION:
>>
>>         I have been trying to use the tables.Expr object to speed up
>>         a sophisticated calculation over an entire dataset (a
>>         pytables Table object). The calculation took so long that I
>>         had to create a simple example to make sure I knew what I was
>>         doing. I apologize in advance for the lengthy code below, but
>>         I wanted the example to mimic exactly what I'm trying to do
>>         and to be totally self-contained.
>>
>>         I have attached a file (and pasted it below) in which I
>>         create a hdf5 file with a single large Table of two columns.
>>         As you can see, I'm not worried about writing speed at all -
>>         I'm concerned about read speed.
>>
>>         I would like to draw your attention to the fn() function.
>>         This is where I evaluate a "chi-squared" value on the
>>         dataset. My strategy is to populate the
>>         "h5table.colinstances" dict object with several parameters
>>         which I call p0, p1, etc and then create the Expr object
>>         using these and the column names from the Table.
>>
>>         If I create 10M rows (77 MB file) in the Table (with the
>>         command below), the evaluation seems to be CPU bound (one of
>>         my cores is at 100% - the others are idle) and it takes about
>>         7 seconds (about 10 MB/s). Similarly, I get about 70 seconds
>>         for 100M events.
>>
>>         python pytables_expr_test.py 10000000
>>         python pytables_expr_test.py 100000000
>>
>>         So my question:  It seems to me that I am not fully using the
>>         CPU power available on my computer (see next paragraph). Am I
>>         missing something or doing something wrong in the fn()
>>         function below?
>>
>>         A few side-notes: My hard-disk is capable of over 200 MB/s in
>>         sequential reading (sustained and tested with large files
>>         using the iozone program), I have two 4-core CPU's on this
>>         machine but the total CPU usage during eval() never goes
>>         above the percentage I get using single-threaded mode with
>>         "numexpr.set_num_threads(1)".
>>
>>         I am using pytables 2.3.1 and numexpr 2.0.1
>>
>>         -- 
>>         Johann T. Goetz, PhD.
>>         <http://sites.google.com/site/theodoregoetz/>
>>         jg...@uc... <mailto:jg...@uc...>
>>         Nefkens Group, UCLA Dept. of Physics & Astronomy
>>         Hall-B, Jefferson Lab, Newport News, VA
>>
>>
>>         ### BEGIN file: pytables_expr_test.py
>>
>>         from tables import openFile, Expr
>>
>>         ### Control of the number of threads used when issuing the
>>         ### Expr::eval() command
>>         #import numexpr
>>         #numexpr.set_num_threads(2)
>>
>>         def create_ntuple_file(filename, npoints, pmodel):
>>             '''
>>                 create an hdf5 file with a single table which contains
>>                 npoints number of rows of type row_t (defined below)
>>             '''
>>             from numpy import random, poly1d
>>             from tables import IsDescription, Float32Col
>>
>>             class row_t(IsDescription):
>>                 '''
>>                     the rows of the table to be created
>>                 '''
>>                 a = Float32Col()
>>                 b = Float32Col()
>>
>>             def append_row(h5row, pmodel):
>>                 '''
>>                     consider this a single "event" being appended
>>                     to the dataset (table)
>>                 '''
>>                 h5row['a'] = random.uniform(0,10)
>>
>>                 h5row['b'] = h5row['a'] # reality (or model)
>>                 h5row['b'] = h5row['b'] - poly1d(pmodel)(h5row['a'])
>>         # systematics
>>                 h5row['b'] = h5row['b'] + random.normal(0,0.1) # noise
>>
>>                 h5row.append()
>>
>>             h5file = openFile(filename, 'w')
>>             h5table = h5file.createTable('/', 'table', row_t, "Data")
>>             h5row = h5table.row
>>
>>             # recording data to file...
>>             for n in xrange(npoints):
>>                 append_row(h5row, pmodel)
>>
>>             h5file.close()
>>
>>         def create_ntuple_file_if_needed(filename, npoints, pmodel):
>>             '''
>>                 looks to see if the file is already there and if so,
>>                 it makes sure its the right size. Otherwise, it
>>                 removes the existing file and creates a new one.
>>             '''
>>             from os import path, remove
>>
>>             print 'model parameters:', pmodel
>>
>>             if path.exists(filename):
>>                 h5file = openFile(filename, 'r')
>>                 h5table = h5file.root.table
>>                 if len(h5table) != npoints:
>>                     h5file.close()
>>                     remove(filename)
>>
>>             if not path.exists(filename):
>>                 create_ntuple_file(filename, npoints, pmodel)
>>
>>         def fn(p, h5table):
>>             '''
>>                 actual function we are going to minimize. It consists of
>>                 the pytables Table object and a list of parameters.
>>             '''
>>             uv = h5table.colinstances
>>
>>             # store parameters in a dict object with names
>>             # like p0, p1, p2, etc. so they can be used in
>>             # the Expr object.
>>             for i in xrange(len(p)):
>>                 k = 'p'+str(i)
>>                 uv[k] = p[i]
>>
>>             # systematic shift on b is a polynomial in a
>>             db = 'p0 * a*a  +  p1 * a  +  p2'
>>
>>             # the element-wise function
>>             fn_str = '(a - (b + %s))**2' % db
>>
>>             expr = Expr(fn_str,uservars=uv)
>>             expr.eval()
>>
>>             # returning the "sum of squares"
>>             return sum(expr)
>>
>>         if __name__ == '__main__':
>>             '''
>>             usage:
>>                 python pytables_expr_test.py [npoints]
>>
>>             Hint: try this with 10M points
>>             '''
>>             from sys import argv
>>             from time import time
>>
>>             npoints = 1000000
>>             if len(argv) > 1:
>>                 npoints = int(argv[1])
>>
>>             filename = 'tmp.'+str(npoints)+'.hdf5'
>>
>>             pmodel = [-0.04,0.002,0.001]
>>
>>             print 'creating file (if it doesn\'t exist)...'
>>             create_ntuple_file_if_needed(filename, npoints, pmodel)
>>
>>             h5file = openFile(filename, 'r')
>>             h5table = h5file.root.table
>>
>>             print 'evaluating function'
>>             starttime = time()
>>             print fn([0.,0.,0.], h5table)
>>             print 'evaluated file in',time()-starttime,'seconds.'
>>
>>         #EOF
>>
>>
>>         ------------------------------------------------------------------------------
>>         Live Security Virtual Conference
>>         Exclusive live event will cover all the ways today's security and
>>         threat landscape has changed and how IT managers can respond.
>>         Discussions
>>         will include endpoint security, mobile security and the
>>         latest in malware
>>         threats.
>>         http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>         _______________________________________________
>>         Pytables-users mailing list
>>         Pyt...@li...
>>         <mailto:Pyt...@li...>
>>         https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>>
>>
>>     ------------------------------------------------------------------------------
>>     Live Security Virtual Conference
>>     Exclusive live event will cover all the ways today's security and
>>     threat landscape has changed and how IT managers can respond. Discussions
>>     will include endpoint security, mobile security and the latest in malware
>>     threats.http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>
>>
>>     _______________________________________________
>>     Pytables-users mailing list
>>     Pyt...@li...  <mailto:Pyt...@li...>
>>     https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>     -- 
>     Francesc Alted
>
>
>     ------------------------------------------------------------------------------
>     Live Security Virtual Conference
>     Exclusive live event will cover all the ways today's security and
>     threat landscape has changed and how IT managers can respond.
>     Discussions
>     will include endpoint security, mobile security and the latest in
>     malware
>     threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>     _______________________________________________
>     Pytables-users mailing list
>     Pyt...@li...
>     <mailto:Pyt...@li...>
>     https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users


-- 
Francesc Alted

Re: [Pytables-users] Expr performance with Tables on multicore machines

From: Anthony S. <sc...@gm...> - 2012-05-14 20:12:56

On Mon, May 14, 2012 at 3:05 PM, Francesc Alted <fa...@py...> wrote:

[snip]

However, do not expect to use all your cores at full speed in this cases,
> as the reductions in numexpr can only make use of one thread (this is
> because this has not been implemented yet, not due to a intrinsic
> limitation of numexpr).
>

Hello Francesc,

Not to side track the discussion too much, but is there a ticket open for
this in numexpr?  It seems that at least for certain reductions (sum, mult,
etc), splitting this up over many cores would be pretty easy.  I may to
wrong about this though ;)

Be Well
Anthony


>
> Francesc
>
>
>
>
>  I hope this helps.  If you need other tips on speeding up the
> sum operation, please let us know.
>
>  Be Well
> Anthony
>
>   Timer unit: 1e-06 s
>
>  File: pytables_expr_test.py
> Function: fn at line 66
> Total time: 1.63254 s
>
>  Line #      Hits         Time  Per Hit   % Time  Line Contents
> ==============================================================
>     66                                           def fn(p, h5table):
>     67                                               '''
>     68                                                   actual function
> we are going to minimize. It consists of
>     69                                                   the pytables
> Table object and a list of parameters.
>     70                                               '''
>     71         1           14     14.0      0.0      uv =
> h5table.colinstances
>     72
>     73                                               # store parameters in
> a dict object with names
>     74                                               # like p0, p1, p2,
> etc. so they can be used in
>     75                                               # the Expr object.
>     76         4           21      5.2      0.0      for i in
> xrange(len(p)):
>     77         3           19      6.3      0.0          k = 'p'+str(i)
>     78         3           14      4.7      0.0          uv[k] = p[i]
>     79
>     80                                               # systematic shift on
> b is a polynomial in a
>     81         1            4      4.0      0.0      db = 'p0 * a*a  +  p1
> * a  +  p2'
>     82
>     83                                               # the element-wise
> function
>     84         1            6      6.0      0.0      fn_str = '(a - (b +
> %s))**2' % db
>     85
>     86         1        16427  16427.0      1.0      expr =
> Expr(fn_str,uservars=uv)
>     87         1        21438  21438.0      1.3      expr.eval()
>     88
>     89                                               # returning the "sum
> of squares"
>     90         1      1594600 1594600.0     97.7      return sum(expr)
>
>
>
>
> On Mon, May 14, 2012 at 1:59 PM, Johann Goetz <jg...@uc...> wrote:
>
>> SHORT VERSION:
>>
>> Please take a look at the fn() function in the attached file (pasted
>> below). When I run this with 10M events or more I notice that the total CPU
>> usage never goes above the percentage I get using single-threaded eval().
>> Am I at some other limit or can I improve performance by doing something
>> else?
>>
>> LONG VERSION:
>>
>> I have been trying to use the tables.Expr object to speed up a
>> sophisticated calculation over an entire dataset (a pytables Table object).
>> The calculation took so long that I had to create a simple example to make
>> sure I knew what I was doing. I apologize in advance for the lengthy code
>> below, but I wanted the example to mimic exactly what I'm trying to do and
>> to be totally self-contained.
>>
>> I have attached a file (and pasted it below) in which I create a hdf5
>> file with a single large Table of two columns. As you can see, I'm not
>> worried about writing speed at all - I'm concerned about read speed.
>>
>> I would like to draw your attention to the fn() function. This is where I
>> evaluate a "chi-squared" value on the dataset. My strategy is to populate
>> the "h5table.colinstances" dict object with several parameters which I call
>> p0, p1, etc and then create the Expr object using these and the column
>> names from the Table.
>>
>> If I create 10M rows (77 MB file) in the Table (with the command below),
>> the evaluation seems to be CPU bound (one of my cores is at 100% - the
>> others are idle) and it takes about 7 seconds (about 10 MB/s). Similarly, I
>> get about 70 seconds for 100M events.
>>
>> python pytables_expr_test.py 10000000
>> python pytables_expr_test.py 100000000
>>
>> So my question:  It seems to me that I am not fully using the CPU power
>> available on my computer (see next paragraph). Am I missing something or
>> doing something wrong in the fn() function below?
>>
>> A few side-notes: My hard-disk is capable of over 200 MB/s in sequential
>> reading (sustained and tested with large files using the iozone program), I
>> have two 4-core CPU's on this machine but the total CPU usage during eval()
>> never goes above the percentage I get using single-threaded mode with
>> "numexpr.set_num_threads(1)".
>>
>> I am using pytables 2.3.1 and numexpr 2.0.1
>>
>> --
>> Johann T. Goetz, PhD. <http://sites.google.com/site/theodoregoetz/>
>> jg...@uc...
>> Nefkens Group, UCLA Dept. of Physics & Astronomy
>> Hall-B, Jefferson Lab, Newport News, VA
>>
>>
>> ### BEGIN file: pytables_expr_test.py
>>
>> from tables import openFile, Expr
>>
>> ### Control of the number of threads used when issuing the
>> ### Expr::eval() command
>> #import numexpr
>> #numexpr.set_num_threads(2)
>>
>> def create_ntuple_file(filename, npoints, pmodel):
>>     '''
>>         create an hdf5 file with a single table which contains
>>         npoints number of rows of type row_t (defined below)
>>     '''
>>     from numpy import random, poly1d
>>     from tables import IsDescription, Float32Col
>>
>>     class row_t(IsDescription):
>>         '''
>>             the rows of the table to be created
>>         '''
>>         a = Float32Col()
>>         b = Float32Col()
>>
>>     def append_row(h5row, pmodel):
>>         '''
>>             consider this a single "event" being appended
>>             to the dataset (table)
>>         '''
>>         h5row['a'] = random.uniform(0,10)
>>
>>         h5row['b'] = h5row['a'] # reality (or model)
>>         h5row['b'] = h5row['b'] - poly1d(pmodel)(h5row['a']) # systematics
>>         h5row['b'] = h5row['b'] + random.normal(0,0.1) # noise
>>
>>         h5row.append()
>>
>>     h5file = openFile(filename, 'w')
>>     h5table = h5file.createTable('/', 'table', row_t, "Data")
>>     h5row = h5table.row
>>
>>     # recording data to file...
>>     for n in xrange(npoints):
>>         append_row(h5row, pmodel)
>>
>>     h5file.close()
>>
>> def create_ntuple_file_if_needed(filename, npoints, pmodel):
>>     '''
>>         looks to see if the file is already there and if so,
>>         it makes sure its the right size. Otherwise, it
>>         removes the existing file and creates a new one.
>>     '''
>>     from os import path, remove
>>
>>     print 'model parameters:', pmodel
>>
>>     if path.exists(filename):
>>         h5file = openFile(filename, 'r')
>>         h5table = h5file.root.table
>>         if len(h5table) != npoints:
>>             h5file.close()
>>             remove(filename)
>>
>>     if not path.exists(filename):
>>         create_ntuple_file(filename, npoints, pmodel)
>>
>> def fn(p, h5table):
>>     '''
>>         actual function we are going to minimize. It consists of
>>         the pytables Table object and a list of parameters.
>>     '''
>>     uv = h5table.colinstances
>>
>>     # store parameters in a dict object with names
>>     # like p0, p1, p2, etc. so they can be used in
>>     # the Expr object.
>>     for i in xrange(len(p)):
>>         k = 'p'+str(i)
>>         uv[k] = p[i]
>>
>>     # systematic shift on b is a polynomial in a
>>     db = 'p0 * a*a  +  p1 * a  +  p2'
>>
>>     # the element-wise function
>>     fn_str = '(a - (b + %s))**2' % db
>>
>>     expr = Expr(fn_str,uservars=uv)
>>     expr.eval()
>>
>>     # returning the "sum of squares"
>>     return sum(expr)
>>
>> if __name__ == '__main__':
>>     '''
>>     usage:
>>         python pytables_expr_test.py [npoints]
>>
>>     Hint: try this with 10M points
>>     '''
>>     from sys import argv
>>     from time import time
>>
>>     npoints = 1000000
>>     if len(argv) > 1:
>>         npoints = int(argv[1])
>>
>>     filename = 'tmp.'+str(npoints)+'.hdf5'
>>
>>     pmodel = [-0.04,0.002,0.001]
>>
>>     print 'creating file (if it doesn\'t exist)...'
>>     create_ntuple_file_if_needed(filename, npoints, pmodel)
>>
>>     h5file = openFile(filename, 'r')
>>     h5table = h5file.root.table
>>
>>     print 'evaluating function'
>>     starttime = time()
>>     print fn([0.,0.,0.], h5table)
>>     print 'evaluated file in',time()-starttime,'seconds.'
>>
>> #EOF
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> --
> Francesc Alted
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Expr performance with Tables on multicore machines

From: Francesc A. <fa...@py...> - 2012-05-14 20:05:13

On 5/14/12 2:51 PM, Anthony Scopatz wrote:
> Hi Johann,
>
> Thanks for bring this up.  I believe that I have determined that this 
> is not a PyTables / pthreads issue.  Doing some profiling 
> npoints=1000000, I found that most of the time (97%) was being spent 
> in the sum() call (see below).  This ratio doesn't change much with 
> different values of npoints.   Since there is no implicit parallelism 
> here, I would recommend using numpy.sum() instead of Python's.

Also, I have noticed that Johann is not using tables.Expr optimally, 
i.e. this code:

     fn_str = '(a - (b + %s))**2' % db
     expr = Expr(fn_str,uservars=uv)
     expr.eval()    # [1]
     # returning the "sum of squares"
     return sum(expr)

performs the evaluation of the expression and returns it as a NumPy 
object [1], but the result is not bound to any variable, so it is lost.

A better version would be:

     fn_str = 'sum((a - (b + %s))**2)' % db
     expr = Expr(fn_str,uservars=uv)
     # returning the "sum of squares"
     return expr.eval()

However, do not expect to use all your cores at full speed in this 
cases, as the reductions in numexpr can only make use of one thread 
(this is because this has not been implemented yet, not due to a 
intrinsic limitation of numexpr).

Francesc

>
> I hope this helps.  If you need other tips on speeding up the 
> sum operation, please let us know.
>
> Be Well
> Anthony
>
> Timer unit: 1e-06 s
>
> File: pytables_expr_test.py
> Function: fn at line 66
> Total time: 1.63254 s
>
> Line #      Hits         Time  Per Hit   % Time  Line Contents
> ==============================================================
>     66                                           def fn(p, h5table):
>     67                                               '''
>     68                                                   actual 
> function we are going to minimize. It consists of
>     69                                                   the pytables 
> Table object and a list of parameters.
>     70                                               '''
>     71         1           14     14.0      0.0      uv = 
> h5table.colinstances
>     72
>     73                                               # store 
> parameters in a dict object with names
>     74                                               # like p0, p1, 
> p2, etc. so they can be used in
>     75                                               # the Expr object.
>     76         4           21      5.2      0.0      for i in 
> xrange(len(p)):
>     77         3           19      6.3      0.0          k = 'p'+str(i)
>     78         3           14      4.7      0.0          uv[k] = p[i]
>     79
>     80                                               # systematic 
> shift on b is a polynomial in a
>     81         1            4      4.0      0.0      db = 'p0 * a*a  + 
>  p1 * a  +  p2'
>     82
>     83                                               # the 
> element-wise function
>     84         1            6      6.0      0.0      fn_str = '(a - (b 
> + %s))**2' % db
>     85
>     86         1        16427  16427.0      1.0      expr = 
> Expr(fn_str,uservars=uv)
>     87         1        21438  21438.0      1.3      expr.eval()
>     88
>     89                                               # returning the 
> "sum of squares"
>     90         1      1594600 1594600.0     97.7      return sum(expr)
>
>
>
>
> On Mon, May 14, 2012 at 1:59 PM, Johann Goetz <jg...@uc... 
> <mailto:jg...@uc...>> wrote:
>
>     SHORT VERSION:
>
>     Please take a look at the fn() function in the attached file
>     (pasted below). When I run this with 10M events or more I notice
>     that the total CPU usage never goes above the percentage I get
>     using single-threaded eval(). Am I at some other limit or can I
>     improve performance by doing something else?
>
>     LONG VERSION:
>
>     I have been trying to use the tables.Expr object to speed up a
>     sophisticated calculation over an entire dataset (a pytables Table
>     object). The calculation took so long that I had to create a
>     simple example to make sure I knew what I was doing. I apologize
>     in advance for the lengthy code below, but I wanted the example to
>     mimic exactly what I'm trying to do and to be totally self-contained.
>
>     I have attached a file (and pasted it below) in which I create a
>     hdf5 file with a single large Table of two columns. As you can
>     see, I'm not worried about writing speed at all - I'm concerned
>     about read speed.
>
>     I would like to draw your attention to the fn() function. This is
>     where I evaluate a "chi-squared" value on the dataset. My strategy
>     is to populate the "h5table.colinstances" dict object with several
>     parameters which I call p0, p1, etc and then create the Expr
>     object using these and the column names from the Table.
>
>     If I create 10M rows (77 MB file) in the Table (with the command
>     below), the evaluation seems to be CPU bound (one of my cores is
>     at 100% - the others are idle) and it takes about 7 seconds (about
>     10 MB/s). Similarly, I get about 70 seconds for 100M events.
>
>     python pytables_expr_test.py 10000000
>     python pytables_expr_test.py 100000000
>
>     So my question:  It seems to me that I am not fully using the CPU
>     power available on my computer (see next paragraph). Am I missing
>     something or doing something wrong in the fn() function below?
>
>     A few side-notes: My hard-disk is capable of over 200 MB/s in
>     sequential reading (sustained and tested with large files using
>     the iozone program), I have two 4-core CPU's on this machine but
>     the total CPU usage during eval() never goes above the percentage
>     I get using single-threaded mode with "numexpr.set_num_threads(1)".
>
>     I am using pytables 2.3.1 and numexpr 2.0.1
>
>     -- 
>     Johann T. Goetz, PhD. <http://sites.google.com/site/theodoregoetz/>
>     jg...@uc... <mailto:jg...@uc...>
>     Nefkens Group, UCLA Dept. of Physics & Astronomy
>     Hall-B, Jefferson Lab, Newport News, VA
>
>
>     ### BEGIN file: pytables_expr_test.py
>
>     from tables import openFile, Expr
>
>     ### Control of the number of threads used when issuing the
>     ### Expr::eval() command
>     #import numexpr
>     #numexpr.set_num_threads(2)
>
>     def create_ntuple_file(filename, npoints, pmodel):
>         '''
>             create an hdf5 file with a single table which contains
>             npoints number of rows of type row_t (defined below)
>         '''
>         from numpy import random, poly1d
>         from tables import IsDescription, Float32Col
>
>         class row_t(IsDescription):
>             '''
>                 the rows of the table to be created
>             '''
>             a = Float32Col()
>             b = Float32Col()
>
>         def append_row(h5row, pmodel):
>             '''
>                 consider this a single "event" being appended
>                 to the dataset (table)
>             '''
>             h5row['a'] = random.uniform(0,10)
>
>             h5row['b'] = h5row['a'] # reality (or model)
>             h5row['b'] = h5row['b'] - poly1d(pmodel)(h5row['a']) #
>     systematics
>             h5row['b'] = h5row['b'] + random.normal(0,0.1) # noise
>
>             h5row.append()
>
>         h5file = openFile(filename, 'w')
>         h5table = h5file.createTable('/', 'table', row_t, "Data")
>         h5row = h5table.row
>
>         # recording data to file...
>         for n in xrange(npoints):
>             append_row(h5row, pmodel)
>
>         h5file.close()
>
>     def create_ntuple_file_if_needed(filename, npoints, pmodel):
>         '''
>             looks to see if the file is already there and if so,
>             it makes sure its the right size. Otherwise, it
>             removes the existing file and creates a new one.
>         '''
>         from os import path, remove
>
>         print 'model parameters:', pmodel
>
>         if path.exists(filename):
>             h5file = openFile(filename, 'r')
>             h5table = h5file.root.table
>             if len(h5table) != npoints:
>                 h5file.close()
>                 remove(filename)
>
>         if not path.exists(filename):
>             create_ntuple_file(filename, npoints, pmodel)
>
>     def fn(p, h5table):
>         '''
>             actual function we are going to minimize. It consists of
>             the pytables Table object and a list of parameters.
>         '''
>         uv = h5table.colinstances
>
>         # store parameters in a dict object with names
>         # like p0, p1, p2, etc. so they can be used in
>         # the Expr object.
>         for i in xrange(len(p)):
>             k = 'p'+str(i)
>             uv[k] = p[i]
>
>         # systematic shift on b is a polynomial in a
>         db = 'p0 * a*a  +  p1 * a  +  p2'
>
>         # the element-wise function
>         fn_str = '(a - (b + %s))**2' % db
>
>         expr = Expr(fn_str,uservars=uv)
>         expr.eval()
>
>         # returning the "sum of squares"
>         return sum(expr)
>
>     if __name__ == '__main__':
>         '''
>         usage:
>             python pytables_expr_test.py [npoints]
>
>         Hint: try this with 10M points
>         '''
>         from sys import argv
>         from time import time
>
>         npoints = 1000000
>         if len(argv) > 1:
>             npoints = int(argv[1])
>
>         filename = 'tmp.'+str(npoints)+'.hdf5'
>
>         pmodel = [-0.04,0.002,0.001]
>
>         print 'creating file (if it doesn\'t exist)...'
>         create_ntuple_file_if_needed(filename, npoints, pmodel)
>
>         h5file = openFile(filename, 'r')
>         h5table = h5file.root.table
>
>         print 'evaluating function'
>         starttime = time()
>         print fn([0.,0.,0.], h5table)
>         print 'evaluated file in',time()-starttime,'seconds.'
>
>     #EOF
>
>
>     ------------------------------------------------------------------------------
>     Live Security Virtual Conference
>     Exclusive live event will cover all the ways today's security and
>     threat landscape has changed and how IT managers can respond.
>     Discussions
>     will include endpoint security, mobile security and the latest in
>     malware
>     threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>     _______________________________________________
>     Pytables-users mailing list
>     Pyt...@li...
>     <mailto:Pyt...@li...>
>     https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users


-- 
Francesc Alted

Re: [Pytables-users] Expr performance with Tables on multicore machines

From: Anthony S. <sc...@gm...> - 2012-05-14 19:51:37

Hi Johann,

Thanks for bring this up.  I believe that I have determined that this is
not a PyTables / pthreads issue.  Doing some profiling npoints=1000000, I
found that most of the time (97%) was being spent in the sum() call (see
below).  This ratio doesn't change much with different values of npoints.
Since there is no implicit parallelism here, I would recommend using
numpy.sum() instead of Python's.

I hope this helps.  If you need other tips on speeding up the
sum operation, please let us know.

Be Well
Anthony

Timer unit: 1e-06 s

File: pytables_expr_test.py
Function: fn at line 66
Total time: 1.63254 s

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
    66                                           def fn(p, h5table):
    67                                               '''
    68                                                   actual function we
are going to minimize. It consists of
    69                                                   the pytables Table
object and a list of parameters.
    70                                               '''
    71         1           14     14.0      0.0      uv =
h5table.colinstances
    72
    73                                               # store parameters in
a dict object with names
    74                                               # like p0, p1, p2,
etc. so they can be used in
    75                                               # the Expr object.
    76         4           21      5.2      0.0      for i in
xrange(len(p)):
    77         3           19      6.3      0.0          k = 'p'+str(i)
    78         3           14      4.7      0.0          uv[k] = p[i]
    79
    80                                               # systematic shift on
b is a polynomial in a
    81         1            4      4.0      0.0      db = 'p0 * a*a  +  p1
* a  +  p2'
    82
    83                                               # the element-wise
function
    84         1            6      6.0      0.0      fn_str = '(a - (b +
%s))**2' % db
    85
    86         1        16427  16427.0      1.0      expr =
Expr(fn_str,uservars=uv)
    87         1        21438  21438.0      1.3      expr.eval()
    88
    89                                               # returning the "sum
of squares"
    90         1      1594600 1594600.0     97.7      return sum(expr)




On Mon, May 14, 2012 at 1:59 PM, Johann Goetz <jg...@uc...> wrote:

> SHORT VERSION:
>
> Please take a look at the fn() function in the attached file (pasted
> below). When I run this with 10M events or more I notice that the total CPU
> usage never goes above the percentage I get using single-threaded eval().
> Am I at some other limit or can I improve performance by doing something
> else?
>
> LONG VERSION:
>
> I have been trying to use the tables.Expr object to speed up a
> sophisticated calculation over an entire dataset (a pytables Table object).
> The calculation took so long that I had to create a simple example to make
> sure I knew what I was doing. I apologize in advance for the lengthy code
> below, but I wanted the example to mimic exactly what I'm trying to do and
> to be totally self-contained.
>
> I have attached a file (and pasted it below) in which I create a hdf5 file
> with a single large Table of two columns. As you can see, I'm not worried
> about writing speed at all - I'm concerned about read speed.
>
> I would like to draw your attention to the fn() function. This is where I
> evaluate a "chi-squared" value on the dataset. My strategy is to populate
> the "h5table.colinstances" dict object with several parameters which I call
> p0, p1, etc and then create the Expr object using these and the column
> names from the Table.
>
> If I create 10M rows (77 MB file) in the Table (with the command below),
> the evaluation seems to be CPU bound (one of my cores is at 100% - the
> others are idle) and it takes about 7 seconds (about 10 MB/s). Similarly, I
> get about 70 seconds for 100M events.
>
> python pytables_expr_test.py 10000000
> python pytables_expr_test.py 100000000
>
> So my question:  It seems to me that I am not fully using the CPU power
> available on my computer (see next paragraph). Am I missing something or
> doing something wrong in the fn() function below?
>
> A few side-notes: My hard-disk is capable of over 200 MB/s in sequential
> reading (sustained and tested with large files using the iozone program), I
> have two 4-core CPU's on this machine but the total CPU usage during eval()
> never goes above the percentage I get using single-threaded mode with
> "numexpr.set_num_threads(1)".
>
> I am using pytables 2.3.1 and numexpr 2.0.1
>
> --
> Johann T. Goetz, PhD. <http://sites.google.com/site/theodoregoetz/>
> jg...@uc...
> Nefkens Group, UCLA Dept. of Physics & Astronomy
> Hall-B, Jefferson Lab, Newport News, VA
>
>
> ### BEGIN file: pytables_expr_test.py
>
> from tables import openFile, Expr
>
> ### Control of the number of threads used when issuing the
> ### Expr::eval() command
> #import numexpr
> #numexpr.set_num_threads(2)
>
> def create_ntuple_file(filename, npoints, pmodel):
>     '''
>         create an hdf5 file with a single table which contains
>         npoints number of rows of type row_t (defined below)
>     '''
>     from numpy import random, poly1d
>     from tables import IsDescription, Float32Col
>
>     class row_t(IsDescription):
>         '''
>             the rows of the table to be created
>         '''
>         a = Float32Col()
>         b = Float32Col()
>
>     def append_row(h5row, pmodel):
>         '''
>             consider this a single "event" being appended
>             to the dataset (table)
>         '''
>         h5row['a'] = random.uniform(0,10)
>
>         h5row['b'] = h5row['a'] # reality (or model)
>         h5row['b'] = h5row['b'] - poly1d(pmodel)(h5row['a']) # systematics
>         h5row['b'] = h5row['b'] + random.normal(0,0.1) # noise
>
>         h5row.append()
>
>     h5file = openFile(filename, 'w')
>     h5table = h5file.createTable('/', 'table', row_t, "Data")
>     h5row = h5table.row
>
>     # recording data to file...
>     for n in xrange(npoints):
>         append_row(h5row, pmodel)
>
>     h5file.close()
>
> def create_ntuple_file_if_needed(filename, npoints, pmodel):
>     '''
>         looks to see if the file is already there and if so,
>         it makes sure its the right size. Otherwise, it
>         removes the existing file and creates a new one.
>     '''
>     from os import path, remove
>
>     print 'model parameters:', pmodel
>
>     if path.exists(filename):
>         h5file = openFile(filename, 'r')
>         h5table = h5file.root.table
>         if len(h5table) != npoints:
>             h5file.close()
>             remove(filename)
>
>     if not path.exists(filename):
>         create_ntuple_file(filename, npoints, pmodel)
>
> def fn(p, h5table):
>     '''
>         actual function we are going to minimize. It consists of
>         the pytables Table object and a list of parameters.
>     '''
>     uv = h5table.colinstances
>
>     # store parameters in a dict object with names
>     # like p0, p1, p2, etc. so they can be used in
>     # the Expr object.
>     for i in xrange(len(p)):
>         k = 'p'+str(i)
>         uv[k] = p[i]
>
>     # systematic shift on b is a polynomial in a
>     db = 'p0 * a*a  +  p1 * a  +  p2'
>
>     # the element-wise function
>     fn_str = '(a - (b + %s))**2' % db
>
>     expr = Expr(fn_str,uservars=uv)
>     expr.eval()
>
>     # returning the "sum of squares"
>     return sum(expr)
>
> if __name__ == '__main__':
>     '''
>     usage:
>         python pytables_expr_test.py [npoints]
>
>     Hint: try this with 10M points
>     '''
>     from sys import argv
>     from time import time
>
>     npoints = 1000000
>     if len(argv) > 1:
>         npoints = int(argv[1])
>
>     filename = 'tmp.'+str(npoints)+'.hdf5'
>
>     pmodel = [-0.04,0.002,0.001]
>
>     print 'creating file (if it doesn\'t exist)...'
>     create_ntuple_file_if_needed(filename, npoints, pmodel)
>
>     h5file = openFile(filename, 'r')
>     h5table = h5file.root.table
>
>     print 'evaluating function'
>     starttime = time()
>     print fn([0.,0.,0.], h5table)
>     print 'evaluated file in',time()-starttime,'seconds.'
>
> #EOF
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Balance chunksize in large tables

From: Anthony S. <sc...@gm...> - 2012-05-11 14:24:05

Hello Nikola,

In general, larger chunk sizes will increase read speed.
 Additionally, your problem sounds like a perfect place to use compression,
which can both decrease storage space and increase read speed (use blosc
compression this).    Please refer to [1] for more information.

In general, if you know *a priori* that you have a hard maximum table size
that you will never go over, you can simply set your chunksize to this
value.  On the other hand, if you know a minimum size that you will be
removing and this size is "large enough" then it makes sense to use this as
the chunksize sometimes too.

Be Well
Anthony

1. http://pytables.github.com/usersguide/optimization.html

On Fri, May 11, 2012 at 5:15 AM, nikola stevanovic <nid...@gm...>wrote:

> *Hi everyone, *
>
> I'm new member and it's nice to meet you all.
> I need some advices about my work with pytables. The problem is next. I'm
> working on some kind of database using pytables and of course hdf5 format.
> I created table with *six columns, row size 92B*. One column in table is
> Time32Col. This column will be *indexed*. Table *will be updated* every
> couple days (rows will be appended on existing table). *Between every
> update users can create queries on table and consume data*. My question
> is how efficiently balance chunksize between updates, because numbers of
> rows in table will be start from *0 to 10 000 000 000* during the time?
> After this number I will start archiving process, i.e. for example remove
> first five billions rows and store in some other table for archiving. Of
> course, I need this balance because *reading speed*. So, what is most
> efficient way for setting chunksize for my problem? Sorry for my english.
>
> *
> Thanks for advice guys.
> Cheers!
> Nikola*
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] Balance chunksize in large tables

From: nikola s. <nid...@gm...> - 2012-05-11 10:15:18

*Hi everyone, *

I'm new member and it's nice to meet you all.
I need some advices about my work with pytables. The problem is next. I'm
working on some kind of database using pytables and of course hdf5 format.
I created table with *six columns, row size 92B*. One column in table is
Time32Col. This column will be *indexed*. Table *will be updated* every
couple days (rows will be appended on existing table). *Between every
update users can create queries on table and consume data*. My question is
how efficiently balance chunksize between updates, because numbers of rows
in table will be start from *0 to 10 000 000 000* during the time? After
this number I will start archiving process, i.e. for example remove first
five billions rows and store in some other table for archiving. Of course,
I need this balance because *reading speed*. So, what is most efficient way
for setting chunksize for my problem? Sorry for my english.

*
Thanks for advice guys.
Cheers!
Nikola*

22 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 25 26 27 28 29 .. 165 > >> (Page 27 of 165)