You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Seref A. <ser...@gm...> - 2013-06-03 10:44:36
|
Many thanks for keeping such a great piece of work up and running. I've just seen some features in the release notes, features which I was going to need in the very near future! Great job! Best regards Seref Arikan On Sat, Jun 1, 2013 at 12:33 PM, Antonio Valentino < ant...@ti...> wrote: > =========================== > Announcing PyTables 3.0.0 > =========================== > > We are happy to announce PyTables 3.0.0. > > PyTables 3.0.0 comes after about 5 years from the last major release > (2.0) and 7 months since the last stable release (2.4.0). > > This is new major release and an important milestone for the PyTables > project since it provides the long waited support for Python 3.x, which > has been around for 4 years. > > Almost all of the core numeric/scientific packages for Python already > support Python 3 so we are very happy that now also PyTables can provide > this important feature. > > > What's new > ========== > > A short summary of main new features: > > - Since this release, PyTables now provides full support to Python 3 > - The entire code base is now more compliant with coding style > guidelines described in PEP8. > - Basic support for HDF5 drivers. It now is possible to open/create an > HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE > drivers. > - Basic support for in-memory image files. An HDF5 file can be set > from or copied into a memory buffer. > - Implemented methods to get/set the user block size in a HDF5 file. > - All read methods now have an optional *out* argument that allows to > pass a pre-allocated array to store data. > - Added support for the floating point data types with extended > precision (Float96, Float128, Complex192 and Complex256). > - Consistent ``create_xxx()`` signatures. Now it is possible to create > all data sets Array, CArray, EArray, VLArray, and Table from existing > Python objects. > - Complete rewrite of the `nodes.filenode` module. Now it is fully > compliant with the interfaces defined in the standard `io` module. > Only non-buffered binary I/O is supported currently. > > Please refer to the RELEASE_NOTES document for a more detailed list of > changes in this release. > > As always, a large amount of bugs have been addressed and squashed as well. > > In case you want to know more in detail what has changed in this > version, please refer to: http://pytables.github.io/release_notes.html > > You can download a source package with generated PDF and HTML docs, as > well as binaries for Windows, from: > http://sourceforge.net/projects/pytables/files/pytables/3.0.0 > > For an online version of the manual, visit: > http://pytables.github.io/usersguide/index.html > > > What it is? > =========== > > PyTables is a library for managing hierarchical datasets and > designed to efficiently cope with extremely large amounts of data with > support for full 64-bit file addressing. PyTables runs on top of > the HDF5 library and NumPy package for achieving maximum throughput and > convenient use. PyTables includes OPSI, a new indexing technology, > allowing to perform data lookups in tables exceeding 10 gigarows > (10**10 rows) in less than a tenth of a second. > > > Resources > ========= > > About PyTables: http://www.pytables.org > > About the HDF5 library: http://hdfgroup.org/HDF5/ > > About NumPy: http://numpy.scipy.org/ > > > Acknowledgments > =============== > > Thanks to many users who provided feature improvements, patches, bug > reports, support and suggestions. See the ``THANKS`` file in the > distribution package for a (incomplete) list of contributors. Most > specially, a lot of kudos go to the HDF5 and NumPy makers. > Without them, PyTables simply would not exist. > > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. > > > ---- > > **Enjoy data!** > > -- The PyTables Developers > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Anthony S. <sc...@gm...> - 2013-06-03 00:39:43
|
Hi Tim, Thanks! I think for what you want to do you should be using the read() method [1], which does support the out argument rather than literal slicing. Be Well Anthony 1. http://pytables.github.io/usersguide/libref/structured_storage.html#tables.Table.read On Sun, Jun 2, 2013 at 7:26 PM, Tim Burgess <tim...@ma...> wrote: > Yes, congratulations on the new release folks! I am trying out some of my > codebase with 3.0.0 at present. > > - All read methods now have an optional *out* argument that allows to > pass a pre-allocated array to store data. > > I have a question about the above. And that is whether this is available > when reading slices? > > Looking thru the master branch codebase, it seems that a bit of user code > like: > > hotspot = h5f.root.anom[firstindex] > > where I am taking a 2D plane out of a 3D array using just the first index, > will use __getitem__() which in turn uses _read_slice(startl, stopl, stepl > , shape). So at present, the 'out' parm is not available for slicing? > > Tim Burgess > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Tim B. <tim...@ma...> - 2013-06-03 00:26:55
|
Yes, congratulations on the new release folks! I am trying out some of my codebase with 3.0.0 at present. - All read methods now have an optional *out* argument that allows to pass a pre-allocated array to store data. I have a question about the above. And that is whether this is available when reading slices? Looking thru the master branch codebase, it seems that a bit of user code like: hotspot = h5f.root.anom[firstindex] where I am taking a 2D plane out of a 3D array using just the first index, will use __getitem__() which in turn uses _read_slice(startl, stopl, stepl, shape). So at present, the 'out' parm is not available for slicing? Tim Burgess |
From: Julio T. <jul...@gm...> - 2013-06-02 21:52:35
|
Thank you from a happy user :))) On Sat, Jun 1, 2013 at 8:33 AM, Antonio Valentino < ant...@ti...> wrote: > =========================== > Announcing PyTables 3.0.0 > =========================== > > We are happy to announce PyTables 3.0.0. > > PyTables 3.0.0 comes after about 5 years from the last major release > (2.0) and 7 months since the last stable release (2.4.0). > > This is new major release and an important milestone for the PyTables > project since it provides the long waited support for Python 3.x, which > has been around for 4 years. > > Almost all of the core numeric/scientific packages for Python already > support Python 3 so we are very happy that now also PyTables can provide > this important feature. > > > What's new > ========== > > A short summary of main new features: > > - Since this release, PyTables now provides full support to Python 3 > - The entire code base is now more compliant with coding style > guidelines described in PEP8. > - Basic support for HDF5 drivers. It now is possible to open/create an > HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE > drivers. > - Basic support for in-memory image files. An HDF5 file can be set > from or copied into a memory buffer. > - Implemented methods to get/set the user block size in a HDF5 file. > - All read methods now have an optional *out* argument that allows to > pass a pre-allocated array to store data. > - Added support for the floating point data types with extended > precision (Float96, Float128, Complex192 and Complex256). > - Consistent ``create_xxx()`` signatures. Now it is possible to create > all data sets Array, CArray, EArray, VLArray, and Table from existing > Python objects. > - Complete rewrite of the `nodes.filenode` module. Now it is fully > compliant with the interfaces defined in the standard `io` module. > Only non-buffered binary I/O is supported currently. > > Please refer to the RELEASE_NOTES document for a more detailed list of > changes in this release. > > As always, a large amount of bugs have been addressed and squashed as well. > > In case you want to know more in detail what has changed in this > version, please refer to: http://pytables.github.io/release_notes.html > > You can download a source package with generated PDF and HTML docs, as > well as binaries for Windows, from: > http://sourceforge.net/projects/pytables/files/pytables/3.0.0 > > For an online version of the manual, visit: > http://pytables.github.io/usersguide/index.html > > > What it is? > =========== > > PyTables is a library for managing hierarchical datasets and > designed to efficiently cope with extremely large amounts of data with > support for full 64-bit file addressing. PyTables runs on top of > the HDF5 library and NumPy package for achieving maximum throughput and > convenient use. PyTables includes OPSI, a new indexing technology, > allowing to perform data lookups in tables exceeding 10 gigarows > (10**10 rows) in less than a tenth of a second. > > > Resources > ========= > > About PyTables: http://www.pytables.org > > About the HDF5 library: http://hdfgroup.org/HDF5/ > > About NumPy: http://numpy.scipy.org/ > > > Acknowledgments > =============== > > Thanks to many users who provided feature improvements, patches, bug > reports, support and suggestions. See the ``THANKS`` file in the > distribution package for a (incomplete) list of contributors. Most > specially, a lot of kudos go to the HDF5 and NumPy makers. > Without them, PyTables simply would not exist. > > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. > > > ---- > > **Enjoy data!** > > -- The PyTables Developers > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Francesc A. <fa...@gm...> - 2013-06-02 21:09:32
|
My congrats for the hard effort too. I am very pleased to see the PyTables project so healty and well managed. Thanks to all the developers, most specially Antonio and Anthony. You guys rock! Francesc El 02/06/2013 17:54, "Anthony Scopatz" <sc...@gm...> va escriure: > Congratulations All! > > This is a huge and important milestone for PyTables and I am glad to have > been a part of it! > > Be Well > Anthony > > > On Sat, Jun 1, 2013 at 6:33 AM, Antonio Valentino < > ant...@ti...> wrote: > >> =========================== >> Announcing PyTables 3.0.0 >> =========================== >> >> We are happy to announce PyTables 3.0.0. >> >> PyTables 3.0.0 comes after about 5 years from the last major release >> (2.0) and 7 months since the last stable release (2.4.0). >> >> This is new major release and an important milestone for the PyTables >> project since it provides the long waited support for Python 3.x, which >> has been around for 4 years. >> >> Almost all of the core numeric/scientific packages for Python already >> support Python 3 so we are very happy that now also PyTables can provide >> this important feature. >> >> >> What's new >> ========== >> >> A short summary of main new features: >> >> - Since this release, PyTables now provides full support to Python 3 >> - The entire code base is now more compliant with coding style >> guidelines described in PEP8. >> - Basic support for HDF5 drivers. It now is possible to open/create an >> HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE >> drivers. >> - Basic support for in-memory image files. An HDF5 file can be set >> from or copied into a memory buffer. >> - Implemented methods to get/set the user block size in a HDF5 file. >> - All read methods now have an optional *out* argument that allows to >> pass a pre-allocated array to store data. >> - Added support for the floating point data types with extended >> precision (Float96, Float128, Complex192 and Complex256). >> - Consistent ``create_xxx()`` signatures. Now it is possible to create >> all data sets Array, CArray, EArray, VLArray, and Table from existing >> Python objects. >> - Complete rewrite of the `nodes.filenode` module. Now it is fully >> compliant with the interfaces defined in the standard `io` module. >> Only non-buffered binary I/O is supported currently. >> >> Please refer to the RELEASE_NOTES document for a more detailed list of >> changes in this release. >> >> As always, a large amount of bugs have been addressed and squashed as >> well. >> >> In case you want to know more in detail what has changed in this >> version, please refer to: http://pytables.github.io/release_notes.html >> >> You can download a source package with generated PDF and HTML docs, as >> well as binaries for Windows, from: >> http://sourceforge.net/projects/pytables/files/pytables/3.0.0 >> >> For an online version of the manual, visit: >> http://pytables.github.io/usersguide/index.html >> >> >> What it is? >> =========== >> >> PyTables is a library for managing hierarchical datasets and >> designed to efficiently cope with extremely large amounts of data with >> support for full 64-bit file addressing. PyTables runs on top of >> the HDF5 library and NumPy package for achieving maximum throughput and >> convenient use. PyTables includes OPSI, a new indexing technology, >> allowing to perform data lookups in tables exceeding 10 gigarows >> (10**10 rows) in less than a tenth of a second. >> >> >> Resources >> ========= >> >> About PyTables: http://www.pytables.org >> >> About the HDF5 library: http://hdfgroup.org/HDF5/ >> >> About NumPy: http://numpy.scipy.org/ >> >> >> Acknowledgments >> =============== >> >> Thanks to many users who provided feature improvements, patches, bug >> reports, support and suggestions. See the ``THANKS`` file in the >> distribution package for a (incomplete) list of contributors. Most >> specially, a lot of kudos go to the HDF5 and NumPy makers. >> Without them, PyTables simply would not exist. >> >> >> Share your experience >> ===================== >> >> Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. >> >> >> ---- >> >> **Enjoy data!** >> >> -- The PyTables Developers >> >> >> ------------------------------------------------------------------------------ >> Get 100% visibility into Java/.NET code with AppDynamics Lite >> It's a free troubleshooting tool designed for production >> Get down to code-level detail for bottlenecks, with <2% overhead. >> Download for free and get started troubleshooting in minutes. >> http://p.sf.net/sfu/appdyn_d2d_ap2 >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2013-06-02 15:54:40
|
Congratulations All! This is a huge and important milestone for PyTables and I am glad to have been a part of it! Be Well Anthony On Sat, Jun 1, 2013 at 6:33 AM, Antonio Valentino < ant...@ti...> wrote: > =========================== > Announcing PyTables 3.0.0 > =========================== > > We are happy to announce PyTables 3.0.0. > > PyTables 3.0.0 comes after about 5 years from the last major release > (2.0) and 7 months since the last stable release (2.4.0). > > This is new major release and an important milestone for the PyTables > project since it provides the long waited support for Python 3.x, which > has been around for 4 years. > > Almost all of the core numeric/scientific packages for Python already > support Python 3 so we are very happy that now also PyTables can provide > this important feature. > > > What's new > ========== > > A short summary of main new features: > > - Since this release, PyTables now provides full support to Python 3 > - The entire code base is now more compliant with coding style > guidelines described in PEP8. > - Basic support for HDF5 drivers. It now is possible to open/create an > HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE > drivers. > - Basic support for in-memory image files. An HDF5 file can be set > from or copied into a memory buffer. > - Implemented methods to get/set the user block size in a HDF5 file. > - All read methods now have an optional *out* argument that allows to > pass a pre-allocated array to store data. > - Added support for the floating point data types with extended > precision (Float96, Float128, Complex192 and Complex256). > - Consistent ``create_xxx()`` signatures. Now it is possible to create > all data sets Array, CArray, EArray, VLArray, and Table from existing > Python objects. > - Complete rewrite of the `nodes.filenode` module. Now it is fully > compliant with the interfaces defined in the standard `io` module. > Only non-buffered binary I/O is supported currently. > > Please refer to the RELEASE_NOTES document for a more detailed list of > changes in this release. > > As always, a large amount of bugs have been addressed and squashed as well. > > In case you want to know more in detail what has changed in this > version, please refer to: http://pytables.github.io/release_notes.html > > You can download a source package with generated PDF and HTML docs, as > well as binaries for Windows, from: > http://sourceforge.net/projects/pytables/files/pytables/3.0.0 > > For an online version of the manual, visit: > http://pytables.github.io/usersguide/index.html > > > What it is? > =========== > > PyTables is a library for managing hierarchical datasets and > designed to efficiently cope with extremely large amounts of data with > support for full 64-bit file addressing. PyTables runs on top of > the HDF5 library and NumPy package for achieving maximum throughput and > convenient use. PyTables includes OPSI, a new indexing technology, > allowing to perform data lookups in tables exceeding 10 gigarows > (10**10 rows) in less than a tenth of a second. > > > Resources > ========= > > About PyTables: http://www.pytables.org > > About the HDF5 library: http://hdfgroup.org/HDF5/ > > About NumPy: http://numpy.scipy.org/ > > > Acknowledgments > =============== > > Thanks to many users who provided feature improvements, patches, bug > reports, support and suggestions. See the ``THANKS`` file in the > distribution package for a (incomplete) list of contributors. Most > specially, a lot of kudos go to the HDF5 and NumPy makers. > Without them, PyTables simply would not exist. > > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. > > > ---- > > **Enjoy data!** > > -- The PyTables Developers > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Antonio V. <ant...@ti...> - 2013-06-01 11:33:51
|
=========================== Announcing PyTables 3.0.0 =========================== We are happy to announce PyTables 3.0.0. PyTables 3.0.0 comes after about 5 years from the last major release (2.0) and 7 months since the last stable release (2.4.0). This is new major release and an important milestone for the PyTables project since it provides the long waited support for Python 3.x, which has been around for 4 years. Almost all of the core numeric/scientific packages for Python already support Python 3 so we are very happy that now also PyTables can provide this important feature. What's new ========== A short summary of main new features: - Since this release, PyTables now provides full support to Python 3 - The entire code base is now more compliant with coding style guidelines described in PEP8. - Basic support for HDF5 drivers. It now is possible to open/create an HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE drivers. - Basic support for in-memory image files. An HDF5 file can be set from or copied into a memory buffer. - Implemented methods to get/set the user block size in a HDF5 file. - All read methods now have an optional *out* argument that allows to pass a pre-allocated array to store data. - Added support for the floating point data types with extended precision (Float96, Float128, Complex192 and Complex256). - Consistent ``create_xxx()`` signatures. Now it is possible to create all data sets Array, CArray, EArray, VLArray, and Table from existing Python objects. - Complete rewrite of the `nodes.filenode` module. Now it is fully compliant with the interfaces defined in the standard `io` module. Only non-buffered binary I/O is supported currently. Please refer to the RELEASE_NOTES document for a more detailed list of changes in this release. As always, a large amount of bugs have been addressed and squashed as well. In case you want to know more in detail what has changed in this version, please refer to: http://pytables.github.io/release_notes.html You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://sourceforge.net/projects/pytables/files/pytables/3.0.0 For an online version of the manual, visit: http://pytables.github.io/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Developers |
From: Antonio V. <ant...@ti...> - 2013-05-29 17:17:30
|
Announcing PyTables 3.0.0rc3 ============================ We are happy to announce PyTables 3.0.0rc3. Changes from 3.0rc2 to 3.0rc3 ----------------------------- * Fixed a crash on 32bit platforms. * Fixed a couple of issue related to stepped read/iteration on tables (see :issue:`260` and :issue:`262`). **Enjoy data!** -- The PyTables Developers |
From: Anthony S. <sc...@gm...> - 2013-05-26 17:05:05
|
On Sun, May 26, 2013 at 11:04 AM, Nolan Phillips <ncp...@gm...>wrote: > Hi, > > I have a question about the metadata that PyTables inserts into the HDF5 > files. > > Is this data stored in the files themselves, but just not user defined? > The important question is, does this metadata make the HDF5 files > inaccessible by other means, such as the standard C library or H5Py? > Hi Nolan, The PyTables-specific metadata is for PyTables (and ViTables) consumption only and does not (or should not) interfere with other methods of HDF5 consumption. Since PyTables and h5py both link to the hdf5 library, I have never had any interoperability problems. Be Well Anthony > > Thanks! > > Nolan > > > ------------------------------------------------------------------------------ > Try New Relic Now & We'll Send You this Cool Shirt > New Relic is the only SaaS-based application performance monitoring service > that delivers powerful full stack analytics. Optimize and monitor your > browser, app, & servers with just a few lines of code. Try New Relic > and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_may > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Nolan P. <ncp...@gm...> - 2013-05-26 16:05:32
|
Hi, I have a question about the metadata that PyTables inserts into the HDF5 files. Is this data stored in the files themselves, but just not user defined? The important question is, does this metadata make the HDF5 files inaccessible by other means, such as the standard C library or H5Py? Thanks! Nolan |
From: Antonio V. <ant...@ti...> - 2013-05-25 16:31:05
|
Hi Andreas, Il giorno 25/mag/2013, alle ore 17:06, Andreas Hilboll <li...@hi...> ha scritto: > Am 25.05.2013 14:27, schrieb Andreas Hilboll: >> Hi, >> >> the netcdf4-python project >> (http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html#createVariable) >> supports a "least_significant_digit" attribute when creating a >> variable/array. This leads to a truncation of the array data before >> storing it to disk >> (https://code.google.com/p/netcdf4-python/source/browse/trunk/netCDF4_utils.py#26), >> which leads to be zlib compression more effective. >> >> My question: Is the same true when I compress the array data with blosc? >> Will I get significant compression improvements when truncating my data >> before storing it in pytables? > > Actually, I can now answer my own question: Yes, it does save some > space. As test, I created a file with two 5760x2880x12 arrays of dtype > float32. The data values are all in the range between +-1E17. When I > truncate the input values to 1E11 (least_significant_digit=-11), when I > get about 20% space reduction: > > -rw-r--r-- 1 andreas andreas 418M Mai 25 16:47 satdb_blosc9-11.h5 > -rw-r--r-- 1 andreas andreas 578M Mai 25 16:34 satdb_blosc9.h5 > > Would you guys be interested in having this as an optional filter? If > so, I'd be happy to submit a PR for this. > > > -- Andreas. thanks Andreas, it would be a nice addition to PyTables. In PyTable 3.0 (currently we have rc2 out) we introduced support for the float16 data type. It is not as flexible as the solution you are suggesting but IMO it could help in your case. best regards -- Antonio Valentino |
From: Francesc A. <fa...@gm...> - 2013-05-25 15:59:16
|
On 5/25/13 5:06 PM, Andreas Hilboll wrote: > Am 25.05.2013 14:27, schrieb Andreas Hilboll: >> Hi, >> >> the netcdf4-python project >> (http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html#createVariable) >> supports a "least_significant_digit" attribute when creating a >> variable/array. This leads to a truncation of the array data before >> storing it to disk >> (https://code.google.com/p/netcdf4-python/source/browse/trunk/netCDF4_utils.py#26), >> which leads to be zlib compression more effective. >> >> My question: Is the same true when I compress the array data with blosc? >> Will I get significant compression improvements when truncating my data >> before storing it in pytables? > Actually, I can now answer my own question: Yes, it does save some > space. As test, I created a file with two 5760x2880x12 arrays of dtype > float32. The data values are all in the range between +-1E17. When I > truncate the input values to 1E11 (least_significant_digit=-11), when I > get about 20% space reduction: > > -rw-r--r-- 1 andreas andreas 418M Mai 25 16:47 satdb_blosc9-11.h5 > -rw-r--r-- 1 andreas andreas 578M Mai 25 16:34 satdb_blosc9.h5 > > Would you guys be interested in having this as an optional filter? If > so, I'd be happy to submit a PR for this. Yeah, quantize used to be in the netcdf3 module in old versions of PyTables (with the introduction of netcdf4-python this was removed). But it would be interesting to have it around again. It would be nice of you can contribute the PR, together with some docs (a small tutorial would be really great). For efficiency, the place for this filter would be inside Blosc, but that's is another story :) Thanks, -- Francesc Alted |
From: Andreas H. <li...@hi...> - 2013-05-25 15:07:17
|
Am 25.05.2013 14:27, schrieb Andreas Hilboll: > Hi, > > the netcdf4-python project > (http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html#createVariable) > supports a "least_significant_digit" attribute when creating a > variable/array. This leads to a truncation of the array data before > storing it to disk > (https://code.google.com/p/netcdf4-python/source/browse/trunk/netCDF4_utils.py#26), > which leads to be zlib compression more effective. > > My question: Is the same true when I compress the array data with blosc? > Will I get significant compression improvements when truncating my data > before storing it in pytables? Actually, I can now answer my own question: Yes, it does save some space. As test, I created a file with two 5760x2880x12 arrays of dtype float32. The data values are all in the range between +-1E17. When I truncate the input values to 1E11 (least_significant_digit=-11), when I get about 20% space reduction: -rw-r--r-- 1 andreas andreas 418M Mai 25 16:47 satdb_blosc9-11.h5 -rw-r--r-- 1 andreas andreas 578M Mai 25 16:34 satdb_blosc9.h5 Would you guys be interested in having this as an optional filter? If so, I'd be happy to submit a PR for this. -- Andreas. |
From: Andreas H. <li...@hi...> - 2013-05-25 12:28:15
|
Hi, the netcdf4-python project (http://netcdf4-python.googlecode.com/svn/trunk/docs/netCDF4.Dataset-class.html#createVariable) supports a "least_significant_digit" attribute when creating a variable/array. This leads to a truncation of the array data before storing it to disk (https://code.google.com/p/netcdf4-python/source/browse/trunk/netCDF4_utils.py#26), which leads to be zlib compression more effective. My question: Is the same true when I compress the array data with blosc? Will I get significant compression improvements when truncating my data before storing it in pytables? Thanks for your insight :) -- Andreas. |
From: Antonio V. <ant...@ti...> - 2013-05-17 18:23:53
|
============================= Announcing PyTables 3.0.0rc2 ============================= We are happy to announce PyTables 3.0.0rc2. PyTables 3.0.0rc2 comes after about 5 years from the last major release (2.0) and 7 months since the last stable release (2.4.0). This is new major release and an important milestone for the PyTables project since it provides the long waited support for Python 3.x, which has been around for 4 years. Almost all of the core numeric/scientific packages for Python already support Python 3 so we are very happy that now also PyTables can provide this important feature. Changes from 3.0rc1 to 3.0rc2 ============================= - The internal Blosc_ library has been upgraded to version 1.2.3. - All methods of the :class:`Table` class that take *start*, *stop* and *step* parameters (including :meth:`Table.read`, :meth:`Table.where`, :meth:`Table.iterrows`, etc) have been redesigned to have a consistent behaviour. The meaning of the *start*, *stop* and *step* and their default values now always work exactly like in the standard :class:`slice` objects. Closes :issue:`44` and :issue:`255`. - The :meth:`iterrows` method of :class:`*Array` and :class:`Table` as well as the :meth:`Table.itersorted` now behave like functions in the standard :mod:`itertools` module. If the *start* parameter is provided and *stop* is None then the array/table is iterated from *start* to the last line. In PyTables < 3.0 only one element was returned. - Fixed :issue:`119`, :issue:`230` and :issue:`232`, where an index on :class:`Time64Col` (only, :class:`Time32Col` was ok) hides the data on selection from a Tables. Thanks to Jeff Reback. - Fixed an issue of the :meth:`Table.itersorted` with reverse iteration (closes :issue:`252` and :issue:`253`). -- Antonio Valentino |
From: Francesc A. <fa...@gm...> - 2013-05-13 10:40:01
|
=============================================================== Announcing Blosc 1.2.2 A blocking, shuffling and lossless compression library =============================================================== What is new? ============ - All important warnings removed for all tested platforms. This allows less intrusive compilation experiences with applications including Blosc source code. - The `bench/bench.c` has been updated so that it can be compiled on Windows again. - The new web site has been set to: http://www.blosc.org For more info, please see the release notes in: https://github.com/FrancescAlted/blosc/wiki/Release-notes What is it? =========== Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor (that I'm aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate object manipulations that are memory-bound. There is also a handy command line for Blosc called Bloscpack (https://github.com/esc/bloscpack) that allows you to compress large binary datafiles on-disk. Although the format for Bloscpack has not stabilized yet, it allows you to effectively use Blosc from you favorite shell. Download sources ================ For more details on what it is, please go to main web site: http://www.blosc.org/ The github repository is over here: https://github.com/FrancescAlted/blosc Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for details. Mailing list ============ There is an official Blosc mailing list at: bl...@go... http://groups.google.es/group/blosc Enjoy Data! -- Francesc Alted |
From: Anthony S. <sc...@gm...> - 2013-05-10 19:49:46
|
Thanks Antonio! PyTables users, This is (hopefully) your last chance to bang on the code base and let the developers know if there are any problems prior to full v3.0 release. Note that there have been many important changes even since in the v3.0.0beta that came out about a week or two ago. Be Well Anthony On Fri, May 10, 2013 at 2:44 PM, Antonio Valentino < ant...@ti...> wrote: > ============================= > Announcing PyTables 3.0.0rc1 > ============================= > > We are happy to announce PyTables 3.0.0rc1. > > PyTables 3.0.0rc1 comes after about 5 years from the last major release > (2.0) and 7 months since the last stable release (2.4.0). > > This is new major release and an important milestone for the PyTables > project > since it provides the long waited support for Python 3.x, which has been > around > for 4 years. > > Almost all of the core numeric/scientific packages for Python already > support > Python 3 so we are very happy that now also PyTables can provide this > important feature. > > > What's new > ========== > > A short summary of main new features: > > - Since this release, PyTables now provides full support to Python 3 > - The entire code base is now more compliant with coding style guidelines > described in PEP8. > - Basic support for HDF5 drivers. It now is possible to open/create an > HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE > drivers. > - Basic support for in-memory image files. An HDF5 file can be set from or > copied into a memory buffer. > - Implemented methods to get/set the user block size in a HDF5 file. > - All read methods now have an optional *out* argument that allows to pass > a > pre-allocated array to store data. > - Added support for the floating point data types with extended precision > (Float96, Float128, Complex192 and Complex256). > - Consistent ``create_xxx()`` signatures. Now it is possible to create all > data sets Array, CArray, EArray, VLArray, and Table from existing Python > objects. > - Complete rewrite of the `nodes.filenode` module. Now it is fully > compliant with the interfaces defined in the standard `io` module. > Only non-buffered binary I/O is supported currently. > > Please refer to the RELEASE_NOTES document for a more detailed list of > changes in this release. > > As always, a large amount of bugs have been addressed and squashed as well. > > In case you want to know more in detail what has changed in this > version, please refer to: http://pytables.github.io/release_notes.html > > You can download a source package with generated PDF and HTML docs, as > well as binaries for Windows, from: > http://sourceforge.net/projects/pytables/files/pytables/3.0.0rc1 > > For an online version of the manual, visit: > http://pytables.github.io/usersguide/index.html > > > What it is? > =========== > > PyTables is a library for managing hierarchical datasets and > designed to efficiently cope with extremely large amounts of data with > support for full 64-bit file addressing. PyTables runs on top of > the HDF5 library and NumPy package for achieving maximum throughput and > convenient use. PyTables includes OPSI, a new indexing technology, > allowing to perform data lookups in tables exceeding 10 gigarows > (10**10 rows) in less than a tenth of a second. > > > Resources > ========= > > About PyTables: http://www.pytables.org > > About the HDF5 library: http://hdfgroup.org/HDF5/ > > About NumPy: http://numpy.scipy.org/ > > > Acknowledgments > =============== > > Thanks to many users who provided feature improvements, patches, bug > reports, support and suggestions. See the ``THANKS`` file in the > distribution package for a (incomplete) list of contributors. Most > specially, a lot of kudos go to the HDF5 and NumPy makers. > Without them, PyTables simply would not exist. > > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. > > > ---- > > **Enjoy data!** > > -- > The PyTables Developers > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and > their applications. This 200-page book is written by three acclaimed > leaders in the field. The early access version is available now. > Download your free book today! http://p.sf.net/sfu/neotech_d2d_may > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Antonio V. <ant...@ti...> - 2013-05-10 19:45:05
|
============================= Announcing PyTables 3.0.0rc1 ============================= We are happy to announce PyTables 3.0.0rc1. PyTables 3.0.0rc1 comes after about 5 years from the last major release (2.0) and 7 months since the last stable release (2.4.0). This is new major release and an important milestone for the PyTables project since it provides the long waited support for Python 3.x, which has been around for 4 years. Almost all of the core numeric/scientific packages for Python already support Python 3 so we are very happy that now also PyTables can provide this important feature. What's new ========== A short summary of main new features: - Since this release, PyTables now provides full support to Python 3 - The entire code base is now more compliant with coding style guidelines described in PEP8. - Basic support for HDF5 drivers. It now is possible to open/create an HDF5 file using one of the SEC2, DIRECT, LOG, WINDOWS, STDIO or CORE drivers. - Basic support for in-memory image files. An HDF5 file can be set from or copied into a memory buffer. - Implemented methods to get/set the user block size in a HDF5 file. - All read methods now have an optional *out* argument that allows to pass a pre-allocated array to store data. - Added support for the floating point data types with extended precision (Float96, Float128, Complex192 and Complex256). - Consistent ``create_xxx()`` signatures. Now it is possible to create all data sets Array, CArray, EArray, VLArray, and Table from existing Python objects. - Complete rewrite of the `nodes.filenode` module. Now it is fully compliant with the interfaces defined in the standard `io` module. Only non-buffered binary I/O is supported currently. Please refer to the RELEASE_NOTES document for a more detailed list of changes in this release. As always, a large amount of bugs have been addressed and squashed as well. In case you want to know more in detail what has changed in this version, please refer to: http://pytables.github.io/release_notes.html You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://sourceforge.net/projects/pytables/files/pytables/3.0.0rc1 For an online version of the manual, visit: http://pytables.github.io/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Developers |
From: Anthony S. <sc...@gm...> - 2013-05-10 16:29:51
|
[dropping scipy-user] Hello Andreas PyTables is a great option and using compression (zlib, blosc, etc) will probably help. Additionally, I would not that since your values are between [0, 100], you can probably get away with using 32-bit floats, rather than 64-bit floats. This size reduction will speed things up, but you probably don't want to go down to 16-bit floats. I would recommend that you store your dataset on disk and then use PyTables Expressions [1,2] with the "out" argument to keep your results on disk as well. If this strategy fails because you need to simultaneously look at multiple indexes in the same array, then I would use partially offset iterators as described in this thread [3]. In both cases, since iterators are automatically chunked, you never read in the whole dataset at one time and what you are interpolating can be as large as you want :). Let us know if you have further specific questions. Be Well Anthony 1. http://pytables.github.io/usersguide/libref.html#the-expr-class-a-general-purpose-expression-evaluator 2. https://github.com/scopatz/hdf5-is-for-lovers/blob/master/hdf5-is-for-lovers.pdf?raw=true 2. "Nested Iteration of HDF5 using PyTables" http://blog.gmane.org/gmane.comp.python.pytables.user/month=20130101 On Fri, May 10, 2013 at 4:58 AM, Andreas Hilboll <li...@hi...> wrote: > Hi, > > I'll have to code multilinear interpolation in n dimensions, n~7. My > data space is quite large, ~10**9 points. The values are given on a > rectangular (but not square) grid. The values are numbers in a range of > approx. [0.0, 100.0]. > > The challenge is to do this efficiently, and it would be great if the > whole thing would be able to run fast on a machine with only 8G (or > better 4G) RAM. > > A common task will be to interpolate 10**6 points, which souldn't take > too long. > > Any ideas on how to do this efficiently are welcome: > > * which dtype to use? > * is using pytables/blosc an option? How can this be integrated in the > interpolation? > * you name it ... ;) > > Cheers, Andreas. > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and > their applications. This 200-page book is written by three acclaimed > leaders in the field. The early access version is available now. > Download your free book today! http://p.sf.net/sfu/neotech_d2d_may > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Andreas H. <li...@hi...> - 2013-05-10 09:59:07
|
Hi, I'll have to code multilinear interpolation in n dimensions, n~7. My data space is quite large, ~10**9 points. The values are given on a rectangular (but not square) grid. The values are numbers in a range of approx. [0.0, 100.0]. The challenge is to do this efficiently, and it would be great if the whole thing would be able to run fast on a machine with only 8G (or better 4G) RAM. A common task will be to interpolate 10**6 points, which souldn't take too long. Any ideas on how to do this efficiently are welcome: * which dtype to use? * is using pytables/blosc an option? How can this be integrated in the interpolation? * you name it ... ;) Cheers, Andreas. |
From: Anthony S. <sc...@gm...> - 2013-05-03 19:56:57
|
On Fri, May 3, 2013 at 2:52 PM, Jim Knoll <jim...@sp...>wrote: > Speed is the problem. I am looking for the fastest possible way to do > this. I was thinking of using Pandas and was able to achieve fair > performance using that lib. It just seemed like I was using panada as a > middle man it introduces some issues with the data types. Could it be > faster to pull it into a numpy array in chunks and write it out? > I think that where_append() is going to be the fastest then. The other option is to pull the table out in chunks into numpy arrays and then write it back out. This is almost certainly slower because you will be iterating in python (not C) and you will be not be multi-threaded. You get this through "where_append(dst, "True")" because it is using numexpr under the hood. Be Well Anthony > **** > > ** ** > > *From:* Anthony Scopatz [mailto:sc...@gm...] > *Sent:* Friday, May 03, 2013 2:14 PM > *To:* Discussion list for PyTables > *Subject:* Re: [Pytables-users] Row.append()**** > > ** ** > > On Fri, May 3, 2013 at 1:15 PM, Jim Knoll <jim...@sp...> > wrote:**** > > I am trying to make this better / faster… **** > > Data comes faster than I can store it on one box. So My though was to > have many boxes each storing their own part in their own table.**** > > Later I would concatenate the tables together with something like this:*** > * > > **** > > dest_h5f = pt.openFile(path + 'big_mater.h5','a')**** > > for source_path in source_h5_path_list:**** > > h5f = pt.openFile(source_path,'r')**** > > for node in h5f.root:**** > > dest_table = dest_h5f.getNode('/', name = node.name)**** > > print node.nrows**** > > if node.nrows > 0 and node.nrows < 1000000: # found I needed to > limit the max size or I would crash **** > > dest_table.append(node.read())**** > > dest_table.flush()**** > > h5f.close()**** > > dest_h5f.close()**** > > **** > > I could add the logic to iter in chunks over the source data to overcome > the crash and but I suspect there could be a better way. **** > > ** ** > > Hi Jim, **** > > ** ** > > You can just iterate over each row in the table (ie "for row in node"). > This is slow, but would solve the problem. **** > > **** > > Take a table in one h5 file and append it to a table in another h5 > file. Looked like Table.copy() would do the trick but don’t see how I get > it to append to an existing table.**** > > ** ** > > You could append directly by using the where_append() method with the > condition "'True'" to append the whole table. This will automatically do > the chunking for you.**** > > ** ** > > Be Well**** > > Anthony**** > > **** > > **** > > My h5 files have 4 rec arrays all stored in root.**** > > **** > > Any suggestions?**** > > ** ** > ------------------------------ > > * Jim Knoll* * > DBA/Developer II* > > Spot Trading L.L.C > 440 South LaSalle St., Suite 2800 > Chicago, IL 60605 > Office: 312.362.4550 > Direct: 312-362-4798 > Fax: 312.362.4551 > jim...@sp... > www.spottradingllc.com **** > ------------------------------ > > The information contained in this message may be privileged and > confidential and protected from disclosure. If the reader of this message > is not the intended recipient, or an employee or agent responsible for > delivering this message to the intended recipient, you are hereby notified > that any dissemination, distribution or copying of this communication is > strictly prohibited. If you have received this communication in error, > please notify us immediately by replying to the message and deleting it > from your computer. Thank you. Spot Trading, LLC**** > > **** > > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users**** > > ** ** > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jim K. <jim...@sp...> - 2013-05-03 19:52:25
|
Speed is the problem. I am looking for the fastest possible way to do this. I was thinking of using Pandas and was able to achieve fair performance using that lib. It just seemed like I was using panada as a middle man it introduces some issues with the data types. Could it be faster to pull it into a numpy array in chunks and write it out? From: Anthony Scopatz [mailto:sc...@gm...] Sent: Friday, May 03, 2013 2:14 PM To: Discussion list for PyTables Subject: Re: [Pytables-users] Row.append() On Fri, May 3, 2013 at 1:15 PM, Jim Knoll <jim...@sp...<mailto:jim...@sp...>> wrote: I am trying to make this better / faster... Data comes faster than I can store it on one box. So My though was to have many boxes each storing their own part in their own table. Later I would concatenate the tables together with something like this: dest_h5f = pt.openFile(path + 'big_mater.h5','a') for source_path in source_h5_path_list: h5f = pt.openFile(source_path,'r') for node in h5f.root: dest_table = dest_h5f.getNode('/', name = node.name<http://node.name>) print node.nrows if node.nrows > 0 and node.nrows < 1000000: # found I needed to limit the max size or I would crash dest_table.append(node.read()) dest_table.flush() h5f.close() dest_h5f.close() I could add the logic to iter in chunks over the source data to overcome the crash and but I suspect there could be a better way. Hi Jim, You can just iterate over each row in the table (ie "for row in node"). This is slow, but would solve the problem. Take a table in one h5 file and append it to a table in another h5 file. Looked like Table.copy() would do the trick but don't see how I get it to append to an existing table. You could append directly by using the where_append() method with the condition "'True'" to append the whole table. This will automatically do the chunking for you. Be Well Anthony My h5 files have 4 rec arrays all stored in root. Any suggestions? ________________________________ Jim Knoll DBA/Developer II Spot Trading L.L.C 440 South LaSalle St., Suite 2800 Chicago, IL 60605 Office: 312.362.4550<tel:312.362.4550> Direct: 312-362-4798<tel:312-362-4798> Fax: 312.362.4551<tel:312.362.4551> jim...@sp...<mailto:jim...@sp...> www.spottradingllc.com<http://www.spottradingllc.com/> ________________________________ The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Spot Trading, LLC ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Pytables-users mailing list Pyt...@li...<mailto:Pyt...@li...> https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Anthony S. <sc...@gm...> - 2013-05-03 19:14:08
|
On Fri, May 3, 2013 at 1:15 PM, Jim Knoll <jim...@sp...>wrote: > I am trying to make this better / faster… **** > > Data comes faster than I can store it on one box. So My though was to > have many boxes each storing their own part in their own table.**** > > Later I would concatenate the tables together with something like this:*** > * > > ** ** > > dest_h5f = pt.openFile(path + 'big_mater.h5','a')**** > > for source_path in source_h5_path_list:**** > > h5f = pt.openFile(source_path,'r')**** > > for node in h5f.root:**** > > dest_table = dest_h5f.getNode('/', name = node.name)**** > > print node.nrows**** > > if node.nrows > 0 and node.nrows < 1000000: # found I needed to > limit the max size or I would crash **** > > dest_table.append(node.read())**** > > dest_table.flush()**** > > h5f.close()**** > > dest_h5f.close()**** > > ** ** > > I could add the logic to iter in chunks over the source data to overcome > the crash and but I suspect there could be a better way. > Hi Jim, You can just iterate over each row in the table (ie "for row in node"). This is slow, but would solve the problem. > **** > > ** Take a table in one h5 file and append it to a table in another h5 > file. Looked like Table.copy() would do the trick but don’t see how I get > it to append to an existing table. > You could append directly by using the where_append() method with the condition "'True'" to append the whole table. This will automatically do the chunking for you. Be Well Anthony > ** ** > > My h5 files have 4 rec arrays all stored in root.**** > > ** ** > > Any suggestions?**** > > > ------------------------------ > > * Jim Knoll* * > **DBA/Developer II* > > Spot Trading L.L.C > 440 South LaSalle St., Suite 2800 > Chicago, IL 60605 > Office: 312.362.4550 > Direct: 312-362-4798 > Fax: 312.362.4551 > jim...@sp... > www.spottradingllc.com > ------------------------------ > > The information contained in this message may be privileged and > confidential and protected from disclosure. If the reader of this message > is not the intended recipient, or an employee or agent responsible for > delivering this message to the intended recipient, you are hereby notified > that any dissemination, distribution or copying of this communication is > strictly prohibited. If you have received this communication in error, > please notify us immediately by replying to the message and deleting it > from your computer. Thank you. Spot Trading, LLC > > > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite > It's a free troubleshooting tool designed for production > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap2 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jim K. <jim...@sp...> - 2013-05-03 18:28:09
|
I am trying to make this better / faster... Data comes faster than I can store it on one box. So My though was to have many boxes each storing their own part in their own table. Later I would concatenate the tables together with something like this: dest_h5f = pt.openFile(path + 'big_mater.h5','a') for source_path in source_h5_path_list: h5f = pt.openFile(source_path,'r') for node in h5f.root: dest_table = dest_h5f.getNode('/', name = node.name) print node.nrows if node.nrows > 0 and node.nrows < 1000000: # found I needed to limit the max size or I would crash dest_table.append(node.read()) dest_table.flush() h5f.close() dest_h5f.close() I could add the logic to iter in chunks over the source data to overcome the crash and but I suspect there could be a better way. Take a table in one h5 file and append it to a table in another h5 file. Looked like Table.copy() would do the trick but don't see how I get it to append to an existing table. My h5 files have 4 rec arrays all stored in root. Any suggestions? ________________________________ Jim Knoll DBA/Developer II Spot Trading L.L.C 440 South LaSalle St., Suite 2800 Chicago, IL 60605 Office: 312.362.4550 Direct: 312-362-4798 Fax: 312.362.4551 jim...@sp... www.spottradingllc.com<http://www.spottradingllc.com/> ________________________________ The information contained in this message may be privileged and confidential and protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify us immediately by replying to the message and deleting it from your computer. Thank you. Spot Trading, LLC |
From: Anthony S. <sc...@gm...> - 2013-05-02 17:47:16
|
Hello Tim, PyTables simply calls HDF5 attributes "attrs". See [1] for more info. Be Well Anthony 1. http://pytables.github.io/usersguide/libref.html?highlight=attrs#the-attributeset-class On Thu, May 2, 2013 at 5:13 AM, Tim Michelsen <tim...@gm...>wrote: > Hello, > I woudl like to add the following to my hdf5 archives: > * physical dimensions > * metadata with a not on how the data in the Table or Group > > I found one example for h5py [1]. > > Is there a similar method for pytables which is the preferred connector > by pandas? > > I woudl be happy to receive a pointer or hint. > > Thanks and regards, > Timmie > > > [1] How do I assign scales (or physical dimensions) to HDF5 datasets in > h5py? > http://stackoverflow.com/questions/11432309 > > > > ------------------------------------------------------------------------------ > Introducing AppDynamics Lite, a free troubleshooting tool for Java/.NET > Get 100% visibility into your production application - at no cost. > Code-level diagnostics for performance bottlenecks with <2% overhead > Download for free and get started troubleshooting in minutes. > http://p.sf.net/sfu/appdyn_d2d_ap1 > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |