You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Antonio V. <ant...@ti...> - 2012-07-08 10:55:58
|
Hi Christoph, thank you for reporting. Can you please tell us which is the output of the attached script on your machine? thanks in advance Il 07/07/2012 21:18, Christoph Gohlke ha scritto: > Looks good. Only one test failure on win-amd64-py2.7 (attached). > > Christoph > > On 7/7/2012 11:47 AM, Antonio Valentino wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> =========================== >> Announcing PyTables 2.4.0b1 >> =========================== >> [CUT] -- Antonio Valentino |
From: Christoph G. <cg...@uc...> - 2012-07-07 19:18:12
|
Looks good. Only one test failure on win-amd64-py2.7 (attached). Christoph On 7/7/2012 11:47 AM, Antonio Valentino wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > =========================== > Announcing PyTables 2.4.0b1 > =========================== > > We are happy to announce PyTables 2.4.0b1. > > This is an incremental release which includes many changes to prepare > for future Python 3 support. > > > What's new > ========== > > This release includes support for the float16 data type and read-only > support for variable length string attributes. > > The handling of HDF5 errors has been improved. The user will no > longer see HDF5 error stacks dumped to the console. All HDF5 error > messages are trapped and attached to a proper Python exception. > > Now PyTables only supports HDF5 v1.8.4+. All the code has been updated > to the new HDF5 API. Supporting only HDF5 1.8 series is beneficial > for future development. > > As always, a large amount of bugs have been addressed and squashed as > well. > > In case you want to know more in detail what has changed in this > version, please refer to: > http://pytables.github.com/release_notes.html > > You can download a source package with generated PDF and HTML docs, as > well as binaries for Windows, from: > http://sourceforge.net/projects/pytables/files/pytables/2.4.0b1 > > For an online version of the manual, visit: > http://pytables.github.com/usersguide/index.html > > > What it is? > =========== > > PyTables is a library for managing hierarchical datasets and > designed to efficiently cope with extremely large amounts of data with > support for full 64-bit file addressing. PyTables runs on top of > the HDF5 library and NumPy package for achieving maximum throughput and > convenient use. PyTables includes OPSI, a new indexing technology, > allowing to perform data lookups in tables exceeding 10 gigarows > (10**10 rows) in less than a tenth of a second. > > > Resources > ========= > > About PyTables: http://www.pytables.org > > About the HDF5 library: http://hdfgroup.org/HDF5/ > > About NumPy: http://numpy.scipy.org/ > > > Acknowledgments > =============== > > Thanks to many users who provided feature improvements, patches, bug > reports, support and suggestions. See the ``THANKS`` file in the > distribution package for a (incomplete) list of contributors. Most > specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) > makers. Without them, PyTables simply would not exist. > > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. > > |
From: Anthony S. <sc...@gm...> - 2012-07-07 18:50:08
|
Great Success! Please hammer on this everybody. On Sat, Jul 7, 2012 at 1:47 PM, Antonio Valentino < ant...@ti...> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > =========================== > Announcing PyTables 2.4.0b1 > =========================== > > We are happy to announce PyTables 2.4.0b1. > > This is an incremental release which includes many changes to prepare > for future Python 3 support. > > > What's new > ========== > > This release includes support for the float16 data type and read-only > support for variable length string attributes. > > The handling of HDF5 errors has been improved. The user will no > longer see HDF5 error stacks dumped to the console. All HDF5 error > messages are trapped and attached to a proper Python exception. > > Now PyTables only supports HDF5 v1.8.4+. All the code has been updated > to the new HDF5 API. Supporting only HDF5 1.8 series is beneficial > for future development. > > As always, a large amount of bugs have been addressed and squashed as > well. > > In case you want to know more in detail what has changed in this > version, please refer to: > http://pytables.github.com/release_notes.html > > You can download a source package with generated PDF and HTML docs, as > well as binaries for Windows, from: > http://sourceforge.net/projects/pytables/files/pytables/2.4.0b1 > > For an online version of the manual, visit: > http://pytables.github.com/usersguide/index.html > > > What it is? > =========== > > PyTables is a library for managing hierarchical datasets and > designed to efficiently cope with extremely large amounts of data with > support for full 64-bit file addressing. PyTables runs on top of > the HDF5 library and NumPy package for achieving maximum throughput and > convenient use. PyTables includes OPSI, a new indexing technology, > allowing to perform data lookups in tables exceeding 10 gigarows > (10**10 rows) in less than a tenth of a second. > > > Resources > ========= > > About PyTables: http://www.pytables.org > > About the HDF5 library: http://hdfgroup.org/HDF5/ > > About NumPy: http://numpy.scipy.org/ > > > Acknowledgments > =============== > > Thanks to many users who provided feature improvements, patches, bug > reports, support and suggestions. See the ``THANKS`` file in the > distribution package for a (incomplete) list of contributors. Most > specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) > makers. Without them, PyTables simply would not exist. > > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. > > > - ---- > > **Enjoy data!** > > > - -- > The PyTables Team > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk/4hDwACgkQ1JUs2CS3bP7TUwCfcobS3KI7L/6k3Bbbt2VBOz5B > TqAAn0DhrSdtd7XTPOj0RR/mpr2FtseE > =T5iQ > -----END PGP SIGNATURE----- > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Antonio V. <ant...@ti...> - 2012-07-07 18:47:41
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 =========================== Announcing PyTables 2.4.0b1 =========================== We are happy to announce PyTables 2.4.0b1. This is an incremental release which includes many changes to prepare for future Python 3 support. What's new ========== This release includes support for the float16 data type and read-only support for variable length string attributes. The handling of HDF5 errors has been improved. The user will no longer see HDF5 error stacks dumped to the console. All HDF5 error messages are trapped and attached to a proper Python exception. Now PyTables only supports HDF5 v1.8.4+. All the code has been updated to the new HDF5 API. Supporting only HDF5 1.8 series is beneficial for future development. As always, a large amount of bugs have been addressed and squashed as well. In case you want to know more in detail what has changed in this version, please refer to: http://pytables.github.com/release_notes.html You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://sourceforge.net/projects/pytables/files/pytables/2.4.0b1 For an online version of the manual, visit: http://pytables.github.com/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. - ---- **Enjoy data!** - -- The PyTables Team -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk/4hDwACgkQ1JUs2CS3bP7TUwCfcobS3KI7L/6k3Bbbt2VBOz5B TqAAn0DhrSdtd7XTPOj0RR/mpr2FtseE =T5iQ -----END PGP SIGNATURE----- |
From: Anthony S. <sc...@gm...> - 2012-07-06 15:48:31
|
Ahh thanks for clarifying.... On Jul 6, 2012 2:06 AM, "Francesc Alted" <fa...@gm...> wrote: > On 7/5/12 7:59 PM, Anthony Scopatz wrote: > > On Thu, Jul 5, 2012 at 12:34 PM, Jacob Bennett <jac...@gm...>wrote: > >> Hello Pytables Users, >> >> I am currently having a maximum number of children error within >> pytables. I am trying to store stock updates within hdf5. My current schema >> is to have one file represent a trading day, each table represent a >> particular instrumentID (stock id) and have each record in the table belong >> to a specific update with a timestamp (where the timestamp could be >> considered a primary key). >> >> I am currently having all tables be direct descendants of root. >> >> The problem with this is that per day I have the following stats: >> >> #of tables ::= 20000 >> #of Records per table ::= 250000 >> >> The problem persists in that 20000 is too many children to be >> associated with a particular node. Continuing with this schema will consume >> an exorbitant amount of memory and lead to slower query times. >> >> Is there a way to redesign this schema so that it could work better >> with pytables? Or is this simply too much data? >> > > It certainly isn't too much data. HDF5 scales to petabytes ;) > > >> Would it help to follow with the current schema and just increase the >> depth of the tree by taking parts of the instrumentId (instrumentId is an >> int64) as nodes? >> > > Yes, this would be one approach that would work. > > > +1 > > Basically, nodes in HDF5 only get a fixed amount of storage for > metadata, including what children they have. (I believe this number is 64 > kb. In theory, it is possible to increase this number and recompile hdf5, > but then files generated in this way would only be compatible with your > altered version of the library.) So if a group has so many children that > storing their names and locations takes up more than 64 kb, you have run > out of room. By adding N other subgroups to the hierarchy you increase the > metadata available to N * 64 kb. > > > No, this is wrong. The hierarchy metadata is stored on a different place > than user metadata, and hence it is not affected by the 64 KB limit. The > problem is rather that having too many children hanging from a single group > affects quite negatively to performance (the same happens with regular > filesystems having directories with too many files). > > -- > Francesc Alted > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Francesc A. <fa...@gm...> - 2012-07-06 07:06:06
|
On 7/5/12 7:59 PM, Anthony Scopatz wrote: > On Thu, Jul 5, 2012 at 12:34 PM, Jacob Bennett > <jac...@gm... <mailto:jac...@gm...>> wrote: > > Hello Pytables Users, > > I am currently having a maximum number of children error within > pytables. I am trying to store stock updates within hdf5. My > current schema is to have one file represent a trading day, each > table represent a particular instrumentID (stock id) and have each > record in the table belong to a specific update with a timestamp > (where the timestamp could be considered a primary key). > > I am currently having all tables be direct descendants of root. > > The problem with this is that per day I have the following stats: > > #of tables ::= 20000 > #of Records per table ::= 250000 > > The problem persists in that 20000 is too many children to be > associated with a particular node. Continuing with this schema > will consume an exorbitant amount of memory and lead to slower > query times. > > Is there a way to redesign this schema so that it could work > better with pytables? Or is this simply too much data? > > > It certainly isn't too much data. HDF5 scales to petabytes ;) > > Would it help to follow with the current schema and just increase > the depth of the tree by taking parts of the instrumentId > (instrumentId is an int64) as nodes? > > > Yes, this would be one approach that would work. +1 > Basically, nodes in HDF5 only get a fixed amount of storage for > metadata, including what children they have. (I believe this number > is 64 kb. In theory, it is possible to increase this number and > recompile hdf5, but then files generated in this way would only be > compatible with your altered version of the library.) So if a group > has so many children that storing their names and locations takes up > more than 64 kb, you have run out of room. By adding N other > subgroups to the hierarchy you increase the metadata available to N * > 64 kb. No, this is wrong. The hierarchy metadata is stored on a different place than user metadata, and hence it is not affected by the 64 KB limit. The problem is rather that having too many children hanging from a single group affects quite negatively to performance (the same happens with regular filesystems having directories with too many files). -- Francesc Alted |
From: Anthony S. <sc...@gm...> - 2012-07-05 18:00:01
|
On Thu, Jul 5, 2012 at 12:34 PM, Jacob Bennett <jac...@gm...>wrote: > Hello Pytables Users, > > I am currently having a maximum number of children error within pytables. > I am trying to store stock updates within hdf5. My current schema is to > have one file represent a trading day, each table represent a particular > instrumentID (stock id) and have each record in the table belong to a > specific update with a timestamp (where the timestamp could be considered a > primary key). > > I am currently having all tables be direct descendants of root. > > The problem with this is that per day I have the following stats: > > #of tables ::= 20000 > #of Records per table ::= 250000 > > The problem persists in that 20000 is too many children to be associated > with a particular node. Continuing with this schema will consume > an exorbitant amount of memory and lead to slower query times. > > Is there a way to redesign this schema so that it could work better with > pytables? Or is this simply too much data? > It certainly isn't too much data. HDF5 scales to petabytes ;) > Would it help to follow with the current schema and just increase the > depth of the tree by taking parts of the instrumentId (instrumentId is an > int64) as nodes? > Yes, this would be one approach that would work. Basically, nodes in HDF5 only get a fixed amount of storage for metadata, including what children they have. (I believe this number is 64 kb. In theory, it is possible to increase this number and recompile hdf5, but then files generated in this way would only be compatible with your altered version of the library.) So if a group has so many children that storing their names and locations takes up more than 64 kb, you have run out of room. By adding N other subgroups to the hierarchy you increase the metadata available to N * 64 kb. This is probably the easiest thing to do given your current setup. Anything else would require you changing the table description. There are probably some natural groupings within your instrumentIDs (eg all commodities go in one group, for example) that you could use. Be Well Anthony > > Thanks, > Jacob > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jacob B. <jac...@gm...> - 2012-07-05 17:34:22
|
Hello Pytables Users, I am currently having a maximum number of children error within pytables. I am trying to store stock updates within hdf5. My current schema is to have one file represent a trading day, each table represent a particular instrumentID (stock id) and have each record in the table belong to a specific update with a timestamp (where the timestamp could be considered a primary key). I am currently having all tables be direct descendants of root. The problem with this is that per day I have the following stats: #of tables ::= 20000 #of Records per table ::= 250000 The problem persists in that 20000 is too many children to be associated with a particular node. Continuing with this schema will consume an exorbitant amount of memory and lead to slower query times. Is there a way to redesign this schema so that it could work better with pytables? Or is this simply too much data? Would it help to follow with the current schema and just increase the depth of the tree by taking parts of the instrumentId (instrumentId is an int64) as nodes? Thanks, Jacob -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Anthony S. <sc...@gm...> - 2012-07-03 05:59:00
|
Why not read in just the date and ID columns to start with, then do a numpy.unique() or python set() on theses, then query based on the unique values? Seems like it might be faster.... Be Well Anthony On Mon, Jul 2, 2012 at 5:16 PM, Aquil H. Abdullah <aqu...@gm...>wrote: > Hello All, > > I have a table that is indexed by two keys, and I would like to search for > duplicate keys. So here is my naive slow implementation: (code I posted on > stackoverflow) > > import tables > > > h5f = tables.openFile('filename.h5') > > > tbl = h5f.getNode('/data','data_table') # assumes group data and table data_table > > > counter += 0 > > > for row in tbl: > > > ts = row['date'] # timestamp (ts) or date > > > uid = row['userID'] > > > query = '(date == %d) & (userID == "%s")' % (ts, uid) > > > result = tbl.readWhere(query) > > > if len(result) > 1: > > > # Do something here > > > pass > > > counter += 1 > > > if counter % 1000 == 0: print '%d rows processed' > > > > -- > Aquil H. Abdullah > aqu...@gm... > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2012-07-03 00:43:14
|
No worries ;) On Mon, Jul 2, 2012 at 5:41 PM, Jacob Bennett <jac...@gm...>wrote: > Cool, this seems pretty straightforward. Thanks again Anthony! > > -Jacob > > > On Mon, Jul 2, 2012 at 1:10 PM, Anthony Scopatz <sc...@gm...> wrote: > >> Hello Jacob, >> >> It seems like you have answered your own question ;). The thing is that >> the locking doesn't have to be all that hard. You can simply check if the >> file is already in the open files cache (from previous thread). If it >> isn't, the thread will open the file. Or you don't even have to do the >> check yourself because tables.openFile() will do this for you. If the file >> is already there, openFile() will just return you a reference to that File >> instance, which is what you wanted anyways. This takes care of opening. >> >> On the other side of things, don't allow any thread to close a file. >> Simply close all files in the cache when your code is about to exit. >> Keeping the file handles open and available for future reading isn't * >> that* expensive. >> >> So use openFile() for opening and don't close until the end and this >> should be thread safe for reading. Obviously, writing is more difficult. >> >> Be Well >> Anthony >> >> On Mon, Jul 2, 2012 at 5:38 AM, Jacob Bennett <jac...@gm...>wrote: >> >>> Hello PyTables Users, >>> >>> I am developing an API to access the current data stored in my pytables >>> instance. Note at this point that this is only reading, no writing to the >>> files. The big question on my mind at this point is how am I supposed to >>> handle the opening and closing of files on read requests that are >>> multithreaded? PyTables supports multithreading for read only; however, I >>> don't know how to handle two threads opening the same file or one thread >>> closing a file while the other is still reading it, besides putting a lock >>> on it thus disabling the multithreaded operations. >>> >>> Thanks, >>> Jacob >>> >>> -- >>> Jacob Bennett >>> Massachusetts Institute of Technology >>> Department of Electrical Engineering and Computer Science >>> Class of 2014| ben...@mi... >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jacob B. <jac...@gm...> - 2012-07-03 00:41:32
|
Cool, this seems pretty straightforward. Thanks again Anthony! -Jacob On Mon, Jul 2, 2012 at 1:10 PM, Anthony Scopatz <sc...@gm...> wrote: > Hello Jacob, > > It seems like you have answered your own question ;). The thing is that > the locking doesn't have to be all that hard. You can simply check if the > file is already in the open files cache (from previous thread). If it > isn't, the thread will open the file. Or you don't even have to do the > check yourself because tables.openFile() will do this for you. If the file > is already there, openFile() will just return you a reference to that File > instance, which is what you wanted anyways. This takes care of opening. > > On the other side of things, don't allow any thread to close a file. > Simply close all files in the cache when your code is about to exit. > Keeping the file handles open and available for future reading isn't * > that* expensive. > > So use openFile() for opening and don't close until the end and this > should be thread safe for reading. Obviously, writing is more difficult. > > Be Well > Anthony > > On Mon, Jul 2, 2012 at 5:38 AM, Jacob Bennett <jac...@gm...>wrote: > >> Hello PyTables Users, >> >> I am developing an API to access the current data stored in my pytables >> instance. Note at this point that this is only reading, no writing to the >> files. The big question on my mind at this point is how am I supposed to >> handle the opening and closing of files on read requests that are >> multithreaded? PyTables supports multithreading for read only; however, I >> don't know how to handle two threads opening the same file or one thread >> closing a file while the other is still reading it, besides putting a lock >> on it thus disabling the multithreaded operations. >> >> Thanks, >> Jacob >> >> -- >> Jacob Bennett >> Massachusetts Institute of Technology >> Department of Electrical Engineering and Computer Science >> Class of 2014| ben...@mi... >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Aquil H. A. <aqu...@gm...> - 2012-07-03 00:16:20
|
Hello All, I have a table that is indexed by two keys, and I would like to search for duplicate keys. So here is my naive slow implementation: (code I posted on stackoverflow) import tables h5f = tables.openFile('filename.h5') tbl = h5f.getNode('/data','data_table') # assumes group data and table data_table counter += 0 for row in tbl: ts = row['date'] # timestamp (ts) or date uid = row['userID'] query = '(date == %d) & (userID == "%s")' % (ts, uid) result = tbl.readWhere(query) if len(result) > 1: # Do something here pass counter += 1 if counter % 1000 == 0: print '%d rows processed' -- Aquil H. Abdullah aqu...@gm... |
From: Anthony S. <sc...@gm...> - 2012-07-02 18:11:12
|
Hello Jacob, It seems like you have answered your own question ;). The thing is that the locking doesn't have to be all that hard. You can simply check if the file is already in the open files cache (from previous thread). If it isn't, the thread will open the file. Or you don't even have to do the check yourself because tables.openFile() will do this for you. If the file is already there, openFile() will just return you a reference to that File instance, which is what you wanted anyways. This takes care of opening. On the other side of things, don't allow any thread to close a file. Simply close all files in the cache when your code is about to exit. Keeping the file handles open and available for future reading isn't *that*expensive. So use openFile() for opening and don't close until the end and this should be thread safe for reading. Obviously, writing is more difficult. Be Well Anthony On Mon, Jul 2, 2012 at 5:38 AM, Jacob Bennett <jac...@gm...>wrote: > Hello PyTables Users, > > I am developing an API to access the current data stored in my pytables > instance. Note at this point that this is only reading, no writing to the > files. The big question on my mind at this point is how am I supposed to > handle the opening and closing of files on read requests that are > multithreaded? PyTables supports multithreading for read only; however, I > don't know how to handle two threads opening the same file or one thread > closing a file while the other is still reading it, besides putting a lock > on it thus disabling the multithreaded operations. > > Thanks, > Jacob > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jacob B. <jac...@gm...> - 2012-07-02 12:38:16
|
Hello PyTables Users, I am developing an API to access the current data stored in my pytables instance. Note at this point that this is only reading, no writing to the files. The big question on my mind at this point is how am I supposed to handle the opening and closing of files on read requests that are multithreaded? PyTables supports multithreading for read only; however, I don't know how to handle two threads opening the same file or one thread closing a file while the other is still reading it, besides putting a lock on it thus disabling the multithreaded operations. Thanks, Jacob -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Anthony S. <sc...@gm...> - 2012-06-29 19:23:42
|
Hi Jacob, Hmmmm, this shouldn't be happening. The data isn't large enough. While there *may* be a memory leak (use valgrind to find it) there it is likely that you are just failing to deference something(s). Is there any place in your calculation where you might accidentally keep data around? Note that PyTables/HDF5 cachesa bunch of stuff behind the scene. What does `ps ux` say while you are running the code or right before the fail? Does the code break with data that is 1/10th the size? Be Well Anthony On Fri, Jun 29, 2012 at 1:48 PM, Jacob Bennett <jac...@gm...>wrote: > Hello PyTables Users, > > My current implementation works pretty well now and has the write speeds > that I am looking for; however, around 20 minutes of execution and of a > file size of around 127MB with level 3 blosc compression I seem to get > memory allocation errors. Here is my trace that I get, if anybody can shed > light on this, that will be excellent. Does my implementation hog all of my > memory? Is there a memory leak? > > HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 0: > #000: ..\..\hdf5-1.8.8\src\H5Dio.c line 266 in H5Dwrite(): can't write > data > major: Dataset > minor: Write failed > #001: ..\..\hdf5-1.8.8\src\H5Dio.c line 671 in H5D_write(): can't write > data > major: Dataset > minor: Write failed > #002: ..\..\hdf5-1.8.8\src\H5Dchunk.c line 1861 in H5D_chunk_write(): > unable t > o read raw data chunk > major: Low-level I/O > minor: Read failed > #003: ..\..\hdf5-1.8.8\src\H5Dchunk.c line 2776 in H5D_chunk_lock(): > memory al > location failed for raw data chunk > major: Resource unavailable > minor: No space available for allocation > Exception in thread bookthread: > Traceback (most recent call last): > File "C:\Python27bit\lib\threading.py", line 551, in __bootstrap_inner > self.run() > File "../PyTablesInterface\Acceptor.py", line 21, in run > BookDataWrapper.acceptDict() > File "../PyTablesInterface\BookDataWrapper.py", line 50, in acceptDict > tableD.append(dataArray) > File "C:\Python27bit\lib\site-packages\tables\table.py", line 2081, in > append > self._saveBufferedRows(wbufRA, lenrows) > File "C:\Python27bit\lib\site-packages\tables\table.py", line 2016, in > _saveBu > fferedRows > self._append_records(lenrows) > File "tableExtension.pyx", line 454, in > tables.tableExtension.Table._append_re > cords (tables\tableExtension.c:4623) > HDF5ExtError: Problems appending the records. > ##################################### > ######THIS IS A LATER ERROR######### > ##################################### > Exception in thread CME_10_B: > Traceback (most recent call last): > File "C:\Python27bit\lib\threading.py", line 551, in __bootstrap_inner > self.run() > File > "C:\Users\jacob.bennett\development\MarketDataReader\IO\__init__.py", lin > e 19, in run > self.socket.rec() > File > "C:\Users\jacob.bennett\development\MarketDataReader\IO\MarketSocket.py", > line 33, in rec > Parser.parse(self.sock.recv(1024*16), self.exchange) > File "../Parser\Parser.py", line 39, in parse > SendInBatch.acceptBookData(instrumentId, timestamp, 0, i, bidPrice, > bidQuant > , bidOrders, exchange, source) > File "../PyTablesInterface\SendInBatch.py", line 28, in acceptBookData > maindict[(instrumentId, yearmonthday)] = [(timestamp1, timestamp2, > side, lev > el, price, quant, orders, source, 1)] > MemoryError > > Thanks, > Jacob Bennett > > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jacob B. <jac...@gm...> - 2012-06-29 18:48:52
|
Hello PyTables Users, My current implementation works pretty well now and has the write speeds that I am looking for; however, around 20 minutes of execution and of a file size of around 127MB with level 3 blosc compression I seem to get memory allocation errors. Here is my trace that I get, if anybody can shed light on this, that will be excellent. Does my implementation hog all of my memory? Is there a memory leak? HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 0: #000: ..\..\hdf5-1.8.8\src\H5Dio.c line 266 in H5Dwrite(): can't write data major: Dataset minor: Write failed #001: ..\..\hdf5-1.8.8\src\H5Dio.c line 671 in H5D_write(): can't write data major: Dataset minor: Write failed #002: ..\..\hdf5-1.8.8\src\H5Dchunk.c line 1861 in H5D_chunk_write(): unable t o read raw data chunk major: Low-level I/O minor: Read failed #003: ..\..\hdf5-1.8.8\src\H5Dchunk.c line 2776 in H5D_chunk_lock(): memory al location failed for raw data chunk major: Resource unavailable minor: No space available for allocation Exception in thread bookthread: Traceback (most recent call last): File "C:\Python27bit\lib\threading.py", line 551, in __bootstrap_inner self.run() File "../PyTablesInterface\Acceptor.py", line 21, in run BookDataWrapper.acceptDict() File "../PyTablesInterface\BookDataWrapper.py", line 50, in acceptDict tableD.append(dataArray) File "C:\Python27bit\lib\site-packages\tables\table.py", line 2081, in append self._saveBufferedRows(wbufRA, lenrows) File "C:\Python27bit\lib\site-packages\tables\table.py", line 2016, in _saveBu fferedRows self._append_records(lenrows) File "tableExtension.pyx", line 454, in tables.tableExtension.Table._append_re cords (tables\tableExtension.c:4623) HDF5ExtError: Problems appending the records. ##################################### ######THIS IS A LATER ERROR######### ##################################### Exception in thread CME_10_B: Traceback (most recent call last): File "C:\Python27bit\lib\threading.py", line 551, in __bootstrap_inner self.run() File "C:\Users\jacob.bennett\development\MarketDataReader\IO\__init__.py", lin e 19, in run self.socket.rec() File "C:\Users\jacob.bennett\development\MarketDataReader\IO\MarketSocket.py", line 33, in rec Parser.parse(self.sock.recv(1024*16), self.exchange) File "../Parser\Parser.py", line 39, in parse SendInBatch.acceptBookData(instrumentId, timestamp, 0, i, bidPrice, bidQuant , bidOrders, exchange, source) File "../PyTablesInterface\SendInBatch.py", line 28, in acceptBookData maindict[(instrumentId, yearmonthday)] = [(timestamp1, timestamp2, side, lev el, price, quant, orders, source, 1)] MemoryError Thanks, Jacob Bennett -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Anthony S. <sc...@gm...> - 2012-06-29 01:13:16
|
Thanks Jacob, This definitely sounds like a bug. if you come up with a self contained example, please report it at https://github.com/PyTables/PyTables/issues Thanks! Anthony On Thu, Jun 28, 2012 at 7:40 PM, Jacob Bennett <jac...@gm...>wrote: > It's strange really. It seems like anything int 64 in python (greater than > 4billion) fails to convert and throws this message; however, for numbers > that can be represented by int 32 can convert fine. Btw, this is for a > field that is defined as UInt64 in pytables and only fails if I do > Table.append(row). If I do insertion based upon table.row, then it works > fine. > > I will look at this issue more later tonight, and will report my findings. > > Thanks, > Jacob > > > On Thu, Jun 28, 2012 at 5:37 PM, Anthony Scopatz <sc...@gm...>wrote: > >> Hello Again Jacob, >> >> Hmm are they of Python type long? Also, what exactly is the number that >> is failing? >> >> Be Well >> Anthony >> >> On Thu, Jun 28, 2012 at 4:18 PM, Jacob Bennett <jac...@gm... >> > wrote: >> >>> Hello PyTables Users, >>> >>> I have a concern with a very strange error that references that my >>> python ints cannot be converted to C longs when trying to run >>> Table.append(rows). My python integers are definitely not big, at most they >>> would probably be around 3 billion in size, which shouldn't be any problem >>> for conversion to C long. >>> >>> This is the error that I am receiving... >>> >>> Exception in thread bookthread: >>> Traceback (most recent call last): >>> File "C:\Python27\lib\threading.py", line 551, in __bootstrap_inner >>> self.run() >>> File >>> "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\Acceptor.py", >>> line 21, in run >>> BookDataWrapper.acceptDict() >>> File >>> "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\BookDataWrapper.py", >>> line 49, in acceptDict >>> tableD.append(dataArray) >>> File "C:\Python27\lib\site-packages\tables\table.py", line 2076, in >>> append >>> "rows parameter cannot be converted into a recarray object compliant >>> with table '%s'. The error was: <%s>" % (str(self), exc) >>> ValueError: rows parameter cannot be converted into a recarray object >>> compliant with table '/t301491615959191971 (Table(0,), shuffle, blosc(3)) >>> 'Instrument''. The error was: <Python int too large to convert to C long> >>> >>> Thanks, >>> Jacob >>> >>> -- >>> Jacob Bennett >>> Massachusetts Institute of Technology >>> Department of Electrical Engineering and Computer Science >>> Class of 2014| ben...@mi... >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jacob B. <jac...@gm...> - 2012-06-29 00:40:42
|
It's strange really. It seems like anything int 64 in python (greater than 4billion) fails to convert and throws this message; however, for numbers that can be represented by int 32 can convert fine. Btw, this is for a field that is defined as UInt64 in pytables and only fails if I do Table.append(row). If I do insertion based upon table.row, then it works fine. I will look at this issue more later tonight, and will report my findings. Thanks, Jacob On Thu, Jun 28, 2012 at 5:37 PM, Anthony Scopatz <sc...@gm...> wrote: > Hello Again Jacob, > > Hmm are they of Python type long? Also, what exactly is the number that > is failing? > > Be Well > Anthony > > On Thu, Jun 28, 2012 at 4:18 PM, Jacob Bennett <jac...@gm...>wrote: > >> Hello PyTables Users, >> >> I have a concern with a very strange error that references that my python >> ints cannot be converted to C longs when trying to run Table.append(rows). >> My python integers are definitely not big, at most they would probably be >> around 3 billion in size, which shouldn't be any problem for conversion to >> C long. >> >> This is the error that I am receiving... >> >> Exception in thread bookthread: >> Traceback (most recent call last): >> File "C:\Python27\lib\threading.py", line 551, in __bootstrap_inner >> self.run() >> File >> "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\Acceptor.py", >> line 21, in run >> BookDataWrapper.acceptDict() >> File >> "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\BookDataWrapper.py", >> line 49, in acceptDict >> tableD.append(dataArray) >> File "C:\Python27\lib\site-packages\tables\table.py", line 2076, in >> append >> "rows parameter cannot be converted into a recarray object compliant >> with table '%s'. The error was: <%s>" % (str(self), exc) >> ValueError: rows parameter cannot be converted into a recarray object >> compliant with table '/t301491615959191971 (Table(0,), shuffle, blosc(3)) >> 'Instrument''. The error was: <Python int too large to convert to C long> >> >> Thanks, >> Jacob >> >> -- >> Jacob Bennett >> Massachusetts Institute of Technology >> Department of Electrical Engineering and Computer Science >> Class of 2014| ben...@mi... >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Alvaro T. C. <al...@mi...> - 2012-06-29 00:09:48
|
Thank you Josh, that is representative enough. In my system the speedup of structured arrays is ~30x. A copy of the whole array is still ~6x faster. -á. On Thu, Jun 28, 2012 at 10:13 PM, Josh Ayers <jos...@gm...> wrote: > import time > import numpy as np > > dtype = np.format_parser(['i4', 'i4'], [], []) > N = 100000 > rec = np.recarray((N, ), dtype=dtype) > struc = np.zeros((N, ), dtype=dtype) > > t1 = time.clock() > for row in rec: > pass > print time.clock() - t1 > > t1 = time.clock() > for row in struc: > pass > print time.clock() - t1 |
From: Anthony S. <sc...@gm...> - 2012-06-28 22:38:07
|
Hello Again Jacob, Hmm are they of Python type long? Also, what exactly is the number that is failing? Be Well Anthony On Thu, Jun 28, 2012 at 4:18 PM, Jacob Bennett <jac...@gm...>wrote: > Hello PyTables Users, > > I have a concern with a very strange error that references that my python > ints cannot be converted to C longs when trying to run Table.append(rows). > My python integers are definitely not big, at most they would probably be > around 3 billion in size, which shouldn't be any problem for conversion to > C long. > > This is the error that I am receiving... > > Exception in thread bookthread: > Traceback (most recent call last): > File "C:\Python27\lib\threading.py", line 551, in __bootstrap_inner > self.run() > File > "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\Acceptor.py", > line 21, in run > BookDataWrapper.acceptDict() > File > "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\BookDataWrapper.py", > line 49, in acceptDict > tableD.append(dataArray) > File "C:\Python27\lib\site-packages\tables\table.py", line 2076, in > append > "rows parameter cannot be converted into a recarray object compliant > with table '%s'. The error was: <%s>" % (str(self), exc) > ValueError: rows parameter cannot be converted into a recarray object > compliant with table '/t301491615959191971 (Table(0,), shuffle, blosc(3)) > 'Instrument''. The error was: <Python int too large to convert to C long> > > Thanks, > Jacob > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2012-06-28 22:17:48
|
That is reason enough for me really. If someone really wants a recarray, they could always convert an ndarray to this. I think it is still worth asking the numpy list what the status is... Be Well Anthony On Thu, Jun 28, 2012 at 4:13 PM, Josh Ayers <jos...@gm...> wrote: > There is a big difference in speed when iterating over the rows. Possibly > that was the reason structured arrays were chosen? The issue is mentioned > here: http://www.scipy.org/Cookbook/Recarray > > In a simple test, I get a difference of about 15x, so it is significant. > Iterating over a recarray with 100,000 rows takes 0.22s, versus 0.014s for > the structured array. Here's the code. > > import time > import numpy as np > > dtype = np.format_parser(['i4', 'i4'], [], []) > N = 100000 > rec = np.recarray((N, ), dtype=dtype) > struc = np.zeros((N, ), dtype=dtype) > > t1 = time.clock() > for row in rec: > pass > print time.clock() - t1 > > t1 = time.clock() > for row in struc: > pass > print time.clock() - t1 > > > > > On Thu, Jun 28, 2012 at 1:31 PM, Anthony Scopatz <sc...@gm...>wrote: > >> On Thu, Jun 28, 2012 at 3:23 PM, Francesc Alted <fa...@py...>wrote: >> >>> Yes, I think it would make more sense to return a recarray too. >>> However, I remember many time ago (3, 4 years?) that NumPy developers were >>> recommending using structured arrays instead of recarrays. I don't >>> remember exactly the arguments, but I think that was the reason why the >>> structured arrays were declared the default for reading tables. But this >>> could be changed, of course... >>> >> >> I remember this too Francesc. I don't think that this has changed, but I >> forgot the reasons. Maybe I'll write to the numpy list later tonight, >> unless someone else wants to... >> >> >>> >>> Francesc >>> >>> >>> On 6/28/12 8:25 PM, Anthony Scopatz wrote: >>> >>> Hmmm Ok. Maybe there needs to be a recarray flavor. >>> >>> I kind of like just returning a normal ndarray, though I see your >>> argument for returning a recarray. Maybe some of the other devs can jump >>> in here with an opinion. >>> >>> Be Well >>> Anthony >>> >>> On Thu, Jun 28, 2012 at 12:37 PM, Alvaro Tejero Cantero <al...@mi... >>> > wrote: >>> >>>> I just tested: passing an object of type numpy.core.records.recarray >>>> to the constructor of createTable and then reading back it into memory >>>> via slicing (h5f.root.myobj[:] ) returns to me a numpy.ndarray. >>>> >>>> Best, >>>> >>>> -á. >>>> >>>> >>>> On Thu, Jun 28, 2012 at 5:30 PM, Anthony Scopatz <sc...@gm...> >>>> wrote: >>>> > Hi Alvaro, >>>> > >>>> > I think if you save the table as a record array, it should return you >>>> a >>>> > record array. Or does it return a structured array? Have you tried >>>> this? >>>> > >>>> > Be Well >>>> > Anthony >>>> > >>>> > On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero < >>>> al...@mi...> >>>> > wrote: >>>> >> >>>> >> Hi, >>>> >> >>>> >> I've noticed that tables are loaded in memory as structured arrays. >>>> >> >>>> >> It seems that returning recarrays by default would be much in the >>>> >> spirit of the natural naming preferences of PyTables. >>>> >> >>>> >> Is there a reason not to do so? >>>> >> >>>> >> Cheers, >>>> >> >>>> >> Álvaro. >>>> >> >>>> >> >>>> >> >>>> ------------------------------------------------------------------------------ >>>> >> Live Security Virtual Conference >>>> >> Exclusive live event will cover all the ways today's security and >>>> >> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> >> will include endpoint security, mobile security and the latest in >>>> malware >>>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> >> _______________________________________________ >>>> >> Pytables-users mailing list >>>> >> Pyt...@li... >>>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> > >>>> > >>>> > >>>> > >>>> ------------------------------------------------------------------------------ >>>> > Live Security Virtual Conference >>>> > Exclusive live event will cover all the ways today's security and >>>> > threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> > will include endpoint security, mobile security and the latest in >>>> malware >>>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> > _______________________________________________ >>>> > Pytables-users mailing list >>>> > Pyt...@li... >>>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> > >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Pytables-users mailing list >>>> Pyt...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> >>> >>> >>> _______________________________________________ >>> Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >>> >>> -- >>> Francesc Alted >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jacob B. <jac...@gm...> - 2012-06-28 21:18:35
|
Hello PyTables Users, I have a concern with a very strange error that references that my python ints cannot be converted to C longs when trying to run Table.append(rows). My python integers are definitely not big, at most they would probably be around 3 billion in size, which shouldn't be any problem for conversion to C long. This is the error that I am receiving... Exception in thread bookthread: Traceback (most recent call last): File "C:\Python27\lib\threading.py", line 551, in __bootstrap_inner self.run() File "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\Acceptor.py", line 21, in run BookDataWrapper.acceptDict() File "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\BookDataWrapper.py", line 49, in acceptDict tableD.append(dataArray) File "C:\Python27\lib\site-packages\tables\table.py", line 2076, in append "rows parameter cannot be converted into a recarray object compliant with table '%s'. The error was: <%s>" % (str(self), exc) ValueError: rows parameter cannot be converted into a recarray object compliant with table '/t301491615959191971 (Table(0,), shuffle, blosc(3)) 'Instrument''. The error was: <Python int too large to convert to C long> Thanks, Jacob -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Josh A. <jos...@gm...> - 2012-06-28 21:13:17
|
There is a big difference in speed when iterating over the rows. Possibly that was the reason structured arrays were chosen? The issue is mentioned here: http://www.scipy.org/Cookbook/Recarray In a simple test, I get a difference of about 15x, so it is significant. Iterating over a recarray with 100,000 rows takes 0.22s, versus 0.014s for the structured array. Here's the code. import time import numpy as np dtype = np.format_parser(['i4', 'i4'], [], []) N = 100000 rec = np.recarray((N, ), dtype=dtype) struc = np.zeros((N, ), dtype=dtype) t1 = time.clock() for row in rec: pass print time.clock() - t1 t1 = time.clock() for row in struc: pass print time.clock() - t1 On Thu, Jun 28, 2012 at 1:31 PM, Anthony Scopatz <sc...@gm...> wrote: > On Thu, Jun 28, 2012 at 3:23 PM, Francesc Alted <fa...@py...>wrote: > >> Yes, I think it would make more sense to return a recarray too. >> However, I remember many time ago (3, 4 years?) that NumPy developers were >> recommending using structured arrays instead of recarrays. I don't >> remember exactly the arguments, but I think that was the reason why the >> structured arrays were declared the default for reading tables. But this >> could be changed, of course... >> > > I remember this too Francesc. I don't think that this has changed, but I > forgot the reasons. Maybe I'll write to the numpy list later tonight, > unless someone else wants to... > > >> >> Francesc >> >> >> On 6/28/12 8:25 PM, Anthony Scopatz wrote: >> >> Hmmm Ok. Maybe there needs to be a recarray flavor. >> >> I kind of like just returning a normal ndarray, though I see your >> argument for returning a recarray. Maybe some of the other devs can jump >> in here with an opinion. >> >> Be Well >> Anthony >> >> On Thu, Jun 28, 2012 at 12:37 PM, Alvaro Tejero Cantero <al...@mi...>wrote: >> >>> I just tested: passing an object of type numpy.core.records.recarray >>> to the constructor of createTable and then reading back it into memory >>> via slicing (h5f.root.myobj[:] ) returns to me a numpy.ndarray. >>> >>> Best, >>> >>> -á. >>> >>> >>> On Thu, Jun 28, 2012 at 5:30 PM, Anthony Scopatz <sc...@gm...> >>> wrote: >>> > Hi Alvaro, >>> > >>> > I think if you save the table as a record array, it should return you a >>> > record array. Or does it return a structured array? Have you tried >>> this? >>> > >>> > Be Well >>> > Anthony >>> > >>> > On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero < >>> al...@mi...> >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> I've noticed that tables are loaded in memory as structured arrays. >>> >> >>> >> It seems that returning recarrays by default would be much in the >>> >> spirit of the natural naming preferences of PyTables. >>> >> >>> >> Is there a reason not to do so? >>> >> >>> >> Cheers, >>> >> >>> >> Álvaro. >>> >> >>> >> >>> >> >>> ------------------------------------------------------------------------------ >>> >> Live Security Virtual Conference >>> >> Exclusive live event will cover all the ways today's security and >>> >> threat landscape has changed and how IT managers can respond. >>> Discussions >>> >> will include endpoint security, mobile security and the latest in >>> malware >>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> >> _______________________________________________ >>> >> Pytables-users mailing list >>> >> Pyt...@li... >>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> > >>> > >>> > >>> ------------------------------------------------------------------------------ >>> > Live Security Virtual Conference >>> > Exclusive live event will cover all the ways today's security and >>> > threat landscape has changed and how IT managers can respond. >>> Discussions >>> > will include endpoint security, mobile security and the latest in >>> malware >>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> > _______________________________________________ >>> > Pytables-users mailing list >>> > Pyt...@li... >>> > https://lists.sourceforge.net/lists/listinfo/pytables-users >>> > >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >> >> >> _______________________________________________ >> Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> -- >> Francesc Alted >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2012-06-28 20:32:08
|
On Thu, Jun 28, 2012 at 3:23 PM, Francesc Alted <fa...@py...> wrote: > Yes, I think it would make more sense to return a recarray too. However, > I remember many time ago (3, 4 years?) that NumPy developers were > recommending using structured arrays instead of recarrays. I don't > remember exactly the arguments, but I think that was the reason why the > structured arrays were declared the default for reading tables. But this > could be changed, of course... > I remember this too Francesc. I don't think that this has changed, but I forgot the reasons. Maybe I'll write to the numpy list later tonight, unless someone else wants to... > > Francesc > > > On 6/28/12 8:25 PM, Anthony Scopatz wrote: > > Hmmm Ok. Maybe there needs to be a recarray flavor. > > I kind of like just returning a normal ndarray, though I see your > argument for returning a recarray. Maybe some of the other devs can jump > in here with an opinion. > > Be Well > Anthony > > On Thu, Jun 28, 2012 at 12:37 PM, Alvaro Tejero Cantero <al...@mi...>wrote: > >> I just tested: passing an object of type numpy.core.records.recarray >> to the constructor of createTable and then reading back it into memory >> via slicing (h5f.root.myobj[:] ) returns to me a numpy.ndarray. >> >> Best, >> >> -á. >> >> >> On Thu, Jun 28, 2012 at 5:30 PM, Anthony Scopatz <sc...@gm...> >> wrote: >> > Hi Alvaro, >> > >> > I think if you save the table as a record array, it should return you a >> > record array. Or does it return a structured array? Have you tried >> this? >> > >> > Be Well >> > Anthony >> > >> > On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero < >> al...@mi...> >> > wrote: >> >> >> >> Hi, >> >> >> >> I've noticed that tables are loaded in memory as structured arrays. >> >> >> >> It seems that returning recarrays by default would be much in the >> >> spirit of the natural naming preferences of PyTables. >> >> >> >> Is there a reason not to do so? >> >> >> >> Cheers, >> >> >> >> Álvaro. >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Live Security Virtual Conference >> >> Exclusive live event will cover all the ways today's security and >> >> threat landscape has changed and how IT managers can respond. >> Discussions >> >> will include endpoint security, mobile security and the latest in >> malware >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >> _______________________________________________ >> >> Pytables-users mailing list >> >> Pyt...@li... >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> > >> > >> ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> Discussions >> > will include endpoint security, mobile security and the latest in >> malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > > _______________________________________________ > Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > -- > Francesc Alted > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Francesc A. <fa...@py...> - 2012-06-28 20:23:40
|
Yes, I think it would make more sense to return a recarray too. However, I remember many time ago (3, 4 years?) that NumPy developers were recommending using structured arrays instead of recarrays. I don't remember exactly the arguments, but I think that was the reason why the structured arrays were declared the default for reading tables. But this could be changed, of course... Francesc On 6/28/12 8:25 PM, Anthony Scopatz wrote: > Hmmm Ok. Maybe there needs to be a recarray flavor. > > I kind of like just returning a normal ndarray, though I see your > argument for returning a recarray. Maybe some of the other devs can > jump in here with an opinion. > > Be Well > Anthony > > On Thu, Jun 28, 2012 at 12:37 PM, Alvaro Tejero Cantero > <al...@mi... <mailto:al...@mi...>> wrote: > > I just tested: passing an object of type numpy.core.records.recarray > to the constructor of createTable and then reading back it into memory > via slicing (h5f.root.myobj[:] ) returns to me a numpy.ndarray. > > Best, > > -á. > > > On Thu, Jun 28, 2012 at 5:30 PM, Anthony Scopatz > <sc...@gm... <mailto:sc...@gm...>> wrote: > > Hi Alvaro, > > > > I think if you save the table as a record array, it should > return you a > > record array. Or does it return a structured array? Have you > tried this? > > > > Be Well > > Anthony > > > > On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero > <al...@mi... <mailto:al...@mi...>> > > wrote: > >> > >> Hi, > >> > >> I've noticed that tables are loaded in memory as structured arrays. > >> > >> It seems that returning recarrays by default would be much in the > >> spirit of the natural naming preferences of PyTables. > >> > >> Is there a reason not to do so? > >> > >> Cheers, > >> > >> Álvaro. > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest > in malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > <mailto:Pyt...@li...> > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. > Discussions > > will include endpoint security, mobile security and the latest > in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > <mailto:Pyt...@li...> > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. > Discussions > will include endpoint security, mobile security and the latest in > malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users -- Francesc Alted |