pytables-users Mailing List for PyTables - Hierarchical datasets (Page 24)

pytables-users — PyTables users discussion list

You can subscribe to this list here.

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 22 23 24 25 26 .. 165 > >> (Page 24 of 165)

Re: [Pytables-users] ANN: PyTables 2.4.0 beta1

From: Antonio V. <ant...@ti...> - 2012-07-08 10:55:58

Attachments: test_HDF5ErrorHandling_test_enable_messages.py

Hi Christoph,
thank you for reporting.

Can you please tell us which is the output of the attached script on
your machine?

thanks in advance



Il 07/07/2012 21:18, Christoph Gohlke ha scritto:
> Looks good. Only one test failure on win-amd64-py2.7 (attached).
> 
> Christoph
> 
> On 7/7/2012 11:47 AM, Antonio Valentino wrote:
>> -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA1
>>
>> ===========================
>> Announcing PyTables 2.4.0b1
>> ===========================
>>
[CUT]


-- 
Antonio Valentino

Re: [Pytables-users] ANN: PyTables 2.4.0 beta1

From: Christoph G. <cg...@uc...> - 2012-07-07 19:18:12

Attachments: tables_test.txt

Looks good. Only one test failure on win-amd64-py2.7 (attached).

Christoph

On 7/7/2012 11:47 AM, Antonio Valentino wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> ===========================
> Announcing PyTables 2.4.0b1
> ===========================
>
> We are happy to announce PyTables 2.4.0b1.
>
> This is an incremental release which includes many changes to prepare
> for future Python 3 support.
>
>
> What's new
> ==========
>
> This release includes support for the float16 data type and read-only
> support for variable length string attributes.
>
> The handling of HDF5 errors has been improved.  The user will no
> longer see HDF5 error stacks dumped to the console.  All HDF5 error
> messages are trapped and attached to a proper Python exception.
>
> Now PyTables only supports HDF5 v1.8.4+. All the code has been updated
> to the new HDF5 API.  Supporting only HDF5 1.8 series is beneficial
> for future development.
>
> As always, a large amount of bugs have been addressed and squashed as
> well.
>
> In case you want to know more in detail what has changed in this
> version, please refer to:
> http://pytables.github.com/release_notes.html
>
> You can download a source package with generated PDF and HTML docs, as
> well as binaries for Windows, from:
> http://sourceforge.net/projects/pytables/files/pytables/2.4.0b1
>
> For an online version of the manual, visit:
> http://pytables.github.com/usersguide/index.html
>
>
> What it is?
> ===========
>
> PyTables is a library for managing hierarchical datasets and
> designed to efficiently cope with extremely large amounts of data with
> support for full 64-bit file addressing.  PyTables runs on top of
> the HDF5 library and NumPy package for achieving maximum throughput and
> convenient use.  PyTables includes OPSI, a new indexing technology,
> allowing to perform data lookups in tables exceeding 10 gigarows
> (10**10 rows) in less than a tenth of a second.
>
>
> Resources
> =========
>
> About PyTables: http://www.pytables.org
>
> About the HDF5 library: http://hdfgroup.org/HDF5/
>
> About NumPy: http://numpy.scipy.org/
>
>
> Acknowledgments
> ===============
>
> Thanks to many users who provided feature improvements, patches, bug
> reports, support and suggestions.  See the ``THANKS`` file in the
> distribution package for a (incomplete) list of contributors.  Most
> specially, a lot of kudos go to the HDF5 and NumPy (and numarray!)
> makers.  Without them, PyTables simply would not exist.
>
>
> Share your experience
> =====================
>
> Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.
>
>

Re: [Pytables-users] ANN: PyTables 2.4.0 beta1

From: Anthony S. <sc...@gm...> - 2012-07-07 18:50:08

Great Success!  Please hammer on this everybody.

On Sat, Jul 7, 2012 at 1:47 PM, Antonio Valentino <
ant...@ti...> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> ===========================
> Announcing PyTables 2.4.0b1
> ===========================
>
> We are happy to announce PyTables 2.4.0b1.
>
> This is an incremental release which includes many changes to prepare
> for future Python 3 support.
>
>
> What's new
> ==========
>
> This release includes support for the float16 data type and read-only
> support for variable length string attributes.
>
> The handling of HDF5 errors has been improved.  The user will no
> longer see HDF5 error stacks dumped to the console.  All HDF5 error
> messages are trapped and attached to a proper Python exception.
>
> Now PyTables only supports HDF5 v1.8.4+. All the code has been updated
> to the new HDF5 API.  Supporting only HDF5 1.8 series is beneficial
> for future development.
>
> As always, a large amount of bugs have been addressed and squashed as
> well.
>
> In case you want to know more in detail what has changed in this
> version, please refer to:
> http://pytables.github.com/release_notes.html
>
> You can download a source package with generated PDF and HTML docs, as
> well as binaries for Windows, from:
> http://sourceforge.net/projects/pytables/files/pytables/2.4.0b1
>
> For an online version of the manual, visit:
> http://pytables.github.com/usersguide/index.html
>
>
> What it is?
> ===========
>
> PyTables is a library for managing hierarchical datasets and
> designed to efficiently cope with extremely large amounts of data with
> support for full 64-bit file addressing.  PyTables runs on top of
> the HDF5 library and NumPy package for achieving maximum throughput and
> convenient use.  PyTables includes OPSI, a new indexing technology,
> allowing to perform data lookups in tables exceeding 10 gigarows
> (10**10 rows) in less than a tenth of a second.
>
>
> Resources
> =========
>
> About PyTables: http://www.pytables.org
>
> About the HDF5 library: http://hdfgroup.org/HDF5/
>
> About NumPy: http://numpy.scipy.org/
>
>
> Acknowledgments
> ===============
>
> Thanks to many users who provided feature improvements, patches, bug
> reports, support and suggestions.  See the ``THANKS`` file in the
> distribution package for a (incomplete) list of contributors.  Most
> specially, a lot of kudos go to the HDF5 and NumPy (and numarray!)
> makers.  Without them, PyTables simply would not exist.
>
>
> Share your experience
> =====================
>
> Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.
>
>
> - ----
>
>   **Enjoy data!**
>
>
> - --
> The PyTables Team
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk/4hDwACgkQ1JUs2CS3bP7TUwCfcobS3KI7L/6k3Bbbt2VBOz5B
> TqAAn0DhrSdtd7XTPOj0RR/mpr2FtseE
> =T5iQ
> -----END PGP SIGNATURE-----
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

[Pytables-users] ANN: PyTables 2.4.0 beta1

From: Antonio V. <ant...@ti...> - 2012-07-07 18:47:41

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

===========================
Announcing PyTables 2.4.0b1
===========================

We are happy to announce PyTables 2.4.0b1.

This is an incremental release which includes many changes to prepare
for future Python 3 support.


What's new
==========

This release includes support for the float16 data type and read-only
support for variable length string attributes.

The handling of HDF5 errors has been improved.  The user will no
longer see HDF5 error stacks dumped to the console.  All HDF5 error
messages are trapped and attached to a proper Python exception.

Now PyTables only supports HDF5 v1.8.4+. All the code has been updated
to the new HDF5 API.  Supporting only HDF5 1.8 series is beneficial
for future development.

As always, a large amount of bugs have been addressed and squashed as
well.

In case you want to know more in detail what has changed in this
version, please refer to:
http://pytables.github.com/release_notes.html

You can download a source package with generated PDF and HTML docs, as
well as binaries for Windows, from:
http://sourceforge.net/projects/pytables/files/pytables/2.4.0b1

For an online version of the manual, visit:
http://pytables.github.com/usersguide/index.html


What it is?
===========

PyTables is a library for managing hierarchical datasets and
designed to efficiently cope with extremely large amounts of data with
support for full 64-bit file addressing.  PyTables runs on top of
the HDF5 library and NumPy package for achieving maximum throughput and
convenient use.  PyTables includes OPSI, a new indexing technology,
allowing to perform data lookups in tables exceeding 10 gigarows
(10**10 rows) in less than a tenth of a second.


Resources
=========

About PyTables: http://www.pytables.org

About the HDF5 library: http://hdfgroup.org/HDF5/

About NumPy: http://numpy.scipy.org/


Acknowledgments
===============

Thanks to many users who provided feature improvements, patches, bug
reports, support and suggestions.  See the ``THANKS`` file in the
distribution package for a (incomplete) list of contributors.  Most
specially, a lot of kudos go to the HDF5 and NumPy (and numarray!)
makers.  Without them, PyTables simply would not exist.


Share your experience
=====================

Let us know of any bugs, suggestions, gripes, kudos, etc. you may have.


- ----

  **Enjoy data!**


- --
The PyTables Team
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk/4hDwACgkQ1JUs2CS3bP7TUwCfcobS3KI7L/6k3Bbbt2VBOz5B
TqAAn0DhrSdtd7XTPOj0RR/mpr2FtseE
=T5iQ
-----END PGP SIGNATURE-----

Re: [Pytables-users] Help in redoing current pytables schema.

From: Anthony S. <sc...@gm...> - 2012-07-06 15:48:31

Ahh thanks for clarifying....
On Jul 6, 2012 2:06 AM, "Francesc Alted" <fa...@gm...> wrote:

>  On 7/5/12 7:59 PM, Anthony Scopatz wrote:
>
> On Thu, Jul 5, 2012 at 12:34 PM, Jacob Bennett <jac...@gm...>wrote:
>
>> Hello Pytables Users,
>>
>>  I am currently having a maximum number of children error within
>> pytables. I am trying to store stock updates within hdf5. My current schema
>> is to have one file represent a trading day, each table represent a
>> particular instrumentID (stock id) and have each record in the table belong
>> to a specific update with a timestamp (where the timestamp could be
>> considered a primary key).
>>
>>  I am currently having all tables be direct descendants of root.
>>
>>  The problem with this is that per day I have the following stats:
>>
>>  #of tables ::= 20000
>> #of Records per table ::= 250000
>>
>>  The problem persists in that 20000 is too many children to be
>> associated with a particular node. Continuing with this schema will consume
>> an exorbitant amount of memory and lead to slower query times.
>>
>>  Is there a way to redesign this schema so that it could work better
>> with pytables? Or is this simply too much data?
>>
>
>  It certainly isn't too much data.  HDF5 scales to petabytes ;)
>
>
>>  Would it help to follow with the current schema and just increase the
>> depth of the tree by taking parts of the instrumentId (instrumentId is an
>> int64) as nodes?
>>
>
>  Yes, this would be one approach that would work.
>
>
> +1
>
>    Basically, nodes in HDF5 only get a fixed amount of storage for
> metadata, including what children they have.  (I believe this number is 64
> kb.  In theory, it is possible to increase this number and recompile hdf5,
> but then files generated in this way would only be compatible with your
> altered version of the library.)  So if a group has so many children that
> storing their names and locations takes up more than 64 kb, you have run
> out of room.  By adding N other subgroups to the hierarchy you increase the
> metadata available to N * 64 kb.
>
>
> No, this is wrong.  The hierarchy metadata is stored on a different place
> than user metadata, and hence it is not affected by the 64 KB limit.  The
> problem is rather that having too many children hanging from a single group
> affects quite negatively to performance (the same happens with regular
> filesystems having directories with too many files).
>
> --
> Francesc Alted
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Help in redoing current pytables schema.

From: Francesc A. <fa...@gm...> - 2012-07-06 07:06:06

On 7/5/12 7:59 PM, Anthony Scopatz wrote:
> On Thu, Jul 5, 2012 at 12:34 PM, Jacob Bennett 
> <jac...@gm... <mailto:jac...@gm...>> wrote:
>
>     Hello Pytables Users,
>
>     I am currently having a maximum number of children error within
>     pytables. I am trying to store stock updates within hdf5. My
>     current schema is to have one file represent a trading day, each
>     table represent a particular instrumentID (stock id) and have each
>     record in the table belong to a specific update with a timestamp
>     (where the timestamp could be considered a primary key).
>
>     I am currently having all tables be direct descendants of root.
>
>     The problem with this is that per day I have the following stats:
>
>     #of tables ::= 20000
>     #of Records per table ::= 250000
>
>     The problem persists in that 20000 is too many children to be
>     associated with a particular node. Continuing with this schema
>     will consume an exorbitant amount of memory and lead to slower
>     query times.
>
>     Is there a way to redesign this schema so that it could work
>     better with pytables? Or is this simply too much data?
>
>
> It certainly isn't too much data.  HDF5 scales to petabytes ;)
>
>     Would it help to follow with the current schema and just increase
>     the depth of the tree by taking parts of the instrumentId
>     (instrumentId is an int64) as nodes?
>
>
> Yes, this would be one approach that would work.

+1

>  Basically, nodes in HDF5 only get a fixed amount of storage for 
> metadata, including what children they have.  (I believe this number 
> is 64 kb.  In theory, it is possible to increase this number and 
> recompile hdf5, but then files generated in this way would only be 
> compatible with your altered version of the library.)  So if a group 
> has so many children that storing their names and locations takes up 
> more than 64 kb, you have run out of room.  By adding N other 
> subgroups to the hierarchy you increase the metadata available to N * 
> 64 kb.

No, this is wrong.  The hierarchy metadata is stored on a different 
place than user metadata, and hence it is not affected by the 64 KB 
limit.  The problem is rather that having too many children hanging from 
a single group affects quite negatively to performance (the same happens 
with regular filesystems having directories with too many files).

-- 
Francesc Alted

Re: [Pytables-users] Help in redoing current pytables schema.

From: Anthony S. <sc...@gm...> - 2012-07-05 18:00:01

On Thu, Jul 5, 2012 at 12:34 PM, Jacob Bennett <jac...@gm...>wrote:

> Hello Pytables Users,
>
> I am currently having a maximum number of children error within pytables.
> I am trying to store stock updates within hdf5. My current schema is to
> have one file represent a trading day, each table represent a particular
> instrumentID (stock id) and have each record in the table belong to a
> specific update with a timestamp (where the timestamp could be considered a
> primary key).
>
> I am currently having all tables be direct descendants of root.
>
> The problem with this is that per day I have the following stats:
>
> #of tables ::= 20000
> #of Records per table ::= 250000
>
> The problem persists in that 20000 is too many children to be associated
> with a particular node. Continuing with this schema will consume
> an exorbitant amount of memory and lead to slower query times.
>
> Is there a way to redesign this schema so that it could work better with
> pytables? Or is this simply too much data?
>

It certainly isn't too much data.  HDF5 scales to petabytes ;)

> Would it help to follow with the current schema and just increase the
> depth of the tree by taking parts of the instrumentId (instrumentId is an
> int64) as nodes?
>

Yes, this would be one approach that would work.  Basically, nodes in HDF5
only get a fixed amount of storage for metadata, including what children
they have.  (I believe this number is 64 kb.  In theory, it is possible to
increase this number and recompile hdf5, but then files generated in this
way would only be compatible with your altered version of the library.)  So
if a group has so many children that storing their names and locations
takes up more than 64 kb, you have run out of room.  By adding N other
subgroups to the hierarchy you increase the metadata available to N * 64
kb.

This is probably the easiest thing to do given your current setup.
 Anything else would require you changing the table description.  There are
probably some natural groupings within your instrumentIDs (eg all
commodities go in one group, for example) that you could use.

Be Well
Anthony

>
> Thanks,
> Jacob
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] Help in redoing current pytables schema.

From: Jacob B. <jac...@gm...> - 2012-07-05 17:34:22

Hello Pytables Users,

I am currently having a maximum number of children error within pytables. I
am trying to store stock updates within hdf5. My current schema is to have
one file represent a trading day, each table represent a particular
instrumentID (stock id) and have each record in the table belong to a
specific update with a timestamp (where the timestamp could be considered a
primary key).

I am currently having all tables be direct descendants of root.

The problem with this is that per day I have the following stats:

#of tables ::= 20000
#of Records per table ::= 250000

The problem persists in that 20000 is too many children to be associated
with a particular node. Continuing with this schema will consume
an exorbitant amount of memory and lead to slower query times.

Is there a way to redesign this schema so that it could work better with
pytables? Or is this simply too much data? Would it help to follow with the
current schema and just increase the depth of the tree by taking parts of
the instrumentId (instrumentId is an int64) as nodes?

Thanks,
Jacob

-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Searching for duplicate keys...

From: Anthony S. <sc...@gm...> - 2012-07-03 05:59:00

Why not read in just the date and ID columns to start with, then do a
numpy.unique() or python set() on theses, then query based on the unique
values?  Seems like it might be faster....

Be Well
Anthony

On Mon, Jul 2, 2012 at 5:16 PM, Aquil H. Abdullah
<aqu...@gm...>wrote:

> Hello All,
>
> I have a table that is indexed by two keys, and I would like to search for
> duplicate keys.  So here is my naive slow implementation: (code I posted on
> stackoverflow)
>
> import tables
>
>
> h5f = tables.openFile('filename.h5')
>
>
> tbl = h5f.getNode('/data','data_table') # assumes group data and table data_table
>
>
> counter += 0
>
>
> for row in tbl:
>
>
>     ts = row['date'] # timestamp (ts) or date
>
>
>     uid = row['userID']
>
>
>     query = '(date == %d) & (userID == "%s")' % (ts, uid)
>
>
>     result = tbl.readWhere(query)
>
>
>     if len(result) > 1:
>
>
>         # Do something here
>
>
>         pass
>
>
>     counter += 1
>
>
>     if counter % 1000 == 0: print '%d rows processed'
>
>
>
> --
> Aquil H. Abdullah
> aqu...@gm...
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] How would you handle file openings for reading by an API?

From: Anthony S. <sc...@gm...> - 2012-07-03 00:43:14

No worries ;)

On Mon, Jul 2, 2012 at 5:41 PM, Jacob Bennett <jac...@gm...>wrote:

> Cool, this seems pretty straightforward. Thanks again Anthony!
>
> -Jacob
>
>
> On Mon, Jul 2, 2012 at 1:10 PM, Anthony Scopatz <sc...@gm...> wrote:
>
>> Hello Jacob,
>>
>> It seems like you have answered your own question ;).  The thing is that
>> the locking doesn't have to be all that hard.  You can simply check if the
>> file is already in the open files cache (from previous thread).  If it
>> isn't, the thread will open the file.  Or you don't even have to do the
>> check yourself because tables.openFile() will do this for you.  If the file
>> is already there, openFile() will just return you a reference to that File
>> instance, which is what you wanted anyways.  This takes care of opening.
>>
>> On the other side of things, don't allow any thread to close a file.
>>  Simply close all files in the cache when your code is about to exit.
>>  Keeping the file handles open and available for future reading isn't *
>> that* expensive.
>>
>> So use openFile() for opening and don't close until the end and this
>> should be thread safe for reading.  Obviously, writing is more difficult.
>>
>> Be Well
>> Anthony
>>
>> On Mon, Jul 2, 2012 at 5:38 AM, Jacob Bennett <jac...@gm...>wrote:
>>
>>> Hello PyTables Users,
>>>
>>> I am developing an API to access the current data stored in my pytables
>>> instance. Note at this point that this is only reading, no writing to the
>>> files. The big question on my mind at this point is how am I supposed to
>>> handle the opening and closing of files on read requests that are
>>> multithreaded? PyTables supports multithreading for read only; however, I
>>> don't know how to handle two threads opening the same file or one thread
>>> closing a file while the other is still reading it, besides putting a lock
>>> on it thus disabling the multithreaded operations.
>>>
>>> Thanks,
>>> Jacob
>>>
>>> --
>>> Jacob Bennett
>>> Massachusetts Institute of Technology
>>> Department of Electrical Engineering and Computer Science
>>> Class of 2014| ben...@mi...
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] How would you handle file openings for reading by an API?

From: Jacob B. <jac...@gm...> - 2012-07-03 00:41:32

Cool, this seems pretty straightforward. Thanks again Anthony!

-Jacob

On Mon, Jul 2, 2012 at 1:10 PM, Anthony Scopatz <sc...@gm...> wrote:

> Hello Jacob,
>
> It seems like you have answered your own question ;).  The thing is that
> the locking doesn't have to be all that hard.  You can simply check if the
> file is already in the open files cache (from previous thread).  If it
> isn't, the thread will open the file.  Or you don't even have to do the
> check yourself because tables.openFile() will do this for you.  If the file
> is already there, openFile() will just return you a reference to that File
> instance, which is what you wanted anyways.  This takes care of opening.
>
> On the other side of things, don't allow any thread to close a file.
>  Simply close all files in the cache when your code is about to exit.
>  Keeping the file handles open and available for future reading isn't *
> that* expensive.
>
> So use openFile() for opening and don't close until the end and this
> should be thread safe for reading.  Obviously, writing is more difficult.
>
> Be Well
> Anthony
>
> On Mon, Jul 2, 2012 at 5:38 AM, Jacob Bennett <jac...@gm...>wrote:
>
>> Hello PyTables Users,
>>
>> I am developing an API to access the current data stored in my pytables
>> instance. Note at this point that this is only reading, no writing to the
>> files. The big question on my mind at this point is how am I supposed to
>> handle the opening and closing of files on read requests that are
>> multithreaded? PyTables supports multithreading for read only; however, I
>> don't know how to handle two threads opening the same file or one thread
>> closing a file while the other is still reading it, besides putting a lock
>> on it thus disabling the multithreaded operations.
>>
>> Thanks,
>> Jacob
>>
>> --
>> Jacob Bennett
>> Massachusetts Institute of Technology
>> Department of Electrical Engineering and Computer Science
>> Class of 2014| ben...@mi...
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>


-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

[Pytables-users] Searching for duplicate keys...

From: Aquil H. A. <aqu...@gm...> - 2012-07-03 00:16:20

Hello All,

I have a table that is indexed by two keys, and I would like to search for
duplicate keys.  So here is my naive slow implementation: (code I posted on
stackoverflow)

import tables
h5f = tables.openFile('filename.h5')
tbl = h5f.getNode('/data','data_table') # assumes group data and table
data_table
counter += 0

for row in tbl:
    ts = row['date'] # timestamp (ts) or date
    uid = row['userID']
    query = '(date == %d) & (userID == "%s")' % (ts, uid)
    result = tbl.readWhere(query)
    if len(result) > 1:
        # Do something here
        pass
    counter += 1
    if counter % 1000 == 0: print '%d rows processed'



-- 
Aquil H. Abdullah
aqu...@gm...

Re: [Pytables-users] How would you handle file openings for reading by an API?

From: Anthony S. <sc...@gm...> - 2012-07-02 18:11:12

Hello Jacob,

It seems like you have answered your own question ;).  The thing is that
the locking doesn't have to be all that hard.  You can simply check if the
file is already in the open files cache (from previous thread).  If it
isn't, the thread will open the file.  Or you don't even have to do the
check yourself because tables.openFile() will do this for you.  If the file
is already there, openFile() will just return you a reference to that File
instance, which is what you wanted anyways.  This takes care of opening.

On the other side of things, don't allow any thread to close a file.
 Simply close all files in the cache when your code is about to exit.
 Keeping the file handles open and available for future reading isn't
*that*expensive.

So use openFile() for opening and don't close until the end and this should
be thread safe for reading.  Obviously, writing is more difficult.

Be Well
Anthony

On Mon, Jul 2, 2012 at 5:38 AM, Jacob Bennett <jac...@gm...>wrote:

> Hello PyTables Users,
>
> I am developing an API to access the current data stored in my pytables
> instance. Note at this point that this is only reading, no writing to the
> files. The big question on my mind at this point is how am I supposed to
> handle the opening and closing of files on read requests that are
> multithreaded? PyTables supports multithreading for read only; however, I
> don't know how to handle two threads opening the same file or one thread
> closing a file while the other is still reading it, besides putting a lock
> on it thus disabling the multithreaded operations.
>
> Thanks,
> Jacob
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] How would you handle file openings for reading by an API?

From: Jacob B. <jac...@gm...> - 2012-07-02 12:38:16

Hello PyTables Users,

I am developing an API to access the current data stored in my pytables
instance. Note at this point that this is only reading, no writing to the
files. The big question on my mind at this point is how am I supposed to
handle the opening and closing of files on read requests that are
multithreaded? PyTables supports multithreading for read only; however, I
don't know how to handle two threads opening the same file or one thread
closing a file while the other is still reading it, besides putting a lock
on it thus disabling the multithreaded operations.

Thanks,
Jacob

-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Having Memory Allocation Failure for Raw Data Chunk

From: Anthony S. <sc...@gm...> - 2012-06-29 19:23:42

Hi Jacob,

Hmmmm, this shouldn't be happening.  The data isn't large enough.  While
there *may* be a memory leak (use valgrind to find it) there it is likely
that you are just failing to deference something(s).   Is there any place
in your calculation where you might accidentally keep data around?  Note
that PyTables/HDF5 cachesa bunch of stuff behind the scene.  What does `ps
ux` say while you are running the code or right before the fail?  Does the
code break with data that is 1/10th the size?

Be Well
Anthony

On Fri, Jun 29, 2012 at 1:48 PM, Jacob Bennett <jac...@gm...>wrote:

> Hello PyTables Users,
>
> My current implementation works pretty well now and has the write speeds
> that I am looking for; however, around 20 minutes of execution and of a
> file size of around 127MB with level 3 blosc compression I seem to get
> memory allocation errors. Here is my trace that I get, if anybody can shed
> light on this, that will be excellent. Does my implementation hog all of my
> memory? Is there a memory leak?
>
> HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 0:
>   #000: ..\..\hdf5-1.8.8\src\H5Dio.c line 266 in H5Dwrite(): can't write
> data
>     major: Dataset
>     minor: Write failed
>   #001: ..\..\hdf5-1.8.8\src\H5Dio.c line 671 in H5D_write(): can't write
> data
>     major: Dataset
>     minor: Write failed
>   #002: ..\..\hdf5-1.8.8\src\H5Dchunk.c line 1861 in H5D_chunk_write():
> unable t
> o read raw data chunk
>     major: Low-level I/O
>     minor: Read failed
>   #003: ..\..\hdf5-1.8.8\src\H5Dchunk.c line 2776 in H5D_chunk_lock():
> memory al
> location failed for raw data chunk
>     major: Resource unavailable
>     minor: No space available for allocation
> Exception in thread bookthread:
> Traceback (most recent call last):
>   File "C:\Python27bit\lib\threading.py", line 551, in __bootstrap_inner
>     self.run()
>   File "../PyTablesInterface\Acceptor.py", line 21, in run
>     BookDataWrapper.acceptDict()
>   File "../PyTablesInterface\BookDataWrapper.py", line 50, in acceptDict
>     tableD.append(dataArray)
>   File "C:\Python27bit\lib\site-packages\tables\table.py", line 2081, in
> append
>     self._saveBufferedRows(wbufRA, lenrows)
>   File "C:\Python27bit\lib\site-packages\tables\table.py", line 2016, in
> _saveBu
> fferedRows
>     self._append_records(lenrows)
>   File "tableExtension.pyx", line 454, in
> tables.tableExtension.Table._append_re
> cords (tables\tableExtension.c:4623)
> HDF5ExtError: Problems appending the records.
> #####################################
> ######THIS IS A LATER ERROR#########
> #####################################
> Exception in thread CME_10_B:
> Traceback (most recent call last):
>   File "C:\Python27bit\lib\threading.py", line 551, in __bootstrap_inner
>     self.run()
>   File
> "C:\Users\jacob.bennett\development\MarketDataReader\IO\__init__.py", lin
> e 19, in run
>     self.socket.rec()
>   File
> "C:\Users\jacob.bennett\development\MarketDataReader\IO\MarketSocket.py",
>  line 33, in rec
>     Parser.parse(self.sock.recv(1024*16), self.exchange)
>   File "../Parser\Parser.py", line 39, in parse
>     SendInBatch.acceptBookData(instrumentId, timestamp, 0, i, bidPrice,
> bidQuant
> , bidOrders, exchange, source)
>   File "../PyTablesInterface\SendInBatch.py", line 28, in acceptBookData
>     maindict[(instrumentId, yearmonthday)] = [(timestamp1, timestamp2,
> side, lev
> el, price, quant, orders, source, 1)]
> MemoryError
>
> Thanks,
> Jacob Bennett
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] Having Memory Allocation Failure for Raw Data Chunk

From: Jacob B. <jac...@gm...> - 2012-06-29 18:48:52

Hello PyTables Users,

My current implementation works pretty well now and has the write speeds
that I am looking for; however, around 20 minutes of execution and of a
file size of around 127MB with level 3 blosc compression I seem to get
memory allocation errors. Here is my trace that I get, if anybody can shed
light on this, that will be excellent. Does my implementation hog all of my
memory? Is there a memory leak?

HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 0:
  #000: ..\..\hdf5-1.8.8\src\H5Dio.c line 266 in H5Dwrite(): can't write
data
    major: Dataset
    minor: Write failed
  #001: ..\..\hdf5-1.8.8\src\H5Dio.c line 671 in H5D_write(): can't write
data
    major: Dataset
    minor: Write failed
  #002: ..\..\hdf5-1.8.8\src\H5Dchunk.c line 1861 in H5D_chunk_write():
unable t
o read raw data chunk
    major: Low-level I/O
    minor: Read failed
  #003: ..\..\hdf5-1.8.8\src\H5Dchunk.c line 2776 in H5D_chunk_lock():
memory al
location failed for raw data chunk
    major: Resource unavailable
    minor: No space available for allocation
Exception in thread bookthread:
Traceback (most recent call last):
  File "C:\Python27bit\lib\threading.py", line 551, in __bootstrap_inner
    self.run()
  File "../PyTablesInterface\Acceptor.py", line 21, in run
    BookDataWrapper.acceptDict()
  File "../PyTablesInterface\BookDataWrapper.py", line 50, in acceptDict
    tableD.append(dataArray)
  File "C:\Python27bit\lib\site-packages\tables\table.py", line 2081, in
append
    self._saveBufferedRows(wbufRA, lenrows)
  File "C:\Python27bit\lib\site-packages\tables\table.py", line 2016, in
_saveBu
fferedRows
    self._append_records(lenrows)
  File "tableExtension.pyx", line 454, in
tables.tableExtension.Table._append_re
cords (tables\tableExtension.c:4623)
HDF5ExtError: Problems appending the records.
#####################################
######THIS IS A LATER ERROR#########
#####################################
Exception in thread CME_10_B:
Traceback (most recent call last):
  File "C:\Python27bit\lib\threading.py", line 551, in __bootstrap_inner
    self.run()
  File
"C:\Users\jacob.bennett\development\MarketDataReader\IO\__init__.py", lin
e 19, in run
    self.socket.rec()
  File
"C:\Users\jacob.bennett\development\MarketDataReader\IO\MarketSocket.py",
 line 33, in rec
    Parser.parse(self.sock.recv(1024*16), self.exchange)
  File "../Parser\Parser.py", line 39, in parse
    SendInBatch.acceptBookData(instrumentId, timestamp, 0, i, bidPrice,
bidQuant
, bidOrders, exchange, source)
  File "../PyTablesInterface\SendInBatch.py", line 28, in acceptBookData
    maindict[(instrumentId, yearmonthday)] = [(timestamp1, timestamp2,
side, lev
el, price, quant, orders, source, 1)]
MemoryError

Thanks,
Jacob Bennett


-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Problems with Converting Python Int to C Long

From: Anthony S. <sc...@gm...> - 2012-06-29 01:13:16

Thanks Jacob,

This definitely sounds like a bug. if you come up with a self contained
example, please report it at https://github.com/PyTables/PyTables/issues

Thanks!
Anthony

On Thu, Jun 28, 2012 at 7:40 PM, Jacob Bennett <jac...@gm...>wrote:

> It's strange really. It seems like anything int 64 in python (greater than
> 4billion) fails to convert and throws this message; however, for numbers
> that can be represented by int 32 can convert fine. Btw, this is for a
> field that is defined as UInt64 in pytables and only fails if I do
> Table.append(row). If I do insertion based upon table.row, then it works
> fine.
>
> I will look at this issue more later tonight, and will report my findings.
>
> Thanks,
> Jacob
>
>
> On Thu, Jun 28, 2012 at 5:37 PM, Anthony Scopatz <sc...@gm...>wrote:
>
>> Hello Again Jacob,
>>
>> Hmm are they of Python type long?  Also, what exactly is the number that
>> is failing?
>>
>> Be Well
>> Anthony
>>
>> On Thu, Jun 28, 2012 at 4:18 PM, Jacob Bennett <jac...@gm...
>> > wrote:
>>
>>> Hello PyTables Users,
>>>
>>> I have a concern with a very strange error that references that my
>>> python ints cannot be converted to C longs when trying to run
>>> Table.append(rows). My python integers are definitely not big, at most they
>>> would probably be around 3 billion in size, which shouldn't be any problem
>>> for conversion to C long.
>>>
>>> This is the error that I am receiving...
>>>
>>> Exception in thread bookthread:
>>> Traceback (most recent call last):
>>>   File "C:\Python27\lib\threading.py", line 551, in __bootstrap_inner
>>>     self.run()
>>>   File
>>> "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\Acceptor.py",
>>> line 21, in run
>>>     BookDataWrapper.acceptDict()
>>>   File
>>> "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\BookDataWrapper.py",
>>> line 49, in acceptDict
>>>     tableD.append(dataArray)
>>>   File "C:\Python27\lib\site-packages\tables\table.py", line 2076, in
>>> append
>>>     "rows parameter cannot be converted into a recarray object compliant
>>> with table '%s'. The error was: <%s>" % (str(self), exc)
>>> ValueError: rows parameter cannot be converted into a recarray object
>>> compliant with table '/t301491615959191971 (Table(0,), shuffle, blosc(3))
>>> 'Instrument''. The error was: <Python int too large to convert to C long>
>>>
>>> Thanks,
>>> Jacob
>>>
>>> --
>>> Jacob Bennett
>>> Massachusetts Institute of Technology
>>> Department of Electrical Engineering and Computer Science
>>> Class of 2014| ben...@mi...
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Problems with Converting Python Int to C Long

From: Jacob B. <jac...@gm...> - 2012-06-29 00:40:42

It's strange really. It seems like anything int 64 in python (greater than
4billion) fails to convert and throws this message; however, for numbers
that can be represented by int 32 can convert fine. Btw, this is for a
field that is defined as UInt64 in pytables and only fails if I do
Table.append(row). If I do insertion based upon table.row, then it works
fine.

I will look at this issue more later tonight, and will report my findings.

Thanks,
Jacob

On Thu, Jun 28, 2012 at 5:37 PM, Anthony Scopatz <sc...@gm...> wrote:

> Hello Again Jacob,
>
> Hmm are they of Python type long?  Also, what exactly is the number that
> is failing?
>
> Be Well
> Anthony
>
> On Thu, Jun 28, 2012 at 4:18 PM, Jacob Bennett <jac...@gm...>wrote:
>
>> Hello PyTables Users,
>>
>> I have a concern with a very strange error that references that my python
>> ints cannot be converted to C longs when trying to run Table.append(rows).
>> My python integers are definitely not big, at most they would probably be
>> around 3 billion in size, which shouldn't be any problem for conversion to
>> C long.
>>
>> This is the error that I am receiving...
>>
>> Exception in thread bookthread:
>> Traceback (most recent call last):
>>   File "C:\Python27\lib\threading.py", line 551, in __bootstrap_inner
>>     self.run()
>>   File
>> "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\Acceptor.py",
>> line 21, in run
>>     BookDataWrapper.acceptDict()
>>   File
>> "C:\Users\jacob.bennett\development\MarketDataReader\PyTablesInterface\BookDataWrapper.py",
>> line 49, in acceptDict
>>     tableD.append(dataArray)
>>   File "C:\Python27\lib\site-packages\tables\table.py", line 2076, in
>> append
>>     "rows parameter cannot be converted into a recarray object compliant
>> with table '%s'. The error was: <%s>" % (str(self), exc)
>> ValueError: rows parameter cannot be converted into a recarray object
>> compliant with table '/t301491615959191971 (Table(0,), shuffle, blosc(3))
>> 'Instrument''. The error was: <Python int too large to convert to C long>
>>
>> Thanks,
>> Jacob
>>
>> --
>> Jacob Bennett
>> Massachusetts Institute of Technology
>> Department of Electrical Engineering and Computer Science
>> Class of 2014| ben...@mi...
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>


-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...