pytables-users Mailing List for PyTables - Hierarchical datasets (Page 22)

pytables-users — PyTables users discussion list

You can subscribe to this list here.

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 20 21 22 23 24 .. 165 > >> (Page 22 of 165)

Re: [Pytables-users] How to get the name of a current table instance?

From: Anthony S. <sc...@gm...> - 2012-07-20 16:24:52

Hey Jacob,

There is always Node._v_name and Node._v_pathname for all nodes in the tree
[1].

Be Well
Anthony

http://pytables.github.com/usersguide/libref.html?highlight=path#tables.Node._v_name

On Fri, Jul 20, 2012 at 11:08 AM, Jacob Bennett
<jac...@gm...>wrote:

> Hello PyTables Gurus,
>
> I am trying to look up the name of a particular table when I am iterating
> through all of the tables in my file; however, there doesn't seem to be a
> name attribute accessible or a public method that will return the name for
> me (this is all according to the current documentation) There must be this
> attribute, maybe its table.name?
>
> Thanks,
> Jacob
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] How to get the name of a current table instance?

From: Jacob B. <jac...@gm...> - 2012-07-20 16:08:53

Hello PyTables Gurus,

I am trying to look up the name of a particular table when I am iterating
through all of the tables in my file; however, there doesn't seem to be a
name attribute accessible or a public method that will return the name for
me (this is all according to the current documentation) There must be this
attribute, maybe its table.name?

Thanks,
Jacob

-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Another Incredibly Quick Question

From: Antonio V. <ant...@ti...> - 2012-07-18 20:45:49

Hi Jacob,

Il 18/07/2012 21:01, Jacob Bennett ha scritto:
> Hello Gurus,
> 
> I cannot seem to find how to pass in custom parameters such as
> NODE_CACHE_SLOTS and METADATA_CACHE_SIZE to the openfile function so that I
> don't have to set the default params when I move to another machine. I know
> you can do this, but I can't seem to remember where I saw it.
> 
> Thanks,
> Jake

keyword parameters?


cheers

-- 
Antonio Valentino

[Pytables-users] Another Incredibly Quick Question

From: Jacob B. <jac...@gm...> - 2012-07-18 19:01:43

Hello Gurus,

I cannot seem to find how to pass in custom parameters such as
NODE_CACHE_SLOTS and METADATA_CACHE_SIZE to the openfile function so that I
don't have to set the default params when I move to another machine. I know
you can do this, but I can't seem to remember where I saw it.

Thanks,
Jake

-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Ümit S. <uem...@gm...> - 2012-07-18 14:48:08

Actually it did complain that it is over a certain limit and it also
suggested a flag with which I can turn off the warning. But performance
seemed fine. So if I randomly accessed any of the 30.000 groups I got the
group handle in a fraction of a second
Am 18.07.2012 16:40 schrieb "Francesc Alted" <fa...@py...>:

> On 7/18/12 4:11 PM, Ümit Seren wrote:
> > Actually I had 30.000 groups in a parent group.
> > Each of the 30.000 groups had maybe 3 datasets.
> > So to be honest I never had 30.000 datasets in a single group.
> > I guess you will probably have to disable the LRU cache in that case
> right?
>
> Okay.  So I'd say that having 30.000 entries (no matter if they are
> groups or datasets) would be a bad performance practice in general, but
> maybe it is a difference between groups and datasets (i.e. it affects
> more to datasets than groups)?.  Just curious, PyTables did not complain
> when you created 30.000 groups in the same group?
>
> Regarding the LRU cache, no, I don't think this is the problem, but
> rather how HDF5 implements the 'inodes' (or whatever they call that).
> This is a big issue in general (inodes in filesystems have similar
> problems too), and what hurts performance in this case.
>
> --
> Francesc Alted
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Francesc A. <fa...@py...> - 2012-07-18 14:39:53

On 7/18/12 4:11 PM, Ümit Seren wrote:
> Actually I had 30.000 groups in a parent group.
> Each of the 30.000 groups had maybe 3 datasets.
> So to be honest I never had 30.000 datasets in a single group.
> I guess you will probably have to disable the LRU cache in that case right?

Okay.  So I'd say that having 30.000 entries (no matter if they are 
groups or datasets) would be a bad performance practice in general, but 
maybe it is a difference between groups and datasets (i.e. it affects 
more to datasets than groups)?.  Just curious, PyTables did not complain 
when you created 30.000 groups in the same group?

Regarding the LRU cache, no, I don't think this is the problem, but 
rather how HDF5 implements the 'inodes' (or whatever they call that).  
This is a big issue in general (inodes in filesystems have similar 
problems too), and what hurts performance in this case.

-- 
Francesc Alted

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Ümit S. <uem...@gm...> - 2012-07-18 14:11:50

Actually I had 30.000 groups in a parent group.
Each of the 30.000 groups had maybe 3 datasets.
So to be honest I never had 30.000 datasets in a single group.
I guess you will probably have to disable the LRU cache in that case right?



On Wed, Jul 18, 2012 at 3:55 PM, Francesc Alted <fa...@py...> wrote:
> On 7/18/12 2:07 PM, Ümit Seren wrote:
>> I actually had 30.000 groups attached to the data group. But I guess
>> it doesn't really matter whether it is a table or a group. They both
>> are nodes.
>
> 30.000 datasets attached to the same group?  I'm interested in knowing
> if you detected performance problems because of this.  My experience is
> that it is better to split the datasets in different groups, so that you
> don't exceed, say, 1000 per each group.  But I might be wrong...
>
> --
> Francesc Alted
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Francesc A. <fa...@py...> - 2012-07-18 13:55:18

On 7/18/12 2:07 PM, Ümit Seren wrote:
> I actually had 30.000 groups attached to the data group. But I guess
> it doesn't really matter whether it is a table or a group. They both
> are nodes.

30.000 datasets attached to the same group?  I'm interested in knowing 
if you detected performance problems because of this.  My experience is 
that it is better to split the datasets in different groups, so that you 
don't exceed, say, 1000 per each group.  But I might be wrong...

-- 
Francesc Alted

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Jacob B. <jac...@gm...> - 2012-07-18 12:10:34

Cool, thanks again for your help!

-Jake

On Wed, Jul 18, 2012 at 7:07 AM, Ümit Seren <uem...@gm...> wrote:

> I actually had 30.000 groups attached to the data group. But I guess
> it doesn't really matter whether it is a table or a group. They both
> are nodes.
>
>
> On Wed, Jul 18, 2012 at 2:04 PM, Jacob Bennett
> <jac...@gm...> wrote:
> > Good to hear, were you able to get away with having 30,000 datasets
> directly
> > linked to a similar node (in this case, data)? I seem to have a problem
> > putting that many nodes from one root.
> >
> > -Jacob
> >
> >
> > On Wed, Jul 18, 2012 at 6:54 AM, Ümit Seren <uem...@gm...>
> wrote:
> >>
> >> On Wed, Jul 18, 2012 at 1:32 PM, Jacob Bennett
> >> <jac...@gm...> wrote:
> >> > Sounds awesome, thanks for the help, I also have two more concerns.
> >> >
> >> > #1 - I will never concurrently write, I only have to worry about one
> >> > write
> >> > with many reads, will the hdf5 metadata for a tree-like structure be
> >> > able to
> >> > hold up in this scenario?
> >>
> >> To be honest I haven't really tried the concurrent read and single
> >> write use case.
> >> In my case I had a cherrypy python web-server (which uses multiple
> >> processes to handle requests) and usually I write from one request and
> >> reading is done from the same or another. But I don't think I ever had
> >> the use case where I read and wrote at the same time.
> >> However I had to keep the files open because of the way PyTables
> >> handles files (it cashes them as singleton object without a lock).
> >> For example if you close the file after you finished writing and at
> >> the same time you are reading from another process it will cause an
> >> exception in the read thread/process because it loses the file handle.
> >> So you probably have to take care of this yourself in your code.
> >>
> >>
> >> > #2 - When you have around 30,000 tables in your hdf5 file, you do not
> >> > want
> >> > to have every node directly linked to root (plus I don't think hdf5
> can
> >> > support that); however, I have no other natural grouping besides this,
> >> > could
> >> > this be a concern also.
> >>
> >>
> >> Well in my case my datasets consisted not only of one table but also
> >> attional data (CArray, etc).
> >> So I naturally created groups for each datasets and stored
> >> meta-information as attributes on the group. These groups could
> >> contain sometimes additional groups and the actual data in form of
> >> tables and CArrays. It looked something like this:
> >>
> >>  - root
> >>     - data
> >>         - dataset1
> >>             - table
> >>             - transformation
> >>                 -table
> >>                 - CArray
> >>         - dataset2
> >>         .
> >>         .
> >>         .
> >>        - dataset30.000
> >>
> >>
> >> > If you could help me out with these two items, I think I will have
> >> > enough
> >> > knowledge under my belt to know what I need to do. Thanks again! ;)
> >> >
> >> >
> >> > On Wed, Jul 18, 2012 at 6:21 AM, Ümit Seren <uem...@gm...>
> >> > wrote:
> >> >>
> >> >> I think it depends and there are different ways to do it.
> >> >> But concurrent writes to one HDF5 file is not really supported (not
> >> >> even by the underlying HDF5 library unless you use the MPI version).
> >> >> So in case you want to write from different threads/processes you
> >> >> probably have to use separate hdf5 files.
> >> >> However writing from one process and reading from another is not much
> >> >> of an issue.
> >> >>
> >> >> Having everything in one hdf5 file has it's advantages as well as
> >> >> putting everything in separate hdf5 files.
> >> >> Filesystems can usually cope with one huge file much better than will
> >> >> millions of small files (copying, listing, etc).
> >> >> Of course if you have the datasets in separate hdf5 files it's easier
> >> >> to copy/move just single datasets compared to having everything in
> one
> >> >> hdf5 file  (tough that's also possible using ptrepack).
> >> >>
> >> >> You could also create one hdf5 file for the meta information and
> >> >> create separate hdf5 files for each dataset. Then you can use
> >> >> hardlinks to connect the hdf5 file containing the meta-information to
> >> >> the hdf5 files for the datasets.
> >> >>
> >> >> I usually tend to put everything in one hdf5 file.
> >> >>
> >> >> On Wed, Jul 18, 2012 at 12:49 PM, Jacob Bennett
> >> >> <jac...@gm...> wrote:
> >> >> > I really like this way about going about it; however, would it be
> >> >> > better
> >> >> > to
> >> >> > use the built in hierarchy for separation of the tables or to write
> >> >> > to
> >> >> > separate hdf5 files? When I am currently experimenting with
> >> >> > concurrent
> >> >> > read/write operations to a shared hdf5 file w/o hierarchy, I notice
> >> >> > that
> >> >> > the
> >> >> > only errors that I get are occasional read errors (which isn't much
> >> >> > of a
> >> >> > problem for me), so I am thinking. Could there be a way to reduce
> the
> >> >> > metadata within an hdf5 and at the same time, use a multi-tabled
> >> >> > approach to
> >> >> > solve my problem?
> >> >> >
> >> >> > Thanks,
> >> >> > Jacob
> >> >> >
> >> >> >
> >> >> > On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uem...@gm...
> >
> >> >> > wrote:
> >> >> >>
> >> >> >> Just to add what Anthony said:
> >> >> >> In the end it also depends how unrelated your data is and how you
> >> >> >> want
> >> >> >> to access it. If the access scenaria is that you usually only
> search
> >> >> >> or select within a specific dataset then splitting up the datasets
> >> >> >> and
> >> >> >> putting them into separate tables is the way to go. In RBDMS terms
> >> >> >> this is btw called sharding.
> >> >> >> I have such a use case where I do have around 30000 datasets (each
> >> >> >> of
> >> >> >> them with around 5 million rows). I am only interested in one
> >> >> >> dataset
> >> >> >> at a time. So I created 30.000 tables. It works really good.
> >> >> >> And in case you want to access the data across the datasets (for
> >> >> >> aggregating or calculating averages) you can take a MapReduce
> >> >> >> approach
> >> >> >> which should work very well with this approach.
> >> >> >>
> >> >> >>
> >> >> >> On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett
> >> >> >> <jac...@gm...> wrote:
> >> >> >> > Thanks for the input Anthony!
> >> >> >> >
> >> >> >> > -Jake
> >> >> >> >
> >> >> >> >
> >> >> >> > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz
> >> >> >> > <sc...@gm...>
> >> >> >> > wrote:
> >> >> >> >>
> >> >> >> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett
> >> >> >> >> <jac...@gm...>
> >> >> >> >> wrote:
> >> >> >> >>>
> >> >> >> >>> Hello PyTables Users & Contributors,
> >> >> >> >>>
> >> >> >> >>> Just a quick question, let's say that I have certain
> identifiers
> >> >> >> >>> that
> >> >> >> >>> link to a set of data. Would it generally be faster for lookup
> >> >> >> >>> to
> >> >> >> >>> have
> >> >> >> >>> each
> >> >> >> >>> set a data as a separate table with an id as the tables name
> or
> >> >> >> >>> to
> >> >> >> >>> add
> >> >> >> >>> this
> >> >> >> >>> id as another column to a universal table of data and then let
> >> >> >> >>> the
> >> >> >> >>> in-kernel
> >> >> >> >>> search query data only with a specific id?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> I think that in general it is faster to have more tables with
> ids
> >> >> >> >> as
> >> >> >> >> names.  For very small data, searching through a single larger
> >> >> >> >> table
> >> >> >> >> might
> >> >> >> >> be quicker than node access...but even then I doubt it.
> >> >> >> >>
> >> >> >> >>>
> >> >> >> >>> I hope you can understand my question would 1,000 tables of
> >> >> >> >>> 100,000
> >> >> >> >>> records each be better for searching than 1 table with 100
> >> >> >> >>> million
> >> >> >> >>> records
> >> >> >> >>> and one extra id column?
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> For these data sizes more tables is probably faster.
> >> >> >> >>
> >> >> >> >> (It should also be noted that in the more tables case, that
> data
> >> >> >> >> is
> >> >> >> >> actually smaller, because you can eliminate the id column.)
> >> >> >> >>
> >> >> >> >> Be Well
> >> >> >> >> Anthony
> >> >> >> >>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>> Thanks,
> >> >> >> >>> Jacob Bennett
> >> >> >> >>>
> >> >> >> >>> --
> >> >> >> >>> Jacob Bennett
> >> >> >> >>> Massachusetts Institute of Technology
> >> >> >> >>> Department of Electrical Engineering and Computer Science
> >> >> >> >>> Class of 2014| ben...@mi...
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> >> >> >> >>>
> ------------------------------------------------------------------------------
> >> >> >> >>> Live Security Virtual Conference
> >> >> >> >>> Exclusive live event will cover all the ways today's security
> >> >> >> >>> and
> >> >> >> >>> threat landscape has changed and how IT managers can respond.
> >> >> >> >>> Discussions
> >> >> >> >>> will include endpoint security, mobile security and the latest
> >> >> >> >>> in
> >> >> >> >>> malware
> >> >> >> >>> threats.
> >> >> >> >>> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> >> >>> _______________________________________________
> >> >> >> >>> Pytables-users mailing list
> >> >> >> >>> Pyt...@li...
> >> >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> ------------------------------------------------------------------------------
> >> >> >> >> Live Security Virtual Conference
> >> >> >> >> Exclusive live event will cover all the ways today's security
> and
> >> >> >> >> threat landscape has changed and how IT managers can respond.
> >> >> >> >> Discussions
> >> >> >> >> will include endpoint security, mobile security and the latest
> in
> >> >> >> >> malware
> >> >> >> >> threats.
> >> >> >> >> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> >> >> _______________________________________________
> >> >> >> >> Pytables-users mailing list
> >> >> >> >> Pyt...@li...
> >> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Jacob Bennett
> >> >> >> > Massachusetts Institute of Technology
> >> >> >> > Department of Electrical Engineering and Computer Science
> >> >> >> > Class of 2014| ben...@mi...
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >
> ------------------------------------------------------------------------------
> >> >> >> > Live Security Virtual Conference
> >> >> >> > Exclusive live event will cover all the ways today's security
> and
> >> >> >> > threat landscape has changed and how IT managers can respond.
> >> >> >> > Discussions
> >> >> >> > will include endpoint security, mobile security and the latest
> in
> >> >> >> > malware
> >> >> >> > threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> >> > _______________________________________________
> >> >> >> > Pytables-users mailing list
> >> >> >> > Pyt...@li...
> >> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> ------------------------------------------------------------------------------
> >> >> >> Live Security Virtual Conference
> >> >> >> Exclusive live event will cover all the ways today's security and
> >> >> >> threat landscape has changed and how IT managers can respond.
> >> >> >> Discussions
> >> >> >> will include endpoint security, mobile security and the latest in
> >> >> >> malware
> >> >> >> threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> >> _______________________________________________
> >> >> >> Pytables-users mailing list
> >> >> >> Pyt...@li...
> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Jacob Bennett
> >> >> > Massachusetts Institute of Technology
> >> >> > Department of Electrical Engineering and Computer Science
> >> >> > Class of 2014| ben...@mi...
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> ------------------------------------------------------------------------------
> >> >> > Live Security Virtual Conference
> >> >> > Exclusive live event will cover all the ways today's security and
> >> >> > threat landscape has changed and how IT managers can respond.
> >> >> > Discussions
> >> >> > will include endpoint security, mobile security and the latest in
> >> >> > malware
> >> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> > _______________________________________________
> >> >> > Pytables-users mailing list
> >> >> > Pyt...@li...
> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >>
> ------------------------------------------------------------------------------
> >> >> Live Security Virtual Conference
> >> >> Exclusive live event will cover all the ways today's security and
> >> >> threat landscape has changed and how IT managers can respond.
> >> >> Discussions
> >> >> will include endpoint security, mobile security and the latest in
> >> >> malware
> >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> _______________________________________________
> >> >> Pytables-users mailing list
> >> >> Pyt...@li...
> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Jacob Bennett
> >> > Massachusetts Institute of Technology
> >> > Department of Electrical Engineering and Computer Science
> >> > Class of 2014| ben...@mi...
> >> >
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > Live Security Virtual Conference
> >> > Exclusive live event will cover all the ways today's security and
> >> > threat landscape has changed and how IT managers can respond.
> >> > Discussions
> >> > will include endpoint security, mobile security and the latest in
> >> > malware
> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> > _______________________________________________
> >> > Pytables-users mailing list
> >> > Pyt...@li...
> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Pytables-users mailing list
> >> Pyt...@li...
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> > --
> > Jacob Bennett
> > Massachusetts Institute of Technology
> > Department of Electrical Engineering and Computer Science
> > Class of 2014| ben...@mi...
> >
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>



-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Ümit S. <uem...@gm...> - 2012-07-18 12:07:49

I actually had 30.000 groups attached to the data group. But I guess
it doesn't really matter whether it is a table or a group. They both
are nodes.


On Wed, Jul 18, 2012 at 2:04 PM, Jacob Bennett
<jac...@gm...> wrote:
> Good to hear, were you able to get away with having 30,000 datasets directly
> linked to a similar node (in this case, data)? I seem to have a problem
> putting that many nodes from one root.
>
> -Jacob
>
>
> On Wed, Jul 18, 2012 at 6:54 AM, Ümit Seren <uem...@gm...> wrote:
>>
>> On Wed, Jul 18, 2012 at 1:32 PM, Jacob Bennett
>> <jac...@gm...> wrote:
>> > Sounds awesome, thanks for the help, I also have two more concerns.
>> >
>> > #1 - I will never concurrently write, I only have to worry about one
>> > write
>> > with many reads, will the hdf5 metadata for a tree-like structure be
>> > able to
>> > hold up in this scenario?
>>
>> To be honest I haven't really tried the concurrent read and single
>> write use case.
>> In my case I had a cherrypy python web-server (which uses multiple
>> processes to handle requests) and usually I write from one request and
>> reading is done from the same or another. But I don't think I ever had
>> the use case where I read and wrote at the same time.
>> However I had to keep the files open because of the way PyTables
>> handles files (it cashes them as singleton object without a lock).
>> For example if you close the file after you finished writing and at
>> the same time you are reading from another process it will cause an
>> exception in the read thread/process because it loses the file handle.
>> So you probably have to take care of this yourself in your code.
>>
>>
>> > #2 - When you have around 30,000 tables in your hdf5 file, you do not
>> > want
>> > to have every node directly linked to root (plus I don't think hdf5 can
>> > support that); however, I have no other natural grouping besides this,
>> > could
>> > this be a concern also.
>>
>>
>> Well in my case my datasets consisted not only of one table but also
>> attional data (CArray, etc).
>> So I naturally created groups for each datasets and stored
>> meta-information as attributes on the group. These groups could
>> contain sometimes additional groups and the actual data in form of
>> tables and CArrays. It looked something like this:
>>
>>  - root
>>     - data
>>         - dataset1
>>             - table
>>             - transformation
>>                 -table
>>                 - CArray
>>         - dataset2
>>         .
>>         .
>>         .
>>        - dataset30.000
>>
>>
>> > If you could help me out with these two items, I think I will have
>> > enough
>> > knowledge under my belt to know what I need to do. Thanks again! ;)
>> >
>> >
>> > On Wed, Jul 18, 2012 at 6:21 AM, Ümit Seren <uem...@gm...>
>> > wrote:
>> >>
>> >> I think it depends and there are different ways to do it.
>> >> But concurrent writes to one HDF5 file is not really supported (not
>> >> even by the underlying HDF5 library unless you use the MPI version).
>> >> So in case you want to write from different threads/processes you
>> >> probably have to use separate hdf5 files.
>> >> However writing from one process and reading from another is not much
>> >> of an issue.
>> >>
>> >> Having everything in one hdf5 file has it's advantages as well as
>> >> putting everything in separate hdf5 files.
>> >> Filesystems can usually cope with one huge file much better than will
>> >> millions of small files (copying, listing, etc).
>> >> Of course if you have the datasets in separate hdf5 files it's easier
>> >> to copy/move just single datasets compared to having everything in one
>> >> hdf5 file  (tough that's also possible using ptrepack).
>> >>
>> >> You could also create one hdf5 file for the meta information and
>> >> create separate hdf5 files for each dataset. Then you can use
>> >> hardlinks to connect the hdf5 file containing the meta-information to
>> >> the hdf5 files for the datasets.
>> >>
>> >> I usually tend to put everything in one hdf5 file.
>> >>
>> >> On Wed, Jul 18, 2012 at 12:49 PM, Jacob Bennett
>> >> <jac...@gm...> wrote:
>> >> > I really like this way about going about it; however, would it be
>> >> > better
>> >> > to
>> >> > use the built in hierarchy for separation of the tables or to write
>> >> > to
>> >> > separate hdf5 files? When I am currently experimenting with
>> >> > concurrent
>> >> > read/write operations to a shared hdf5 file w/o hierarchy, I notice
>> >> > that
>> >> > the
>> >> > only errors that I get are occasional read errors (which isn't much
>> >> > of a
>> >> > problem for me), so I am thinking. Could there be a way to reduce the
>> >> > metadata within an hdf5 and at the same time, use a multi-tabled
>> >> > approach to
>> >> > solve my problem?
>> >> >
>> >> > Thanks,
>> >> > Jacob
>> >> >
>> >> >
>> >> > On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uem...@gm...>
>> >> > wrote:
>> >> >>
>> >> >> Just to add what Anthony said:
>> >> >> In the end it also depends how unrelated your data is and how you
>> >> >> want
>> >> >> to access it. If the access scenaria is that you usually only search
>> >> >> or select within a specific dataset then splitting up the datasets
>> >> >> and
>> >> >> putting them into separate tables is the way to go. In RBDMS terms
>> >> >> this is btw called sharding.
>> >> >> I have such a use case where I do have around 30000 datasets (each
>> >> >> of
>> >> >> them with around 5 million rows). I am only interested in one
>> >> >> dataset
>> >> >> at a time. So I created 30.000 tables. It works really good.
>> >> >> And in case you want to access the data across the datasets (for
>> >> >> aggregating or calculating averages) you can take a MapReduce
>> >> >> approach
>> >> >> which should work very well with this approach.
>> >> >>
>> >> >>
>> >> >> On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett
>> >> >> <jac...@gm...> wrote:
>> >> >> > Thanks for the input Anthony!
>> >> >> >
>> >> >> > -Jake
>> >> >> >
>> >> >> >
>> >> >> > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz
>> >> >> > <sc...@gm...>
>> >> >> > wrote:
>> >> >> >>
>> >> >> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett
>> >> >> >> <jac...@gm...>
>> >> >> >> wrote:
>> >> >> >>>
>> >> >> >>> Hello PyTables Users & Contributors,
>> >> >> >>>
>> >> >> >>> Just a quick question, let's say that I have certain identifiers
>> >> >> >>> that
>> >> >> >>> link to a set of data. Would it generally be faster for lookup
>> >> >> >>> to
>> >> >> >>> have
>> >> >> >>> each
>> >> >> >>> set a data as a separate table with an id as the tables name or
>> >> >> >>> to
>> >> >> >>> add
>> >> >> >>> this
>> >> >> >>> id as another column to a universal table of data and then let
>> >> >> >>> the
>> >> >> >>> in-kernel
>> >> >> >>> search query data only with a specific id?
>> >> >> >>
>> >> >> >>
>> >> >> >> I think that in general it is faster to have more tables with ids
>> >> >> >> as
>> >> >> >> names.  For very small data, searching through a single larger
>> >> >> >> table
>> >> >> >> might
>> >> >> >> be quicker than node access...but even then I doubt it.
>> >> >> >>
>> >> >> >>>
>> >> >> >>> I hope you can understand my question would 1,000 tables of
>> >> >> >>> 100,000
>> >> >> >>> records each be better for searching than 1 table with 100
>> >> >> >>> million
>> >> >> >>> records
>> >> >> >>> and one extra id column?
>> >> >> >>
>> >> >> >>
>> >> >> >> For these data sizes more tables is probably faster.
>> >> >> >>
>> >> >> >> (It should also be noted that in the more tables case, that data
>> >> >> >> is
>> >> >> >> actually smaller, because you can eliminate the id column.)
>> >> >> >>
>> >> >> >> Be Well
>> >> >> >> Anthony
>> >> >> >>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> Thanks,
>> >> >> >>> Jacob Bennett
>> >> >> >>>
>> >> >> >>> --
>> >> >> >>> Jacob Bennett
>> >> >> >>> Massachusetts Institute of Technology
>> >> >> >>> Department of Electrical Engineering and Computer Science
>> >> >> >>> Class of 2014| ben...@mi...
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>>
>> >> >> >>> ------------------------------------------------------------------------------
>> >> >> >>> Live Security Virtual Conference
>> >> >> >>> Exclusive live event will cover all the ways today's security
>> >> >> >>> and
>> >> >> >>> threat landscape has changed and how IT managers can respond.
>> >> >> >>> Discussions
>> >> >> >>> will include endpoint security, mobile security and the latest
>> >> >> >>> in
>> >> >> >>> malware
>> >> >> >>> threats.
>> >> >> >>> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> >> >>> _______________________________________________
>> >> >> >>> Pytables-users mailing list
>> >> >> >>> Pyt...@li...
>> >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >> >> >>>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >>
>> >> >> >> ------------------------------------------------------------------------------
>> >> >> >> Live Security Virtual Conference
>> >> >> >> Exclusive live event will cover all the ways today's security and
>> >> >> >> threat landscape has changed and how IT managers can respond.
>> >> >> >> Discussions
>> >> >> >> will include endpoint security, mobile security and the latest in
>> >> >> >> malware
>> >> >> >> threats.
>> >> >> >> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> >> >> _______________________________________________
>> >> >> >> Pytables-users mailing list
>> >> >> >> Pyt...@li...
>> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >> >> >>
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > --
>> >> >> > Jacob Bennett
>> >> >> > Massachusetts Institute of Technology
>> >> >> > Department of Electrical Engineering and Computer Science
>> >> >> > Class of 2014| ben...@mi...
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > ------------------------------------------------------------------------------
>> >> >> > Live Security Virtual Conference
>> >> >> > Exclusive live event will cover all the ways today's security and
>> >> >> > threat landscape has changed and how IT managers can respond.
>> >> >> > Discussions
>> >> >> > will include endpoint security, mobile security and the latest in
>> >> >> > malware
>> >> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> >> > _______________________________________________
>> >> >> > Pytables-users mailing list
>> >> >> > Pyt...@li...
>> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >> >> >
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------------
>> >> >> Live Security Virtual Conference
>> >> >> Exclusive live event will cover all the ways today's security and
>> >> >> threat landscape has changed and how IT managers can respond.
>> >> >> Discussions
>> >> >> will include endpoint security, mobile security and the latest in
>> >> >> malware
>> >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> >> _______________________________________________
>> >> >> Pytables-users mailing list
>> >> >> Pyt...@li...
>> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Jacob Bennett
>> >> > Massachusetts Institute of Technology
>> >> > Department of Electrical Engineering and Computer Science
>> >> > Class of 2014| ben...@mi...
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > Live Security Virtual Conference
>> >> > Exclusive live event will cover all the ways today's security and
>> >> > threat landscape has changed and how IT managers can respond.
>> >> > Discussions
>> >> > will include endpoint security, mobile security and the latest in
>> >> > malware
>> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> > _______________________________________________
>> >> > Pytables-users mailing list
>> >> > Pyt...@li...
>> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >> >
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Live Security Virtual Conference
>> >> Exclusive live event will cover all the ways today's security and
>> >> threat landscape has changed and how IT managers can respond.
>> >> Discussions
>> >> will include endpoint security, mobile security and the latest in
>> >> malware
>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> _______________________________________________
>> >> Pytables-users mailing list
>> >> Pyt...@li...
>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >
>> >
>> >
>> >
>> > --
>> > Jacob Bennett
>> > Massachusetts Institute of Technology
>> > Department of Electrical Engineering and Computer Science
>> > Class of 2014| ben...@mi...
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Live Security Virtual Conference
>> > Exclusive live event will cover all the ways today's security and
>> > threat landscape has changed and how IT managers can respond.
>> > Discussions
>> > will include endpoint security, mobile security and the latest in
>> > malware
>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> > _______________________________________________
>> > Pytables-users mailing list
>> > Pyt...@li...
>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Jacob B. <jac...@gm...> - 2012-07-18 12:05:10

Good to hear, were you able to get away with having 30,000 datasets
directly linked to a similar node (in this case, data)? I seem to have a
problem putting that many nodes from one root.

-Jacob

On Wed, Jul 18, 2012 at 6:54 AM, Ümit Seren <uem...@gm...> wrote:

> On Wed, Jul 18, 2012 at 1:32 PM, Jacob Bennett
> <jac...@gm...> wrote:
> > Sounds awesome, thanks for the help, I also have two more concerns.
> >
> > #1 - I will never concurrently write, I only have to worry about one
> write
> > with many reads, will the hdf5 metadata for a tree-like structure be
> able to
> > hold up in this scenario?
>
> To be honest I haven't really tried the concurrent read and single
> write use case.
> In my case I had a cherrypy python web-server (which uses multiple
> processes to handle requests) and usually I write from one request and
> reading is done from the same or another. But I don't think I ever had
> the use case where I read and wrote at the same time.
> However I had to keep the files open because of the way PyTables
> handles files (it cashes them as singleton object without a lock).
> For example if you close the file after you finished writing and at
> the same time you are reading from another process it will cause an
> exception in the read thread/process because it loses the file handle.
> So you probably have to take care of this yourself in your code.
>
>
> > #2 - When you have around 30,000 tables in your hdf5 file, you do not
> want
> > to have every node directly linked to root (plus I don't think hdf5 can
> > support that); however, I have no other natural grouping besides this,
> could
> > this be a concern also.
>
>
> Well in my case my datasets consisted not only of one table but also
> attional data (CArray, etc).
> So I naturally created groups for each datasets and stored
> meta-information as attributes on the group. These groups could
> contain sometimes additional groups and the actual data in form of
> tables and CArrays. It looked something like this:
>
>  - root
>     - data
>         - dataset1
>             - table
>             - transformation
>                 -table
>                 - CArray
>         - dataset2
>         .
>         .
>         .
>        - dataset30.000
>
>
> > If you could help me out with these two items, I think I will have enough
> > knowledge under my belt to know what I need to do. Thanks again! ;)
> >
> >
> > On Wed, Jul 18, 2012 at 6:21 AM, Ümit Seren <uem...@gm...>
> wrote:
> >>
> >> I think it depends and there are different ways to do it.
> >> But concurrent writes to one HDF5 file is not really supported (not
> >> even by the underlying HDF5 library unless you use the MPI version).
> >> So in case you want to write from different threads/processes you
> >> probably have to use separate hdf5 files.
> >> However writing from one process and reading from another is not much
> >> of an issue.
> >>
> >> Having everything in one hdf5 file has it's advantages as well as
> >> putting everything in separate hdf5 files.
> >> Filesystems can usually cope with one huge file much better than will
> >> millions of small files (copying, listing, etc).
> >> Of course if you have the datasets in separate hdf5 files it's easier
> >> to copy/move just single datasets compared to having everything in one
> >> hdf5 file  (tough that's also possible using ptrepack).
> >>
> >> You could also create one hdf5 file for the meta information and
> >> create separate hdf5 files for each dataset. Then you can use
> >> hardlinks to connect the hdf5 file containing the meta-information to
> >> the hdf5 files for the datasets.
> >>
> >> I usually tend to put everything in one hdf5 file.
> >>
> >> On Wed, Jul 18, 2012 at 12:49 PM, Jacob Bennett
> >> <jac...@gm...> wrote:
> >> > I really like this way about going about it; however, would it be
> better
> >> > to
> >> > use the built in hierarchy for separation of the tables or to write to
> >> > separate hdf5 files? When I am currently experimenting with concurrent
> >> > read/write operations to a shared hdf5 file w/o hierarchy, I notice
> that
> >> > the
> >> > only errors that I get are occasional read errors (which isn't much
> of a
> >> > problem for me), so I am thinking. Could there be a way to reduce the
> >> > metadata within an hdf5 and at the same time, use a multi-tabled
> >> > approach to
> >> > solve my problem?
> >> >
> >> > Thanks,
> >> > Jacob
> >> >
> >> >
> >> > On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uem...@gm...>
> >> > wrote:
> >> >>
> >> >> Just to add what Anthony said:
> >> >> In the end it also depends how unrelated your data is and how you
> want
> >> >> to access it. If the access scenaria is that you usually only search
> >> >> or select within a specific dataset then splitting up the datasets
> and
> >> >> putting them into separate tables is the way to go. In RBDMS terms
> >> >> this is btw called sharding.
> >> >> I have such a use case where I do have around 30000 datasets (each of
> >> >> them with around 5 million rows). I am only interested in one dataset
> >> >> at a time. So I created 30.000 tables. It works really good.
> >> >> And in case you want to access the data across the datasets (for
> >> >> aggregating or calculating averages) you can take a MapReduce
> approach
> >> >> which should work very well with this approach.
> >> >>
> >> >>
> >> >> On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett
> >> >> <jac...@gm...> wrote:
> >> >> > Thanks for the input Anthony!
> >> >> >
> >> >> > -Jake
> >> >> >
> >> >> >
> >> >> > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <
> sc...@gm...>
> >> >> > wrote:
> >> >> >>
> >> >> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett
> >> >> >> <jac...@gm...>
> >> >> >> wrote:
> >> >> >>>
> >> >> >>> Hello PyTables Users & Contributors,
> >> >> >>>
> >> >> >>> Just a quick question, let's say that I have certain identifiers
> >> >> >>> that
> >> >> >>> link to a set of data. Would it generally be faster for lookup to
> >> >> >>> have
> >> >> >>> each
> >> >> >>> set a data as a separate table with an id as the tables name or
> to
> >> >> >>> add
> >> >> >>> this
> >> >> >>> id as another column to a universal table of data and then let
> the
> >> >> >>> in-kernel
> >> >> >>> search query data only with a specific id?
> >> >> >>
> >> >> >>
> >> >> >> I think that in general it is faster to have more tables with ids
> as
> >> >> >> names.  For very small data, searching through a single larger
> table
> >> >> >> might
> >> >> >> be quicker than node access...but even then I doubt it.
> >> >> >>
> >> >> >>>
> >> >> >>> I hope you can understand my question would 1,000 tables of
> 100,000
> >> >> >>> records each be better for searching than 1 table with 100
> million
> >> >> >>> records
> >> >> >>> and one extra id column?
> >> >> >>
> >> >> >>
> >> >> >> For these data sizes more tables is probably faster.
> >> >> >>
> >> >> >> (It should also be noted that in the more tables case, that data
> is
> >> >> >> actually smaller, because you can eliminate the id column.)
> >> >> >>
> >> >> >> Be Well
> >> >> >> Anthony
> >> >> >>
> >> >> >>>
> >> >> >>>
> >> >> >>> Thanks,
> >> >> >>> Jacob Bennett
> >> >> >>>
> >> >> >>> --
> >> >> >>> Jacob Bennett
> >> >> >>> Massachusetts Institute of Technology
> >> >> >>> Department of Electrical Engineering and Computer Science
> >> >> >>> Class of 2014| ben...@mi...
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> >> >> >>>
> ------------------------------------------------------------------------------
> >> >> >>> Live Security Virtual Conference
> >> >> >>> Exclusive live event will cover all the ways today's security and
> >> >> >>> threat landscape has changed and how IT managers can respond.
> >> >> >>> Discussions
> >> >> >>> will include endpoint security, mobile security and the latest in
> >> >> >>> malware
> >> >> >>> threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> >>> _______________________________________________
> >> >> >>> Pytables-users mailing list
> >> >> >>> Pyt...@li...
> >> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >> >>>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >>
> ------------------------------------------------------------------------------
> >> >> >> Live Security Virtual Conference
> >> >> >> Exclusive live event will cover all the ways today's security and
> >> >> >> threat landscape has changed and how IT managers can respond.
> >> >> >> Discussions
> >> >> >> will include endpoint security, mobile security and the latest in
> >> >> >> malware
> >> >> >> threats.
> http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> >> _______________________________________________
> >> >> >> Pytables-users mailing list
> >> >> >> Pyt...@li...
> >> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >> >>
> >> >> >
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Jacob Bennett
> >> >> > Massachusetts Institute of Technology
> >> >> > Department of Electrical Engineering and Computer Science
> >> >> > Class of 2014| ben...@mi...
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> >> >> >
> ------------------------------------------------------------------------------
> >> >> > Live Security Virtual Conference
> >> >> > Exclusive live event will cover all the ways today's security and
> >> >> > threat landscape has changed and how IT managers can respond.
> >> >> > Discussions
> >> >> > will include endpoint security, mobile security and the latest in
> >> >> > malware
> >> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> > _______________________________________________
> >> >> > Pytables-users mailing list
> >> >> > Pyt...@li...
> >> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >>
> ------------------------------------------------------------------------------
> >> >> Live Security Virtual Conference
> >> >> Exclusive live event will cover all the ways today's security and
> >> >> threat landscape has changed and how IT managers can respond.
> >> >> Discussions
> >> >> will include endpoint security, mobile security and the latest in
> >> >> malware
> >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> _______________________________________________
> >> >> Pytables-users mailing list
> >> >> Pyt...@li...
> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Jacob Bennett
> >> > Massachusetts Institute of Technology
> >> > Department of Electrical Engineering and Computer Science
> >> > Class of 2014| ben...@mi...
> >> >
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > Live Security Virtual Conference
> >> > Exclusive live event will cover all the ways today's security and
> >> > threat landscape has changed and how IT managers can respond.
> >> > Discussions
> >> > will include endpoint security, mobile security and the latest in
> >> > malware
> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> > _______________________________________________
> >> > Pytables-users mailing list
> >> > Pyt...@li...
> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Pytables-users mailing list
> >> Pyt...@li...
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> > --
> > Jacob Bennett
> > Massachusetts Institute of Technology
> > Department of Electrical Engineering and Computer Science
> > Class of 2014| ben...@mi...
> >
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>



-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Ümit S. <uem...@gm...> - 2012-07-18 11:55:09

On Wed, Jul 18, 2012 at 1:32 PM, Jacob Bennett
<jac...@gm...> wrote:
> Sounds awesome, thanks for the help, I also have two more concerns.
>
> #1 - I will never concurrently write, I only have to worry about one write
> with many reads, will the hdf5 metadata for a tree-like structure be able to
> hold up in this scenario?

To be honest I haven't really tried the concurrent read and single
write use case.
In my case I had a cherrypy python web-server (which uses multiple
processes to handle requests) and usually I write from one request and
reading is done from the same or another. But I don't think I ever had
the use case where I read and wrote at the same time.
However I had to keep the files open because of the way PyTables
handles files (it cashes them as singleton object without a lock).
For example if you close the file after you finished writing and at
the same time you are reading from another process it will cause an
exception in the read thread/process because it loses the file handle.
So you probably have to take care of this yourself in your code.


> #2 - When you have around 30,000 tables in your hdf5 file, you do not want
> to have every node directly linked to root (plus I don't think hdf5 can
> support that); however, I have no other natural grouping besides this, could
> this be a concern also.


Well in my case my datasets consisted not only of one table but also
attional data (CArray, etc).
So I naturally created groups for each datasets and stored
meta-information as attributes on the group. These groups could
contain sometimes additional groups and the actual data in form of
tables and CArrays. It looked something like this:

 - root
    - data
        - dataset1
            - table
            - transformation
                -table
                - CArray
        - dataset2
        .
        .
        .
       - dataset30.000


> If you could help me out with these two items, I think I will have enough
> knowledge under my belt to know what I need to do. Thanks again! ;)
>
>
> On Wed, Jul 18, 2012 at 6:21 AM, Ümit Seren <uem...@gm...> wrote:
>>
>> I think it depends and there are different ways to do it.
>> But concurrent writes to one HDF5 file is not really supported (not
>> even by the underlying HDF5 library unless you use the MPI version).
>> So in case you want to write from different threads/processes you
>> probably have to use separate hdf5 files.
>> However writing from one process and reading from another is not much
>> of an issue.
>>
>> Having everything in one hdf5 file has it's advantages as well as
>> putting everything in separate hdf5 files.
>> Filesystems can usually cope with one huge file much better than will
>> millions of small files (copying, listing, etc).
>> Of course if you have the datasets in separate hdf5 files it's easier
>> to copy/move just single datasets compared to having everything in one
>> hdf5 file  (tough that's also possible using ptrepack).
>>
>> You could also create one hdf5 file for the meta information and
>> create separate hdf5 files for each dataset. Then you can use
>> hardlinks to connect the hdf5 file containing the meta-information to
>> the hdf5 files for the datasets.
>>
>> I usually tend to put everything in one hdf5 file.
>>
>> On Wed, Jul 18, 2012 at 12:49 PM, Jacob Bennett
>> <jac...@gm...> wrote:
>> > I really like this way about going about it; however, would it be better
>> > to
>> > use the built in hierarchy for separation of the tables or to write to
>> > separate hdf5 files? When I am currently experimenting with concurrent
>> > read/write operations to a shared hdf5 file w/o hierarchy, I notice that
>> > the
>> > only errors that I get are occasional read errors (which isn't much of a
>> > problem for me), so I am thinking. Could there be a way to reduce the
>> > metadata within an hdf5 and at the same time, use a multi-tabled
>> > approach to
>> > solve my problem?
>> >
>> > Thanks,
>> > Jacob
>> >
>> >
>> > On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uem...@gm...>
>> > wrote:
>> >>
>> >> Just to add what Anthony said:
>> >> In the end it also depends how unrelated your data is and how you want
>> >> to access it. If the access scenaria is that you usually only search
>> >> or select within a specific dataset then splitting up the datasets and
>> >> putting them into separate tables is the way to go. In RBDMS terms
>> >> this is btw called sharding.
>> >> I have such a use case where I do have around 30000 datasets (each of
>> >> them with around 5 million rows). I am only interested in one dataset
>> >> at a time. So I created 30.000 tables. It works really good.
>> >> And in case you want to access the data across the datasets (for
>> >> aggregating or calculating averages) you can take a MapReduce approach
>> >> which should work very well with this approach.
>> >>
>> >>
>> >> On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett
>> >> <jac...@gm...> wrote:
>> >> > Thanks for the input Anthony!
>> >> >
>> >> > -Jake
>> >> >
>> >> >
>> >> > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <sc...@gm...>
>> >> > wrote:
>> >> >>
>> >> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett
>> >> >> <jac...@gm...>
>> >> >> wrote:
>> >> >>>
>> >> >>> Hello PyTables Users & Contributors,
>> >> >>>
>> >> >>> Just a quick question, let's say that I have certain identifiers
>> >> >>> that
>> >> >>> link to a set of data. Would it generally be faster for lookup to
>> >> >>> have
>> >> >>> each
>> >> >>> set a data as a separate table with an id as the tables name or to
>> >> >>> add
>> >> >>> this
>> >> >>> id as another column to a universal table of data and then let the
>> >> >>> in-kernel
>> >> >>> search query data only with a specific id?
>> >> >>
>> >> >>
>> >> >> I think that in general it is faster to have more tables with ids as
>> >> >> names.  For very small data, searching through a single larger table
>> >> >> might
>> >> >> be quicker than node access...but even then I doubt it.
>> >> >>
>> >> >>>
>> >> >>> I hope you can understand my question would 1,000 tables of 100,000
>> >> >>> records each be better for searching than 1 table with 100 million
>> >> >>> records
>> >> >>> and one extra id column?
>> >> >>
>> >> >>
>> >> >> For these data sizes more tables is probably faster.
>> >> >>
>> >> >> (It should also be noted that in the more tables case, that data is
>> >> >> actually smaller, because you can eliminate the id column.)
>> >> >>
>> >> >> Be Well
>> >> >> Anthony
>> >> >>
>> >> >>>
>> >> >>>
>> >> >>> Thanks,
>> >> >>> Jacob Bennett
>> >> >>>
>> >> >>> --
>> >> >>> Jacob Bennett
>> >> >>> Massachusetts Institute of Technology
>> >> >>> Department of Electrical Engineering and Computer Science
>> >> >>> Class of 2014| ben...@mi...
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> ------------------------------------------------------------------------------
>> >> >>> Live Security Virtual Conference
>> >> >>> Exclusive live event will cover all the ways today's security and
>> >> >>> threat landscape has changed and how IT managers can respond.
>> >> >>> Discussions
>> >> >>> will include endpoint security, mobile security and the latest in
>> >> >>> malware
>> >> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> >>> _______________________________________________
>> >> >>> Pytables-users mailing list
>> >> >>> Pyt...@li...
>> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >> >>>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> ------------------------------------------------------------------------------
>> >> >> Live Security Virtual Conference
>> >> >> Exclusive live event will cover all the ways today's security and
>> >> >> threat landscape has changed and how IT managers can respond.
>> >> >> Discussions
>> >> >> will include endpoint security, mobile security and the latest in
>> >> >> malware
>> >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> >> _______________________________________________
>> >> >> Pytables-users mailing list
>> >> >> Pyt...@li...
>> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Jacob Bennett
>> >> > Massachusetts Institute of Technology
>> >> > Department of Electrical Engineering and Computer Science
>> >> > Class of 2014| ben...@mi...
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > ------------------------------------------------------------------------------
>> >> > Live Security Virtual Conference
>> >> > Exclusive live event will cover all the ways today's security and
>> >> > threat landscape has changed and how IT managers can respond.
>> >> > Discussions
>> >> > will include endpoint security, mobile security and the latest in
>> >> > malware
>> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> > _______________________________________________
>> >> > Pytables-users mailing list
>> >> > Pyt...@li...
>> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >> >
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Live Security Virtual Conference
>> >> Exclusive live event will cover all the ways today's security and
>> >> threat landscape has changed and how IT managers can respond.
>> >> Discussions
>> >> will include endpoint security, mobile security and the latest in
>> >> malware
>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> _______________________________________________
>> >> Pytables-users mailing list
>> >> Pyt...@li...
>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >
>> >
>> >
>> >
>> > --
>> > Jacob Bennett
>> > Massachusetts Institute of Technology
>> > Department of Electrical Engineering and Computer Science
>> > Class of 2014| ben...@mi...
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Live Security Virtual Conference
>> > Exclusive live event will cover all the ways today's security and
>> > threat landscape has changed and how IT managers can respond.
>> > Discussions
>> > will include endpoint security, mobile security and the latest in
>> > malware
>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> > _______________________________________________
>> > Pytables-users mailing list
>> > Pyt...@li...
>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Jacob B. <jac...@gm...> - 2012-07-18 11:32:24

Sounds awesome, thanks for the help, I also have two more concerns.

#1 - I will never concurrently write, I only have to worry about one write
with many reads, will the hdf5 metadata for a tree-like structure be able
to hold up in this scenario?
#2 - When you have around 30,000 tables in your hdf5 file, you do not want
to have every node directly linked to root (plus I don't think hdf5 can
support that); however, I have no other natural grouping besides this,
could this be a concern also.

If you could help me out with these two items, I think I will have enough
knowledge under my belt to know what I need to do. Thanks again! ;)

On Wed, Jul 18, 2012 at 6:21 AM, Ümit Seren <uem...@gm...> wrote:

> I think it depends and there are different ways to do it.
> But concurrent writes to one HDF5 file is not really supported (not
> even by the underlying HDF5 library unless you use the MPI version).
> So in case you want to write from different threads/processes you
> probably have to use separate hdf5 files.
> However writing from one process and reading from another is not much
> of an issue.
>
> Having everything in one hdf5 file has it's advantages as well as
> putting everything in separate hdf5 files.
> Filesystems can usually cope with one huge file much better than will
> millions of small files (copying, listing, etc).
> Of course if you have the datasets in separate hdf5 files it's easier
> to copy/move just single datasets compared to having everything in one
> hdf5 file  (tough that's also possible using ptrepack).
>
> You could also create one hdf5 file for the meta information and
> create separate hdf5 files for each dataset. Then you can use
> hardlinks to connect the hdf5 file containing the meta-information to
> the hdf5 files for the datasets.
>
> I usually tend to put everything in one hdf5 file.
>
> On Wed, Jul 18, 2012 at 12:49 PM, Jacob Bennett
> <jac...@gm...> wrote:
> > I really like this way about going about it; however, would it be better
> to
> > use the built in hierarchy for separation of the tables or to write to
> > separate hdf5 files? When I am currently experimenting with concurrent
> > read/write operations to a shared hdf5 file w/o hierarchy, I notice that
> the
> > only errors that I get are occasional read errors (which isn't much of a
> > problem for me), so I am thinking. Could there be a way to reduce the
> > metadata within an hdf5 and at the same time, use a multi-tabled
> approach to
> > solve my problem?
> >
> > Thanks,
> > Jacob
> >
> >
> > On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uem...@gm...>
> wrote:
> >>
> >> Just to add what Anthony said:
> >> In the end it also depends how unrelated your data is and how you want
> >> to access it. If the access scenaria is that you usually only search
> >> or select within a specific dataset then splitting up the datasets and
> >> putting them into separate tables is the way to go. In RBDMS terms
> >> this is btw called sharding.
> >> I have such a use case where I do have around 30000 datasets (each of
> >> them with around 5 million rows). I am only interested in one dataset
> >> at a time. So I created 30.000 tables. It works really good.
> >> And in case you want to access the data across the datasets (for
> >> aggregating or calculating averages) you can take a MapReduce approach
> >> which should work very well with this approach.
> >>
> >>
> >> On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett
> >> <jac...@gm...> wrote:
> >> > Thanks for the input Anthony!
> >> >
> >> > -Jake
> >> >
> >> >
> >> > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <sc...@gm...>
> >> > wrote:
> >> >>
> >> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett
> >> >> <jac...@gm...>
> >> >> wrote:
> >> >>>
> >> >>> Hello PyTables Users & Contributors,
> >> >>>
> >> >>> Just a quick question, let's say that I have certain identifiers
> that
> >> >>> link to a set of data. Would it generally be faster for lookup to
> have
> >> >>> each
> >> >>> set a data as a separate table with an id as the tables name or to
> add
> >> >>> this
> >> >>> id as another column to a universal table of data and then let the
> >> >>> in-kernel
> >> >>> search query data only with a specific id?
> >> >>
> >> >>
> >> >> I think that in general it is faster to have more tables with ids as
> >> >> names.  For very small data, searching through a single larger table
> >> >> might
> >> >> be quicker than node access...but even then I doubt it.
> >> >>
> >> >>>
> >> >>> I hope you can understand my question would 1,000 tables of 100,000
> >> >>> records each be better for searching than 1 table with 100 million
> >> >>> records
> >> >>> and one extra id column?
> >> >>
> >> >>
> >> >> For these data sizes more tables is probably faster.
> >> >>
> >> >> (It should also be noted that in the more tables case, that data is
> >> >> actually smaller, because you can eliminate the id column.)
> >> >>
> >> >> Be Well
> >> >> Anthony
> >> >>
> >> >>>
> >> >>>
> >> >>> Thanks,
> >> >>> Jacob Bennett
> >> >>>
> >> >>> --
> >> >>> Jacob Bennett
> >> >>> Massachusetts Institute of Technology
> >> >>> Department of Electrical Engineering and Computer Science
> >> >>> Class of 2014| ben...@mi...
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> ------------------------------------------------------------------------------
> >> >>> Live Security Virtual Conference
> >> >>> Exclusive live event will cover all the ways today's security and
> >> >>> threat landscape has changed and how IT managers can respond.
> >> >>> Discussions
> >> >>> will include endpoint security, mobile security and the latest in
> >> >>> malware
> >> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >>> _______________________________________________
> >> >>> Pytables-users mailing list
> >> >>> Pyt...@li...
> >> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >>>
> >> >>
> >> >>
> >> >>
> >> >>
> >> >>
> ------------------------------------------------------------------------------
> >> >> Live Security Virtual Conference
> >> >> Exclusive live event will cover all the ways today's security and
> >> >> threat landscape has changed and how IT managers can respond.
> >> >> Discussions
> >> >> will include endpoint security, mobile security and the latest in
> >> >> malware
> >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> >> _______________________________________________
> >> >> Pytables-users mailing list
> >> >> Pyt...@li...
> >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Jacob Bennett
> >> > Massachusetts Institute of Technology
> >> > Department of Electrical Engineering and Computer Science
> >> > Class of 2014| ben...@mi...
> >> >
> >> >
> >> >
> >> >
> ------------------------------------------------------------------------------
> >> > Live Security Virtual Conference
> >> > Exclusive live event will cover all the ways today's security and
> >> > threat landscape has changed and how IT managers can respond.
> >> > Discussions
> >> > will include endpoint security, mobile security and the latest in
> >> > malware
> >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> > _______________________________________________
> >> > Pytables-users mailing list
> >> > Pyt...@li...
> >> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >> >
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Pytables-users mailing list
> >> Pyt...@li...
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> > --
> > Jacob Bennett
> > Massachusetts Institute of Technology
> > Department of Electrical Engineering and Computer Science
> > Class of 2014| ben...@mi...
> >
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>



-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Ümit S. <uem...@gm...> - 2012-07-18 11:22:22

I think it depends and there are different ways to do it.
But concurrent writes to one HDF5 file is not really supported (not
even by the underlying HDF5 library unless you use the MPI version).
So in case you want to write from different threads/processes you
probably have to use separate hdf5 files.
However writing from one process and reading from another is not much
of an issue.

Having everything in one hdf5 file has it's advantages as well as
putting everything in separate hdf5 files.
Filesystems can usually cope with one huge file much better than will
millions of small files (copying, listing, etc).
Of course if you have the datasets in separate hdf5 files it's easier
to copy/move just single datasets compared to having everything in one
hdf5 file  (tough that's also possible using ptrepack).

You could also create one hdf5 file for the meta information and
create separate hdf5 files for each dataset. Then you can use
hardlinks to connect the hdf5 file containing the meta-information to
the hdf5 files for the datasets.

I usually tend to put everything in one hdf5 file.

On Wed, Jul 18, 2012 at 12:49 PM, Jacob Bennett
<jac...@gm...> wrote:
> I really like this way about going about it; however, would it be better to
> use the built in hierarchy for separation of the tables or to write to
> separate hdf5 files? When I am currently experimenting with concurrent
> read/write operations to a shared hdf5 file w/o hierarchy, I notice that the
> only errors that I get are occasional read errors (which isn't much of a
> problem for me), so I am thinking. Could there be a way to reduce the
> metadata within an hdf5 and at the same time, use a multi-tabled approach to
> solve my problem?
>
> Thanks,
> Jacob
>
>
> On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uem...@gm...> wrote:
>>
>> Just to add what Anthony said:
>> In the end it also depends how unrelated your data is and how you want
>> to access it. If the access scenaria is that you usually only search
>> or select within a specific dataset then splitting up the datasets and
>> putting them into separate tables is the way to go. In RBDMS terms
>> this is btw called sharding.
>> I have such a use case where I do have around 30000 datasets (each of
>> them with around 5 million rows). I am only interested in one dataset
>> at a time. So I created 30.000 tables. It works really good.
>> And in case you want to access the data across the datasets (for
>> aggregating or calculating averages) you can take a MapReduce approach
>> which should work very well with this approach.
>>
>>
>> On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett
>> <jac...@gm...> wrote:
>> > Thanks for the input Anthony!
>> >
>> > -Jake
>> >
>> >
>> > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <sc...@gm...>
>> > wrote:
>> >>
>> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett
>> >> <jac...@gm...>
>> >> wrote:
>> >>>
>> >>> Hello PyTables Users & Contributors,
>> >>>
>> >>> Just a quick question, let's say that I have certain identifiers that
>> >>> link to a set of data. Would it generally be faster for lookup to have
>> >>> each
>> >>> set a data as a separate table with an id as the tables name or to add
>> >>> this
>> >>> id as another column to a universal table of data and then let the
>> >>> in-kernel
>> >>> search query data only with a specific id?
>> >>
>> >>
>> >> I think that in general it is faster to have more tables with ids as
>> >> names.  For very small data, searching through a single larger table
>> >> might
>> >> be quicker than node access...but even then I doubt it.
>> >>
>> >>>
>> >>> I hope you can understand my question would 1,000 tables of 100,000
>> >>> records each be better for searching than 1 table with 100 million
>> >>> records
>> >>> and one extra id column?
>> >>
>> >>
>> >> For these data sizes more tables is probably faster.
>> >>
>> >> (It should also be noted that in the more tables case, that data is
>> >> actually smaller, because you can eliminate the id column.)
>> >>
>> >> Be Well
>> >> Anthony
>> >>
>> >>>
>> >>>
>> >>> Thanks,
>> >>> Jacob Bennett
>> >>>
>> >>> --
>> >>> Jacob Bennett
>> >>> Massachusetts Institute of Technology
>> >>> Department of Electrical Engineering and Computer Science
>> >>> Class of 2014| ben...@mi...
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> ------------------------------------------------------------------------------
>> >>> Live Security Virtual Conference
>> >>> Exclusive live event will cover all the ways today's security and
>> >>> threat landscape has changed and how IT managers can respond.
>> >>> Discussions
>> >>> will include endpoint security, mobile security and the latest in
>> >>> malware
>> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >>> _______________________________________________
>> >>> Pytables-users mailing list
>> >>> Pyt...@li...
>> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >>>
>> >>
>> >>
>> >>
>> >>
>> >> ------------------------------------------------------------------------------
>> >> Live Security Virtual Conference
>> >> Exclusive live event will cover all the ways today's security and
>> >> threat landscape has changed and how IT managers can respond.
>> >> Discussions
>> >> will include endpoint security, mobile security and the latest in
>> >> malware
>> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> >> _______________________________________________
>> >> Pytables-users mailing list
>> >> Pyt...@li...
>> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >>
>> >
>> >
>> >
>> > --
>> > Jacob Bennett
>> > Massachusetts Institute of Technology
>> > Department of Electrical Engineering and Computer Science
>> > Class of 2014| ben...@mi...
>> >
>> >
>> >
>> > ------------------------------------------------------------------------------
>> > Live Security Virtual Conference
>> > Exclusive live event will cover all the ways today's security and
>> > threat landscape has changed and how IT managers can respond.
>> > Discussions
>> > will include endpoint security, mobile security and the latest in
>> > malware
>> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> > _______________________________________________
>> > Pytables-users mailing list
>> > Pyt...@li...
>> > https://lists.sourceforge.net/lists/listinfo/pytables-users
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Jacob B. <jac...@gm...> - 2012-07-18 10:49:34

I really like this way about going about it; however, would it be better to
use the built in hierarchy for separation of the tables or to write to
separate hdf5 files? When I am currently experimenting with concurrent
read/write operations to a shared hdf5 file w/o hierarchy, I notice that
the only errors that I get are occasional read errors (which isn't much of
a problem for me), so I am thinking. Could there be a way to reduce the
metadata within an hdf5 and at the same time, use a multi-tabled approach
to solve my problem?

Thanks,
Jacob

On Wed, Jul 18, 2012 at 1:22 AM, Ümit Seren <uem...@gm...> wrote:

> Just to add what Anthony said:
> In the end it also depends how unrelated your data is and how you want
> to access it. If the access scenaria is that you usually only search
> or select within a specific dataset then splitting up the datasets and
> putting them into separate tables is the way to go. In RBDMS terms
> this is btw called sharding.
> I have such a use case where I do have around 30000 datasets (each of
> them with around 5 million rows). I am only interested in one dataset
> at a time. So I created 30.000 tables. It works really good.
> And in case you want to access the data across the datasets (for
> aggregating or calculating averages) you can take a MapReduce approach
> which should work very well with this approach.
>
>
> On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett
> <jac...@gm...> wrote:
> > Thanks for the input Anthony!
> >
> > -Jake
> >
> >
> > On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <sc...@gm...>
> wrote:
> >>
> >> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett <
> jac...@gm...>
> >> wrote:
> >>>
> >>> Hello PyTables Users & Contributors,
> >>>
> >>> Just a quick question, let's say that I have certain identifiers that
> >>> link to a set of data. Would it generally be faster for lookup to have
> each
> >>> set a data as a separate table with an id as the tables name or to add
> this
> >>> id as another column to a universal table of data and then let the
> in-kernel
> >>> search query data only with a specific id?
> >>
> >>
> >> I think that in general it is faster to have more tables with ids as
> >> names.  For very small data, searching through a single larger table
> might
> >> be quicker than node access...but even then I doubt it.
> >>
> >>>
> >>> I hope you can understand my question would 1,000 tables of 100,000
> >>> records each be better for searching than 1 table with 100 million
> records
> >>> and one extra id column?
> >>
> >>
> >> For these data sizes more tables is probably faster.
> >>
> >> (It should also be noted that in the more tables case, that data is
> >> actually smaller, because you can eliminate the id column.)
> >>
> >> Be Well
> >> Anthony
> >>
> >>>
> >>>
> >>> Thanks,
> >>> Jacob Bennett
> >>>
> >>> --
> >>> Jacob Bennett
> >>> Massachusetts Institute of Technology
> >>> Department of Electrical Engineering and Computer Science
> >>> Class of 2014| ben...@mi...
> >>>
> >>>
> >>>
> >>>
> ------------------------------------------------------------------------------
> >>> Live Security Virtual Conference
> >>> Exclusive live event will cover all the ways today's security and
> >>> threat landscape has changed and how IT managers can respond.
> Discussions
> >>> will include endpoint security, mobile security and the latest in
> malware
> >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >>> _______________________________________________
> >>> Pytables-users mailing list
> >>> Pyt...@li...
> >>> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >>>
> >>
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Pytables-users mailing list
> >> Pyt...@li...
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >>
> >
> >
> >
> > --
> > Jacob Bennett
> > Massachusetts Institute of Technology
> > Department of Electrical Engineering and Computer Science
> > Class of 2014| ben...@mi...
> >
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>



-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Ümit S. <uem...@gm...> - 2012-07-18 06:22:53

Just to add what Anthony said:
In the end it also depends how unrelated your data is and how you want
to access it. If the access scenaria is that you usually only search
or select within a specific dataset then splitting up the datasets and
putting them into separate tables is the way to go. In RBDMS terms
this is btw called sharding.
I have such a use case where I do have around 30000 datasets (each of
them with around 5 million rows). I am only interested in one dataset
at a time. So I created 30.000 tables. It works really good.
And in case you want to access the data across the datasets (for
aggregating or calculating averages) you can take a MapReduce approach
which should work very well with this approach.


On Tue, Jul 17, 2012 at 11:55 PM, Jacob Bennett
<jac...@gm...> wrote:
> Thanks for the input Anthony!
>
> -Jake
>
>
> On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <sc...@gm...> wrote:
>>
>> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett <jac...@gm...>
>> wrote:
>>>
>>> Hello PyTables Users & Contributors,
>>>
>>> Just a quick question, let's say that I have certain identifiers that
>>> link to a set of data. Would it generally be faster for lookup to have each
>>> set a data as a separate table with an id as the tables name or to add this
>>> id as another column to a universal table of data and then let the in-kernel
>>> search query data only with a specific id?
>>
>>
>> I think that in general it is faster to have more tables with ids as
>> names.  For very small data, searching through a single larger table might
>> be quicker than node access...but even then I doubt it.
>>
>>>
>>> I hope you can understand my question would 1,000 tables of 100,000
>>> records each be better for searching than 1 table with 100 million records
>>> and one extra id column?
>>
>>
>> For these data sizes more tables is probably faster.
>>
>> (It should also be noted that in the more tables case, that data is
>> actually smaller, because you can eliminate the id column.)
>>
>> Be Well
>> Anthony
>>
>>>
>>>
>>> Thanks,
>>> Jacob Bennett
>>>
>>> --
>>> Jacob Bennett
>>> Massachusetts Institute of Technology
>>> Department of Electrical Engineering and Computer Science
>>> Class of 2014| ben...@mi...
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Pytables file structure

From: Anthony S. <sc...@gm...> - 2012-07-17 23:31:49

Hello Juan,

Just make an account at github [1] and then go to the PyTables issues page.

Be Well
Anthony

1. https://github.com/
2. https://github.com/PyTables/PyTables/issues

On Tue, Jul 17, 2012 at 6:27 PM, Juan Manuel Vázquez Tovar <
jmv...@gm...> wrote:

> Thank you very much Anthony.
> Do I have to sign up to store a ticket?
>
>
> 2012/7/15 Anthony Scopatz <sc...@gm...>
>
>> Ahh I see, tricky.
>>
>> So I think what is killing you is that you are pulling each row of the
>> table individually over the network.  Ideally you should be able to do
>> something like the following:
>>
>> f.root.table.cols.my_col[:,n,:]
>>
>>
>> using numpy-esque multidimensional slicing.  However, this fails when I
>> just tested it.  So instead, I would just pull over the full column and
>> slice using numpy in memory.
>>
>> my_col = f.root.table.cols.my_col[:]
>> my_selection = my_col[:,n,:]
>>
>>
>> We should open a ticket so that the top method works (though I think
>> there might already be one).
>>
>> I hope this helps!
>>
>> On Sun, Jul 15, 2012 at 4:27 PM, Juan Manuel Vázquez Tovar <
>> jmv...@gm...> wrote:
>>
>>> The column I´m requesting the data from has multidimensional cells, so
>>> each time I request data from the table, I need to get a specific row for
>>> all the multidimensional cells in the column. I hope this clarifies a bit.
>>> I have at the office a Linux workstation, but it is part of a computing
>>> cluster where all the users have access, so the files are in a folder of
>>> the cluster, not in my hard drive.
>>>
>>> Thank you,
>>> Juanma
>>>
>>> 2012/7/15 Anthony Scopatz <sc...@gm...>
>>>
>>>> Rereading the original post, I am a little confused are your trying to
>>>> read the whole table, just a couple of rows that meet some condition, or
>>>> just one whole column, or one part of the column.
>>>>
>>>> To request the whole table without looping over each row in Python,
>>>> index every element:
>>>>
>>>> f.root.table[:]
>>>>
>>>>
>>>> To just get certain rows, use where().
>>>>
>>>> To get a single column, use the cols namespace:
>>>>
>>>> f.root.table.cols.my_column[:]
>>>>
>>>>
>>>> Why is this file elsewhere on the network?
>>>>
>>>> Be Well
>>>>  Anthony
>>>>
>>>> On Sun, Jul 15, 2012 at 4:08 PM, Juan Manuel Vázquez Tovar <
>>>> jmv...@gm...> wrote:
>>>>
>>>>> Hello Anthony,
>>>>>
>>>>> I have to loop over the whole set of rows. Does the where method has
>>>>> any advantages in that case?
>>>>>
>>>>> Thank you,
>>>>> Juanma
>>>>>
>>>>> 2012/7/15 Anthony Scopatz <sc...@gm...>
>>>>>
>>>>>> Hello Juan,
>>>>>>
>>>>>> Try using the where() method [1],  It has a lot of
>>>>>> nice features under the covers.
>>>>>>
>>>>>> Be Well
>>>>>> Anthony
>>>>>>
>>>>>> 1.
>>>>>> http://pytables.github.com/usersguide/libref.html?highlight=where#tables.Table.where
>>>>>>
>>>>>> On Sun, Jul 15, 2012 at 4:01 PM, Juan Manuel Vázquez Tovar <
>>>>>> jmv...@gm...> wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> I have been using pytables for a few moths. The main structure of my
>>>>>>> files has a four column table, two of which have multidimensional cells,
>>>>>>> (56,1) and (133,6) respectively. The previous structure had more columns
>>>>>>> instead of storing the 56x1 array into the same cell. The largest file has
>>>>>>> almost three million rows in the table.
>>>>>>> I usually request data from the table looping through the entire
>>>>>>> table and getting for each row one specific row of the 133x6 2d array.
>>>>>>> Currently, each of the requests can take from 15 sec up to 10
>>>>>>> minutes, I believe that depending on the status of the office network.
>>>>>>> Could you please advice about how to improve the reading time?
>>>>>>> I have tried to compress the data with zlib, but it takes more or
>>>>>>> less the same time.
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> Juan Manuel
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> Live Security Virtual Conference
>>>>>>> Exclusive live event will cover all the ways today's security and
>>>>>>> threat landscape has changed and how IT managers can respond.
>>>>>>> Discussions
>>>>>>> will include endpoint security, mobile security and the latest in
>>>>>>> malware
>>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>>>> _______________________________________________
>>>>>>> Pytables-users mailing list
>>>>>>> Pyt...@li...
>>>>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Live Security Virtual Conference
>>>>>> Exclusive live event will cover all the ways today's security and
>>>>>> threat landscape has changed and how IT managers can respond.
>>>>>> Discussions
>>>>>> will include endpoint security, mobile security and the latest in
>>>>>> malware
>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>>> _______________________________________________
>>>>>> Pytables-users mailing list
>>>>>> Pyt...@li...
>>>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Live Security Virtual Conference
>>>>> Exclusive live event will cover all the ways today's security and
>>>>> threat landscape has changed and how IT managers can respond.
>>>>> Discussions
>>>>> will include endpoint security, mobile security and the latest in
>>>>> malware
>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>> _______________________________________________
>>>>> Pytables-users mailing list
>>>>> Pyt...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond.
>>>> Discussions
>>>> will include endpoint security, mobile security and the latest in
>>>> malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Pytables-users mailing list
>>>> Pyt...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Pytables file structure

From: Juan M. V. T. <jmv...@gm...> - 2012-07-17 23:27:51

Thank you very much Anthony.
Do I have to sign up to store a ticket?

2012/7/15 Anthony Scopatz <sc...@gm...>

> Ahh I see, tricky.
>
> So I think what is killing you is that you are pulling each row of the
> table individually over the network.  Ideally you should be able to do
> something like the following:
>
> f.root.table.cols.my_col[:,n,:]
>
>
> using numpy-esque multidimensional slicing.  However, this fails when I
> just tested it.  So instead, I would just pull over the full column and
> slice using numpy in memory.
>
> my_col = f.root.table.cols.my_col[:]
> my_selection = my_col[:,n,:]
>
>
> We should open a ticket so that the top method works (though I think there
> might already be one).
>
> I hope this helps!
>
> On Sun, Jul 15, 2012 at 4:27 PM, Juan Manuel Vázquez Tovar <
> jmv...@gm...> wrote:
>
>> The column I´m requesting the data from has multidimensional cells, so
>> each time I request data from the table, I need to get a specific row for
>> all the multidimensional cells in the column. I hope this clarifies a bit.
>> I have at the office a Linux workstation, but it is part of a computing
>> cluster where all the users have access, so the files are in a folder of
>> the cluster, not in my hard drive.
>>
>> Thank you,
>> Juanma
>>
>> 2012/7/15 Anthony Scopatz <sc...@gm...>
>>
>>> Rereading the original post, I am a little confused are your trying to
>>> read the whole table, just a couple of rows that meet some condition, or
>>> just one whole column, or one part of the column.
>>>
>>> To request the whole table without looping over each row in Python,
>>> index every element:
>>>
>>> f.root.table[:]
>>>
>>>
>>> To just get certain rows, use where().
>>>
>>> To get a single column, use the cols namespace:
>>>
>>> f.root.table.cols.my_column[:]
>>>
>>>
>>> Why is this file elsewhere on the network?
>>>
>>> Be Well
>>>  Anthony
>>>
>>> On Sun, Jul 15, 2012 at 4:08 PM, Juan Manuel Vázquez Tovar <
>>> jmv...@gm...> wrote:
>>>
>>>> Hello Anthony,
>>>>
>>>> I have to loop over the whole set of rows. Does the where method has
>>>> any advantages in that case?
>>>>
>>>> Thank you,
>>>> Juanma
>>>>
>>>> 2012/7/15 Anthony Scopatz <sc...@gm...>
>>>>
>>>>> Hello Juan,
>>>>>
>>>>> Try using the where() method [1],  It has a lot of nice features under
>>>>> the covers.
>>>>>
>>>>> Be Well
>>>>> Anthony
>>>>>
>>>>> 1.
>>>>> http://pytables.github.com/usersguide/libref.html?highlight=where#tables.Table.where
>>>>>
>>>>> On Sun, Jul 15, 2012 at 4:01 PM, Juan Manuel Vázquez Tovar <
>>>>> jmv...@gm...> wrote:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> I have been using pytables for a few moths. The main structure of my
>>>>>> files has a four column table, two of which have multidimensional cells,
>>>>>> (56,1) and (133,6) respectively. The previous structure had more columns
>>>>>> instead of storing the 56x1 array into the same cell. The largest file has
>>>>>> almost three million rows in the table.
>>>>>> I usually request data from the table looping through the entire
>>>>>> table and getting for each row one specific row of the 133x6 2d array.
>>>>>> Currently, each of the requests can take from 15 sec up to 10
>>>>>> minutes, I believe that depending on the status of the office network.
>>>>>> Could you please advice about how to improve the reading time?
>>>>>> I have tried to compress the data with zlib, but it takes more or
>>>>>> less the same time.
>>>>>>
>>>>>> Thanks in advance,
>>>>>>
>>>>>> Juan Manuel
>>>>>>
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Live Security Virtual Conference
>>>>>> Exclusive live event will cover all the ways today's security and
>>>>>> threat landscape has changed and how IT managers can respond.
>>>>>> Discussions
>>>>>> will include endpoint security, mobile security and the latest in
>>>>>> malware
>>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>>> _______________________________________________
>>>>>> Pytables-users mailing list
>>>>>> Pyt...@li...
>>>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Live Security Virtual Conference
>>>>> Exclusive live event will cover all the ways today's security and
>>>>> threat landscape has changed and how IT managers can respond.
>>>>> Discussions
>>>>> will include endpoint security, mobile security and the latest in
>>>>> malware
>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>> _______________________________________________
>>>>> Pytables-users mailing list
>>>>> Pyt...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond.
>>>> Discussions
>>>> will include endpoint security, mobile security and the latest in
>>>> malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Pytables-users mailing list
>>>> Pyt...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Jacob B. <jac...@gm...> - 2012-07-17 21:55:34

Thanks for the input Anthony!

-Jake

On Tue, Jul 17, 2012 at 4:20 PM, Anthony Scopatz <sc...@gm...> wrote:

> On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett <jac...@gm...>wrote:
>
>> Hello PyTables Users & Contributors,
>>
>> Just a quick question, let's say that I have certain identifiers that
>> link to a set of data. Would it generally be faster for lookup to have each
>> set a data as a separate table with an id as the tables name or to add this
>> id as another column to a universal table of data and then let the
>> in-kernel search query data only with a specific id?
>>
>
> I think that in general it is faster to have more tables with ids as
> names.  For very small data, searching through a single larger table might
> be quicker than node access...but even then I doubt it.
>
>
>> I hope you can understand my question would 1,000 tables of 100,000
>> records each be better for searching than 1 table with 100 million records
>> and one extra id column?
>>
>
> For these data sizes more tables is probably faster.
>
> (It should also be noted that in the more tables case, that data is
> actually smaller, because you can eliminate the id column.)
>
> Be Well
> Anthony
>
>
>>
>> Thanks,
>> Jacob Bennett
>>
>> --
>> Jacob Bennett
>> Massachusetts Institute of Technology
>> Department of Electrical Engineering and Computer Science
>> Class of 2014| ben...@mi...
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>


-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] [pytables-dev] SciPy 2012 Tutorial

From: Anthony S. <sc...@gm...> - 2012-07-17 21:53:34

On Tue, Jul 17, 2012 at 2:43 AM, Alvaro Tejero Cantero <al...@mi...>wrote:

> It is a very nice presentation.
>
> Makes me wonder if using the terminology
>
> 'in memory' for 'in-core' and 'in disk' for 'out of core' would not be
> more straightforward!
>

Thanks Alvaro!

I agree the existing terminology here is very confusing....

Be Well
Anthony


>
> -á.
>
>
> On 17 July 2012 06:46, Anthony Scopatz <sc...@gm...> wrote:
> > Hello PyTables,
> >
> > I'd like to present the tutorial I gave at SciPy 2012 this afternoon.
>  There
> > were 50+ people in attendance!  My slides are attached and you can find
> the
> > repository with all of my exercises at [1].  This is released under CC
> > BY-SA, so feel free to poach as needed.  I'll be sure to let you know
> when
> > the video goes up.  I think that we definitely had some PyTables / HDF5
> > converts today.
> >
> > I should also note that Antonio put out the v2.4-rc during my tutorial
> ;0.
> >
> > Enjoy data!
> > Anthony
> >
> > 1. https://github.com/scopatz/scipy2012/tree/master/hdf5
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Anthony S. <sc...@gm...> - 2012-07-17 21:20:30

On Tue, Jul 17, 2012 at 3:30 PM, Jacob Bennett <jac...@gm...>wrote:

> Hello PyTables Users & Contributors,
>
> Just a quick question, let's say that I have certain identifiers that link
> to a set of data. Would it generally be faster for lookup to have each set
> a data as a separate table with an id as the tables name or to add this id
> as another column to a universal table of data and then let the in-kernel
> search query data only with a specific id?
>

I think that in general it is faster to have more tables with ids as names.
 For very small data, searching through a single larger table might be
quicker than node access...but even then I doubt it.


> I hope you can understand my question would 1,000 tables of 100,000
> records each be better for searching than 1 table with 100 million records
> and one extra id column?
>

For these data sizes more tables is probably faster.

(It should also be noted that in the more tables case, that data is
actually smaller, because you can eliminate the id column.)

Be Well
Anthony


>
> Thanks,
> Jacob Bennett
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] Faster Performance: A set of nodes vs A new column that ranges within a set?

From: Jacob B. <jac...@gm...> - 2012-07-17 20:30:31

Hello PyTables Users & Contributors,

Just a quick question, let's say that I have certain identifiers that link
to a set of data. Would it generally be faster for lookup to have each set
a data as a separate table with an id as the tables name or to add this id
as another column to a universal table of data and then let the in-kernel
search query data only with a specific id?

I hope you can understand my question would 1,000 tables of 100,000 records
each be better for searching than 1 table with 100 million records and one
extra id column?

Thanks,
Jacob Bennett

-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] [pytables-dev] SciPy 2012 Tutorial

From: Francesc A. <fa...@py...> - 2012-07-17 13:41:03

Hey Anthony,

I was not there, but judging by the slices, this should have been a very 
nice tutorial.

Some remarks:

- In slide 19, you state that, if data comes from datasets with the 
'numpy' flavor, they can be accessed in a numpy-like fashion. In fact, 
you should be able to access data this way for any flavor.

- In the same slide 19, you mention that data accessed this way is 
'memory mapped'.  I'm not swure why you are using this expression, but 
memory mapped files are normally a way for the operating system to load 
data in a lazy way (on-demand).  It is very similar to what you are 
describing there, but how both systems work is not quite the same.  In a 
truly 'memory mapped' system, the mapping is done at the operating 
system level.  In your example, it has been PyTables who has done the 
job, without any help from the operating system's 'memory mapping' 
capability.  But probably you already explained this to the audience.

- In slice 46, you seem to suggest that 'out-of-core' would be a similar 
concept to 'in-kernel', but it is not.  When I introduced the 
'in-kernel' concept in PyTables I meant something like a computation 
that is made at C-level (i.e. in the computational kernel, that is, 
numexpr).  It is true that the way to evaluate out-of-core computations 
is via 'in-kernel', but you can also do in-kernel operations with 
in-memory data (like in 'in-kernel' selections).

- And finally, I don't see your name anywhere in slide 72.  Why is so??  
Come on, don't be shy! :)


At any rate, I have found most of the material to be of very high 
quality.  Thanks for the effort!

Francesc


On 7/17/12 7:46 AM, Anthony Scopatz wrote:
> Hello PyTables,
>
> I'd like to present the tutorial I gave at SciPy 2012 this afternoon. 
>  There were 50+ people in attendance!  My slides are attached and you 
> can find the repository with all of my exercises at [1].  This is 
> released under CC BY-SA, so feel free to poach as needed.  I'll be 
> sure to let you know when the video goes up.  I think that we 
> definitely had some PyTables / HDF5 converts today.
>
> I should also note that Antonio put out the v2.4-rc /during/ my 
> tutorial ;0.
>
> Enjoy data!
> Anthony
>
> 1. https://github.com/scopatz/scipy2012/tree/master/hdf5


-- 
Francesc Alted

Re: [Pytables-users] [pytables-dev] SciPy 2012 Tutorial

From: Alvaro T. C. <al...@mi...> - 2012-07-17 07:43:59

It is a very nice presentation.

Makes me wonder if using the terminology

'in memory' for 'in-core' and 'in disk' for 'out of core' would not be
more straightforward!

-á.


On 17 July 2012 06:46, Anthony Scopatz <sc...@gm...> wrote:
> Hello PyTables,
>
> I'd like to present the tutorial I gave at SciPy 2012 this afternoon.  There
> were 50+ people in attendance!  My slides are attached and you can find the
> repository with all of my exercises at [1].  This is released under CC
> BY-SA, so feel free to poach as needed.  I'll be sure to let you know when
> the video goes up.  I think that we definitely had some PyTables / HDF5
> converts today.
>
> I should also note that Antonio put out the v2.4-rc during my tutorial ;0.
>
> Enjoy data!
> Anthony
>
> 1. https://github.com/scopatz/scipy2012/tree/master/hdf5

Re: [Pytables-users] PyTables Simultaneous Read Write from Current File

From: Anthony S. <sc...@gm...> - 2012-07-17 05:03:40

On Mon, Jul 16, 2012 at 3:30 PM, Jacob Bennett <jac...@gm...>wrote:

> Wait, is there perhaps a way to simulataneously read and write without any
> kind of blocking? Perhaps the "a" mode or the "r+" mode might help for
> simultaneous read/write? I am currently implementing the
> multithreading.Queue, but I think that a large number of query requests
> might put an necessary load on my writing queue since the data comes in
> sooooo fast. ;)


Hmm I'll have to look into it, but I vaguely recall a file access mode that
HDF5 has that PyTables doesn't expose... I may be wrong about this....


> Btw, I will submit the example soon.
>

+1!


>
> -Jacob
>
>
> On Sat, Jul 14, 2012 at 1:39 PM, Anthony Scopatz <sc...@gm...>wrote:
>
>> +1 to example of this!
>>
>>
>> On Sat, Jul 14, 2012 at 1:36 PM, Jacob Bennett <jac...@gm...
>> > wrote:
>>
>>> Awesome, I think this sounds like a very workable solution and the idea
>>> is very neat. I will try to implement this right away. I definitely agree
>>> to putting a small example.
>>>
>>> Let you know how this works, thanks guys!
>>>
>>> Thanks,
>>> Jacob
>>>
>>>
>>> On Sat, Jul 14, 2012 at 2:36 AM, Antonio Valentino <
>>> ant...@ti...> wrote:
>>>
>>>> Hi all,
>>>> Il 14/07/2012 00:44, Josh Ayers ha scritto:
>>>> > My first instinct would be to handle all access (read and write) to
>>>> > that file from a single process.  You could create two
>>>> > multiprocessing.Queue objects, one for data to write and one for read
>>>> > requests.  Then the process would check the queues in a loop and
>>>> > handle each request serially.  The data read from the file could be
>>>> > sent back to the originating process using another queue or pipe.  You
>>>> > should be able to do the same thing with sockets if the other parts of
>>>> > your application are in languages other than Python.
>>>> >
>>>> > I do something similar to handle writing to a log file from multiple
>>>> > processes and it works well.  In that case the file is write-only -
>>>> > and just a simple text file rather than HDF5 - but I don't see any
>>>> > reason why it wouldn't work for read and write as well.
>>>> >
>>>> > Hope that helps,
>>>> > Josh
>>>> >
>>>>
>>>> I totally agree with Josh.
>>>>
>>>> I don't have a test code to demonstrate it but IMHO parallelizing I/O
>>>> to/from a single file on a single disk do not makes too much sense
>>>> unless you have special HW.  Is this your case Jacob?
>>>>
>>>> IMHO with standard SATA devices you could have a marginal speedup (in
>>>> the best case), but if your bottleneck is the I/O this will not solve
>>>> your problem.
>>>>
>>>> If someone finds the time to implement a toy example of what Josh
>>>> suggested we could put it on the cookbook :)
>>>>
>>>>
>>>> regards
>>>>
>>>> > On Fri, Jul 13, 2012 at 12:18 PM, Anthony Scopatz <sc...@gm...>
>>>> wrote:
>>>> >> On Fri, Jul 13, 2012 at 2:09 PM, Jacob Bennett <
>>>> jac...@gm...>
>>>> >> wrote:
>>>> >>
>>>> >> [snip]
>>>> >>
>>>> >>>
>>>> >>>  My first implementation was to have a set of current files stay in
>>>> write
>>>> >>> mode and have an overall lock over these files for the current day,
>>>> but
>>>> >>> (stupidly) I forgot that lock instances cannot be shared over
>>>> separate
>>>> >>> processes, only threads.
>>>> >>>
>>>> >>> So could you give me any advice in this situation? I'm sure it has
>>>> come up
>>>> >>> before. ;)
>>>> >>
>>>> >>
>>>> >> Hello All, I previously suggested to Jacob a setup where only one
>>>> proc would
>>>> >> have a write handle and all of the other processes would be in
>>>> read-only
>>>> >> mode.  I am not sure that this would work.
>>>> >>
>>>> >> Francesc, Antonio, Josh, etc or anyone else, how would you solve this
>>>> >> problem where you may want many processors to query the file, while
>>>> >> something else may be writing to it?  I defer to people with more
>>>> >> experience...  Thanks for your help!
>>>> >>
>>>> >> Be Well
>>>> >> Anthony
>>>> >>
>>>> >>>
>>>> >>> Thanks,
>>>> >>> Jacob Bennett
>>>> >>>
>>>>
>>>>
>>>> --
>>>> Antonio Valentino
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond.
>>>> Discussions
>>>> will include endpoint security, mobile security and the latest in
>>>> malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Pytables-users mailing list
>>>> Pyt...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>
>>>
>>>
>>>
>>> --
>>> Jacob Bennett
>>> Massachusetts Institute of Technology
>>> Department of Electrical Engineering and Computer Science
>>> Class of 2014| ben...@mi...
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

22 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 20 21 22 23 24 .. 165 > >> (Page 22 of 165)