pytables-users Mailing List for PyTables - Hierarchical datasets (Page 25)

pytables-users — PyTables users discussion list

You can subscribe to this list here.

2002	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov (5)	Dec
2003	Jan	Feb (2)	Mar	Apr (5)	May (11)	Jun (7)	Jul (18)	Aug (5)	Sep (15)	Oct (4)	Nov (1)	Dec (4)
2004	Jan (5)	Feb (2)	Mar (5)	Apr (8)	May (8)	Jun (10)	Jul (4)	Aug (4)	Sep (20)	Oct (11)	Nov (31)	Dec (41)
2005	Jan (79)	Feb (22)	Mar (14)	Apr (17)	May (35)	Jun (24)	Jul (26)	Aug (9)	Sep (57)	Oct (64)	Nov (25)	Dec (37)
2006	Jan (76)	Feb (24)	Mar (79)	Apr (44)	May (33)	Jun (12)	Jul (15)	Aug (40)	Sep (17)	Oct (21)	Nov (46)	Dec (23)
2007	Jan (18)	Feb (25)	Mar (41)	Apr (66)	May (18)	Jun (29)	Jul (40)	Aug (32)	Sep (34)	Oct (17)	Nov (46)	Dec (17)
2008	Jan (17)	Feb (42)	Mar (23)	Apr (11)	May (65)	Jun (28)	Jul (28)	Aug (16)	Sep (24)	Oct (33)	Nov (16)	Dec (5)
2009	Jan (19)	Feb (25)	Mar (11)	Apr (32)	May (62)	Jun (28)	Jul (61)	Aug (20)	Sep (61)	Oct (11)	Nov (14)	Dec (53)
2010	Jan (17)	Feb (31)	Mar (39)	Apr (43)	May (49)	Jun (47)	Jul (35)	Aug (58)	Sep (55)	Oct (91)	Nov (77)	Dec (63)
2011	Jan (50)	Feb (30)	Mar (67)	Apr (31)	May (17)	Jun (83)	Jul (17)	Aug (33)	Sep (35)	Oct (19)	Nov (29)	Dec (26)
2012	Jan (53)	Feb (22)	Mar (118)	Apr (45)	May (28)	Jun (71)	Jul (87)	Aug (55)	Sep (30)	Oct (73)	Nov (41)	Dec (28)
2013	Jan (19)	Feb (30)	Mar (14)	Apr (63)	May (20)	Jun (59)	Jul (40)	Aug (33)	Sep (1)	Oct	Nov	Dec

Flat | Threaded

<< < 1 .. 23 24 25 26 27 .. 165 > >> (Page 25 of 165)

Re: [Pytables-users] Use of recarrays as representation for Tables in memory

From: Anthony S. <sc...@gm...> - 2012-06-28 18:25:38

Hmmm Ok.  Maybe there needs to be a recarray flavor.

I kind of like just returning a normal ndarray, though I see your argument
for returning a recarray.  Maybe some of the other devs can jump in here
with an opinion.

Be Well
Anthony

On Thu, Jun 28, 2012 at 12:37 PM, Alvaro Tejero Cantero <al...@mi...>wrote:

> I just tested: passing an object of type numpy.core.records.recarray
> to the constructor of createTable and then reading back it into memory
> via slicing (h5f.root.myobj[:] ) returns to me a numpy.ndarray.
>
> Best,
>
> -á.
>
>
> On Thu, Jun 28, 2012 at 5:30 PM, Anthony Scopatz <sc...@gm...>
> wrote:
> > Hi Alvaro,
> >
> > I think if you save the table as a record array, it should return you a
> > record array.  Or does it return a structured array?  Have you tried
> this?
> >
> > Be Well
> > Anthony
> >
> > On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero <al...@mi...
> >
> > wrote:
> >>
> >> Hi,
> >>
> >> I've noticed that tables are loaded in memory as structured arrays.
> >>
> >> It seems that returning recarrays by default would be much in the
> >> spirit of the natural naming preferences of PyTables.
> >>
> >> Is there a reason not to do so?
> >>
> >> Cheers,
> >>
> >> Álvaro.
> >>
> >>
> >>
> ------------------------------------------------------------------------------
> >> Live Security Virtual Conference
> >> Exclusive live event will cover all the ways today's security and
> >> threat landscape has changed and how IT managers can respond.
> Discussions
> >> will include endpoint security, mobile security and the latest in
> malware
> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> >> _______________________________________________
> >> Pytables-users mailing list
> >> Pyt...@li...
> >> https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
> >
> >
> >
> ------------------------------------------------------------------------------
> > Live Security Virtual Conference
> > Exclusive live event will cover all the ways today's security and
> > threat landscape has changed and how IT managers can respond. Discussions
> > will include endpoint security, mobile security and the latest in malware
> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> > _______________________________________________
> > Pytables-users mailing list
> > Pyt...@li...
> > https://lists.sourceforge.net/lists/listinfo/pytables-users
> >
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Use of recarrays as representation for Tables in memory

From: Alvaro T. C. <al...@mi...> - 2012-06-28 17:38:21

I just tested: passing an object of type numpy.core.records.recarray
to the constructor of createTable and then reading back it into memory
via slicing (h5f.root.myobj[:] ) returns to me a numpy.ndarray.

Best,

-á.


On Thu, Jun 28, 2012 at 5:30 PM, Anthony Scopatz <sc...@gm...> wrote:
> Hi Alvaro,
>
> I think if you save the table as a record array, it should return you a
> record array.  Or does it return a structured array?  Have you tried this?
>
> Be Well
> Anthony
>
> On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero <al...@mi...>
> wrote:
>>
>> Hi,
>>
>> I've noticed that tables are loaded in memory as structured arrays.
>>
>> It seems that returning recarrays by default would be much in the
>> spirit of the natural naming preferences of PyTables.
>>
>> Is there a reason not to do so?
>>
>> Cheers,
>>
>> Álvaro.
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

Re: [Pytables-users] Use of recarrays as representation for Tables in memory

From: Anthony S. <sc...@gm...> - 2012-06-28 16:30:37

Hi Alvaro,

I think if you save the table as a record array, it should return you a
record array.  Or does it return a structured array?  Have you tried this?

Be Well
Anthony

On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero <al...@mi...>wrote:

> Hi,
>
> I've noticed that tables are loaded in memory as structured arrays.
>
> It seems that returning recarrays by default would be much in the
> spirit of the natural naming preferences of PyTables.
>
> Is there a reason not to do so?
>
> Cheers,
>
> Álvaro.
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>

[Pytables-users] Use of recarrays as representation for Tables in memory

From: Alvaro T. C. <al...@mi...> - 2012-06-28 16:23:21

Hi,

I've noticed that tables are loaded in memory as structured arrays.

It seems that returning recarrays by default would be much in the
spirit of the natural naming preferences of PyTables.

Is there a reason not to do so?

Cheers,

Álvaro.

Re: [Pytables-users] How Fast is File.__contains, File.getNode, File.createTable?

From: Anthony S. <sc...@gm...> - 2012-06-28 15:57:19

On Thu, Jun 28, 2012 at 10:41 AM, Jacob Bennett
<jac...@gm...>wrote:

> Hey Anthony,
>
> Awesome, I think I'm going to take your advice for aiming towards larger
> tables. Just an inquiry though, let's say you keep track of a
> dictionary/hashtable that maps node identifiers (keys) to instances of the
> node object (values) which can be assigned during node creation. ie*
> mydict['id'] = thisFile.createTable(params). I think this could actually
> help get away from the expensive search calls.


Yup. This would probably help a lot.  I hadn't even considered it.  I guess
you learn something new everyday ;)


> I'm still going to go with larger tables though, since I have to read the
> data eventually.


Sounds good!  Fee free to ask further questions here!

Be Well
Anthony


>
> Thanks Again For Your Time,
> Jacob
>
>
> On Thu, Jun 28, 2012 at 10:16 AM, Anthony Scopatz <sc...@gm...>wrote:
>
>> Hi Jacob,
>>
>> This is not a solely PyTables issue.  As described the methods you
>> mention all involve attribute (or metadata) access, which is notaoriously
>> slow in HDF5.  Or rather, much slower that read/write from the datasets
>> (Tables, Arrays) themselves.    Generally, having a single table with 3E8
>> rows will be faster than searching through 3E3 tables with 1E5 rows.    If
>> there is any way you can represent you data in a sane way to have larger
>> tables, I would recommend that you try this.
>>
>> The other option too is to simply have an initialization step where you
>> create the all of the tables and then another loop where you append to all
>> of them, rather than searching through 3000 tables 3000 times.   For
>> example:
>>
>> for i in range(3000):
>>     f.root.createTable("i" + str(i))
>>
>> for i in range(3000):
>>     tab = f.getNode("/i" + str(i))
>>     tab.append(...)
>>
>> In the above pseudocode, __contains__ is never called - let alone calling
>> it 3 times, like in your previous email.  In effect the time that you are
>> spending searching in your previous email is 3000 tables x 3000 loop
>> iterations times 3 if-else branches.    So you are automatically in a 9 -
>> 27 million iteration, just by the way you have been using contains.
>>
>> I really think that pre-creating the tables so that you *know* that they
>> are there and just have to get the nodes will be far faster for you.
>>
>> Be Well
>> Anthony
>>
>> On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jac...@gm...
>> > wrote:
>>
>>> Hello PyTables Users,
>>>
>>> I am asking this quick question because my application is currently
>>> horribly bottlenecking on these methods, all of which are called once
>>> before each Table.append(rows). The table writing on the other hand is
>>> much, much faster than the searching for the table.
>>>
>>> Any general discussion on this would be great. The current hierarchy
>>> consists of root leading to around 3000 nodes each of which have around
>>> 100000 rows.
>>>
>>> Thanks,
>>> Jacob
>>>
>>> --
>>> Jacob Bennett
>>> Massachusetts Institute of Technology
>>> Department of Electrical Engineering and Computer Science
>>> Class of 2014| ben...@mi...
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] How Fast is File.__contains, File.getNode, File.createTable?

From: Jacob B. <jac...@gm...> - 2012-06-28 15:41:37

Hey Anthony,

Awesome, I think I'm going to take your advice for aiming towards larger
tables. Just an inquiry though, let's say you keep track of a
dictionary/hashtable that maps node identifiers (keys) to instances of the
node object (values) which can be assigned during node creation. ie*
mydict['id'] = thisFile.createTable(params). I think this could actually
help get away from the expensive search calls. I'm still going to go with
larger tables though, since I have to read the data eventually.

Thanks Again For Your Time,
Jacob

On Thu, Jun 28, 2012 at 10:16 AM, Anthony Scopatz <sc...@gm...> wrote:

> Hi Jacob,
>
> This is not a solely PyTables issue.  As described the methods you mention
> all involve attribute (or metadata) access, which is notaoriously slow in
> HDF5.  Or rather, much slower that read/write from the datasets (Tables,
> Arrays) themselves.    Generally, having a single table with 3E8 rows will
> be faster than searching through 3E3 tables with 1E5 rows.    If there is
> any way you can represent you data in a sane way to have larger tables, I
> would recommend that you try this.
>
> The other option too is to simply have an initialization step where you
> create the all of the tables and then another loop where you append to all
> of them, rather than searching through 3000 tables 3000 times.   For
> example:
>
> for i in range(3000):
>     f.root.createTable("i" + str(i))
>
> for i in range(3000):
>     tab = f.getNode("/i" + str(i))
>     tab.append(...)
>
> In the above pseudocode, __contains__ is never called - let alone calling
> it 3 times, like in your previous email.  In effect the time that you are
> spending searching in your previous email is 3000 tables x 3000 loop
> iterations times 3 if-else branches.    So you are automatically in a 9 -
> 27 million iteration, just by the way you have been using contains.
>
> I really think that pre-creating the tables so that you *know* that they
> are there and just have to get the nodes will be far faster for you.
>
> Be Well
> Anthony
>
> On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jac...@gm...>wrote:
>
>> Hello PyTables Users,
>>
>> I am asking this quick question because my application is currently
>> horribly bottlenecking on these methods, all of which are called once
>> before each Table.append(rows). The table writing on the other hand is
>> much, much faster than the searching for the table.
>>
>> Any general discussion on this would be great. The current hierarchy
>> consists of root leading to around 3000 nodes each of which have around
>> 100000 rows.
>>
>> Thanks,
>> Jacob
>>
>> --
>> Jacob Bennett
>> Massachusetts Institute of Technology
>> Department of Electrical Engineering and Computer Science
>> Class of 2014| ben...@mi...
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>


-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] How Fast is File.__contains, File.getNode, File.createTable?

From: Anthony S. <sc...@gm...> - 2012-06-28 15:16:42

Hi Jacob,

This is not a solely PyTables issue.  As described the methods you mention
all involve attribute (or metadata) access, which is notaoriously slow in
HDF5.  Or rather, much slower that read/write from the datasets (Tables,
Arrays) themselves.    Generally, having a single table with 3E8 rows will
be faster than searching through 3E3 tables with 1E5 rows.    If there is
any way you can represent you data in a sane way to have larger tables, I
would recommend that you try this.

The other option too is to simply have an initialization step where you
create the all of the tables and then another loop where you append to all
of them, rather than searching through 3000 tables 3000 times.   For
example:

for i in range(3000):
    f.root.createTable("i" + str(i))

for i in range(3000):
    tab = f.getNode("/i" + str(i))
    tab.append(...)

In the above pseudocode, __contains__ is never called - let alone calling
it 3 times, like in your previous email.  In effect the time that you are
spending searching in your previous email is 3000 tables x 3000 loop
iterations times 3 if-else branches.    So you are automatically in a 9 -
27 million iteration, just by the way you have been using contains.

I really think that pre-creating the tables so that you *know* that they
are there and just have to get the nodes will be far faster for you.

Be Well
Anthony

On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jac...@gm...>wrote:

> Hello PyTables Users,
>
> I am asking this quick question because my application is currently
> horribly bottlenecking on these methods, all of which are called once
> before each Table.append(rows). The table writing on the other hand is
> much, much faster than the searching for the table.
>
> Any general discussion on this would be great. The current hierarchy
> consists of root leading to around 3000 nodes each of which have around
> 100000 rows.
>
> Thanks,
> Jacob
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] How Fast is File.__contains, File.getNode, File.createTable?

From: Jacob B. <jac...@gm...> - 2012-06-27 19:33:48

Hello PyTables Users,

I am asking this quick question because my application is currently
horribly bottlenecking on these methods, all of which are called once
before each Table.append(rows). The table writing on the other hand is
much, much faster than the searching for the table.

Any general discussion on this would be great. The current hierarchy
consists of root leading to around 3000 nodes each of which have around
100000 rows.

Thanks,
Jacob

-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Having Trouble With Threaded Writing to PyTables

From: Jacob B. <jac...@gm...> - 2012-06-27 17:01:21

Update:

It actually seems that I am bottlenecking in my performance before I even
get to writing data. It seems that my current search procedure is very
computationally ineffcient. I have pasted the active portion of my code,
and I have also attached the copy of my dataWrapper. I have narrowed the
problem down to this, now I just have to see what the problem is exactly.

I am currently iterating over large dictionary with keys as tuples
(tableloc, group1, group2, date(not important)) and values as the data that
is supposed to be loaded into the table.

Any help would be appreciated! Thanks!

Thanks,
Jacob

              if not(multiInstances):
                    if ("/" + mainTuple[1] + "/" + source + "/" + tick) in
openFi:
                        tableD = openFi.getNode("/" + mainTuple[1] + "/" +
source + "/" + tick)
                    elif ("/" + mainTuple[1] + "/" + source) in openFi:
                        group2 = openFi.getNode("/" + mainTuple[1] + "/" +
source)
                        tableD = openFi.createTable(group2, tick, Tick,
"Instrument", Filters(complevel = 2, complib = 'blosc'), 100000)
                    elif ("/" + mainTuple[1]) in openFi:
                        group1 = openFi.getNode("/" + mainTuple[1])
                        group2 = openFi.createGroup(group1, source, source)
                        tableD = openFi.createTable(group2, tick, Tick,
"Instrument", Filters(complevel = 2, complib = 'blosc'), 100000)
                    else:
                        group1 = openFi.createGroup("/", mainTuple[1],
mainTuple[1])
                        group2 = openFi.createGroup(group1, source, source)
                        tableD = openFi.createTable(group2, tick, Tick,
"Instrument", Filters(complevel = 2, complib = 'blosc'), 100000)
                    tableD.append(dataArray)
                    tableD.flush()

On Wed, Jun 27, 2012 at 10:39 AM, Jacob Bennett
<jac...@gm...>wrote:

> Sorry about that, I uploaded the code, but since it requires
> many dependencies, I was not expecting you to run it. That being said, I
> would say the expected number of rows per table is 100,000 and I am
> currently working on an intel xeon with 4 processors and 8 threads.
>
> I also found that pytables has a __contains__ method within it and
> therefore can make ease of the try-catch statements I had before.
>
> Thanks,
> Jacob
>
>
> On Tue, Jun 26, 2012 at 10:37 PM, Anthony Scopatz <sc...@gm...>wrote:
>
>> Hi Jacob,
>>
>> On Tue, Jun 26, 2012 at 5:35 PM, Jacob Bennett <jac...@gm...
>> > wrote:
>>
>>> Hello Anthony,
>>>
>>> With the above being said and with more work put on the initial attempt,
>>> could you suggest other methods that might improve write performance? The
>>> optimization page I feel mostly talks about data retrieval, which will be
>>> more important later on, but I have to make a writing bound before then.
>>> The only thing that I have done to improve performance is to use
>>> table.append(rows).
>>>
>>
>> Typically also having larger chunksizes will increase performance.
>>  Additionally, adding compression via filters may also increase write
>> speeds.  However, the exact strategy you take will depend on the size of
>> data that you are writing and the number of processors that you have.  Out
>> of curiosity, what are your data and nproc sizes?
>>
>>
>>> My updated code is attached to this email, thanks again!
>>>
>>
>> I tried running this but there was no TimeHandler module....  However, I
>> will note that this doesn't look like the most efficient code with all of
>> try-except blocks.  I think that hasattr() will work in a lot of these
>> cases for you.
>>
>> Be Well
>> Anthony
>>
>>
>>>
>>> Thanks,
>>> Jacob
>>>
>>>
>>> On Tue, Jun 26, 2012 at 12:56 AM, Anthony Scopatz <sc...@gm...>wrote:
>>>
>>>> Hello Jacob,
>>>>
>>>> This is not surprising.  The HDF5 parallel library requires MPI and
>>>> comes with some special restrictions (no compression on write).  As such,
>>>> the pain of implementing a parallel write version of PyTables has not been
>>>> worth it.  We certainly welcome pull requests and further discussion on
>>>> this issue ;).    Often times it is easier (and faster...writing is
>>>> expensive) to do the computation in parallel followed by a single write.
>>>>  Or you could have a dedicated thread which queues and executes write
>>>> commands as they come in.    Just some thoughts on how to avoid this
>>>> problem.
>>>>
>>>> Parallel reads are supported.
>>>>
>>>> Let me know if you have further questions or really want to dive into
>>>> this issue.
>>>>
>>>> Be Well
>>>> Anthony
>>>>
>>>>  On Mon, Jun 25, 2012 at 1:33 PM, Jacob Bennett <
>>>> jac...@gm...> wrote:
>>>>
>>>>>  Hello PyTables Users,
>>>>>
>>>>> I am very new to pytables and if you all could help me out, that would
>>>>> be splendid.
>>>>>
>>>>> I'm currently having trouble with writing to two separate HDF5 files
>>>>> using pytables. Each file itself is only accessible by a single thread, so
>>>>> there really shouldn't be any threading issues. When I run my python script
>>>>> however, it just seems to crash at random time intervals without any error
>>>>> messages received or exceptions thrown.
>>>>>
>>>>> I write data to the HDF5 files as follows. I have two HDF5 files that
>>>>> represent book snapshots and trade snapshots. The data of these snapshots
>>>>> come in the form of python dictionaries whose values are the data itself in
>>>>> an array. Two threads run on each file. One thread controls when to create
>>>>> new files and close others based upon the time of day while the other
>>>>> thread iterates over each key value pair in the dictionary and loads data
>>>>> to the file. When a thread has access to the file, the file is locked.
>>>>>
>>>>> I have my two datawrappers attached to the email. Please take a look
>>>>> at them. One thread runs acceptDict in a loop while the other runs
>>>>> changeFile in a loop. This is really frustrating when I don't get any
>>>>> errors and python just crashes unexpectedly.
>>>>>
>>>>> Thanks,
>>>>> Jacob
>>>>>
>>>>> --
>>>>> Jacob Bennett
>>>>> Massachusetts Institute of Technology
>>>>> Department of Electrical Engineering and Computer Science
>>>>>  Class of 2014| ben...@mi...
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Live Security Virtual Conference
>>>>> Exclusive live event will cover all the ways today's security and
>>>>> threat landscape has changed and how IT managers can respond.
>>>>> Discussions
>>>>> will include endpoint security, mobile security and the latest in
>>>>> malware
>>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>>> _______________________________________________
>>>>> Pytables-users mailing list
>>>>> Pyt...@li...
>>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>>
>>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond.
>>>> Discussions
>>>> will include endpoint security, mobile security and the latest in
>>>> malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Pytables-users mailing list
>>>> Pyt...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>
>>>>
>>>
>>>
>>> --
>>> Jacob Bennett
>>> Massachusetts Institute of Technology
>>> Department of Electrical Engineering and Computer Science
>>> Class of 2014| ben...@mi...
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>


-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] Having Trouble With Threaded Writing to PyTables

From: Jacob B. <jac...@gm...> - 2012-06-27 15:39:19

Sorry about that, I uploaded the code, but since it requires
many dependencies, I was not expecting you to run it. That being said, I
would say the expected number of rows per table is 100,000 and I am
currently working on an intel xeon with 4 processors and 8 threads.

I also found that pytables has a __contains__ method within it and
therefore can make ease of the try-catch statements I had before.

Thanks,
Jacob

On Tue, Jun 26, 2012 at 10:37 PM, Anthony Scopatz <sc...@gm...> wrote:

> Hi Jacob,
>
> On Tue, Jun 26, 2012 at 5:35 PM, Jacob Bennett <jac...@gm...>wrote:
>
>> Hello Anthony,
>>
>> With the above being said and with more work put on the initial attempt,
>> could you suggest other methods that might improve write performance? The
>> optimization page I feel mostly talks about data retrieval, which will be
>> more important later on, but I have to make a writing bound before then.
>> The only thing that I have done to improve performance is to use
>> table.append(rows).
>>
>
> Typically also having larger chunksizes will increase performance.
>  Additionally, adding compression via filters may also increase write
> speeds.  However, the exact strategy you take will depend on the size of
> data that you are writing and the number of processors that you have.  Out
> of curiosity, what are your data and nproc sizes?
>
>
>> My updated code is attached to this email, thanks again!
>>
>
> I tried running this but there was no TimeHandler module....  However, I
> will note that this doesn't look like the most efficient code with all of
> try-except blocks.  I think that hasattr() will work in a lot of these
> cases for you.
>
> Be Well
> Anthony
>
>
>>
>> Thanks,
>> Jacob
>>
>>
>> On Tue, Jun 26, 2012 at 12:56 AM, Anthony Scopatz <sc...@gm...>wrote:
>>
>>> Hello Jacob,
>>>
>>> This is not surprising.  The HDF5 parallel library requires MPI and
>>> comes with some special restrictions (no compression on write).  As such,
>>> the pain of implementing a parallel write version of PyTables has not been
>>> worth it.  We certainly welcome pull requests and further discussion on
>>> this issue ;).    Often times it is easier (and faster...writing is
>>> expensive) to do the computation in parallel followed by a single write.
>>>  Or you could have a dedicated thread which queues and executes write
>>> commands as they come in.    Just some thoughts on how to avoid this
>>> problem.
>>>
>>> Parallel reads are supported.
>>>
>>> Let me know if you have further questions or really want to dive into
>>> this issue.
>>>
>>> Be Well
>>> Anthony
>>>
>>>  On Mon, Jun 25, 2012 at 1:33 PM, Jacob Bennett <
>>> jac...@gm...> wrote:
>>>
>>>>  Hello PyTables Users,
>>>>
>>>> I am very new to pytables and if you all could help me out, that would
>>>> be splendid.
>>>>
>>>> I'm currently having trouble with writing to two separate HDF5 files
>>>> using pytables. Each file itself is only accessible by a single thread, so
>>>> there really shouldn't be any threading issues. When I run my python script
>>>> however, it just seems to crash at random time intervals without any error
>>>> messages received or exceptions thrown.
>>>>
>>>> I write data to the HDF5 files as follows. I have two HDF5 files that
>>>> represent book snapshots and trade snapshots. The data of these snapshots
>>>> come in the form of python dictionaries whose values are the data itself in
>>>> an array. Two threads run on each file. One thread controls when to create
>>>> new files and close others based upon the time of day while the other
>>>> thread iterates over each key value pair in the dictionary and loads data
>>>> to the file. When a thread has access to the file, the file is locked.
>>>>
>>>> I have my two datawrappers attached to the email. Please take a look at
>>>> them. One thread runs acceptDict in a loop while the other runs changeFile
>>>> in a loop. This is really frustrating when I don't get any errors and
>>>> python just crashes unexpectedly.
>>>>
>>>> Thanks,
>>>> Jacob
>>>>
>>>> --
>>>> Jacob Bennett
>>>> Massachusetts Institute of Technology
>>>> Department of Electrical Engineering and Computer Science
>>>>  Class of 2014| ben...@mi...
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Live Security Virtual Conference
>>>> Exclusive live event will cover all the ways today's security and
>>>> threat landscape has changed and how IT managers can respond.
>>>> Discussions
>>>> will include endpoint security, mobile security and the latest in
>>>> malware
>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>> _______________________________________________
>>>> Pytables-users mailing list
>>>> Pyt...@li...
>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>>
>>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> --
>> Jacob Bennett
>> Massachusetts Institute of Technology
>> Department of Electrical Engineering and Computer Science
>> Class of 2014| ben...@mi...
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>


-- 
Jacob Bennett
Massachusetts Institute of Technology
Department of Electrical Engineering and Computer Science
Class of 2014| ben...@mi...

Re: [Pytables-users] What is the result of calling craeteIndex() on multiple columns?

From: Aquil H. A. <aqu...@gm...> - 2012-06-27 13:55:58

Hello Francesc,  

Thank you for your response! I guess I need to read the User's Guide cover to cover.  

--  
Aquil H. Abdullah


On Wednesday, June 27, 2012 at 4:44 AM, Francesc Alted wrote:

> On 6/26/12 11:19 PM, Aquil H. Abdullah wrote:
> > Hello All,
> >  
> > In my newbist state, I called createIndex on two columns in one of my  
> > tables:
> >  
> > import tables
> > table_desc = {'timestamp':tables.Time32Col(),  
> > 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()}
> > h5f = tables.openFile('test.h5',mode='w')
> > group = h5f.createGroup('/','data')
> > table = h5f.createTable(group, 'test',table_desc,'Test Table')
> > table.cols.timestamp.createIndex()
> > table.cols.symbol.createIndex()
> > …
> >  
> > Now from what I've been able to find on the internet an index is only  
> > associated with one column:
> >  
> > class tables.Index
> > Represents the index of a column in a table.
> >  
> > This class is used to keep the indexing information for columns in a  
> > Table dataset (see The Table class). It is actually the descendant of the
> > Group class (see The Group class), with some added functionality. An  
> > Index is always associated with one and only one column in a table.
> >  
> > - PyTables 2.3.1 User's Guide - Library Reference/The Index Class  
> > http://pytables.github.com/usersguide/libref.html#indexclassdescr
> > - Efficient way to verify that records are unique in Python/PyTables  
> > http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables
> > - Hints For SQL Users (Creating an index)  
> > http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex
> >  
> > So how does PyTables interpret a table with multiple column indices?
>  
> If a table has multiple indices, PyTables will use its internal query  
> optimizer to try to use these in your queries. It is not always possible  
> for PyTables to use all indexes though. Please see:
>  
> http://pytables.github.com/usersguide/optimization.html#indexed-searches
>  
> for a series of examples where different indexes can be used.
>  
> > The best solution that I've found is creating a hash from the two  
> > fields that I am interested in indexing and then indexing that table  
> > on that hash.
> >  
>  
>  
> In case several indexes cannot be use in your case, that could be an  
> alternate solution for what you are trying to do, yes.
>  
> >  
> > The other solution would be to shard my data by symbol and then index  
> > each symbol table by timestamp.
> >  
>  
>  
> The range of possibilities is really large, yes, but I'd try to avoid  
> sharding because it is normally harder to setup and manage, but you are  
> indeed free to try whatever approaches you feel they are best for you.
>  
> HTH,
>  
> --  
> Francesc Alted
>  
>  
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and  
> threat landscape has changed and how IT managers can respond. Discussions  
> will include endpoint security, mobile security and the latest in malware  
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li... (mailto:Pyt...@li...)
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>  
>

Re: [Pytables-users] What is the result of calling craeteIndex() on multiple columns?

From: Francesc A. <fa...@py...> - 2012-06-27 08:43:59

On 6/26/12 11:19 PM, Aquil H. Abdullah wrote:
> Hello All,
>
> In my newbist state, I called createIndex on two columns in one of my 
> tables:
>
> import tables
> table_desc = {'timestamp':tables.Time32Col(), 
> 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()}
> h5f = tables.openFile('test.h5',mode='w')
> group = h5f.createGroup('/','data')
> table = h5f.createTable(group, 'test',table_desc,'Test Table')
> table.cols.timestamp.createIndex()
> table.cols.symbol.createIndex()
> …
>
> Now from what I've been able to find on the internet an index is only 
> associated with one column:
>
> class tables.Index
> Represents the index of a column in a table.
>
> This class is used to keep the indexing information for columns in a 
> Table dataset (see The Table class). It is actually the descendant of the
> Group class (see The Group class), with some added functionality. An 
> Index is always associated with one and only one column in a table.
>
> - PyTables 2.3.1 User's Guide - Library Reference/The Index Class 
> http://pytables.github.com/usersguide/libref.html#indexclassdescr
> - Efficient way to verify that records are unique in Python/PyTables 
> http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables
> - Hints For SQL Users (Creating an index) 
> http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex
>
> So how does PyTables interpret a table with multiple column indices?

If a table has multiple indices, PyTables will use its internal query 
optimizer to try to use these in your queries. It is not always possible 
for PyTables to use all indexes though. Please see:

http://pytables.github.com/usersguide/optimization.html#indexed-searches

for a series of examples where different indexes can be used.

> The best solution that I've found is creating a hash from the two 
> fields that I am interested in indexing and then indexing that table 
> on that hash.

In case several indexes cannot be use in your case, that could be an 
alternate solution for what you are trying to do, yes.

>
> The other solution would be to shard my data by symbol and then index 
> each symbol table by timestamp.

The range of possibilities is really large, yes, but I'd try to avoid 
sharding because it is normally harder to setup and manage, but you are 
indeed free to try whatever approaches you feel they are best for you.

HTH,

-- 
Francesc Alted

Re: [Pytables-users] Having Trouble With Threaded Writing to PyTables

From: Anthony S. <sc...@gm...> - 2012-06-27 03:38:26

Hi Jacob,

On Tue, Jun 26, 2012 at 5:35 PM, Jacob Bennett <jac...@gm...>wrote:

> Hello Anthony,
>
> With the above being said and with more work put on the initial attempt,
> could you suggest other methods that might improve write performance? The
> optimization page I feel mostly talks about data retrieval, which will be
> more important later on, but I have to make a writing bound before then.
> The only thing that I have done to improve performance is to use
> table.append(rows).
>

Typically also having larger chunksizes will increase performance.
 Additionally, adding compression via filters may also increase write
speeds.  However, the exact strategy you take will depend on the size of
data that you are writing and the number of processors that you have.  Out
of curiosity, what are your data and nproc sizes?


> My updated code is attached to this email, thanks again!
>

I tried running this but there was no TimeHandler module....  However, I
will note that this doesn't look like the most efficient code with all of
try-except blocks.  I think that hasattr() will work in a lot of these
cases for you.

Be Well
Anthony


>
> Thanks,
> Jacob
>
>
> On Tue, Jun 26, 2012 at 12:56 AM, Anthony Scopatz <sc...@gm...>wrote:
>
>> Hello Jacob,
>>
>> This is not surprising.  The HDF5 parallel library requires MPI and comes
>> with some special restrictions (no compression on write).  As such, the
>> pain of implementing a parallel write version of PyTables has not been
>> worth it.  We certainly welcome pull requests and further discussion on
>> this issue ;).    Often times it is easier (and faster...writing is
>> expensive) to do the computation in parallel followed by a single write.
>>  Or you could have a dedicated thread which queues and executes write
>> commands as they come in.    Just some thoughts on how to avoid this
>> problem.
>>
>> Parallel reads are supported.
>>
>> Let me know if you have further questions or really want to dive into
>> this issue.
>>
>> Be Well
>> Anthony
>>
>>  On Mon, Jun 25, 2012 at 1:33 PM, Jacob Bennett <
>> jac...@gm...> wrote:
>>
>>>  Hello PyTables Users,
>>>
>>> I am very new to pytables and if you all could help me out, that would
>>> be splendid.
>>>
>>> I'm currently having trouble with writing to two separate HDF5 files
>>> using pytables. Each file itself is only accessible by a single thread, so
>>> there really shouldn't be any threading issues. When I run my python script
>>> however, it just seems to crash at random time intervals without any error
>>> messages received or exceptions thrown.
>>>
>>> I write data to the HDF5 files as follows. I have two HDF5 files that
>>> represent book snapshots and trade snapshots. The data of these snapshots
>>> come in the form of python dictionaries whose values are the data itself in
>>> an array. Two threads run on each file. One thread controls when to create
>>> new files and close others based upon the time of day while the other
>>> thread iterates over each key value pair in the dictionary and loads data
>>> to the file. When a thread has access to the file, the file is locked.
>>>
>>> I have my two datawrappers attached to the email. Please take a look at
>>> them. One thread runs acceptDict in a loop while the other runs changeFile
>>> in a loop. This is really frustrating when I don't get any errors and
>>> python just crashes unexpectedly.
>>>
>>> Thanks,
>>> Jacob
>>>
>>> --
>>> Jacob Bennett
>>> Massachusetts Institute of Technology
>>> Department of Electrical Engineering and Computer Science
>>>  Class of 2014| ben...@mi...
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Pytables-users mailing list
>>> Pyt...@li...
>>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Pytables-users mailing list
>> Pyt...@li...
>> https://lists.sourceforge.net/lists/listinfo/pytables-users
>>
>>
>
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] What is the result of calling craeteIndex() on multiple columns?

From: Anthony S. <sc...@gm...> - 2012-06-26 21:30:43

On Tue, Jun 26, 2012 at 4:19 PM, Aquil H. Abdullah <aqu...@gm...
> wrote:

>  Hello All,
>
> In my newbist state, I called createIndex on two columns in one of my
> tables:
>
> import tables
> table_desc = {'timestamp':tables.Time32Col(),
> 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()}
> h5f = tables.openFile('test.h5',mode='w')
> group = h5f.createGroup('/','data')
> table = h5f.createTable(group, 'test',table_desc,'Test Table')
> table.cols.timestamp.createIndex()
> table.cols.symbol.createIndex()
> …
>
> Now from what I've been able to find on the internet an index is only
> associated with one column:
>
> class tables.Index
>     Represents the index of a column in a table.
>
>     This class is used to keep the indexing information for columns in a
> Table dataset (see The Table class). It is actually the descendant of the
>     Group class (see The Group class), with some added functionality. An
> Index is always associated with one and only one column in a table.
>
> - PyTables 2.3.1 User's Guide - Library Reference/The Index Class
> http://pytables.github.com/usersguide/libref.html#indexclassdescr
> - Efficient way to verify that records are unique in Python/PyTables
> http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables
> - Hints For SQL Users (Creating an index)
> http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex
>
> So how does PyTables interpret a table with multiple column indices?  The
> best solution that I've found is creating a hash from the two fields that I
> am interested in indexing and then indexing that table on that hash.
>
> The other solution would be to shard my data by symbol and then index each
> symbol table by timestamp.
>
> Can anyone explain what effect two index columns has on Pytables?
> Also, can anyone tell me if they've come up with a better solution for
> dealing with tables that require multiple indices than any that I've
> mentioned?
>

I don't have a lot of time right now, but maybe create a nested column or a
column with a compound data type that is just a tuple of the two data types
you are interested in.  Then index against the super column.  Storing a
hash in another column is probably not the greatest way to do this...

Hopefully someone else can jump in and answer this one.


>
> Regards,
>
> --
> Aquil H. Abdullah
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] What is the result of calling craeteIndex() on multiple columns?

From: Aquil H. A. <aqu...@gm...> - 2012-06-26 21:19:43

Hello All,  

In my newbist state, I called createIndex on two columns in one of my tables:

import tables
table_desc = {'timestamp':tables.Time32Col(), 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()}
h5f = tables.openFile('test.h5',mode='w')
group = h5f.createGroup('/','data')
table = h5f.createTable(group, 'test',table_desc,'Test Table')
table.cols.timestamp.createIndex()
table.cols.symbol.createIndex()
…

Now from what I've been able to find on the internet an index is only associated with one column:

class tables.Index
    Represents the index of a column in a table.

    This class is used to keep the indexing information for columns in a Table dataset (see The Table class). It is actually the descendant of the  
    Group class (see The Group class), with some added functionality. An Index is always associated with one and only one column in a table.

- PyTables 2.3.1 User's Guide - Library Reference/The Index Class http://pytables.github.com/usersguide/libref.html#indexclassdescr
- Efficient way to verify that records are unique in Python/PyTables http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables
- Hints For SQL Users (Creating an index) http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex

So how does PyTables interpret a table with multiple column indices?  The best solution that I've found is creating a hash from the two fields that I am interested in indexing and then indexing that table on that hash.

The other solution would be to shard my data by symbol and then index each symbol table by timestamp.

Can anyone explain what effect two index columns has on Pytables?
Also, can anyone tell me if they've come up with a better solution for dealing with tables that require multiple indices than any that I've mentioned?

Regards,

--  
Aquil H. Abdullah

Re: [Pytables-users] Having Trouble With Threaded Writing to PyTables

From: Anthony S. <sc...@gm...> - 2012-06-26 05:57:22

Hello Jacob,

This is not surprising.  The HDF5 parallel library requires MPI and comes
with some special restrictions (no compression on write).  As such, the
pain of implementing a parallel write version of PyTables has not been
worth it.  We certainly welcome pull requests and further discussion on
this issue ;).    Often times it is easier (and faster...writing is
expensive) to do the computation in parallel followed by a single write.
 Or you could have a dedicated thread which queues and executes write
commands as they come in.    Just some thoughts on how to avoid this
problem.

Parallel reads are supported.

Let me know if you have further questions or really want to dive into this
issue.

Be Well
Anthony

On Mon, Jun 25, 2012 at 1:33 PM, Jacob Bennett <jac...@gm...>wrote:

> Hello PyTables Users,
>
> I am very new to pytables and if you all could help me out, that would be
> splendid.
>
> I'm currently having trouble with writing to two separate HDF5 files using
> pytables. Each file itself is only accessible by a single thread, so there
> really shouldn't be any threading issues. When I run my python script
> however, it just seems to crash at random time intervals without any error
> messages received or exceptions thrown.
>
> I write data to the HDF5 files as follows. I have two HDF5 files that
> represent book snapshots and trade snapshots. The data of these snapshots
> come in the form of python dictionaries whose values are the data itself in
> an array. Two threads run on each file. One thread controls when to create
> new files and close others based upon the time of day while the other
> thread iterates over each key value pair in the dictionary and loads data
> to the file. When a thread has access to the file, the file is locked.
>
> I have my two datawrappers attached to the email. Please take a look at
> them. One thread runs acceptDict in a loop while the other runs changeFile
> in a loop. This is really frustrating when I don't get any errors and
> python just crashes unexpectedly.
>
> Thanks,
> Jacob
>
> --
> Jacob Bennett
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
>  Class of 2014| ben...@mi...
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

Re: [Pytables-users] createTable chokes on rich-dtype structured array

From: Antonio V. <ant...@ti...> - 2012-06-25 17:47:08

Hi Alvaro,
thank you for reporting.

I filed an issue on GitHub to track the problem:

https://github.com/PyTables/PyTables/issues/160

ciao

Il 25/06/2012 12:03, Alvaro Tejero Cantero ha scritto:
> Hi,
>
> In view of the upcoming release I thought I'd report this because at
> the time I cannot fix it myself:
>
> I am using a structured array with a dtype specified with the
> following numpy-accepted
> format (quotation follows from
> http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html):
>
> [(field_name, field_dtype, field_shape), ...]
>
> obj should be a list of fields where each field is described by a
> tuple of length 2 or 3. (Equivalent to the descr item in the
> __array_interface__ attribute.)
>
> The first element, field_name, is the field name (if this is '' then a
> standard field name, 'f#', is assigned). The field name may also be a
> 2-tuple of strings where the first string is either a “title” (which
> may be any string or unicode string) or meta-data for the field which
> can be any object, and the second string is the “name” which must be a
> valid Python identifier.
>
> This is my concrete example:
>
> header = [(('timestamp', 't'), 'u4'),
>                 (('unit (cluster) id', 'unit'),'u2')]
>
> This is what PyTables says upon passing either the structured array or
> np.dtype(header) to the createTables function:
>
>> test.createTable('/', 'spike', s, 'test')
>
> ---------------------------------------------------------------------------
> ValueError                                Traceback (most recent call last)
> /home/tejero/Dropbox/O/ridge/doc/<ipython-input-40-5fdbd9feb41d> in <module>()
> ----> 1 test.createTable('/', 'spike', s, 'test')
>
> /home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/file.pyc
> in createTable(self, where, name, description, title, filters,
> expectedrows, chunkshape, byteorder, createparents)
>      768                      description=description, title=title,
>      769                      filters=filters, expectedrows=expectedrows,
> --> 770                      chunkshape=chunkshape, byteorder=byteorder)
>      771
>      772
>
> /home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/table.pyc
> in __init__(self, parentNode, name, description, title, filters,
> expectedrows, chunkshape, byteorder, _log)
>      805                     self._v_recarray = nparray
>      806                 self.description, self._rabyteorder = \
> --> 807                                   descr_from_dtype(nparray.dtype)
>      808
>      809         # No description yet?
>
>
> /home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/description.pyc
> in descr_from_dtype(dtype_)
>      723     fields = {}
>      724     fbyteorder = '|'
> --> 725     for (name, (dtype, pos)) in dtype_.fields.items():
>      726         kind = dtype.base.kind
>      727         byteorder = dtype.base.byteorder
>
> ValueError: too many values to unpack
>
> -á.
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>


-- 
Antonio Valentino

Re: [Pytables-users] closing an erroneous file

From: Mythsmith <sp...@mo...> - 2012-06-25 15:29:39

Attachments: test_reopening.py

Done:
https://github.com/PyTables/PyTables/issues/159
I did not understand how to attach a file to a github issue.
Anyway, I produced a pastebin address and attach the unittest here.

Regards,
Daniele

Il 25/06/2012 09:35, Antonio Valentino ha scritto:
> Ciao Daniele,
>
> Il giorno Mon, 25 Jun 2012 09:17:00 +0200
> Mythsmith <sp...@mo...> ha scritto:
>
>> Hi Anthony,
>> Shouldn't the close() method also clear the cache? I think a file
>> should be either opened or closed... Should I file a bug report?
>> Best regards,
>> Daniele
>>
> The close method also remover the file from the cache if there are no
> more references to it
>
> https://github.com/PyTables/PyTables/blob/6fccb7495ba1bc758c7b04960fe1cd392abe9b96/tables/file.py#L2098
>
> Anyway yes, if you have some problem with the file caching system
> please file a bug report on github.
>
> Of course test scripts or patches are very welcome.
>
> ciao
>
>> Il 21/06/2012 19:23, Anthony Scopatz ha scritto:
>>> Hi Daniele,
>>>
>>> This is probably because of the way PyTables caches its file
>>> objects. As a temporary work around, why don't you try clearing the
>>> cache or at least removing this file.  The cache is just a
>>> dictionary and it is located at "tables.file._open_files".   ie try:
>>>
>>> tables.file._open_files.clear()
>>> -or-
>>> del tables.file._open_files.pop["touch.h5"]
>>>
>>> Be Well
>>> Anthony
>>>
>>> On Thu, Jun 21, 2012 at 10:43 AM, Mythsmith <sp...@mo...
>>> <mailto:sp...@mo...>> wrote:
>>>
>>>      Hi,
>>>      I noticed that if I open an erroneous file (eg: empty), then it
>>>      seems not possible to completely close it and reopen the same
>>>      path, even if a valid file was created in the meanwhile.
>>>      The error is:
>>>      ValueError: The file 'touch.h5' is already opened.  Please close
>>>      it before reopening in write mode.
>>>
>>>      You find a complete example attached.
>>>
>>>      Regards,
>>>      daniele
>
>

[Pytables-users] createTable chokes on rich-dtype structured array

From: Alvaro T. C. <al...@mi...> - 2012-06-25 10:03:38

Hi,

In view of the upcoming release I thought I'd report this because at
the time I cannot fix it myself:

I am using a structured array with a dtype specified with the
following numpy-accepted
format (quotation follows from
http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html):

[(field_name, field_dtype, field_shape), ...]

obj should be a list of fields where each field is described by a
tuple of length 2 or 3. (Equivalent to the descr item in the
__array_interface__ attribute.)

The first element, field_name, is the field name (if this is '' then a
standard field name, 'f#', is assigned). The field name may also be a
2-tuple of strings where the first string is either a “title” (which
may be any string or unicode string) or meta-data for the field which
can be any object, and the second string is the “name” which must be a
valid Python identifier.

This is my concrete example:

header = [(('timestamp', 't'), 'u4'),
               (('unit (cluster) id', 'unit'),'u2')]

This is what PyTables says upon passing either the structured array or
np.dtype(header) to the createTables function:

> test.createTable('/', 'spike', s, 'test')

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/home/tejero/Dropbox/O/ridge/doc/<ipython-input-40-5fdbd9feb41d> in <module>()
----> 1 test.createTable('/', 'spike', s, 'test')

/home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/file.pyc
in createTable(self, where, name, description, title, filters,
expectedrows, chunkshape, byteorder, createparents)
    768                      description=description, title=title,
    769                      filters=filters, expectedrows=expectedrows,
--> 770                      chunkshape=chunkshape, byteorder=byteorder)
    771
    772

/home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/table.pyc
in __init__(self, parentNode, name, description, title, filters,
expectedrows, chunkshape, byteorder, _log)
    805                     self._v_recarray = nparray
    806                 self.description, self._rabyteorder = \
--> 807                                   descr_from_dtype(nparray.dtype)
    808
    809         # No description yet?


/home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/description.pyc
in descr_from_dtype(dtype_)
    723     fields = {}
    724     fbyteorder = '|'
--> 725     for (name, (dtype, pos)) in dtype_.fields.items():
    726         kind = dtype.base.kind
    727         byteorder = dtype.base.byteorder

ValueError: too many values to unpack

-á.

Re: [Pytables-users] closing an erroneous file

From: Antonio V. <ant...@ti...> - 2012-06-25 07:53:23

Ciao Daniele,

Il giorno Mon, 25 Jun 2012 09:17:00 +0200
Mythsmith <sp...@mo...> ha scritto:

> Hi Anthony,
> Shouldn't the close() method also clear the cache? I think a file
> should be either opened or closed... Should I file a bug report?
> Best regards,
> Daniele
> 

The close method also remover the file from the cache if there are no
more references to it

https://github.com/PyTables/PyTables/blob/6fccb7495ba1bc758c7b04960fe1cd392abe9b96/tables/file.py#L2098

Anyway yes, if you have some problem with the file caching system
please file a bug report on github.

Of course test scripts or patches are very welcome.

ciao

> Il 21/06/2012 19:23, Anthony Scopatz ha scritto:
> > Hi Daniele,
> >
> > This is probably because of the way PyTables caches its file
> > objects. As a temporary work around, why don't you try clearing the
> > cache or at least removing this file.  The cache is just a
> > dictionary and it is located at "tables.file._open_files".   ie try:
> >
> > tables.file._open_files.clear()
> > -or-
> > del tables.file._open_files.pop["touch.h5"]
> >
> > Be Well
> > Anthony
> >
> > On Thu, Jun 21, 2012 at 10:43 AM, Mythsmith <sp...@mo... 
> > <mailto:sp...@mo...>> wrote:
> >
> >     Hi,
> >     I noticed that if I open an erroneous file (eg: empty), then it
> >     seems not possible to completely close it and reopen the same
> >     path, even if a valid file was created in the meanwhile.
> >     The error is:
> >     ValueError: The file 'touch.h5' is already opened.  Please close
> >     it before reopening in write mode.
> >
> >     You find a complete example attached.
> >
> >     Regards,
> >     daniele



-- 
Antonio Valentino

Re: [Pytables-users] closing an erroneous file

From: Mythsmith <sp...@mo...> - 2012-06-25 07:17:15

Hi Anthony,
Shouldn't the close() method also clear the cache? I think a file should 
be either opened or closed... Should I file a bug report?
Best regards,
Daniele

Il 21/06/2012 19:23, Anthony Scopatz ha scritto:
> Hi Daniele,
>
> This is probably because of the way PyTables caches its file objects. 
>  As a temporary work around, why don't you try clearing the cache or 
> at least removing this file.  The cache is just a dictionary and it is 
> located at "tables.file._open_files".   ie try:
>
> tables.file._open_files.clear()
> -or-
> del tables.file._open_files.pop["touch.h5"]
>
> Be Well
> Anthony
>
> On Thu, Jun 21, 2012 at 10:43 AM, Mythsmith <sp...@mo... 
> <mailto:sp...@mo...>> wrote:
>
>     Hi,
>     I noticed that if I open an erroneous file (eg: empty), then it
>     seems not possible to completely close it and reopen the same
>     path, even if a valid file was created in the meanwhile.
>     The error is:
>     ValueError: The file 'touch.h5' is already opened.  Please close
>     it before reopening in write mode.
>
>     You find a complete example attached.
>
>     Regards,
>     daniele
>
>     ------------------------------------------------------------------------------
>     Live Security Virtual Conference
>     Exclusive live event will cover all the ways today's security and
>     threat landscape has changed and how IT managers can respond.
>     Discussions
>     will include endpoint security, mobile security and the latest in
>     malware
>     threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>     _______________________________________________
>     Pytables-users mailing list
>     Pyt...@li...
>     <mailto:Pyt...@li...>
>     https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users

Re: [Pytables-users] closing an erroneous file

From: Antonio V. <ant...@ti...> - 2012-06-23 18:12:18

Ciao Daniele,

Il 21/06/2012 17:43, Mythsmith ha scritto:
> Hi,
> I noticed that if I open an erroneous file (eg: empty), then it seems
> not possible to completely close it and reopen the same path, even if a
> valid file was created in the meanwhile.
> The error is:
> ValueError: The file 'touch.h5' is already opened.  Please close it
> before reopening in write mode.
>
> You find a complete example attached.
>
> Regards,
> daniele
>

Thank you for reporting.
The issue has already been fixed in the development branch an it should 
be available in PyTables 2.4.

I filed a ticket on GitHub 
(https://github.com/PyTables/PyTables/issues/158) to track the issue.


ciao

>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>
>
>
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>


-- 
Antonio Valentino

Re: [Pytables-users] segmentation fault on Mac OS X Lion when running tables.test()

From: Josh M. <jos...@gm...> - 2012-06-22 14:47:09

On Jun 19, 2012, at 4:37 AM, David Donovan wrote:

> Hi Anthony,
> 
> Thanks for the response.  Installed HDF5 1.8.9 using the following
> flags for configure.
> 
> `./configure --prefix=/usr/local
> --with-szlib=/Library/Frameworks/Python.framework/Versions/Current
> CPPFLAGS=-I/Library/Frameworks/Python.framework/Versions/Current/include
>    LDFLAGS=-L/Library/Frameworks/Python.framework/Versions/Current/lib`
> 
> Also, I had to modify the optimization flag for gcc-4 in order to pass
> the make check part (as noted on the HDF5 page).
> `Conversion Tests fail on Mac OS X 10.7 (Lion) Users have reported
> that when building HDF5, the conversion tests failed (make check) in
> dt_arith.chk. A workaround is to edit ./< HDF5source
>> /config/gnu-flags, search for PROD_CFLAGS under "gcc-4.*", and change
> the value for PROD_CFLAGS to "-O0".`
> 
> Then:
> 'make'
> 'make check'
> 'sudo make install'
> 
> Is there a better way?  Is tables somehow having a hard time finding
> the HDF5 library do you think?


Hi David,

I always use homebrew for any of my prerequisites:

  http://mxcl.github.com/homebrew/


Cheers
~Josh


> Thanks!
> 
> Best Regards,
> David
> 
> 
> 
> On Sat, Jun 16, 2012 at 12:57 AM, Anthony Scopatz <sc...@gm...> wrote:
>> Hi David,
>> 
>> How did you build / install HDF5?
>> 
>> Be Well
>> Anthony
>> 
>> On Fri, Jun 15, 2012 at 7:14 PM, David Donovan <don...@gm...> wrote:
>>> 
>>> Hi Everyone,
>>> 
>>> I am having problems running the tests for PyTables on Mac OS X Lion.
>>> I have tried HDF5 version 1.8.5 as well, but I still get the same issue.
>>> 
>>> Any thoughts would be helpful...
>>> 
>>> Thanks for any help you can provide.
>>> 
>>> Best Regards,
>>> David Donovan
>>> 
>>> 
>>> impPython 2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
>>> [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on
>>> darwin
>>> Type "help", "copyright", "credits" or "license" for more information.
>>>>>> import tables
>>> table>>> tables.test()
>>> 
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>>> PyTables version:  2.3.1
>>> HDF5 version:      1.8.9
>>> NumPy version:     1.7.0.dev-0c5f480
>>> Numexpr version:   2.0.1 (not using Intel's VML/MKL)
>>> Zlib version:      1.2.5 (in Python interpreter)
>>> LZO version:       2.06 (Aug 12 2011)
>>> BZIP2 version:     1.0.6 (6-Sept-2010)
>>> Blosc version:     1.1.2 (2010-11-04)
>>> Cython version:    0.16
>>> Python version:    2.7.1 (r271:86832, Jul 31 2011, 19:30:53)
>>> [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)]
>>> Platform:          darwin-i386
>>> Byte-ordering:     little
>>> Detected cores:    2
>>> 
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>>> Performing only a light (yet comprehensive) subset of the test suite.
>>> If you want a more complete test, try passing the --heavy flag to this
>>> script
>>> (or set the 'heavy' parameter in case you are using tables.test() call).
>>> The whole suite will take more than 4 hours to complete on a relatively
>>> modern CPU and around 512 MB of main memory.
>>> 
>>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
>>> 
>>> .....................................................................................................................................FSegmentation
>>> fault: 11
>>>

Re: [Pytables-users] closing an erroneous file

From: Anthony S. <sc...@gm...> - 2012-06-21 17:23:57

Hi Daniele,

This is probably because of the way PyTables caches its file objects.  As a
temporary work around, why don't you try clearing the cache or at least
removing this file.  The cache is just a dictionary and it is located at
"tables.file._open_files".   ie try:

tables.file._open_files.clear()
-or-
del tables.file._open_files.pop["touch.h5"]

Be Well
Anthony

On Thu, Jun 21, 2012 at 10:43 AM, Mythsmith <sp...@mo...> wrote:

> Hi,
> I noticed that if I open an erroneous file (eg: empty), then it seems not
> possible to completely close it and reopen the same path, even if a valid
> file was created in the meanwhile.
> The error is:
> ValueError: The file 'touch.h5' is already opened.  Please close it before
> reopening in write mode.
>
> You find a complete example attached.
>
> Regards,
> daniele
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Pytables-users mailing list
> Pyt...@li...
> https://lists.sourceforge.net/lists/listinfo/pytables-users
>
>

[Pytables-users] closing an erroneous file

From: Mythsmith <sp...@mo...> - 2012-06-21 15:44:08

Attachments: emptybug.py

Hi,
I noticed that if I open an erroneous file (eg: empty), then it seems 
not possible to completely close it and reopen the same path, even if a 
valid file was created in the meanwhile.
The error is:
ValueError: The file 'touch.h5' is already opened.  Please close it 
before reopening in write mode.

You find a complete example attached.

Regards,
daniele

22 messages has been excluded from this view by a project administrator.

Flat | Threaded

<< < 1 .. 23 24 25 26 27 .. 165 > >> (Page 25 of 165)