You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Anthony S. <sc...@gm...> - 2012-06-28 18:25:38
|
Hmmm Ok. Maybe there needs to be a recarray flavor. I kind of like just returning a normal ndarray, though I see your argument for returning a recarray. Maybe some of the other devs can jump in here with an opinion. Be Well Anthony On Thu, Jun 28, 2012 at 12:37 PM, Alvaro Tejero Cantero <al...@mi...>wrote: > I just tested: passing an object of type numpy.core.records.recarray > to the constructor of createTable and then reading back it into memory > via slicing (h5f.root.myobj[:] ) returns to me a numpy.ndarray. > > Best, > > -á. > > > On Thu, Jun 28, 2012 at 5:30 PM, Anthony Scopatz <sc...@gm...> > wrote: > > Hi Alvaro, > > > > I think if you save the table as a record array, it should return you a > > record array. Or does it return a structured array? Have you tried > this? > > > > Be Well > > Anthony > > > > On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero <al...@mi... > > > > wrote: > >> > >> Hi, > >> > >> I've noticed that tables are loaded in memory as structured arrays. > >> > >> It seems that returning recarrays by default would be much in the > >> spirit of the natural naming preferences of PyTables. > >> > >> Is there a reason not to do so? > >> > >> Cheers, > >> > >> Álvaro. > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Alvaro T. C. <al...@mi...> - 2012-06-28 17:38:21
|
I just tested: passing an object of type numpy.core.records.recarray to the constructor of createTable and then reading back it into memory via slicing (h5f.root.myobj[:] ) returns to me a numpy.ndarray. Best, -á. On Thu, Jun 28, 2012 at 5:30 PM, Anthony Scopatz <sc...@gm...> wrote: > Hi Alvaro, > > I think if you save the table as a record array, it should return you a > record array. Or does it return a structured array? Have you tried this? > > Be Well > Anthony > > On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero <al...@mi...> > wrote: >> >> Hi, >> >> I've noticed that tables are loaded in memory as structured arrays. >> >> It seems that returning recarrays by default would be much in the >> spirit of the natural naming preferences of PyTables. >> >> Is there a reason not to do so? >> >> Cheers, >> >> Álvaro. >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Anthony S. <sc...@gm...> - 2012-06-28 16:30:37
|
Hi Alvaro, I think if you save the table as a record array, it should return you a record array. Or does it return a structured array? Have you tried this? Be Well Anthony On Thu, Jun 28, 2012 at 11:22 AM, Alvaro Tejero Cantero <al...@mi...>wrote: > Hi, > > I've noticed that tables are loaded in memory as structured arrays. > > It seems that returning recarrays by default would be much in the > spirit of the natural naming preferences of PyTables. > > Is there a reason not to do so? > > Cheers, > > Álvaro. > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Alvaro T. C. <al...@mi...> - 2012-06-28 16:23:21
|
Hi, I've noticed that tables are loaded in memory as structured arrays. It seems that returning recarrays by default would be much in the spirit of the natural naming preferences of PyTables. Is there a reason not to do so? Cheers, Álvaro. |
From: Anthony S. <sc...@gm...> - 2012-06-28 15:57:19
|
On Thu, Jun 28, 2012 at 10:41 AM, Jacob Bennett <jac...@gm...>wrote: > Hey Anthony, > > Awesome, I think I'm going to take your advice for aiming towards larger > tables. Just an inquiry though, let's say you keep track of a > dictionary/hashtable that maps node identifiers (keys) to instances of the > node object (values) which can be assigned during node creation. ie* > mydict['id'] = thisFile.createTable(params). I think this could actually > help get away from the expensive search calls. Yup. This would probably help a lot. I hadn't even considered it. I guess you learn something new everyday ;) > I'm still going to go with larger tables though, since I have to read the > data eventually. Sounds good! Fee free to ask further questions here! Be Well Anthony > > Thanks Again For Your Time, > Jacob > > > On Thu, Jun 28, 2012 at 10:16 AM, Anthony Scopatz <sc...@gm...>wrote: > >> Hi Jacob, >> >> This is not a solely PyTables issue. As described the methods you >> mention all involve attribute (or metadata) access, which is notaoriously >> slow in HDF5. Or rather, much slower that read/write from the datasets >> (Tables, Arrays) themselves. Generally, having a single table with 3E8 >> rows will be faster than searching through 3E3 tables with 1E5 rows. If >> there is any way you can represent you data in a sane way to have larger >> tables, I would recommend that you try this. >> >> The other option too is to simply have an initialization step where you >> create the all of the tables and then another loop where you append to all >> of them, rather than searching through 3000 tables 3000 times. For >> example: >> >> for i in range(3000): >> f.root.createTable("i" + str(i)) >> >> for i in range(3000): >> tab = f.getNode("/i" + str(i)) >> tab.append(...) >> >> In the above pseudocode, __contains__ is never called - let alone calling >> it 3 times, like in your previous email. In effect the time that you are >> spending searching in your previous email is 3000 tables x 3000 loop >> iterations times 3 if-else branches. So you are automatically in a 9 - >> 27 million iteration, just by the way you have been using contains. >> >> I really think that pre-creating the tables so that you *know* that they >> are there and just have to get the nodes will be far faster for you. >> >> Be Well >> Anthony >> >> On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jac...@gm... >> > wrote: >> >>> Hello PyTables Users, >>> >>> I am asking this quick question because my application is currently >>> horribly bottlenecking on these methods, all of which are called once >>> before each Table.append(rows). The table writing on the other hand is >>> much, much faster than the searching for the table. >>> >>> Any general discussion on this would be great. The current hierarchy >>> consists of root leading to around 3000 nodes each of which have around >>> 100000 rows. >>> >>> Thanks, >>> Jacob >>> >>> -- >>> Jacob Bennett >>> Massachusetts Institute of Technology >>> Department of Electrical Engineering and Computer Science >>> Class of 2014| ben...@mi... >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jacob B. <jac...@gm...> - 2012-06-28 15:41:37
|
Hey Anthony, Awesome, I think I'm going to take your advice for aiming towards larger tables. Just an inquiry though, let's say you keep track of a dictionary/hashtable that maps node identifiers (keys) to instances of the node object (values) which can be assigned during node creation. ie* mydict['id'] = thisFile.createTable(params). I think this could actually help get away from the expensive search calls. I'm still going to go with larger tables though, since I have to read the data eventually. Thanks Again For Your Time, Jacob On Thu, Jun 28, 2012 at 10:16 AM, Anthony Scopatz <sc...@gm...> wrote: > Hi Jacob, > > This is not a solely PyTables issue. As described the methods you mention > all involve attribute (or metadata) access, which is notaoriously slow in > HDF5. Or rather, much slower that read/write from the datasets (Tables, > Arrays) themselves. Generally, having a single table with 3E8 rows will > be faster than searching through 3E3 tables with 1E5 rows. If there is > any way you can represent you data in a sane way to have larger tables, I > would recommend that you try this. > > The other option too is to simply have an initialization step where you > create the all of the tables and then another loop where you append to all > of them, rather than searching through 3000 tables 3000 times. For > example: > > for i in range(3000): > f.root.createTable("i" + str(i)) > > for i in range(3000): > tab = f.getNode("/i" + str(i)) > tab.append(...) > > In the above pseudocode, __contains__ is never called - let alone calling > it 3 times, like in your previous email. In effect the time that you are > spending searching in your previous email is 3000 tables x 3000 loop > iterations times 3 if-else branches. So you are automatically in a 9 - > 27 million iteration, just by the way you have been using contains. > > I really think that pre-creating the tables so that you *know* that they > are there and just have to get the nodes will be far faster for you. > > Be Well > Anthony > > On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jac...@gm...>wrote: > >> Hello PyTables Users, >> >> I am asking this quick question because my application is currently >> horribly bottlenecking on these methods, all of which are called once >> before each Table.append(rows). The table writing on the other hand is >> much, much faster than the searching for the table. >> >> Any general discussion on this would be great. The current hierarchy >> consists of root leading to around 3000 nodes each of which have around >> 100000 rows. >> >> Thanks, >> Jacob >> >> -- >> Jacob Bennett >> Massachusetts Institute of Technology >> Department of Electrical Engineering and Computer Science >> Class of 2014| ben...@mi... >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Anthony S. <sc...@gm...> - 2012-06-28 15:16:42
|
Hi Jacob, This is not a solely PyTables issue. As described the methods you mention all involve attribute (or metadata) access, which is notaoriously slow in HDF5. Or rather, much slower that read/write from the datasets (Tables, Arrays) themselves. Generally, having a single table with 3E8 rows will be faster than searching through 3E3 tables with 1E5 rows. If there is any way you can represent you data in a sane way to have larger tables, I would recommend that you try this. The other option too is to simply have an initialization step where you create the all of the tables and then another loop where you append to all of them, rather than searching through 3000 tables 3000 times. For example: for i in range(3000): f.root.createTable("i" + str(i)) for i in range(3000): tab = f.getNode("/i" + str(i)) tab.append(...) In the above pseudocode, __contains__ is never called - let alone calling it 3 times, like in your previous email. In effect the time that you are spending searching in your previous email is 3000 tables x 3000 loop iterations times 3 if-else branches. So you are automatically in a 9 - 27 million iteration, just by the way you have been using contains. I really think that pre-creating the tables so that you *know* that they are there and just have to get the nodes will be far faster for you. Be Well Anthony On Wed, Jun 27, 2012 at 2:33 PM, Jacob Bennett <jac...@gm...>wrote: > Hello PyTables Users, > > I am asking this quick question because my application is currently > horribly bottlenecking on these methods, all of which are called once > before each Table.append(rows). The table writing on the other hand is > much, much faster than the searching for the table. > > Any general discussion on this would be great. The current hierarchy > consists of root leading to around 3000 nodes each of which have around > 100000 rows. > > Thanks, > Jacob > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jacob B. <jac...@gm...> - 2012-06-27 19:33:48
|
Hello PyTables Users, I am asking this quick question because my application is currently horribly bottlenecking on these methods, all of which are called once before each Table.append(rows). The table writing on the other hand is much, much faster than the searching for the table. Any general discussion on this would be great. The current hierarchy consists of root leading to around 3000 nodes each of which have around 100000 rows. Thanks, Jacob -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Jacob B. <jac...@gm...> - 2012-06-27 17:01:21
|
Update: It actually seems that I am bottlenecking in my performance before I even get to writing data. It seems that my current search procedure is very computationally ineffcient. I have pasted the active portion of my code, and I have also attached the copy of my dataWrapper. I have narrowed the problem down to this, now I just have to see what the problem is exactly. I am currently iterating over large dictionary with keys as tuples (tableloc, group1, group2, date(not important)) and values as the data that is supposed to be loaded into the table. Any help would be appreciated! Thanks! Thanks, Jacob if not(multiInstances): if ("/" + mainTuple[1] + "/" + source + "/" + tick) in openFi: tableD = openFi.getNode("/" + mainTuple[1] + "/" + source + "/" + tick) elif ("/" + mainTuple[1] + "/" + source) in openFi: group2 = openFi.getNode("/" + mainTuple[1] + "/" + source) tableD = openFi.createTable(group2, tick, Tick, "Instrument", Filters(complevel = 2, complib = 'blosc'), 100000) elif ("/" + mainTuple[1]) in openFi: group1 = openFi.getNode("/" + mainTuple[1]) group2 = openFi.createGroup(group1, source, source) tableD = openFi.createTable(group2, tick, Tick, "Instrument", Filters(complevel = 2, complib = 'blosc'), 100000) else: group1 = openFi.createGroup("/", mainTuple[1], mainTuple[1]) group2 = openFi.createGroup(group1, source, source) tableD = openFi.createTable(group2, tick, Tick, "Instrument", Filters(complevel = 2, complib = 'blosc'), 100000) tableD.append(dataArray) tableD.flush() On Wed, Jun 27, 2012 at 10:39 AM, Jacob Bennett <jac...@gm...>wrote: > Sorry about that, I uploaded the code, but since it requires > many dependencies, I was not expecting you to run it. That being said, I > would say the expected number of rows per table is 100,000 and I am > currently working on an intel xeon with 4 processors and 8 threads. > > I also found that pytables has a __contains__ method within it and > therefore can make ease of the try-catch statements I had before. > > Thanks, > Jacob > > > On Tue, Jun 26, 2012 at 10:37 PM, Anthony Scopatz <sc...@gm...>wrote: > >> Hi Jacob, >> >> On Tue, Jun 26, 2012 at 5:35 PM, Jacob Bennett <jac...@gm... >> > wrote: >> >>> Hello Anthony, >>> >>> With the above being said and with more work put on the initial attempt, >>> could you suggest other methods that might improve write performance? The >>> optimization page I feel mostly talks about data retrieval, which will be >>> more important later on, but I have to make a writing bound before then. >>> The only thing that I have done to improve performance is to use >>> table.append(rows). >>> >> >> Typically also having larger chunksizes will increase performance. >> Additionally, adding compression via filters may also increase write >> speeds. However, the exact strategy you take will depend on the size of >> data that you are writing and the number of processors that you have. Out >> of curiosity, what are your data and nproc sizes? >> >> >>> My updated code is attached to this email, thanks again! >>> >> >> I tried running this but there was no TimeHandler module.... However, I >> will note that this doesn't look like the most efficient code with all of >> try-except blocks. I think that hasattr() will work in a lot of these >> cases for you. >> >> Be Well >> Anthony >> >> >>> >>> Thanks, >>> Jacob >>> >>> >>> On Tue, Jun 26, 2012 at 12:56 AM, Anthony Scopatz <sc...@gm...>wrote: >>> >>>> Hello Jacob, >>>> >>>> This is not surprising. The HDF5 parallel library requires MPI and >>>> comes with some special restrictions (no compression on write). As such, >>>> the pain of implementing a parallel write version of PyTables has not been >>>> worth it. We certainly welcome pull requests and further discussion on >>>> this issue ;). Often times it is easier (and faster...writing is >>>> expensive) to do the computation in parallel followed by a single write. >>>> Or you could have a dedicated thread which queues and executes write >>>> commands as they come in. Just some thoughts on how to avoid this >>>> problem. >>>> >>>> Parallel reads are supported. >>>> >>>> Let me know if you have further questions or really want to dive into >>>> this issue. >>>> >>>> Be Well >>>> Anthony >>>> >>>> On Mon, Jun 25, 2012 at 1:33 PM, Jacob Bennett < >>>> jac...@gm...> wrote: >>>> >>>>> Hello PyTables Users, >>>>> >>>>> I am very new to pytables and if you all could help me out, that would >>>>> be splendid. >>>>> >>>>> I'm currently having trouble with writing to two separate HDF5 files >>>>> using pytables. Each file itself is only accessible by a single thread, so >>>>> there really shouldn't be any threading issues. When I run my python script >>>>> however, it just seems to crash at random time intervals without any error >>>>> messages received or exceptions thrown. >>>>> >>>>> I write data to the HDF5 files as follows. I have two HDF5 files that >>>>> represent book snapshots and trade snapshots. The data of these snapshots >>>>> come in the form of python dictionaries whose values are the data itself in >>>>> an array. Two threads run on each file. One thread controls when to create >>>>> new files and close others based upon the time of day while the other >>>>> thread iterates over each key value pair in the dictionary and loads data >>>>> to the file. When a thread has access to the file, the file is locked. >>>>> >>>>> I have my two datawrappers attached to the email. Please take a look >>>>> at them. One thread runs acceptDict in a loop while the other runs >>>>> changeFile in a loop. This is really frustrating when I don't get any >>>>> errors and python just crashes unexpectedly. >>>>> >>>>> Thanks, >>>>> Jacob >>>>> >>>>> -- >>>>> Jacob Bennett >>>>> Massachusetts Institute of Technology >>>>> Department of Electrical Engineering and Computer Science >>>>> Class of 2014| ben...@mi... >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. >>>>> Discussions >>>>> will include endpoint security, mobile security and the latest in >>>>> malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> Pytables-users mailing list >>>>> Pyt...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>>> >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Pytables-users mailing list >>>> Pyt...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>>> >>> >>> >>> -- >>> Jacob Bennett >>> Massachusetts Institute of Technology >>> Department of Electrical Engineering and Computer Science >>> Class of 2014| ben...@mi... >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Jacob B. <jac...@gm...> - 2012-06-27 15:39:19
|
Sorry about that, I uploaded the code, but since it requires many dependencies, I was not expecting you to run it. That being said, I would say the expected number of rows per table is 100,000 and I am currently working on an intel xeon with 4 processors and 8 threads. I also found that pytables has a __contains__ method within it and therefore can make ease of the try-catch statements I had before. Thanks, Jacob On Tue, Jun 26, 2012 at 10:37 PM, Anthony Scopatz <sc...@gm...> wrote: > Hi Jacob, > > On Tue, Jun 26, 2012 at 5:35 PM, Jacob Bennett <jac...@gm...>wrote: > >> Hello Anthony, >> >> With the above being said and with more work put on the initial attempt, >> could you suggest other methods that might improve write performance? The >> optimization page I feel mostly talks about data retrieval, which will be >> more important later on, but I have to make a writing bound before then. >> The only thing that I have done to improve performance is to use >> table.append(rows). >> > > Typically also having larger chunksizes will increase performance. > Additionally, adding compression via filters may also increase write > speeds. However, the exact strategy you take will depend on the size of > data that you are writing and the number of processors that you have. Out > of curiosity, what are your data and nproc sizes? > > >> My updated code is attached to this email, thanks again! >> > > I tried running this but there was no TimeHandler module.... However, I > will note that this doesn't look like the most efficient code with all of > try-except blocks. I think that hasattr() will work in a lot of these > cases for you. > > Be Well > Anthony > > >> >> Thanks, >> Jacob >> >> >> On Tue, Jun 26, 2012 at 12:56 AM, Anthony Scopatz <sc...@gm...>wrote: >> >>> Hello Jacob, >>> >>> This is not surprising. The HDF5 parallel library requires MPI and >>> comes with some special restrictions (no compression on write). As such, >>> the pain of implementing a parallel write version of PyTables has not been >>> worth it. We certainly welcome pull requests and further discussion on >>> this issue ;). Often times it is easier (and faster...writing is >>> expensive) to do the computation in parallel followed by a single write. >>> Or you could have a dedicated thread which queues and executes write >>> commands as they come in. Just some thoughts on how to avoid this >>> problem. >>> >>> Parallel reads are supported. >>> >>> Let me know if you have further questions or really want to dive into >>> this issue. >>> >>> Be Well >>> Anthony >>> >>> On Mon, Jun 25, 2012 at 1:33 PM, Jacob Bennett < >>> jac...@gm...> wrote: >>> >>>> Hello PyTables Users, >>>> >>>> I am very new to pytables and if you all could help me out, that would >>>> be splendid. >>>> >>>> I'm currently having trouble with writing to two separate HDF5 files >>>> using pytables. Each file itself is only accessible by a single thread, so >>>> there really shouldn't be any threading issues. When I run my python script >>>> however, it just seems to crash at random time intervals without any error >>>> messages received or exceptions thrown. >>>> >>>> I write data to the HDF5 files as follows. I have two HDF5 files that >>>> represent book snapshots and trade snapshots. The data of these snapshots >>>> come in the form of python dictionaries whose values are the data itself in >>>> an array. Two threads run on each file. One thread controls when to create >>>> new files and close others based upon the time of day while the other >>>> thread iterates over each key value pair in the dictionary and loads data >>>> to the file. When a thread has access to the file, the file is locked. >>>> >>>> I have my two datawrappers attached to the email. Please take a look at >>>> them. One thread runs acceptDict in a loop while the other runs changeFile >>>> in a loop. This is really frustrating when I don't get any errors and >>>> python just crashes unexpectedly. >>>> >>>> Thanks, >>>> Jacob >>>> >>>> -- >>>> Jacob Bennett >>>> Massachusetts Institute of Technology >>>> Department of Electrical Engineering and Computer Science >>>> Class of 2014| ben...@mi... >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Pytables-users mailing list >>>> Pyt...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> -- >> Jacob Bennett >> Massachusetts Institute of Technology >> Department of Electrical Engineering and Computer Science >> Class of 2014| ben...@mi... >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Aquil H. A. <aqu...@gm...> - 2012-06-27 13:55:58
|
Hello Francesc, Thank you for your response! I guess I need to read the User's Guide cover to cover. -- Aquil H. Abdullah On Wednesday, June 27, 2012 at 4:44 AM, Francesc Alted wrote: > On 6/26/12 11:19 PM, Aquil H. Abdullah wrote: > > Hello All, > > > > In my newbist state, I called createIndex on two columns in one of my > > tables: > > > > import tables > > table_desc = {'timestamp':tables.Time32Col(), > > 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()} > > h5f = tables.openFile('test.h5',mode='w') > > group = h5f.createGroup('/','data') > > table = h5f.createTable(group, 'test',table_desc,'Test Table') > > table.cols.timestamp.createIndex() > > table.cols.symbol.createIndex() > > … > > > > Now from what I've been able to find on the internet an index is only > > associated with one column: > > > > class tables.Index > > Represents the index of a column in a table. > > > > This class is used to keep the indexing information for columns in a > > Table dataset (see The Table class). It is actually the descendant of the > > Group class (see The Group class), with some added functionality. An > > Index is always associated with one and only one column in a table. > > > > - PyTables 2.3.1 User's Guide - Library Reference/The Index Class > > http://pytables.github.com/usersguide/libref.html#indexclassdescr > > - Efficient way to verify that records are unique in Python/PyTables > > http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables > > - Hints For SQL Users (Creating an index) > > http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex > > > > So how does PyTables interpret a table with multiple column indices? > > If a table has multiple indices, PyTables will use its internal query > optimizer to try to use these in your queries. It is not always possible > for PyTables to use all indexes though. Please see: > > http://pytables.github.com/usersguide/optimization.html#indexed-searches > > for a series of examples where different indexes can be used. > > > The best solution that I've found is creating a hash from the two > > fields that I am interested in indexing and then indexing that table > > on that hash. > > > > > In case several indexes cannot be use in your case, that could be an > alternate solution for what you are trying to do, yes. > > > > > The other solution would be to shard my data by symbol and then index > > each symbol table by timestamp. > > > > > The range of possibilities is really large, yes, but I'd try to avoid > sharding because it is normally harder to setup and manage, but you are > indeed free to try whatever approaches you feel they are best for you. > > HTH, > > -- > Francesc Alted > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... (mailto:Pyt...@li...) > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Francesc A. <fa...@py...> - 2012-06-27 08:43:59
|
On 6/26/12 11:19 PM, Aquil H. Abdullah wrote: > Hello All, > > In my newbist state, I called createIndex on two columns in one of my > tables: > > import tables > table_desc = {'timestamp':tables.Time32Col(), > 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()} > h5f = tables.openFile('test.h5',mode='w') > group = h5f.createGroup('/','data') > table = h5f.createTable(group, 'test',table_desc,'Test Table') > table.cols.timestamp.createIndex() > table.cols.symbol.createIndex() > … > > Now from what I've been able to find on the internet an index is only > associated with one column: > > class tables.Index > Represents the index of a column in a table. > > This class is used to keep the indexing information for columns in a > Table dataset (see The Table class). It is actually the descendant of the > Group class (see The Group class), with some added functionality. An > Index is always associated with one and only one column in a table. > > - PyTables 2.3.1 User's Guide - Library Reference/The Index Class > http://pytables.github.com/usersguide/libref.html#indexclassdescr > - Efficient way to verify that records are unique in Python/PyTables > http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables > - Hints For SQL Users (Creating an index) > http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex > > So how does PyTables interpret a table with multiple column indices? If a table has multiple indices, PyTables will use its internal query optimizer to try to use these in your queries. It is not always possible for PyTables to use all indexes though. Please see: http://pytables.github.com/usersguide/optimization.html#indexed-searches for a series of examples where different indexes can be used. > The best solution that I've found is creating a hash from the two > fields that I am interested in indexing and then indexing that table > on that hash. In case several indexes cannot be use in your case, that could be an alternate solution for what you are trying to do, yes. > > The other solution would be to shard my data by symbol and then index > each symbol table by timestamp. The range of possibilities is really large, yes, but I'd try to avoid sharding because it is normally harder to setup and manage, but you are indeed free to try whatever approaches you feel they are best for you. HTH, -- Francesc Alted |
From: Anthony S. <sc...@gm...> - 2012-06-27 03:38:26
|
Hi Jacob, On Tue, Jun 26, 2012 at 5:35 PM, Jacob Bennett <jac...@gm...>wrote: > Hello Anthony, > > With the above being said and with more work put on the initial attempt, > could you suggest other methods that might improve write performance? The > optimization page I feel mostly talks about data retrieval, which will be > more important later on, but I have to make a writing bound before then. > The only thing that I have done to improve performance is to use > table.append(rows). > Typically also having larger chunksizes will increase performance. Additionally, adding compression via filters may also increase write speeds. However, the exact strategy you take will depend on the size of data that you are writing and the number of processors that you have. Out of curiosity, what are your data and nproc sizes? > My updated code is attached to this email, thanks again! > I tried running this but there was no TimeHandler module.... However, I will note that this doesn't look like the most efficient code with all of try-except blocks. I think that hasattr() will work in a lot of these cases for you. Be Well Anthony > > Thanks, > Jacob > > > On Tue, Jun 26, 2012 at 12:56 AM, Anthony Scopatz <sc...@gm...>wrote: > >> Hello Jacob, >> >> This is not surprising. The HDF5 parallel library requires MPI and comes >> with some special restrictions (no compression on write). As such, the >> pain of implementing a parallel write version of PyTables has not been >> worth it. We certainly welcome pull requests and further discussion on >> this issue ;). Often times it is easier (and faster...writing is >> expensive) to do the computation in parallel followed by a single write. >> Or you could have a dedicated thread which queues and executes write >> commands as they come in. Just some thoughts on how to avoid this >> problem. >> >> Parallel reads are supported. >> >> Let me know if you have further questions or really want to dive into >> this issue. >> >> Be Well >> Anthony >> >> On Mon, Jun 25, 2012 at 1:33 PM, Jacob Bennett < >> jac...@gm...> wrote: >> >>> Hello PyTables Users, >>> >>> I am very new to pytables and if you all could help me out, that would >>> be splendid. >>> >>> I'm currently having trouble with writing to two separate HDF5 files >>> using pytables. Each file itself is only accessible by a single thread, so >>> there really shouldn't be any threading issues. When I run my python script >>> however, it just seems to crash at random time intervals without any error >>> messages received or exceptions thrown. >>> >>> I write data to the HDF5 files as follows. I have two HDF5 files that >>> represent book snapshots and trade snapshots. The data of these snapshots >>> come in the form of python dictionaries whose values are the data itself in >>> an array. Two threads run on each file. One thread controls when to create >>> new files and close others based upon the time of day while the other >>> thread iterates over each key value pair in the dictionary and loads data >>> to the file. When a thread has access to the file, the file is locked. >>> >>> I have my two datawrappers attached to the email. Please take a look at >>> them. One thread runs acceptDict in a loop while the other runs changeFile >>> in a loop. This is really frustrating when I don't get any errors and >>> python just crashes unexpectedly. >>> >>> Thanks, >>> Jacob >>> >>> -- >>> Jacob Bennett >>> Massachusetts Institute of Technology >>> Department of Electrical Engineering and Computer Science >>> Class of 2014| ben...@mi... >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2012-06-26 21:30:43
|
On Tue, Jun 26, 2012 at 4:19 PM, Aquil H. Abdullah <aqu...@gm... > wrote: > Hello All, > > In my newbist state, I called createIndex on two columns in one of my > tables: > > import tables > table_desc = {'timestamp':tables.Time32Col(), > 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()} > h5f = tables.openFile('test.h5',mode='w') > group = h5f.createGroup('/','data') > table = h5f.createTable(group, 'test',table_desc,'Test Table') > table.cols.timestamp.createIndex() > table.cols.symbol.createIndex() > … > > Now from what I've been able to find on the internet an index is only > associated with one column: > > class tables.Index > Represents the index of a column in a table. > > This class is used to keep the indexing information for columns in a > Table dataset (see The Table class). It is actually the descendant of the > Group class (see The Group class), with some added functionality. An > Index is always associated with one and only one column in a table. > > - PyTables 2.3.1 User's Guide - Library Reference/The Index Class > http://pytables.github.com/usersguide/libref.html#indexclassdescr > - Efficient way to verify that records are unique in Python/PyTables > http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables > - Hints For SQL Users (Creating an index) > http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex > > So how does PyTables interpret a table with multiple column indices? The > best solution that I've found is creating a hash from the two fields that I > am interested in indexing and then indexing that table on that hash. > > The other solution would be to shard my data by symbol and then index each > symbol table by timestamp. > > Can anyone explain what effect two index columns has on Pytables? > Also, can anyone tell me if they've come up with a better solution for > dealing with tables that require multiple indices than any that I've > mentioned? > I don't have a lot of time right now, but maybe create a nested column or a column with a compound data type that is just a tuple of the two data types you are interested in. Then index against the super column. Storing a hash in another column is probably not the greatest way to do this... Hopefully someone else can jump in and answer this one. > > Regards, > > -- > Aquil H. Abdullah > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Aquil H. A. <aqu...@gm...> - 2012-06-26 21:19:43
|
Hello All, In my newbist state, I called createIndex on two columns in one of my tables: import tables table_desc = {'timestamp':tables.Time32Col(), 'symbol':tables.StringCol(8), 'observation':tables.Float32Col()} h5f = tables.openFile('test.h5',mode='w') group = h5f.createGroup('/','data') table = h5f.createTable(group, 'test',table_desc,'Test Table') table.cols.timestamp.createIndex() table.cols.symbol.createIndex() … Now from what I've been able to find on the internet an index is only associated with one column: class tables.Index Represents the index of a column in a table. This class is used to keep the indexing information for columns in a Table dataset (see The Table class). It is actually the descendant of the Group class (see The Group class), with some added functionality. An Index is always associated with one and only one column in a table. - PyTables 2.3.1 User's Guide - Library Reference/The Index Class http://pytables.github.com/usersguide/libref.html#indexclassdescr - Efficient way to verify that records are unique in Python/PyTables http://stackoverflow.com/questions/1315129/efficient-way-to-verify-that-records-are-unique-in-python-pytables - Hints For SQL Users (Creating an index) http://www.pytables.org/moin/HintsForSQLUsers#Creatinganindex So how does PyTables interpret a table with multiple column indices? The best solution that I've found is creating a hash from the two fields that I am interested in indexing and then indexing that table on that hash. The other solution would be to shard my data by symbol and then index each symbol table by timestamp. Can anyone explain what effect two index columns has on Pytables? Also, can anyone tell me if they've come up with a better solution for dealing with tables that require multiple indices than any that I've mentioned? Regards, -- Aquil H. Abdullah |
From: Anthony S. <sc...@gm...> - 2012-06-26 05:57:22
|
Hello Jacob, This is not surprising. The HDF5 parallel library requires MPI and comes with some special restrictions (no compression on write). As such, the pain of implementing a parallel write version of PyTables has not been worth it. We certainly welcome pull requests and further discussion on this issue ;). Often times it is easier (and faster...writing is expensive) to do the computation in parallel followed by a single write. Or you could have a dedicated thread which queues and executes write commands as they come in. Just some thoughts on how to avoid this problem. Parallel reads are supported. Let me know if you have further questions or really want to dive into this issue. Be Well Anthony On Mon, Jun 25, 2012 at 1:33 PM, Jacob Bennett <jac...@gm...>wrote: > Hello PyTables Users, > > I am very new to pytables and if you all could help me out, that would be > splendid. > > I'm currently having trouble with writing to two separate HDF5 files using > pytables. Each file itself is only accessible by a single thread, so there > really shouldn't be any threading issues. When I run my python script > however, it just seems to crash at random time intervals without any error > messages received or exceptions thrown. > > I write data to the HDF5 files as follows. I have two HDF5 files that > represent book snapshots and trade snapshots. The data of these snapshots > come in the form of python dictionaries whose values are the data itself in > an array. Two threads run on each file. One thread controls when to create > new files and close others based upon the time of day while the other > thread iterates over each key value pair in the dictionary and loads data > to the file. When a thread has access to the file, the file is locked. > > I have my two datawrappers attached to the email. Please take a look at > them. One thread runs acceptDict in a loop while the other runs changeFile > in a loop. This is really frustrating when I don't get any errors and > python just crashes unexpectedly. > > Thanks, > Jacob > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Antonio V. <ant...@ti...> - 2012-06-25 17:47:08
|
Hi Alvaro, thank you for reporting. I filed an issue on GitHub to track the problem: https://github.com/PyTables/PyTables/issues/160 ciao Il 25/06/2012 12:03, Alvaro Tejero Cantero ha scritto: > Hi, > > In view of the upcoming release I thought I'd report this because at > the time I cannot fix it myself: > > I am using a structured array with a dtype specified with the > following numpy-accepted > format (quotation follows from > http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html): > > [(field_name, field_dtype, field_shape), ...] > > obj should be a list of fields where each field is described by a > tuple of length 2 or 3. (Equivalent to the descr item in the > __array_interface__ attribute.) > > The first element, field_name, is the field name (if this is '' then a > standard field name, 'f#', is assigned). The field name may also be a > 2-tuple of strings where the first string is either a “title” (which > may be any string or unicode string) or meta-data for the field which > can be any object, and the second string is the “name” which must be a > valid Python identifier. > > This is my concrete example: > > header = [(('timestamp', 't'), 'u4'), > (('unit (cluster) id', 'unit'),'u2')] > > This is what PyTables says upon passing either the structured array or > np.dtype(header) to the createTables function: > >> test.createTable('/', 'spike', s, 'test') > > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > /home/tejero/Dropbox/O/ridge/doc/<ipython-input-40-5fdbd9feb41d> in <module>() > ----> 1 test.createTable('/', 'spike', s, 'test') > > /home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/file.pyc > in createTable(self, where, name, description, title, filters, > expectedrows, chunkshape, byteorder, createparents) > 768 description=description, title=title, > 769 filters=filters, expectedrows=expectedrows, > --> 770 chunkshape=chunkshape, byteorder=byteorder) > 771 > 772 > > /home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/table.pyc > in __init__(self, parentNode, name, description, title, filters, > expectedrows, chunkshape, byteorder, _log) > 805 self._v_recarray = nparray > 806 self.description, self._rabyteorder = \ > --> 807 descr_from_dtype(nparray.dtype) > 808 > 809 # No description yet? > > > /home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/description.pyc > in descr_from_dtype(dtype_) > 723 fields = {} > 724 fbyteorder = '|' > --> 725 for (name, (dtype, pos)) in dtype_.fields.items(): > 726 kind = dtype.base.kind > 727 byteorder = dtype.base.byteorder > > ValueError: too many values to unpack > > -á. > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > -- Antonio Valentino |
From: Mythsmith <sp...@mo...> - 2012-06-25 15:29:39
|
Done: https://github.com/PyTables/PyTables/issues/159 I did not understand how to attach a file to a github issue. Anyway, I produced a pastebin address and attach the unittest here. Regards, Daniele Il 25/06/2012 09:35, Antonio Valentino ha scritto: > Ciao Daniele, > > Il giorno Mon, 25 Jun 2012 09:17:00 +0200 > Mythsmith <sp...@mo...> ha scritto: > >> Hi Anthony, >> Shouldn't the close() method also clear the cache? I think a file >> should be either opened or closed... Should I file a bug report? >> Best regards, >> Daniele >> > The close method also remover the file from the cache if there are no > more references to it > > https://github.com/PyTables/PyTables/blob/6fccb7495ba1bc758c7b04960fe1cd392abe9b96/tables/file.py#L2098 > > Anyway yes, if you have some problem with the file caching system > please file a bug report on github. > > Of course test scripts or patches are very welcome. > > ciao > >> Il 21/06/2012 19:23, Anthony Scopatz ha scritto: >>> Hi Daniele, >>> >>> This is probably because of the way PyTables caches its file >>> objects. As a temporary work around, why don't you try clearing the >>> cache or at least removing this file. The cache is just a >>> dictionary and it is located at "tables.file._open_files". ie try: >>> >>> tables.file._open_files.clear() >>> -or- >>> del tables.file._open_files.pop["touch.h5"] >>> >>> Be Well >>> Anthony >>> >>> On Thu, Jun 21, 2012 at 10:43 AM, Mythsmith <sp...@mo... >>> <mailto:sp...@mo...>> wrote: >>> >>> Hi, >>> I noticed that if I open an erroneous file (eg: empty), then it >>> seems not possible to completely close it and reopen the same >>> path, even if a valid file was created in the meanwhile. >>> The error is: >>> ValueError: The file 'touch.h5' is already opened. Please close >>> it before reopening in write mode. >>> >>> You find a complete example attached. >>> >>> Regards, >>> daniele > > |
From: Alvaro T. C. <al...@mi...> - 2012-06-25 10:03:38
|
Hi, In view of the upcoming release I thought I'd report this because at the time I cannot fix it myself: I am using a structured array with a dtype specified with the following numpy-accepted format (quotation follows from http://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html): [(field_name, field_dtype, field_shape), ...] obj should be a list of fields where each field is described by a tuple of length 2 or 3. (Equivalent to the descr item in the __array_interface__ attribute.) The first element, field_name, is the field name (if this is '' then a standard field name, 'f#', is assigned). The field name may also be a 2-tuple of strings where the first string is either a “title” (which may be any string or unicode string) or meta-data for the field which can be any object, and the second string is the “name” which must be a valid Python identifier. This is my concrete example: header = [(('timestamp', 't'), 'u4'), (('unit (cluster) id', 'unit'),'u2')] This is what PyTables says upon passing either the structured array or np.dtype(header) to the createTables function: > test.createTable('/', 'spike', s, 'test') --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /home/tejero/Dropbox/O/ridge/doc/<ipython-input-40-5fdbd9feb41d> in <module>() ----> 1 test.createTable('/', 'spike', s, 'test') /home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/file.pyc in createTable(self, where, name, description, title, filters, expectedrows, chunkshape, byteorder, createparents) 768 description=description, title=title, 769 filters=filters, expectedrows=expectedrows, --> 770 chunkshape=chunkshape, byteorder=byteorder) 771 772 /home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/table.pyc in __init__(self, parentNode, name, description, title, filters, expectedrows, chunkshape, byteorder, _log) 805 self._v_recarray = nparray 806 self.description, self._rabyteorder = \ --> 807 descr_from_dtype(nparray.dtype) 808 809 # No description yet? /home/tejero/Local/Envs/test/lib/python2.7/site-packages/tables/description.pyc in descr_from_dtype(dtype_) 723 fields = {} 724 fbyteorder = '|' --> 725 for (name, (dtype, pos)) in dtype_.fields.items(): 726 kind = dtype.base.kind 727 byteorder = dtype.base.byteorder ValueError: too many values to unpack -á. |
From: Antonio V. <ant...@ti...> - 2012-06-25 07:53:23
|
Ciao Daniele, Il giorno Mon, 25 Jun 2012 09:17:00 +0200 Mythsmith <sp...@mo...> ha scritto: > Hi Anthony, > Shouldn't the close() method also clear the cache? I think a file > should be either opened or closed... Should I file a bug report? > Best regards, > Daniele > The close method also remover the file from the cache if there are no more references to it https://github.com/PyTables/PyTables/blob/6fccb7495ba1bc758c7b04960fe1cd392abe9b96/tables/file.py#L2098 Anyway yes, if you have some problem with the file caching system please file a bug report on github. Of course test scripts or patches are very welcome. ciao > Il 21/06/2012 19:23, Anthony Scopatz ha scritto: > > Hi Daniele, > > > > This is probably because of the way PyTables caches its file > > objects. As a temporary work around, why don't you try clearing the > > cache or at least removing this file. The cache is just a > > dictionary and it is located at "tables.file._open_files". ie try: > > > > tables.file._open_files.clear() > > -or- > > del tables.file._open_files.pop["touch.h5"] > > > > Be Well > > Anthony > > > > On Thu, Jun 21, 2012 at 10:43 AM, Mythsmith <sp...@mo... > > <mailto:sp...@mo...>> wrote: > > > > Hi, > > I noticed that if I open an erroneous file (eg: empty), then it > > seems not possible to completely close it and reopen the same > > path, even if a valid file was created in the meanwhile. > > The error is: > > ValueError: The file 'touch.h5' is already opened. Please close > > it before reopening in write mode. > > > > You find a complete example attached. > > > > Regards, > > daniele -- Antonio Valentino |
From: Mythsmith <sp...@mo...> - 2012-06-25 07:17:15
|
Hi Anthony, Shouldn't the close() method also clear the cache? I think a file should be either opened or closed... Should I file a bug report? Best regards, Daniele Il 21/06/2012 19:23, Anthony Scopatz ha scritto: > Hi Daniele, > > This is probably because of the way PyTables caches its file objects. > As a temporary work around, why don't you try clearing the cache or > at least removing this file. The cache is just a dictionary and it is > located at "tables.file._open_files". ie try: > > tables.file._open_files.clear() > -or- > del tables.file._open_files.pop["touch.h5"] > > Be Well > Anthony > > On Thu, Jun 21, 2012 at 10:43 AM, Mythsmith <sp...@mo... > <mailto:sp...@mo...>> wrote: > > Hi, > I noticed that if I open an erroneous file (eg: empty), then it > seems not possible to completely close it and reopen the same > path, even if a valid file was created in the meanwhile. > The error is: > ValueError: The file 'touch.h5' is already opened. Please close > it before reopening in write mode. > > You find a complete example attached. > > Regards, > daniele > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. > Discussions > will include endpoint security, mobile security and the latest in > malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Antonio V. <ant...@ti...> - 2012-06-23 18:12:18
|
Ciao Daniele, Il 21/06/2012 17:43, Mythsmith ha scritto: > Hi, > I noticed that if I open an erroneous file (eg: empty), then it seems > not possible to completely close it and reopen the same path, even if a > valid file was created in the meanwhile. > The error is: > ValueError: The file 'touch.h5' is already opened. Please close it > before reopening in write mode. > > You find a complete example attached. > > Regards, > daniele > Thank you for reporting. The issue has already been fixed in the development branch an it should be available in PyTables 2.4. I filed a ticket on GitHub (https://github.com/PyTables/PyTables/issues/158) to track the issue. ciao > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > -- Antonio Valentino |
From: Josh M. <jos...@gm...> - 2012-06-22 14:47:09
|
On Jun 19, 2012, at 4:37 AM, David Donovan wrote: > Hi Anthony, > > Thanks for the response. Installed HDF5 1.8.9 using the following > flags for configure. > > `./configure --prefix=/usr/local > --with-szlib=/Library/Frameworks/Python.framework/Versions/Current > CPPFLAGS=-I/Library/Frameworks/Python.framework/Versions/Current/include > LDFLAGS=-L/Library/Frameworks/Python.framework/Versions/Current/lib` > > Also, I had to modify the optimization flag for gcc-4 in order to pass > the make check part (as noted on the HDF5 page). > `Conversion Tests fail on Mac OS X 10.7 (Lion) Users have reported > that when building HDF5, the conversion tests failed (make check) in > dt_arith.chk. A workaround is to edit ./< HDF5source >> /config/gnu-flags, search for PROD_CFLAGS under "gcc-4.*", and change > the value for PROD_CFLAGS to "-O0".` > > Then: > 'make' > 'make check' > 'sudo make install' > > Is there a better way? Is tables somehow having a hard time finding > the HDF5 library do you think? Hi David, I always use homebrew for any of my prerequisites: http://mxcl.github.com/homebrew/ Cheers ~Josh > Thanks! > > Best Regards, > David > > > > On Sat, Jun 16, 2012 at 12:57 AM, Anthony Scopatz <sc...@gm...> wrote: >> Hi David, >> >> How did you build / install HDF5? >> >> Be Well >> Anthony >> >> On Fri, Jun 15, 2012 at 7:14 PM, David Donovan <don...@gm...> wrote: >>> >>> Hi Everyone, >>> >>> I am having problems running the tests for PyTables on Mac OS X Lion. >>> I have tried HDF5 version 1.8.5 as well, but I still get the same issue. >>> >>> Any thoughts would be helpful... >>> >>> Thanks for any help you can provide. >>> >>> Best Regards, >>> David Donovan >>> >>> >>> impPython 2.7.1 (r271:86832, Jul 31 2011, 19:30:53) >>> [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on >>> darwin >>> Type "help", "copyright", "credits" or "license" for more information. >>>>>> import tables >>> table>>> tables.test() >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >>> PyTables version: 2.3.1 >>> HDF5 version: 1.8.9 >>> NumPy version: 1.7.0.dev-0c5f480 >>> Numexpr version: 2.0.1 (not using Intel's VML/MKL) >>> Zlib version: 1.2.5 (in Python interpreter) >>> LZO version: 2.06 (Aug 12 2011) >>> BZIP2 version: 1.0.6 (6-Sept-2010) >>> Blosc version: 1.1.2 (2010-11-04) >>> Cython version: 0.16 >>> Python version: 2.7.1 (r271:86832, Jul 31 2011, 19:30:53) >>> [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] >>> Platform: darwin-i386 >>> Byte-ordering: little >>> Detected cores: 2 >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >>> Performing only a light (yet comprehensive) subset of the test suite. >>> If you want a more complete test, try passing the --heavy flag to this >>> script >>> (or set the 'heavy' parameter in case you are using tables.test() call). >>> The whole suite will take more than 4 hours to complete on a relatively >>> modern CPU and around 512 MB of main memory. >>> >>> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= >>> >>> .....................................................................................................................................FSegmentation >>> fault: 11 >>> |
From: Anthony S. <sc...@gm...> - 2012-06-21 17:23:57
|
Hi Daniele, This is probably because of the way PyTables caches its file objects. As a temporary work around, why don't you try clearing the cache or at least removing this file. The cache is just a dictionary and it is located at "tables.file._open_files". ie try: tables.file._open_files.clear() -or- del tables.file._open_files.pop["touch.h5"] Be Well Anthony On Thu, Jun 21, 2012 at 10:43 AM, Mythsmith <sp...@mo...> wrote: > Hi, > I noticed that if I open an erroneous file (eg: empty), then it seems not > possible to completely close it and reopen the same path, even if a valid > file was created in the meanwhile. > The error is: > ValueError: The file 'touch.h5' is already opened. Please close it before > reopening in write mode. > > You find a complete example attached. > > Regards, > daniele > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Mythsmith <sp...@mo...> - 2012-06-21 15:44:08
|
Hi, I noticed that if I open an erroneous file (eg: empty), then it seems not possible to completely close it and reopen the same path, even if a valid file was created in the meanwhile. The error is: ValueError: The file 'touch.h5' is already opened. Please close it before reopening in write mode. You find a complete example attached. Regards, daniele |