You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Jacob B. <jac...@gm...> - 2012-07-16 20:30:54
|
Wait, is there perhaps a way to simulataneously read and write without any kind of blocking? Perhaps the "a" mode or the "r+" mode might help for simultaneous read/write? I am currently implementing the multithreading.Queue, but I think that a large number of query requests might put an necessary load on my writing queue since the data comes in sooooo fast. ;) Btw, I will submit the example soon. -Jacob On Sat, Jul 14, 2012 at 1:39 PM, Anthony Scopatz <sc...@gm...> wrote: > +1 to example of this! > > > On Sat, Jul 14, 2012 at 1:36 PM, Jacob Bennett <jac...@gm...>wrote: > >> Awesome, I think this sounds like a very workable solution and the idea >> is very neat. I will try to implement this right away. I definitely agree >> to putting a small example. >> >> Let you know how this works, thanks guys! >> >> Thanks, >> Jacob >> >> >> On Sat, Jul 14, 2012 at 2:36 AM, Antonio Valentino < >> ant...@ti...> wrote: >> >>> Hi all, >>> Il 14/07/2012 00:44, Josh Ayers ha scritto: >>> > My first instinct would be to handle all access (read and write) to >>> > that file from a single process. You could create two >>> > multiprocessing.Queue objects, one for data to write and one for read >>> > requests. Then the process would check the queues in a loop and >>> > handle each request serially. The data read from the file could be >>> > sent back to the originating process using another queue or pipe. You >>> > should be able to do the same thing with sockets if the other parts of >>> > your application are in languages other than Python. >>> > >>> > I do something similar to handle writing to a log file from multiple >>> > processes and it works well. In that case the file is write-only - >>> > and just a simple text file rather than HDF5 - but I don't see any >>> > reason why it wouldn't work for read and write as well. >>> > >>> > Hope that helps, >>> > Josh >>> > >>> >>> I totally agree with Josh. >>> >>> I don't have a test code to demonstrate it but IMHO parallelizing I/O >>> to/from a single file on a single disk do not makes too much sense >>> unless you have special HW. Is this your case Jacob? >>> >>> IMHO with standard SATA devices you could have a marginal speedup (in >>> the best case), but if your bottleneck is the I/O this will not solve >>> your problem. >>> >>> If someone finds the time to implement a toy example of what Josh >>> suggested we could put it on the cookbook :) >>> >>> >>> regards >>> >>> > On Fri, Jul 13, 2012 at 12:18 PM, Anthony Scopatz <sc...@gm...> >>> wrote: >>> >> On Fri, Jul 13, 2012 at 2:09 PM, Jacob Bennett < >>> jac...@gm...> >>> >> wrote: >>> >> >>> >> [snip] >>> >> >>> >>> >>> >>> My first implementation was to have a set of current files stay in >>> write >>> >>> mode and have an overall lock over these files for the current day, >>> but >>> >>> (stupidly) I forgot that lock instances cannot be shared over >>> separate >>> >>> processes, only threads. >>> >>> >>> >>> So could you give me any advice in this situation? I'm sure it has >>> come up >>> >>> before. ;) >>> >> >>> >> >>> >> Hello All, I previously suggested to Jacob a setup where only one >>> proc would >>> >> have a write handle and all of the other processes would be in >>> read-only >>> >> mode. I am not sure that this would work. >>> >> >>> >> Francesc, Antonio, Josh, etc or anyone else, how would you solve this >>> >> problem where you may want many processors to query the file, while >>> >> something else may be writing to it? I defer to people with more >>> >> experience... Thanks for your help! >>> >> >>> >> Be Well >>> >> Anthony >>> >> >>> >>> >>> >>> Thanks, >>> >>> Jacob Bennett >>> >>> >>> >>> >>> -- >>> Antonio Valentino >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >> >> >> >> -- >> Jacob Bennett >> Massachusetts Institute of Technology >> Department of Electrical Engineering and Computer Science >> Class of 2014| ben...@mi... >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Antonio V. <ant...@ti...> - 2012-07-16 20:18:14
|
=========================== Announcing PyTables 2.4.0rc1 =========================== We are happy to announce PyTables 2.4.0rc1. This is an incremental release which includes many changes to prepare for future Python 3 support. What's new ========== This release includes support for the float16 data type and read-only support for variable length string attributes. The handling of HDF5 errors has been improved. The user will no longer see HDF5 error stacks dumped to the console. All HDF5 error messages are trapped and attached to a proper Python exception. Now PyTables only supports HDF5 v1.8.4+. All the code has been updated to the new HDF5 API. Supporting only HDF5 1.8 series is beneficial for future development. Documentation has been improved. As always, a large amount of bugs have been addressed and squashed as well. In case you want to know more in detail what has changed in this version, please refer to: http://pytables.github.com/release_notes.html You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://sourceforge.net/projects/pytables/files/pytables/2.4.0rc1 For an online version of the manual, visit: http://pytables.github.com/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Team |
From: Anthony S. <sc...@gm...> - 2012-07-15 21:50:19
|
Ahh I see, tricky. So I think what is killing you is that you are pulling each row of the table individually over the network. Ideally you should be able to do something like the following: f.root.table.cols.my_col[:,n,:] using numpy-esque multidimensional slicing. However, this fails when I just tested it. So instead, I would just pull over the full column and slice using numpy in memory. my_col = f.root.table.cols.my_col[:] my_selection = my_col[:,n,:] We should open a ticket so that the top method works (though I think there might already be one). I hope this helps! On Sun, Jul 15, 2012 at 4:27 PM, Juan Manuel Vázquez Tovar < jmv...@gm...> wrote: > The column I´m requesting the data from has multidimensional cells, so > each time I request data from the table, I need to get a specific row for > all the multidimensional cells in the column. I hope this clarifies a bit. > I have at the office a Linux workstation, but it is part of a computing > cluster where all the users have access, so the files are in a folder of > the cluster, not in my hard drive. > > Thank you, > Juanma > > 2012/7/15 Anthony Scopatz <sc...@gm...> > >> Rereading the original post, I am a little confused are your trying to >> read the whole table, just a couple of rows that meet some condition, or >> just one whole column, or one part of the column. >> >> To request the whole table without looping over each row in Python, index >> every element: >> >> f.root.table[:] >> >> >> To just get certain rows, use where(). >> >> To get a single column, use the cols namespace: >> >> f.root.table.cols.my_column[:] >> >> >> Why is this file elsewhere on the network? >> >> Be Well >> Anthony >> >> On Sun, Jul 15, 2012 at 4:08 PM, Juan Manuel Vázquez Tovar < >> jmv...@gm...> wrote: >> >>> Hello Anthony, >>> >>> I have to loop over the whole set of rows. Does the where method has any >>> advantages in that case? >>> >>> Thank you, >>> Juanma >>> >>> 2012/7/15 Anthony Scopatz <sc...@gm...> >>> >>>> Hello Juan, >>>> >>>> Try using the where() method [1], It has a lot of nice features under >>>> the covers. >>>> >>>> Be Well >>>> Anthony >>>> >>>> 1. >>>> http://pytables.github.com/usersguide/libref.html?highlight=where#tables.Table.where >>>> >>>> On Sun, Jul 15, 2012 at 4:01 PM, Juan Manuel Vázquez Tovar < >>>> jmv...@gm...> wrote: >>>> >>>>> Hello, >>>>> >>>>> I have been using pytables for a few moths. The main structure of my >>>>> files has a four column table, two of which have multidimensional cells, >>>>> (56,1) and (133,6) respectively. The previous structure had more columns >>>>> instead of storing the 56x1 array into the same cell. The largest file has >>>>> almost three million rows in the table. >>>>> I usually request data from the table looping through the entire table >>>>> and getting for each row one specific row of the 133x6 2d array. >>>>> Currently, each of the requests can take from 15 sec up to 10 minutes, >>>>> I believe that depending on the status of the office network. >>>>> Could you please advice about how to improve the reading time? >>>>> I have tried to compress the data with zlib, but it takes more or less >>>>> the same time. >>>>> >>>>> Thanks in advance, >>>>> >>>>> Juan Manuel >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. >>>>> Discussions >>>>> will include endpoint security, mobile security and the latest in >>>>> malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> Pytables-users mailing list >>>>> Pyt...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>>> >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Pytables-users mailing list >>>> Pyt...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Juan M. V. T. <jmv...@gm...> - 2012-07-15 21:27:58
|
The column I´m requesting the data from has multidimensional cells, so each time I request data from the table, I need to get a specific row for all the multidimensional cells in the column. I hope this clarifies a bit. I have at the office a Linux workstation, but it is part of a computing cluster where all the users have access, so the files are in a folder of the cluster, not in my hard drive. Thank you, Juanma 2012/7/15 Anthony Scopatz <sc...@gm...> > Rereading the original post, I am a little confused are your trying to > read the whole table, just a couple of rows that meet some condition, or > just one whole column, or one part of the column. > > To request the whole table without looping over each row in Python, index > every element: > > f.root.table[:] > > > To just get certain rows, use where(). > > To get a single column, use the cols namespace: > > f.root.table.cols.my_column[:] > > > Why is this file elsewhere on the network? > > Be Well > Anthony > > On Sun, Jul 15, 2012 at 4:08 PM, Juan Manuel Vázquez Tovar < > jmv...@gm...> wrote: > >> Hello Anthony, >> >> I have to loop over the whole set of rows. Does the where method has any >> advantages in that case? >> >> Thank you, >> Juanma >> >> 2012/7/15 Anthony Scopatz <sc...@gm...> >> >>> Hello Juan, >>> >>> Try using the where() method [1], It has a lot of nice features under >>> the covers. >>> >>> Be Well >>> Anthony >>> >>> 1. >>> http://pytables.github.com/usersguide/libref.html?highlight=where#tables.Table.where >>> >>> On Sun, Jul 15, 2012 at 4:01 PM, Juan Manuel Vázquez Tovar < >>> jmv...@gm...> wrote: >>> >>>> Hello, >>>> >>>> I have been using pytables for a few moths. The main structure of my >>>> files has a four column table, two of which have multidimensional cells, >>>> (56,1) and (133,6) respectively. The previous structure had more columns >>>> instead of storing the 56x1 array into the same cell. The largest file has >>>> almost three million rows in the table. >>>> I usually request data from the table looping through the entire table >>>> and getting for each row one specific row of the 133x6 2d array. >>>> Currently, each of the requests can take from 15 sec up to 10 minutes, >>>> I believe that depending on the status of the office network. >>>> Could you please advice about how to improve the reading time? >>>> I have tried to compress the data with zlib, but it takes more or less >>>> the same time. >>>> >>>> Thanks in advance, >>>> >>>> Juan Manuel >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Pytables-users mailing list >>>> Pyt...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2012-07-15 21:15:56
|
Rereading the original post, I am a little confused are your trying to read the whole table, just a couple of rows that meet some condition, or just one whole column, or one part of the column. To request the whole table without looping over each row in Python, index every element: f.root.table[:] To just get certain rows, use where(). To get a single column, use the cols namespace: f.root.table.cols.my_column[:] Why is this file elsewhere on the network? Be Well Anthony On Sun, Jul 15, 2012 at 4:08 PM, Juan Manuel Vázquez Tovar < jmv...@gm...> wrote: > Hello Anthony, > > I have to loop over the whole set of rows. Does the where method has any > advantages in that case? > > Thank you, > Juanma > > 2012/7/15 Anthony Scopatz <sc...@gm...> > >> Hello Juan, >> >> Try using the where() method [1], It has a lot of nice features under >> the covers. >> >> Be Well >> Anthony >> >> 1. >> http://pytables.github.com/usersguide/libref.html?highlight=where#tables.Table.where >> >> On Sun, Jul 15, 2012 at 4:01 PM, Juan Manuel Vázquez Tovar < >> jmv...@gm...> wrote: >> >>> Hello, >>> >>> I have been using pytables for a few moths. The main structure of my >>> files has a four column table, two of which have multidimensional cells, >>> (56,1) and (133,6) respectively. The previous structure had more columns >>> instead of storing the 56x1 array into the same cell. The largest file has >>> almost three million rows in the table. >>> I usually request data from the table looping through the entire table >>> and getting for each row one specific row of the 133x6 2d array. >>> Currently, each of the requests can take from 15 sec up to 10 minutes, I >>> believe that depending on the status of the office network. >>> Could you please advice about how to improve the reading time? >>> I have tried to compress the data with zlib, but it takes more or less >>> the same time. >>> >>> Thanks in advance, >>> >>> Juan Manuel >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Juan M. V. T. <jmv...@gm...> - 2012-07-15 21:08:36
|
Hello Anthony, I have to loop over the whole set of rows. Does the where method has any advantages in that case? Thank you, Juanma 2012/7/15 Anthony Scopatz <sc...@gm...> > Hello Juan, > > Try using the where() method [1], It has a lot of nice features under the > covers. > > Be Well > Anthony > > 1. > http://pytables.github.com/usersguide/libref.html?highlight=where#tables.Table.where > > On Sun, Jul 15, 2012 at 4:01 PM, Juan Manuel Vázquez Tovar < > jmv...@gm...> wrote: > >> Hello, >> >> I have been using pytables for a few moths. The main structure of my >> files has a four column table, two of which have multidimensional cells, >> (56,1) and (133,6) respectively. The previous structure had more columns >> instead of storing the 56x1 array into the same cell. The largest file has >> almost three million rows in the table. >> I usually request data from the table looping through the entire table >> and getting for each row one specific row of the 133x6 2d array. >> Currently, each of the requests can take from 15 sec up to 10 minutes, I >> believe that depending on the status of the office network. >> Could you please advice about how to improve the reading time? >> I have tried to compress the data with zlib, but it takes more or less >> the same time. >> >> Thanks in advance, >> >> Juan Manuel >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2012-07-15 21:04:58
|
Hello Juan, Try using the where() method [1], It has a lot of nice features under the covers. Be Well Anthony 1. http://pytables.github.com/usersguide/libref.html?highlight=where#tables.Table.where On Sun, Jul 15, 2012 at 4:01 PM, Juan Manuel Vázquez Tovar < jmv...@gm...> wrote: > Hello, > > I have been using pytables for a few moths. The main structure of my files > has a four column table, two of which have multidimensional cells, (56,1) > and (133,6) respectively. The previous structure had more columns instead > of storing the 56x1 array into the same cell. The largest file has almost > three million rows in the table. > I usually request data from the table looping through the entire table and > getting for each row one specific row of the 133x6 2d array. > Currently, each of the requests can take from 15 sec up to 10 minutes, I > believe that depending on the status of the office network. > Could you please advice about how to improve the reading time? > I have tried to compress the data with zlib, but it takes more or less the > same time. > > Thanks in advance, > > Juan Manuel > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Juan M. V. T. <jmv...@gm...> - 2012-07-15 21:01:32
|
Hello, I have been using pytables for a few moths. The main structure of my files has a four column table, two of which have multidimensional cells, (56,1) and (133,6) respectively. The previous structure had more columns instead of storing the 56x1 array into the same cell. The largest file has almost three million rows in the table. I usually request data from the table looping through the entire table and getting for each row one specific row of the 133x6 2d array. Currently, each of the requests can take from 15 sec up to 10 minutes, I believe that depending on the status of the office network. Could you please advice about how to improve the reading time? I have tried to compress the data with zlib, but it takes more or less the same time. Thanks in advance, Juan Manuel |
From: Anthony S. <sc...@gm...> - 2012-07-14 18:39:31
|
+1 to example of this! On Sat, Jul 14, 2012 at 1:36 PM, Jacob Bennett <jac...@gm...>wrote: > Awesome, I think this sounds like a very workable solution and the idea is > very neat. I will try to implement this right away. I definitely agree to > putting a small example. > > Let you know how this works, thanks guys! > > Thanks, > Jacob > > > On Sat, Jul 14, 2012 at 2:36 AM, Antonio Valentino < > ant...@ti...> wrote: > >> Hi all, >> Il 14/07/2012 00:44, Josh Ayers ha scritto: >> > My first instinct would be to handle all access (read and write) to >> > that file from a single process. You could create two >> > multiprocessing.Queue objects, one for data to write and one for read >> > requests. Then the process would check the queues in a loop and >> > handle each request serially. The data read from the file could be >> > sent back to the originating process using another queue or pipe. You >> > should be able to do the same thing with sockets if the other parts of >> > your application are in languages other than Python. >> > >> > I do something similar to handle writing to a log file from multiple >> > processes and it works well. In that case the file is write-only - >> > and just a simple text file rather than HDF5 - but I don't see any >> > reason why it wouldn't work for read and write as well. >> > >> > Hope that helps, >> > Josh >> > >> >> I totally agree with Josh. >> >> I don't have a test code to demonstrate it but IMHO parallelizing I/O >> to/from a single file on a single disk do not makes too much sense >> unless you have special HW. Is this your case Jacob? >> >> IMHO with standard SATA devices you could have a marginal speedup (in >> the best case), but if your bottleneck is the I/O this will not solve >> your problem. >> >> If someone finds the time to implement a toy example of what Josh >> suggested we could put it on the cookbook :) >> >> >> regards >> >> > On Fri, Jul 13, 2012 at 12:18 PM, Anthony Scopatz <sc...@gm...> >> wrote: >> >> On Fri, Jul 13, 2012 at 2:09 PM, Jacob Bennett < >> jac...@gm...> >> >> wrote: >> >> >> >> [snip] >> >> >> >>> >> >>> My first implementation was to have a set of current files stay in >> write >> >>> mode and have an overall lock over these files for the current day, >> but >> >>> (stupidly) I forgot that lock instances cannot be shared over separate >> >>> processes, only threads. >> >>> >> >>> So could you give me any advice in this situation? I'm sure it has >> come up >> >>> before. ;) >> >> >> >> >> >> Hello All, I previously suggested to Jacob a setup where only one proc >> would >> >> have a write handle and all of the other processes would be in >> read-only >> >> mode. I am not sure that this would work. >> >> >> >> Francesc, Antonio, Josh, etc or anyone else, how would you solve this >> >> problem where you may want many processors to query the file, while >> >> something else may be writing to it? I defer to people with more >> >> experience... Thanks for your help! >> >> >> >> Be Well >> >> Anthony >> >> >> >>> >> >>> Thanks, >> >>> Jacob Bennett >> >>> >> >> >> -- >> Antonio Valentino >> >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jacob B. <jac...@gm...> - 2012-07-14 18:37:04
|
Awesome, I think this sounds like a very workable solution and the idea is very neat. I will try to implement this right away. I definitely agree to putting a small example. Let you know how this works, thanks guys! Thanks, Jacob On Sat, Jul 14, 2012 at 2:36 AM, Antonio Valentino < ant...@ti...> wrote: > Hi all, > Il 14/07/2012 00:44, Josh Ayers ha scritto: > > My first instinct would be to handle all access (read and write) to > > that file from a single process. You could create two > > multiprocessing.Queue objects, one for data to write and one for read > > requests. Then the process would check the queues in a loop and > > handle each request serially. The data read from the file could be > > sent back to the originating process using another queue or pipe. You > > should be able to do the same thing with sockets if the other parts of > > your application are in languages other than Python. > > > > I do something similar to handle writing to a log file from multiple > > processes and it works well. In that case the file is write-only - > > and just a simple text file rather than HDF5 - but I don't see any > > reason why it wouldn't work for read and write as well. > > > > Hope that helps, > > Josh > > > > I totally agree with Josh. > > I don't have a test code to demonstrate it but IMHO parallelizing I/O > to/from a single file on a single disk do not makes too much sense > unless you have special HW. Is this your case Jacob? > > IMHO with standard SATA devices you could have a marginal speedup (in > the best case), but if your bottleneck is the I/O this will not solve > your problem. > > If someone finds the time to implement a toy example of what Josh > suggested we could put it on the cookbook :) > > > regards > > > On Fri, Jul 13, 2012 at 12:18 PM, Anthony Scopatz <sc...@gm...> > wrote: > >> On Fri, Jul 13, 2012 at 2:09 PM, Jacob Bennett < > jac...@gm...> > >> wrote: > >> > >> [snip] > >> > >>> > >>> My first implementation was to have a set of current files stay in > write > >>> mode and have an overall lock over these files for the current day, but > >>> (stupidly) I forgot that lock instances cannot be shared over separate > >>> processes, only threads. > >>> > >>> So could you give me any advice in this situation? I'm sure it has > come up > >>> before. ;) > >> > >> > >> Hello All, I previously suggested to Jacob a setup where only one proc > would > >> have a write handle and all of the other processes would be in read-only > >> mode. I am not sure that this would work. > >> > >> Francesc, Antonio, Josh, etc or anyone else, how would you solve this > >> problem where you may want many processors to query the file, while > >> something else may be writing to it? I defer to people with more > >> experience... Thanks for your help! > >> > >> Be Well > >> Anthony > >> > >>> > >>> Thanks, > >>> Jacob Bennett > >>> > > > -- > Antonio Valentino > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Antonio V. <ant...@ti...> - 2012-07-14 07:36:29
|
Hi all, Il 14/07/2012 00:44, Josh Ayers ha scritto: > My first instinct would be to handle all access (read and write) to > that file from a single process. You could create two > multiprocessing.Queue objects, one for data to write and one for read > requests. Then the process would check the queues in a loop and > handle each request serially. The data read from the file could be > sent back to the originating process using another queue or pipe. You > should be able to do the same thing with sockets if the other parts of > your application are in languages other than Python. > > I do something similar to handle writing to a log file from multiple > processes and it works well. In that case the file is write-only - > and just a simple text file rather than HDF5 - but I don't see any > reason why it wouldn't work for read and write as well. > > Hope that helps, > Josh > I totally agree with Josh. I don't have a test code to demonstrate it but IMHO parallelizing I/O to/from a single file on a single disk do not makes too much sense unless you have special HW. Is this your case Jacob? IMHO with standard SATA devices you could have a marginal speedup (in the best case), but if your bottleneck is the I/O this will not solve your problem. If someone finds the time to implement a toy example of what Josh suggested we could put it on the cookbook :) regards > On Fri, Jul 13, 2012 at 12:18 PM, Anthony Scopatz <sc...@gm...> wrote: >> On Fri, Jul 13, 2012 at 2:09 PM, Jacob Bennett <jac...@gm...> >> wrote: >> >> [snip] >> >>> >>> My first implementation was to have a set of current files stay in write >>> mode and have an overall lock over these files for the current day, but >>> (stupidly) I forgot that lock instances cannot be shared over separate >>> processes, only threads. >>> >>> So could you give me any advice in this situation? I'm sure it has come up >>> before. ;) >> >> >> Hello All, I previously suggested to Jacob a setup where only one proc would >> have a write handle and all of the other processes would be in read-only >> mode. I am not sure that this would work. >> >> Francesc, Antonio, Josh, etc or anyone else, how would you solve this >> problem where you may want many processors to query the file, while >> something else may be writing to it? I defer to people with more >> experience... Thanks for your help! >> >> Be Well >> Anthony >> >>> >>> Thanks, >>> Jacob Bennett >>> -- Antonio Valentino |
From: Josh A. <jos...@gm...> - 2012-07-13 22:44:29
|
My first instinct would be to handle all access (read and write) to that file from a single process. You could create two multiprocessing.Queue objects, one for data to write and one for read requests. Then the process would check the queues in a loop and handle each request serially. The data read from the file could be sent back to the originating process using another queue or pipe. You should be able to do the same thing with sockets if the other parts of your application are in languages other than Python. I do something similar to handle writing to a log file from multiple processes and it works well. In that case the file is write-only - and just a simple text file rather than HDF5 - but I don't see any reason why it wouldn't work for read and write as well. Hope that helps, Josh On Fri, Jul 13, 2012 at 12:18 PM, Anthony Scopatz <sc...@gm...> wrote: > On Fri, Jul 13, 2012 at 2:09 PM, Jacob Bennett <jac...@gm...> > wrote: > > [snip] > >> >> My first implementation was to have a set of current files stay in write >> mode and have an overall lock over these files for the current day, but >> (stupidly) I forgot that lock instances cannot be shared over separate >> processes, only threads. >> >> So could you give me any advice in this situation? I'm sure it has come up >> before. ;) > > > Hello All, I previously suggested to Jacob a setup where only one proc would > have a write handle and all of the other processes would be in read-only > mode. I am not sure that this would work. > > Francesc, Antonio, Josh, etc or anyone else, how would you solve this > problem where you may want many processors to query the file, while > something else may be writing to it? I defer to people with more > experience... Thanks for your help! > > Be Well > Anthony > >> >> Thanks, >> Jacob Bennett >> >> -- >> Jacob Bennett >> Massachusetts Institute of Technology >> Department of Electrical Engineering and Computer Science >> Class of 2014| ben...@mi... >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Anthony S. <sc...@gm...> - 2012-07-13 19:18:33
|
On Fri, Jul 13, 2012 at 2:09 PM, Jacob Bennett <jac...@gm...>wrote: [snip] > My first implementation was to have a set of current files stay in write > mode and have an overall lock over these files for the current day, but > (stupidly) I forgot that lock instances cannot be shared over separate > processes, only threads. > > So could you give me any advice in this situation? I'm sure it has come up > before. ;) > Hello All, I previously suggested to Jacob a setup where only one proc would have a write handle and all of the other processes would be in read-only mode. I am not sure that this would work. Francesc, Antonio, Josh, etc or anyone else, how would you solve this problem where you may want many processors to query the file, while something else may be writing to it? I defer to people with more experience... Thanks for your help! Be Well Anthony > Thanks, > Jacob Bennett > > -- > Jacob Bennett > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Class of 2014| ben...@mi... > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Jacob B. <jac...@gm...> - 2012-07-13 19:09:14
|
Hello PyTables Discussion, Could you perhaps give me the best advice on read/write from a single file in PyTables? I currently have a parsing system that accepts a stream of data and consistently writes the data to pytables; however, on the other hand, I have an independent server that accepts requests for data and those queries might come from the current day (so for data in the same file). My first implementation was to have a set of current files stay in write mode and have an overall lock over these files for the current day, but (stupidly) I forgot that lock instances cannot be shared over separate processes, only threads. So could you give me any advice in this situation? I'm sure it has come up before. ;) Thanks, Jacob Bennett -- Jacob Bennett Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Class of 2014| ben...@mi... |
From: Anthony S. <sc...@gm...> - 2012-07-12 15:24:06
|
On Thu, Jul 12, 2012 at 10:17 AM, <ben...@lf...> wrote: > > Hello Benjamin, > > > > Not knowing to much about the ASTERIX format, other than what you said > > and what is in the links, I would say that this is a good fit for HDF5 > > and PyTables. PyTables will certainly help you read in the data and > > manipulate it. > > > > However, before you abandon hachoir completely, I will say it is a lot > > easier to write hdf5 files in PyTables than to use the HDF5 C API. If > > hachoir is too slow, have you tried profiling the code to see what is > > taking up the most time? Maybe you could just rewrite these parts in > > C? Have you looked into Cythonizing it? Also, you don't seem to be > > using numpy to read in the data... (there are some tricks given ASTERIX > > here, but not insurmountable). > > > > I ask the above, just so you don't have to completely rewrite > > everything. You are correct though that pure python is probably not > > sufficient. Feel free to ask more questions here. > > Hello Anthony, > > Thanks for your answer. > Now that I discovered ipython and the line_profiler extension, I've done > more profiling. > Most of the time is spent in creating the field objects defined in hachoir. > A field object has nice attributes (size, value, description, address...) > but it adds a lot of overhead. > Understood! Good luck. > Contrary to my intuition, not that much time is spent in reading the data > (despite some bitfields to read). > So it's probably worth trying to parse and write an hdf5 file directly > with PyTables. > I need to read more PyTables doc and examples. > I might ask more questions when I have :-) > Feel free ;) > > Regards, > > Benjamin > > > > > > Be Well > > Anthony > > > > On Wed, Jul 11, 2012 at 6:52 AM, <ben...@lf...> wrote: > > > > > > Hi, > > > > I'm working with Air Traffic Management and would like to > > perform checks / compute statistics on ASTERIX data. > > ASTERIX is an ATM Surveillance Data Binary Messaging Format > > (http://www.eurocontrol.int/asterix/public/standard_page/overview.html) > > > > The data consist of a concatenation of consecutive data > > blocks. > > Each data block consists of data category + length + > > records. > > Each record is of variable length and consists of several > > data items (that are well defined for each category). > > Some data items might be present or not depending on a field > > specification (bitfield). > > > > I started to write a parser using hachoir > > (https://bitbucket.org/haypo/hachoir/overview) a pure python library. > > But the parsing was really too slow and taking a lot of > > memory. > > That's not really useable. > > > > >From what I read, PyTables could really help to manipulate > > and analyze the data. > > So I've been thinking about writing a tool (probably in C) > > to convert my ASTERIX format to HDF5. > > > > Before I start, I'd like confirmation that this seems like a > > suitable application for PyTables. > > Is there another approach than writing a conversion tool to > > HDF5? > > > > Thanks in advance > > > > Benjamin > > > > ------------------------------------------------------------ > > ------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's > > security and > > threat landscape has changed and how IT managers can > > respond. Discussions > > will include endpoint security, mobile security and the > > latest in malware > > threats. > > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: <ben...@lf...> - 2012-07-12 15:19:17
|
> Hello Benjamin, > > Not knowing to much about the ASTERIX format, other than what you said > and what is in the links, I would say that this is a good fit for HDF5 > and PyTables. PyTables will certainly help you read in the data and > manipulate it. > > However, before you abandon hachoir completely, I will say it is a lot > easier to write hdf5 files in PyTables than to use the HDF5 C API. If > hachoir is too slow, have you tried profiling the code to see what is > taking up the most time? Maybe you could just rewrite these parts in > C? Have you looked into Cythonizing it? Also, you don't seem to be > using numpy to read in the data... (there are some tricks given ASTERIX > here, but not insurmountable). > > I ask the above, just so you don't have to completely rewrite > everything. You are correct though that pure python is probably not > sufficient. Feel free to ask more questions here. Hello Anthony, Thanks for your answer. Now that I discovered ipython and the line_profiler extension, I've done more profiling. Most of the time is spent in creating the field objects defined in hachoir. A field object has nice attributes (size, value, description, address...) but it adds a lot of overhead. Contrary to my intuition, not that much time is spent in reading the data (despite some bitfields to read). So it's probably worth trying to parse and write an hdf5 file directly with PyTables. I need to read more PyTables doc and examples. I might ask more questions when I have :-) Regards, Benjamin > > Be Well > Anthony > > On Wed, Jul 11, 2012 at 6:52 AM, <ben...@lf...> wrote: > > > Hi, > > I'm working with Air Traffic Management and would like to > perform checks / compute statistics on ASTERIX data. > ASTERIX is an ATM Surveillance Data Binary Messaging Format > (http://www.eurocontrol.int/asterix/public/standard_page/overview.html) > > The data consist of a concatenation of consecutive data > blocks. > Each data block consists of data category + length + > records. > Each record is of variable length and consists of several > data items (that are well defined for each category). > Some data items might be present or not depending on a field > specification (bitfield). > > I started to write a parser using hachoir > (https://bitbucket.org/haypo/hachoir/overview) a pure python library. > But the parsing was really too slow and taking a lot of > memory. > That's not really useable. > > >From what I read, PyTables could really help to manipulate > and analyze the data. > So I've been thinking about writing a tool (probably in C) > to convert my ASTERIX format to HDF5. > > Before I start, I'd like confirmation that this seems like a > suitable application for PyTables. > Is there another approach than writing a conversion tool to > HDF5? > > Thanks in advance > > Benjamin > > ------------------------------------------------------------ > ------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's > security and > threat landscape has changed and how IT managers can > respond. Discussions > will include endpoint security, mobile security and the > latest in malware > threats. > http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Francesc A. <fa...@py...> - 2012-07-12 08:36:37
|
On 7/11/12 7:06 PM, Antonio Valentino wrote: > >> BTW, which is the status of the 3.x support? I vaguely remember >> you asking me for some help on this, but I don't remember well. Not >> that I have a lot of time to spend on it, but perhaps I can use >> some hours in the next days. > well, in PyTables 2.4 we made some job in preparation of the porting > to python3, but the porting itself is still not started. > > One of the main issues is that numexpr still not supports python3 so > we have a missing dependency. > > I started a porting of numexpr to python3 (see [1]) but it is still > incomplete. > I hope it is good enough to let us start working on the porting of > PyTables. > > Of course if you would like to give a look to numexpr for python3 it > would be of great help. Ah yes. I have also tried with the porting of numexpr to python3, but failed. I'll try to have a look at your patches and see if we can make it. Not sure when I'll have time to tackle this, but hopefully I'll be able to tell you something soon. > > After the PyTables 2.4 final I plan to publish a wiki page with my > roadmap proposal. IMHO main points are: > > * open a new branch in the repo > * remove al deprecated code (Numeric, numarray, netcdf3, etc). This > breaks the API and, IMHO, we will also need to bump the format version > * ensure that all the required SW work (enough) on python3 > * handle str/unicode issues > * full support to unicode HDF5 object names > * start working an a good setup for 2to3 (needs some investigation) > * ... > > Please let me know if you think there are other point that are > important for python3 support Ok. That looks good. This is a lot of work though, but I hope you will manage. Thanks for the fine work! -- Francesc Alted |
From: Anthony S. <sc...@gm...> - 2012-07-11 22:02:29
|
Hello Benjamin, Not knowing to much about the ASTERIX format, other than what you said and what is in the links, I would say that this is a good fit for HDF5 and PyTables. PyTables will certainly help you read in the data and manipulate it. However, before you abandon hachoir completely, I will say it is a lot easier to write hdf5 files in PyTables than to use the HDF5 C API. If hachoir is too slow, have you tried profiling the code to see what is taking up the most time? Maybe you could just rewrite these parts in C? Have you looked into Cythonizing it? Also, you don't seem to be using numpy to read in the data... (there are some tricks given ASTERIX here, but not insurmountable). I ask the above, just so you don't have to completely rewrite everything. You are correct though that pure python is probably not sufficient. Feel free to ask more questions here. Be Well Anthony On Wed, Jul 11, 2012 at 6:52 AM, <ben...@lf...> wrote: > Hi, > > I'm working with Air Traffic Management and would like to perform checks / > compute statistics on ASTERIX data. > ASTERIX is an ATM Surveillance Data Binary Messaging Format ( > http://www.eurocontrol.int/asterix/public/standard_page/overview.html) > > The data consist of a concatenation of consecutive data blocks. > Each data block consists of data category + length + records. > Each record is of variable length and consists of several data items (that > are well defined for each category). > Some data items might be present or not depending on a field specification > (bitfield). > > I started to write a parser using hachoir ( > https://bitbucket.org/haypo/hachoir/overview) a pure python library. > But the parsing was really too slow and taking a lot of memory. > That's not really useable. > > >From what I read, PyTables could really help to manipulate and analyze > the data. > So I've been thinking about writing a tool (probably in C) to convert my > ASTERIX format to HDF5. > > Before I start, I'd like confirmation that this seems like a suitable > application for PyTables. > Is there another approach than writing a conversion tool to HDF5? > > Thanks in advance > > Benjamin > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Antonio V. <ant...@ti...> - 2012-07-11 17:06:36
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Francesc, Il 11/07/2012 11:33, Francesc Alted ha scritto: > Hey Antonio, this looks great. > Thanks :) > BTW, which is the status of the 3.x support? I vaguely remember > you asking me for some help on this, but I don't remember well. Not > that I have a lot of time to spend on it, but perhaps I can use > some hours in the next days. well, in PyTables 2.4 we made some job in preparation of the porting to python3, but the porting itself is still not started. One of the main issues is that numexpr still not supports python3 so we have a missing dependency. I started a porting of numexpr to python3 (see [1]) but it is still incomplete. I hope it is good enough to let us start working on the porting of PyTables. Of course if you would like to give a look to numexpr for python3 it would be of great help. After the PyTables 2.4 final I plan to publish a wiki page with my roadmap proposal. IMHO main points are: * open a new branch in the repo * remove al deprecated code (Numeric, numarray, netcdf3, etc). This breaks the API and, IMHO, we will also need to bump the format version * ensure that all the required SW work (enough) on python3 * handle str/unicode issues * full support to unicode HDF5 object names * start working an a good setup for 2to3 (needs some investigation) * ... Please let me know if you think there are other point that are important for python3 support [1] https://groups.google.com/forum/?fromgroups#!topic/numexpr/M2MVjXsBR0c cheers > > Thanks, > > Francesc > > On 7/7/12 8:47 PM, Antonio Valentino wrote: > =========================== Announcing PyTables 2.4.0b1 > =========================== > > We are happy to announce PyTables 2.4.0b1. > [CUT] - -- Antonio Valentino -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk/9soUACgkQ1JUs2CS3bP4u1ACeJKnMQRFF1hATXFMG3lPH2xyU 9DwAoJNPp6L8gHf+s5hA2Jhj4JLyl3jr =AYqd -----END PGP SIGNATURE----- |
From: <ben...@lf...> - 2012-07-11 12:19:03
|
Hi, I'm working with Air Traffic Management and would like to perform checks / compute statistics on ASTERIX data. ASTERIX is an ATM Surveillance Data Binary Messaging Format (http://www.eurocontrol.int/asterix/public/standard_page/overview.html) The data consist of a concatenation of consecutive data blocks. Each data block consists of data category + length + records. Each record is of variable length and consists of several data items (that are well defined for each category). Some data items might be present or not depending on a field specification (bitfield). I started to write a parser using hachoir (https://bitbucket.org/haypo/hachoir/overview) a pure python library. But the parsing was really too slow and taking a lot of memory. That's not really useable. >From what I read, PyTables could really help to manipulate and analyze the data. So I've been thinking about writing a tool (probably in C) to convert my ASTERIX format to HDF5. Before I start, I'd like confirmation that this seems like a suitable application for PyTables. Is there another approach than writing a conversion tool to HDF5? Thanks in advance Benjamin |
From: Francesc A. <fa...@gm...> - 2012-07-11 09:34:00
|
Hey Antonio, this looks great. BTW, which is the status of the 3.x support? I vaguely remember you asking me for some help on this, but I don't remember well. Not that I have a lot of time to spend on it, but perhaps I can use some hours in the next days. Thanks, Francesc On 7/7/12 8:47 PM, Antonio Valentino wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > =========================== > Announcing PyTables 2.4.0b1 > =========================== > > We are happy to announce PyTables 2.4.0b1. > > This is an incremental release which includes many changes to prepare > for future Python 3 support. > > > What's new > ========== > > This release includes support for the float16 data type and read-only > support for variable length string attributes. > > The handling of HDF5 errors has been improved. The user will no > longer see HDF5 error stacks dumped to the console. All HDF5 error > messages are trapped and attached to a proper Python exception. > > Now PyTables only supports HDF5 v1.8.4+. All the code has been updated > to the new HDF5 API. Supporting only HDF5 1.8 series is beneficial > for future development. > > As always, a large amount of bugs have been addressed and squashed as > well. > > In case you want to know more in detail what has changed in this > version, please refer to: > http://pytables.github.com/release_notes.html > > You can download a source package with generated PDF and HTML docs, as > well as binaries for Windows, from: > http://sourceforge.net/projects/pytables/files/pytables/2.4.0b1 > > For an online version of the manual, visit: > http://pytables.github.com/usersguide/index.html > > > What it is? > =========== > > PyTables is a library for managing hierarchical datasets and > designed to efficiently cope with extremely large amounts of data with > support for full 64-bit file addressing. PyTables runs on top of > the HDF5 library and NumPy package for achieving maximum throughput and > convenient use. PyTables includes OPSI, a new indexing technology, > allowing to perform data lookups in tables exceeding 10 gigarows > (10**10 rows) in less than a tenth of a second. > > > Resources > ========= > > About PyTables: http://www.pytables.org > > About the HDF5 library: http://hdfgroup.org/HDF5/ > > About NumPy: http://numpy.scipy.org/ > > > Acknowledgments > =============== > > Thanks to many users who provided feature improvements, patches, bug > reports, support and suggestions. See the ``THANKS`` file in the > distribution package for a (incomplete) list of contributors. Most > specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) > makers. Without them, PyTables simply would not exist. > > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. > > > - ---- > > **Enjoy data!** > > > - -- > The PyTables Team > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iEYEARECAAYFAk/4hDwACgkQ1JUs2CS3bP7TUwCfcobS3KI7L/6k3Bbbt2VBOz5B > TqAAn0DhrSdtd7XTPOj0RR/mpr2FtseE > =T5iQ > -----END PGP SIGNATURE----- > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-announce mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-announce -- Francesc Alted |
From: Antonio V. <ant...@ti...> - 2012-07-08 17:47:01
|
Thank you very much Christoph. Il 08/07/2012 19:41, Christoph Gohlke ha scritto: > I submitted a PR at <https://github.com/PyTables/PyTables/pull/161> > > Christoph > > > On 7/8/2012 10:06 AM, Antonio Valentino wrote: >> Hi Christoph, >> >> Il 08/07/2012 18:21, Christoph Gohlke ha scritto: >>> Hi Antonio, >>> >>> here's the stderr output: >>> >>> HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 0: >>> #000: ..\..\hdf5-1.8.8\src\H5F.c line 1522 in H5Fopen(): unable to >>> open file >>> major: File accessability >>> minor: Unable to open file >>> #001: ..\..\hdf5-1.8.8\src\H5F.c line 1313 in H5F_open(): unable to >>> read superblock >>> major: File accessability >>> minor: Read failed >>> #002: ..\..\hdf5-1.8.8\src\H5Fsuper.c line 334 in H5F_super_read(): >>> unable to find file signature >>> major: File accessability >>> minor: Not an HDF5 file >>> #003: ..\..\hdf5-1.8.8\src\H5Fsuper.c line 155 in >>> H5F_locate_signature(): unable to find a valid file signature >>> major: Low-level I/O >>> minor: Unable to initialize object >>> >>> >>> Christoph >>> >> >> thank you. >> This is strange, "HDF5-DIAG" actually is in the output so you should not >> have the reported error: >> >> ====================================================================== >> FAIL: test_enable_messages (tables.tests.test_basics.HDF5ErrorHandling) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File "X:\Python27-x64\lib\site-packages\tables\tests\common.py", line >> 259, in newmethod >> return oldmethod(self, *args, **kwargs) >> File "X:\Python27-x64\lib\site-packages\tables\tests\test_basics.py", >> line 2445, in test_enable_messages >> self.assertTrue("HDF5-DIAG" in stderr) >> AssertionError: False is not true >> >> >> Looking at the doc is seems that there is no particular issue at >> subprocess level. >> >> Umm, can you please check if the message actually comes form the stderr >> and not form the stdout? >> >> thanks >> >>> >>> On 7/8/2012 3:55 AM, Antonio Valentino wrote: >>>> Hi Christoph, >>>> thank you for reporting. >>>> >>>> Can you please tell us which is the output of the attached script on >>>> your machine? >>>> >>>> thanks in advance >>>> >>>> >>>> >>>> Il 07/07/2012 21:18, Christoph Gohlke ha scritto: >>>>> Looks good. Only one test failure on win-amd64-py2.7 (attached). >>>>> >>>>> Christoph >>>>> >>>>> On 7/7/2012 11:47 AM, Antonio Valentino wrote: >>>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>>> Hash: SHA1 >>>>>> >>>>>> =========================== >>>>>> Announcing PyTables 2.4.0b1 >>>>>> =========================== >>>>>> >>>> [CUT] -- Antonio Valentino |
From: Christoph G. <cg...@uc...> - 2012-07-08 17:42:00
|
I submitted a PR at <https://github.com/PyTables/PyTables/pull/161> Christoph On 7/8/2012 10:06 AM, Antonio Valentino wrote: > Hi Christoph, > > Il 08/07/2012 18:21, Christoph Gohlke ha scritto: >> Hi Antonio, >> >> here's the stderr output: >> >> HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 0: >> #000: ..\..\hdf5-1.8.8\src\H5F.c line 1522 in H5Fopen(): unable to >> open file >> major: File accessability >> minor: Unable to open file >> #001: ..\..\hdf5-1.8.8\src\H5F.c line 1313 in H5F_open(): unable to >> read superblock >> major: File accessability >> minor: Read failed >> #002: ..\..\hdf5-1.8.8\src\H5Fsuper.c line 334 in H5F_super_read(): >> unable to find file signature >> major: File accessability >> minor: Not an HDF5 file >> #003: ..\..\hdf5-1.8.8\src\H5Fsuper.c line 155 in >> H5F_locate_signature(): unable to find a valid file signature >> major: Low-level I/O >> minor: Unable to initialize object >> >> >> Christoph >> > > thank you. > This is strange, "HDF5-DIAG" actually is in the output so you should not > have the reported error: > > ====================================================================== > FAIL: test_enable_messages (tables.tests.test_basics.HDF5ErrorHandling) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "X:\Python27-x64\lib\site-packages\tables\tests\common.py", line > 259, in newmethod > return oldmethod(self, *args, **kwargs) > File "X:\Python27-x64\lib\site-packages\tables\tests\test_basics.py", > line 2445, in test_enable_messages > self.assertTrue("HDF5-DIAG" in stderr) > AssertionError: False is not true > > > Looking at the doc is seems that there is no particular issue at > subprocess level. > > Umm, can you please check if the message actually comes form the stderr > and not form the stdout? > > thanks > >> >> On 7/8/2012 3:55 AM, Antonio Valentino wrote: >>> Hi Christoph, >>> thank you for reporting. >>> >>> Can you please tell us which is the output of the attached script on >>> your machine? >>> >>> thanks in advance >>> >>> >>> >>> Il 07/07/2012 21:18, Christoph Gohlke ha scritto: >>>> Looks good. Only one test failure on win-amd64-py2.7 (attached). >>>> >>>> Christoph >>>> >>>> On 7/7/2012 11:47 AM, Antonio Valentino wrote: >>>>> -----BEGIN PGP SIGNED MESSAGE----- >>>>> Hash: SHA1 >>>>> >>>>> =========================== >>>>> Announcing PyTables 2.4.0b1 >>>>> =========================== >>>>> >>> [CUT] >>> >>> >> > > |
From: Antonio V. <ant...@ti...> - 2012-07-08 17:07:10
|
Hi Christoph, Il 08/07/2012 18:21, Christoph Gohlke ha scritto: > Hi Antonio, > > here's the stderr output: > > HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 0: > #000: ..\..\hdf5-1.8.8\src\H5F.c line 1522 in H5Fopen(): unable to > open file > major: File accessability > minor: Unable to open file > #001: ..\..\hdf5-1.8.8\src\H5F.c line 1313 in H5F_open(): unable to > read superblock > major: File accessability > minor: Read failed > #002: ..\..\hdf5-1.8.8\src\H5Fsuper.c line 334 in H5F_super_read(): > unable to find file signature > major: File accessability > minor: Not an HDF5 file > #003: ..\..\hdf5-1.8.8\src\H5Fsuper.c line 155 in > H5F_locate_signature(): unable to find a valid file signature > major: Low-level I/O > minor: Unable to initialize object > > > Christoph > thank you. This is strange, "HDF5-DIAG" actually is in the output so you should not have the reported error: ====================================================================== FAIL: test_enable_messages (tables.tests.test_basics.HDF5ErrorHandling) ---------------------------------------------------------------------- Traceback (most recent call last): File "X:\Python27-x64\lib\site-packages\tables\tests\common.py", line 259, in newmethod return oldmethod(self, *args, **kwargs) File "X:\Python27-x64\lib\site-packages\tables\tests\test_basics.py", line 2445, in test_enable_messages self.assertTrue("HDF5-DIAG" in stderr) AssertionError: False is not true Looking at the doc is seems that there is no particular issue at subprocess level. Umm, can you please check if the message actually comes form the stderr and not form the stdout? thanks > > On 7/8/2012 3:55 AM, Antonio Valentino wrote: >> Hi Christoph, >> thank you for reporting. >> >> Can you please tell us which is the output of the attached script on >> your machine? >> >> thanks in advance >> >> >> >> Il 07/07/2012 21:18, Christoph Gohlke ha scritto: >>> Looks good. Only one test failure on win-amd64-py2.7 (attached). >>> >>> Christoph >>> >>> On 7/7/2012 11:47 AM, Antonio Valentino wrote: >>>> -----BEGIN PGP SIGNED MESSAGE----- >>>> Hash: SHA1 >>>> >>>> =========================== >>>> Announcing PyTables 2.4.0b1 >>>> =========================== >>>> >> [CUT] >> >> > -- Antonio Valentino |
From: Christoph G. <cg...@uc...> - 2012-07-08 16:21:34
|
Hi Antonio, here's the stderr output: HDF5-DIAG: Error detected in HDF5 (1.8.8) thread 0: #000: ..\..\hdf5-1.8.8\src\H5F.c line 1522 in H5Fopen(): unable to open file major: File accessability minor: Unable to open file #001: ..\..\hdf5-1.8.8\src\H5F.c line 1313 in H5F_open(): unable to read superblock major: File accessability minor: Read failed #002: ..\..\hdf5-1.8.8\src\H5Fsuper.c line 334 in H5F_super_read(): unable to find file signature major: File accessability minor: Not an HDF5 file #003: ..\..\hdf5-1.8.8\src\H5Fsuper.c line 155 in H5F_locate_signature(): unable to find a valid file signature major: Low-level I/O minor: Unable to initialize object Christoph On 7/8/2012 3:55 AM, Antonio Valentino wrote: > Hi Christoph, > thank you for reporting. > > Can you please tell us which is the output of the attached script on > your machine? > > thanks in advance > > > > Il 07/07/2012 21:18, Christoph Gohlke ha scritto: >> Looks good. Only one test failure on win-amd64-py2.7 (attached). >> >> Christoph >> >> On 7/7/2012 11:47 AM, Antonio Valentino wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA1 >>> >>> =========================== >>> Announcing PyTables 2.4.0b1 >>> =========================== >>> > [CUT] > > |