You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Anthony S. <sc...@gm...> - 2012-08-15 21:48:12
|
On Wed, Aug 15, 2012 at 12:33 PM, Adam Dershowitz <ade...@ex...>wrote: > I am trying to find all cases where a value transitions above a > threshold. So, my code first does a getwherelist to find values that are > above the threshold, then it uses that list to find immediately prior > values that are below. The code is working, but the second part, searching > through just a smaller subset is much slower (First search is on the order > of 1 second, while the second is a minute). > Is there any way to get this second part of the search in-kernal? Or any > more general way to do a search for values above a threshold, where the > prior value is below? > Essentially, what I am looking for is a way to speed up that second search > for "all rows in a prior defined list, where a condition is applied to the > table" > > My table is just seconds and values, in chronological order. > > Here is the code that I am using now: > > h5data = tb.openFile("AllData.h5","r") > table1 = h5data.root.table1 > > #Find all values above threshold: > thelist= table1.getWhereList("""Value > 150""") > > #From the above list find all values where the immediately prior value > is below: > transition=[] > for i in thelist: > if (table1[i-1]['Value'] < 150) and (i != 0) : > transition.append(i) > Hey Adam, Sorry for taking a while to respond. Assuming you don't mind one of these being <= or >=, you don't really need the second loop with a little index arithmetic: import numpy as np inds = np.array(thelist) dinds = inds[1:] - inds[:-1] transition = dinds[(1 < dinds)] This should get you an array of all of the transition indices since wherever the difference in indices is greater than 1 the Value must have dropped below the threshold and then returned back up. Be Well Anthony > > Thanks, > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Adam D. <ade...@ex...> - 2012-08-15 17:33:17
|
I am trying to find all cases where a value transitions above a threshold. So, my code first does a getwherelist to find values that are above the threshold, then it uses that list to find immediately prior values that are below. The code is working, but the second part, searching through just a smaller subset is much slower (First search is on the order of 1 second, while the second is a minute). Is there any way to get this second part of the search in-kernal? Or any more general way to do a search for values above a threshold, where the prior value is below? Essentially, what I am looking for is a way to speed up that second search for "all rows in a prior defined list, where a condition is applied to the table" My table is just seconds and values, in chronological order. Here is the code that I am using now: h5data = tb.openFile("AllData.h5","r") table1 = h5data.root.table1 #Find all values above threshold: thelist= table1.getWhereList("""Value > 150""") #From the above list find all values where the immediately prior value is below: transition=[] for i in thelist: if (table1[i-1]['Value'] < 150) and (i != 0) : transition.append(i) Thanks, |
From: Anthony S. <sc...@gm...> - 2012-08-15 16:15:48
|
Hello Ask, I bet this is because you are storing these as attrs...which will default back to some pickled Python representation. Can you check if this works as expected when saving as actual arrays. Something like: import numpy as np import tables with tables.openFile("test.h5", "w") as f: A=np.array([[0,1],[2,3]]) a=f.createArray("/", "a", A) b=f.createArray("/", "b", A.T.copy()) c=f.createArray("/", "c", A.T) assert np.all(a==A) assert np.all(b==A.T) assert np.all(c==A) # AssertionError! assert np.all(c==A.T) Be Well Anthony On Wed, Aug 15, 2012 at 4:13 AM, Ask F. Jakobsen <as...@li...> wrote: > Hey all, > > When I store a view of a numpy array as an attribute it appears to be > stored as the array that owns the data. Is this a bug? I find it confusing > that the user has to check if the numpy array owns the data or always > remember to do a copy() before storing a numpy array as an attribute. > > Below is some sample code that highlights the problem. > > Best regards, Ask > > import numpy as np > import tables > > with tables.openFile("test.h5", "w") as f: > > x=f.createArray("/", "test", [0]) > > A=np.array([[0,1],[2,3]]) > > x.attrs['a']=A > x.attrs['b']=A.T.copy() > x.attrs['c']=A.T > > assert np.all(x.attrs['a']==A) > assert np.all(x.attrs['b']==A.T) > assert np.all(x.attrs['c']==A) > assert np.all(x.attrs['c']==A.T) # AssertionError! > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Ask F. J. <as...@li...> - 2012-08-15 09:30:29
|
Hey all, When I store a view of a numpy array as an attribute it appears to be stored as the array that owns the data. Is this a bug? I find it confusing that the user has to check if the numpy array owns the data or always remember to do a copy() before storing a numpy array as an attribute. Below is some sample code that highlights the problem. Best regards, Ask import numpy as np import tables with tables.openFile("test.h5", "w") as f: x=f.createArray("/", "test", [0]) A=np.array([[0,1],[2,3]]) x.attrs['a']=A x.attrs['b']=A.T.copy() x.attrs['c']=A.T assert np.all(x.attrs['a']==A) assert np.all(x.attrs['b']==A.T) assert np.all(x.attrs['c']==A) assert np.all(x.attrs['c']==A.T) # AssertionError! |
From: Anthony S. <sc...@gm...> - 2012-08-07 16:55:15
|
On Tue, Aug 7, 2012 at 11:50 AM, Daniel Wheeler <dan...@gm...>wrote: > > > On Tue, Aug 7, 2012 at 12:46 PM, Anthony Scopatz <sc...@gm...>wrote: > >> >> On Tue, Aug 7, 2012 at 11:43 AM, Daniel Wheeler < >> dan...@gm...> wrote: >> >>> >>> >>>> They should know what do do and how to fix it. >>> >>> >>> Maybe mpi init issues with either pytrilinos or mpi4py as a wild guess. >>> Both are imported by fipy. >>> >> >> Your guess is as good, or much better, than mine. >> > > Thanks for your questions and answers. > BTW If it turns out that you need us to change something in PyTables to play nicely with fipy, please let us know! > > -- > Daniel Wheeler > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Daniel W. <dan...@gm...> - 2012-08-07 16:50:56
|
On Tue, Aug 7, 2012 at 12:46 PM, Anthony Scopatz <sc...@gm...> wrote: > > On Tue, Aug 7, 2012 at 11:43 AM, Daniel Wheeler <dan...@gm... > > wrote: >> >> >> >>> They should know what do do and how to fix it. >> >> >> Maybe mpi init issues with either pytrilinos or mpi4py as a wild guess. >> Both are imported by fipy. >> > > Your guess is as good, or much better, than mine. > Thanks for your questions and answers. -- Daniel Wheeler |
From: Anthony S. <sc...@gm...> - 2012-08-07 16:46:56
|
On Tue, Aug 7, 2012 at 11:43 AM, Daniel Wheeler <dan...@gm...>wrote: > > > On Tue, Aug 7, 2012 at 11:49 AM, Anthony Scopatz <sc...@gm...>wrote: > >> Yeah, that is probably it. There is probably some weird overlap >> of resources for fipy and hdf5. Sorry, I don't know what we can do about >> this, but I would bring it up with the fipy people. > > > I am the fipy people. > All apologies ;) > > >> They should know what do do and how to fix it. > > > Maybe mpi init issues with either pytrilinos or mpi4py as a wild guess. > Both are imported by fipy. > Your guess is as good, or much better, than mine. > > -- > Daniel Wheeler > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Daniel W. <dan...@gm...> - 2012-08-07 16:43:10
|
On Tue, Aug 7, 2012 at 11:49 AM, Anthony Scopatz <sc...@gm...> wrote: > Yeah, that is probably it. There is probably some weird overlap > of resources for fipy and hdf5. Sorry, I don't know what we can do about > this, but I would bring it up with the fipy people. I am the fipy people. > They should know what do do and how to fix it. Maybe mpi init issues with either pytrilinos or mpi4py as a wild guess. Both are imported by fipy. -- Daniel Wheeler |
From: Anthony S. <sc...@gm...> - 2012-08-07 15:49:54
|
Yeah, that is probably it. There is probably some weird overlap of resources for fipy and hdf5. Sorry, I don't know what we can do about this, but I would bring it up with the fipy people. They should know what do do and how to fix it. Be Well Anthony On Tue, Aug 7, 2012 at 8:22 AM, Daniel Wheeler <dan...@gm...>wrote: > > > On Mon, Aug 6, 2012 at 6:32 PM, Anthony Scopatz <sc...@gm...> wrote: > >> On Mon, Aug 6, 2012 at 5:25 PM, Daniel Wheeler <dan...@gm... >> > wrote: >> >>> >>> >> Hmm interesting. Was your HDF5 itself configured & compiled with MPI >> support? Or are you using a serial version? >> > > Yes, compiled with MPI support. > > > -- > Daniel Wheeler > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Daniel W. <dan...@gm...> - 2012-08-07 13:23:08
|
On Mon, Aug 6, 2012 at 6:32 PM, Anthony Scopatz <sc...@gm...> wrote: > On Mon, Aug 6, 2012 at 5:25 PM, Daniel Wheeler <dan...@gm...>wrote: > >> >> > Hmm interesting. Was your HDF5 itself configured & compiled with MPI > support? Or are you using a serial version? > Yes, compiled with MPI support. -- Daniel Wheeler |
From: Anthony S. <sc...@gm...> - 2012-08-06 22:33:17
|
On Mon, Aug 6, 2012 at 5:25 PM, Daniel Wheeler <dan...@gm...>wrote: > On Mon, Aug 6, 2012 at 5:59 PM, Anthony Scopatz <sc...@gm...> wrote: > >> Hi Daniel, >> >> I am glad that it is working for you. I don't think that those version >> differences are significant enough to have caused the problem you >> were experiencing. I bet there was some other externality in virtualenv >> that was causing this, though figuring out what it was might be more >> trouble than it is worth. Once again, glad it is working for you. >> > > Definitely not worth figuring out until it is easily reproducible. There > seems to have been two issues that were confusing me. I don't think the > version differences mattered at all. Firstly, I can't open an h5 file in > ipython for some reason though it works fine from the regular python > prompt. Secondly, the order in which imports occur seems to influence the > hanging behavior. As long as "tables" is imported before the "fipy" > package, it works fine. Fipy imports a lot of things including stuff with > mpi and threads that seems always to give issues. I suspect it is these > packages playing badly in some way. > Hmm interesting. Was your HDF5 itself configured & compiled with MPI support? Or are you using a serial version? > > > -- > Daniel Wheeler > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Daniel W. <dan...@gm...> - 2012-08-06 22:25:06
|
On Mon, Aug 6, 2012 at 5:59 PM, Anthony Scopatz <sc...@gm...> wrote: > Hi Daniel, > > I am glad that it is working for you. I don't think that those version > differences are significant enough to have caused the problem you > were experiencing. I bet there was some other externality in virtualenv > that was causing this, though figuring out what it was might be more > trouble than it is worth. Once again, glad it is working for you. > Definitely not worth figuring out until it is easily reproducible. There seems to have been two issues that were confusing me. I don't think the version differences mattered at all. Firstly, I can't open an h5 file in ipython for some reason though it works fine from the regular python prompt. Secondly, the order in which imports occur seems to influence the hanging behavior. As long as "tables" is imported before the "fipy" package, it works fine. Fipy imports a lot of things including stuff with mpi and threads that seems always to give issues. I suspect it is these packages playing badly in some way. -- Daniel Wheeler |
From: Anthony S. <sc...@gm...> - 2012-08-06 21:59:30
|
Hi Daniel, I am glad that it is working for you. I don't think that those version differences are significant enough to have caused the problem you were experiencing. I bet there was some other externality in virtualenv that was causing this, though figuring out what it was might be more trouble than it is worth. Once again, glad it is working for you. Be Well Anthony On Mon, Aug 6, 2012 at 3:55 PM, Daniel Wheeler <dan...@gm...>wrote: > > > On Mon, Aug 6, 2012 at 12:20 PM, Anthony Scopatz <sc...@gm...>wrote: > >> Hmm What if you try opening in 'r' or 'w' mode, rather than 'a'? >> >> > Doesn't make a difference. > > >> I am not an expert in virtualenv, but I have heard of people having >> problems with it with other packages. Let us know if there is anything >> that we can do. >> >> > Starting with a clean virtualenv seems to have worked. The new > configuration is > > > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > PyTables version: 2.4.0 > HDF5 version: 1.8.4-patch1 > NumPy version: 1.6.2 > > Numexpr version: 2.0.1 (not using Intel's VML/MKL) > Zlib version: 1.2.3.4 (in Python interpreter) > BZIP2 version: 1.0.5 (10-Dec-2007) > Blosc version: 1.1.3 (2010-11-16) > Cython version: 0.16 > > Python version: 2.6.6 (r266:84292, Dec 26 2010, 22:31:48) > [GCC 4.4.5] > Platform: linux2-x86_64 > Byte-ordering: little > Detected cores: 4 > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > > The only difference seems to cython and numpy. Tests run to completion. > > -- > Daniel Wheeler > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Daniel W. <dan...@gm...> - 2012-08-06 20:55:22
|
On Mon, Aug 6, 2012 at 12:20 PM, Anthony Scopatz <sc...@gm...> wrote: > Hmm What if you try opening in 'r' or 'w' mode, rather than 'a'? > > Doesn't make a difference. > I am not an expert in virtualenv, but I have heard of people having > problems with it with other packages. Let us know if there is anything > that we can do. > > Starting with a clean virtualenv seems to have worked. The new configuration is -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= PyTables version: 2.4.0 HDF5 version: 1.8.4-patch1 NumPy version: 1.6.2 Numexpr version: 2.0.1 (not using Intel's VML/MKL) Zlib version: 1.2.3.4 (in Python interpreter) BZIP2 version: 1.0.5 (10-Dec-2007) Blosc version: 1.1.3 (2010-11-16) Cython version: 0.16 Python version: 2.6.6 (r266:84292, Dec 26 2010, 22:31:48) [GCC 4.4.5] Platform: linux2-x86_64 Byte-ordering: little Detected cores: 4 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= The only difference seems to cython and numpy. Tests run to completion. -- Daniel Wheeler |
From: Anthony S. <sc...@gm...> - 2012-08-06 16:20:56
|
Hmm What if you try opening in 'r' or 'w' mode, rather than 'a'? I am not an expert in virtualenv, but I have heard of people having problems with it with other packages. Let us know if there is anything that we can do. Be Well Anthony On Mon, Aug 6, 2012 at 11:17 AM, Daniel Wheeler <dan...@gm...>wrote: > > > On Mon, Aug 6, 2012 at 12:13 PM, Anthony Scopatz <sc...@gm...>wrote: > >> Hi Daniel, >> >> Does this always happen when opening files? or just occasionally? > > > I just installed pytables this morning and it happens every time so far. I > was using pytables a lot about a year ago without any issues. This morning, > I installed it in a virtualenv with a lot of other packages. I might try it > in a completely clean virtualenv and see if that helps any. > > > -- > Daniel Wheeler > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Daniel W. <dan...@gm...> - 2012-08-06 16:17:57
|
On Mon, Aug 6, 2012 at 12:13 PM, Anthony Scopatz <sc...@gm...> wrote: > Hi Daniel, > > Does this always happen when opening files? or just occasionally? I just installed pytables this morning and it happens every time so far. I was using pytables a lot about a year ago without any issues. This morning, I installed it in a virtualenv with a lot of other packages. I might try it in a completely clean virtualenv and see if that helps any. -- Daniel Wheeler |
From: Anthony S. <sc...@gm...> - 2012-08-06 16:13:41
|
Hi Daniel, Does this always happen when opening files? or just occasionally? Be Well Anthony On Mon, Aug 6, 2012 at 11:08 AM, Daniel Wheeler <dan...@gm...>wrote: > The following just seems to hang indefinitely. > > In [1]: import tables > > > In [2]: f = tables.openFile('tmp.h5', mode='a') > > The tests hang as well. > > In [3]: tables.test() > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > PyTables version: 2.4.0 > HDF5 version: 1.8.4-patch1 > NumPy version: 1.6.1 > Numexpr version: 2.0.1 (not using Intel's VML/MKL) > Zlib version: 1.2.3.4 (in Python interpreter) > BZIP2 version: 1.0.5 (10-Dec-2007) > Blosc version: 1.1.3 (2010-11-16) > Cython version: 0.15.1 > Python version: 2.6.6 (r266:84292, Dec 26 2010, 22:31:48) > [GCC 4.4.5] > Platform: linux2-x86_64 > Byte-ordering: little > Detected cores: 4 > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > Performing only a light (yet comprehensive) subset of the test suite. > If you want a more complete test, try passing the --heavy flag to this > script > (or set the 'heavy' parameter in case you are using tables.test() call). > The whole suite will take more than 4 hours to complete on a relatively > modern CPU and around 512 MB of main memory. > > -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= > /users/wd15/.virtualenvs/trunk/lib/python2.6/site-packages/tables/filters.py:253: > FiltersWarning: compression library ``lzo`` is not available; using > ``zlib`` instead > % (complib, default_complib), FiltersWarning ) > > Any ideas are greatly appreciated. > > Thanks. > > -- > Daniel Wheeler > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Daniel W. <dan...@gm...> - 2012-08-06 16:09:01
|
The following just seems to hang indefinitely. In [1]: import tables In [2]: f = tables.openFile('tmp.h5', mode='a') The tests hang as well. In [3]: tables.test() -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= PyTables version: 2.4.0 HDF5 version: 1.8.4-patch1 NumPy version: 1.6.1 Numexpr version: 2.0.1 (not using Intel's VML/MKL) Zlib version: 1.2.3.4 (in Python interpreter) BZIP2 version: 1.0.5 (10-Dec-2007) Blosc version: 1.1.3 (2010-11-16) Cython version: 0.15.1 Python version: 2.6.6 (r266:84292, Dec 26 2010, 22:31:48) [GCC 4.4.5] Platform: linux2-x86_64 Byte-ordering: little Detected cores: 4 -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Performing only a light (yet comprehensive) subset of the test suite. If you want a more complete test, try passing the --heavy flag to this script (or set the 'heavy' parameter in case you are using tables.test() call). The whole suite will take more than 4 hours to complete on a relatively modern CPU and around 512 MB of main memory. -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= /users/wd15/.virtualenvs/trunk/lib/python2.6/site-packages/tables/filters.py:253: FiltersWarning: compression library ``lzo`` is not available; using ``zlib`` instead % (complib, default_complib), FiltersWarning ) Any ideas are greatly appreciated. Thanks. -- Daniel Wheeler |
From: Juan M. V. T. <jmv...@gm...> - 2012-08-06 15:32:46
|
Hi Antonio, Last question about this, from pytables point of view and based on your experience, is it better to manage a table with 3 million rows and multidimensional cells or a table with 300 million rows and plain cells? Thank you, Juanma El Aug 5, 2012, a las 17:32, Antonio Valentino <ant...@ti...> escribió: > Hi Juan Manuel, > > Il 05/08/2012 22:52, Juan Manuel Vázquez Tovar ha scritto: >> Hi Antonio, >> >> This is the piece of code I use to read the part of the table I need: >> >> data = [case[´loads´][i] for case in table] >> >> where i is the index of the row that I need to read from the matrix (133x6) >> stored in each cell of the column "loads". >> >> Juanma >> > > that looks perfectly fine to me. > No idea about what could be the issue :/ > > You can perfform patrial reads using Table.iterrows: > > data = [case[´loads´][i] for case in table.iterrows(start, stop)] > > Please also consider that using a single np.array with 1e8 rows instead > of a list of arrays will allows you to save the memory overhead of 1e8 > array objects. > Considering that 6 doubles are 48 bytes while an empty np.array takes 80 > bytes > > In [64]: sys.getsizeof(np.zeros((0,))) > Out[64]: 80 > > you should be able to reduce the memory footprint by far more than an half. > > > cheers > > >> 2012/8/5 Antonio Valentino <ant...@ti...> >> >>> Hi Juan Manuel, >>> >>> Il 05/08/2012 22:28, Juan Manuel Vázquez Tovar ha scritto: >>>> Hi Antonio, >>>> >>>> You are right, I don´t need to load the entire table into memory. >>>> The fourth column has multidimensional cells and when I read a single row >>>> from every cell in the column, I almost fill the workstation memory. >>>> I didn´t expect that process to use so much memory, but the fact is that >>> it >>>> uses it. >>>> May be I didn´t explain very well last time. >>>> >>>> Thank you, >>>> >>>> Juanma >>>> >>> >>> Sorry, still don't understand. >>> Can you please post a short code snipped that shows how exactly do you >>> read data into your program? >>> >>> My impression is that somewhere you use some instruction that triggers >>> loading of unnecessary data into memory. > > > -- > Antonio Valentino > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Juan M. V. T. <jmv...@gm...> - 2012-08-05 22:15:24
|
Thank you Antonio, I will try Cheers Juanma El Aug 5, 2012, a las 17:32, Antonio Valentino <ant...@ti...> escribió: > Hi Juan Manuel, > > Il 05/08/2012 22:52, Juan Manuel Vázquez Tovar ha scritto: >> Hi Antonio, >> >> This is the piece of code I use to read the part of the table I need: >> >> data = [case[´loads´][i] for case in table] >> >> where i is the index of the row that I need to read from the matrix (133x6) >> stored in each cell of the column "loads". >> >> Juanma >> > > that looks perfectly fine to me. > No idea about what could be the issue :/ > > You can perfform patrial reads using Table.iterrows: > > data = [case[´loads´][i] for case in table.iterrows(start, stop)] > > Please also consider that using a single np.array with 1e8 rows instead > of a list of arrays will allows you to save the memory overhead of 1e8 > array objects. > Considering that 6 doubles are 48 bytes while an empty np.array takes 80 > bytes > > In [64]: sys.getsizeof(np.zeros((0,))) > Out[64]: 80 > > you should be able to reduce the memory footprint by far more than an half. > > > cheers > > >> 2012/8/5 Antonio Valentino <ant...@ti...> >> >>> Hi Juan Manuel, >>> >>> Il 05/08/2012 22:28, Juan Manuel Vázquez Tovar ha scritto: >>>> Hi Antonio, >>>> >>>> You are right, I don´t need to load the entire table into memory. >>>> The fourth column has multidimensional cells and when I read a single row >>>> from every cell in the column, I almost fill the workstation memory. >>>> I didn´t expect that process to use so much memory, but the fact is that >>> it >>>> uses it. >>>> May be I didn´t explain very well last time. >>>> >>>> Thank you, >>>> >>>> Juanma >>>> >>> >>> Sorry, still don't understand. >>> Can you please post a short code snipped that shows how exactly do you >>> read data into your program? >>> >>> My impression is that somewhere you use some instruction that triggers >>> loading of unnecessary data into memory. > > > -- > Antonio Valentino > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Antonio V. <ant...@ti...> - 2012-08-05 21:32:57
|
Hi Juan Manuel, Il 05/08/2012 22:52, Juan Manuel Vázquez Tovar ha scritto: > Hi Antonio, > > This is the piece of code I use to read the part of the table I need: > > data = [case[´loads´][i] for case in table] > > where i is the index of the row that I need to read from the matrix (133x6) > stored in each cell of the column "loads". > > Juanma > that looks perfectly fine to me. No idea about what could be the issue :/ You can perfform patrial reads using Table.iterrows: data = [case[´loads´][i] for case in table.iterrows(start, stop)] Please also consider that using a single np.array with 1e8 rows instead of a list of arrays will allows you to save the memory overhead of 1e8 array objects. Considering that 6 doubles are 48 bytes while an empty np.array takes 80 bytes In [64]: sys.getsizeof(np.zeros((0,))) Out[64]: 80 you should be able to reduce the memory footprint by far more than an half. cheers > 2012/8/5 Antonio Valentino <ant...@ti...> > >> Hi Juan Manuel, >> >> Il 05/08/2012 22:28, Juan Manuel Vázquez Tovar ha scritto: >>> Hi Antonio, >>> >>> You are right, I don´t need to load the entire table into memory. >>> The fourth column has multidimensional cells and when I read a single row >>> from every cell in the column, I almost fill the workstation memory. >>> I didn´t expect that process to use so much memory, but the fact is that >> it >>> uses it. >>> May be I didn´t explain very well last time. >>> >>> Thank you, >>> >>> Juanma >>> >> >> Sorry, still don't understand. >> Can you please post a short code snipped that shows how exactly do you >> read data into your program? >> >> My impression is that somewhere you use some instruction that triggers >> loading of unnecessary data into memory. -- Antonio Valentino |
From: Juan M. V. T. <jmv...@gm...> - 2012-08-05 20:52:30
|
Hi Antonio, This is the piece of code I use to read the part of the table I need: data = [case[´loads´][i] for case in table] where i is the index of the row that I need to read from the matrix (133x6) stored in each cell of the column "loads". Juanma 2012/8/5 Antonio Valentino <ant...@ti...> > Hi Juan Manuel, > > Il 05/08/2012 22:28, Juan Manuel Vázquez Tovar ha scritto: > > Hi Antonio, > > > > You are right, I don´t need to load the entire table into memory. > > The fourth column has multidimensional cells and when I read a single row > > from every cell in the column, I almost fill the workstation memory. > > I didn´t expect that process to use so much memory, but the fact is that > it > > uses it. > > May be I didn´t explain very well last time. > > > > Thank you, > > > > Juanma > > > > Sorry, still don't understand. > Can you please post a short code snipped that shows how exactly do you > read data into your program? > > My impression is that somewhere you use some instruction that triggers > loading of unnecessary data into memory. > > > > > 2012/8/5 Antonio Valentino <ant...@ti...> > > > >> Hi Juan Manuel, > >> > >> Il 04/08/2012 01:55, Juan Manuel Vázquez Tovar ha scritto: > >>> Hello all, > >>> > >>> I´m managing a file close to 26 Gb size. It´s main structure is a > table > >>> with a bit more than 8 million rows. The table is made by four columns, > >> the > >>> first two columns store names, the 3rd one has a 53 items array in each > >>> cell and the last column has a 133x6 matrix in each cell. > >>> I use to work with a Linux workstation with 24 Gb. My usual way of > >> working > >>> with the file is to retrieve, from each cell in the 4th column of the > >>> table, the same row from the 133x6 matrix. > >>> I store the information in a bumpy array with shape 8e6x6. In this > >> process > >>> I almost use the whole workstation memory. > >>> Is there anyway to optimize the memory usage? > >> > >> I'm not sure to understand. > >> My impression is that you do not actually need to have the entire 8e6x6 > >> matrix in memory at once, is it correct? > >> > >> In that case you could simply try to load less data using something like > >> > >> data = table.read(0, 5e7, field='name of the 4-th field') > >> process(data) > >> data = table.read(5e7, 1e8, field='name of the 4-th field') > >> process(data) > >> > >> See also [1] and [2]. > >> > >> Does it make sense for you? > >> > >> > >> [1] > >> http://pytables.github.com/usersguide/libref.html#table-methods-reading > >> [2] http://pytables.github.com/usersguide/libref.html#tables.Table.read > >> > >>> If not, I have been thinking about splitting the file. > >>> > >>> Thank you, > >>> > >>> Juanma > >> > >> > >> cheers > >> > >> -- > >> Antonio Valentino > >> > > -- > Antonio Valentino > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Antonio V. <ant...@ti...> - 2012-08-05 20:45:05
|
Hi Juan Manuel, Il 05/08/2012 22:28, Juan Manuel Vázquez Tovar ha scritto: > Hi Antonio, > > You are right, I don´t need to load the entire table into memory. > The fourth column has multidimensional cells and when I read a single row > from every cell in the column, I almost fill the workstation memory. > I didn´t expect that process to use so much memory, but the fact is that it > uses it. > May be I didn´t explain very well last time. > > Thank you, > > Juanma > Sorry, still don't understand. Can you please post a short code snipped that shows how exactly do you read data into your program? My impression is that somewhere you use some instruction that triggers loading of unnecessary data into memory. > 2012/8/5 Antonio Valentino <ant...@ti...> > >> Hi Juan Manuel, >> >> Il 04/08/2012 01:55, Juan Manuel Vázquez Tovar ha scritto: >>> Hello all, >>> >>> I´m managing a file close to 26 Gb size. It´s main structure is a table >>> with a bit more than 8 million rows. The table is made by four columns, >> the >>> first two columns store names, the 3rd one has a 53 items array in each >>> cell and the last column has a 133x6 matrix in each cell. >>> I use to work with a Linux workstation with 24 Gb. My usual way of >> working >>> with the file is to retrieve, from each cell in the 4th column of the >>> table, the same row from the 133x6 matrix. >>> I store the information in a bumpy array with shape 8e6x6. In this >> process >>> I almost use the whole workstation memory. >>> Is there anyway to optimize the memory usage? >> >> I'm not sure to understand. >> My impression is that you do not actually need to have the entire 8e6x6 >> matrix in memory at once, is it correct? >> >> In that case you could simply try to load less data using something like >> >> data = table.read(0, 5e7, field='name of the 4-th field') >> process(data) >> data = table.read(5e7, 1e8, field='name of the 4-th field') >> process(data) >> >> See also [1] and [2]. >> >> Does it make sense for you? >> >> >> [1] >> http://pytables.github.com/usersguide/libref.html#table-methods-reading >> [2] http://pytables.github.com/usersguide/libref.html#tables.Table.read >> >>> If not, I have been thinking about splitting the file. >>> >>> Thank you, >>> >>> Juanma >> >> >> cheers >> >> -- >> Antonio Valentino >> -- Antonio Valentino |
From: Juan M. V. T. <jmv...@gm...> - 2012-08-05 20:28:17
|
Hi Antonio, You are right, I don´t need to load the entire table into memory. The fourth column has multidimensional cells and when I read a single row from every cell in the column, I almost fill the workstation memory. I didn´t expect that process to use so much memory, but the fact is that it uses it. May be I didn´t explain very well last time. Thank you, Juanma 2012/8/5 Antonio Valentino <ant...@ti...> > Hi Juan Manuel, > > Il 04/08/2012 01:55, Juan Manuel Vázquez Tovar ha scritto: > > Hello all, > > > > I´m managing a file close to 26 Gb size. It´s main structure is a table > > with a bit more than 8 million rows. The table is made by four columns, > the > > first two columns store names, the 3rd one has a 53 items array in each > > cell and the last column has a 133x6 matrix in each cell. > > I use to work with a Linux workstation with 24 Gb. My usual way of > working > > with the file is to retrieve, from each cell in the 4th column of the > > table, the same row from the 133x6 matrix. > > I store the information in a bumpy array with shape 8e6x6. In this > process > > I almost use the whole workstation memory. > > Is there anyway to optimize the memory usage? > > I'm not sure to understand. > My impression is that you do not actually need to have the entire 8e6x6 > matrix in memory at once, is it correct? > > In that case you could simply try to load less data using something like > > data = table.read(0, 5e7, field='name of the 4-th field') > process(data) > data = table.read(5e7, 1e8, field='name of the 4-th field') > process(data) > > See also [1] and [2]. > > Does it make sense for you? > > > [1] > http://pytables.github.com/usersguide/libref.html#table-methods-reading > [2] http://pytables.github.com/usersguide/libref.html#tables.Table.read > > > If not, I have been thinking about splitting the file. > > > > Thank you, > > > > Juanma > > > cheers > > -- > Antonio Valentino > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Antonio V. <ant...@ti...> - 2012-08-05 18:53:19
|
Hi Juan Manuel, Il 04/08/2012 01:55, Juan Manuel Vázquez Tovar ha scritto: > Hello all, > > I´m managing a file close to 26 Gb size. It´s main structure is a table > with a bit more than 8 million rows. The table is made by four columns, the > first two columns store names, the 3rd one has a 53 items array in each > cell and the last column has a 133x6 matrix in each cell. > I use to work with a Linux workstation with 24 Gb. My usual way of working > with the file is to retrieve, from each cell in the 4th column of the > table, the same row from the 133x6 matrix. > I store the information in a bumpy array with shape 8e6x6. In this process > I almost use the whole workstation memory. > Is there anyway to optimize the memory usage? I'm not sure to understand. My impression is that you do not actually need to have the entire 8e6x6 matrix in memory at once, is it correct? In that case you could simply try to load less data using something like data = table.read(0, 5e7, field='name of the 4-th field') process(data) data = table.read(5e7, 1e8, field='name of the 4-th field') process(data) See also [1] and [2]. Does it make sense for you? [1] http://pytables.github.com/usersguide/libref.html#table-methods-reading [2] http://pytables.github.com/usersguide/libref.html#tables.Table.read > If not, I have been thinking about splitting the file. > > Thank you, > > Juanma cheers -- Antonio Valentino |