You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Anthony S. <sc...@gm...> - 2013-07-17 21:59:44
|
Hi Pushkar, I agree with Antonio. You should load your data with NumPy functions and then write back out to PyTables. This is the fastest way to do things. Be Well Anthony On Wed, Jul 17, 2013 at 2:12 PM, Antonio Valentino < ant...@ti...> wrote: > Hi Pushkar, > > Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto: > > Hi all, > > > > I am trying to figure out the best way to bulk load data into pytables. > > This question may have been already answered but I couldn't find what I > was > > looking for. > > > > The source data is in form of csv which may require parsing, type > checking > > and setting default values if it doesn't conform to the type of the > column. > > There are over 100 columns in a record. Doing this in a loop in python > for > > each row of the record is very slow compared to just fetching the rows > from > > one pytable file and writing it to another. Difference is almost a factor > > of ~50. > > > > I believe if I load the data using a C procedure that does the parsing > and > > builds the records to write in pytables I can get close to the speed of > > just copying and writing the rows from 1 pytable to another. But may be > > there is something simple and better that already exists. Can someone > > please advise? But if it is a C procedure that I should write can someone > > point me to some examples or snippets that I can refer to put this > together. > > > > Thanks, > > Pushkar > > > > numpy has some tools for loading data from csv files like loadtxt [1], > genfromtxt [2] and other variants. > > Non of them is OK for you? > > [1] > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt > [2] > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt > > > cheers > > -- > Antonio Valentino > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Valeriy S. <sok...@gm...> - 2013-07-17 20:15:02
|
Not sure if the quoted message was delivered to the list (maybe because I was not registered on this list), so reposting it this way... On Fri, Jul 12, 2013 at 5:40 PM, Valeriy Sokolov <sok...@gm...>wrote: > Hi, > > I am trying to store lots of small (~2Kb) files in the filenode-s of the > pytables. And I ran into a trouble with size overhead. > > 200 such files which consumes in total ~2Mb on the filesystem takes 14Mb > in the .h5 file produced by pytables. My experiments show that if I create > 200 file nodes and store 1 byte in each, I have .h5 of 14Mb. Approximately > from the size like 200Kb per file node I have a linear increase of size. > I.e. 400Kb per node leads to 89Mb, and 800Kb per node leads to 164Mb. > > But I would like to store ~2Kb there and current overhead (like 70Kb per > file node) is pretty huge. > > Could you please help me with work-around for this issue? > > Thank you in advance. > > -- > Best regards, > Valeriy Sokolov. > -- Best regards, Valeriy Sokolov. |
From: Antonio V. <ant...@ti...> - 2013-07-17 19:13:02
|
Hi Pushkar, Il 17/07/2013 19:28, Pushkar Raj Pande ha scritto: > Hi all, > > I am trying to figure out the best way to bulk load data into pytables. > This question may have been already answered but I couldn't find what I was > looking for. > > The source data is in form of csv which may require parsing, type checking > and setting default values if it doesn't conform to the type of the column. > There are over 100 columns in a record. Doing this in a loop in python for > each row of the record is very slow compared to just fetching the rows from > one pytable file and writing it to another. Difference is almost a factor > of ~50. > > I believe if I load the data using a C procedure that does the parsing and > builds the records to write in pytables I can get close to the speed of > just copying and writing the rows from 1 pytable to another. But may be > there is something simple and better that already exists. Can someone > please advise? But if it is a C procedure that I should write can someone > point me to some examples or snippets that I can refer to put this together. > > Thanks, > Pushkar > numpy has some tools for loading data from csv files like loadtxt [1], genfromtxt [2] and other variants. Non of them is OK for you? [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html#numpy.loadtxt [2] http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html#numpy.genfromtxt cheers -- Antonio Valentino |
From: Pushkar R. P. <top...@gm...> - 2013-07-17 17:29:01
|
Hi all, I am trying to figure out the best way to bulk load data into pytables. This question may have been already answered but I couldn't find what I was looking for. The source data is in form of csv which may require parsing, type checking and setting default values if it doesn't conform to the type of the column. There are over 100 columns in a record. Doing this in a loop in python for each row of the record is very slow compared to just fetching the rows from one pytable file and writing it to another. Difference is almost a factor of ~50. I believe if I load the data using a C procedure that does the parsing and builds the records to write in pytables I can get close to the speed of just copying and writing the rows from 1 pytable to another. But may be there is something simple and better that already exists. Can someone please advise? But if it is a C procedure that I should write can someone point me to some examples or snippets that I can refer to put this together. Thanks, Pushkar |
From: Anthony S. <sc...@gm...> - 2013-07-12 18:40:58
|
Hi Robert, Glad these materials can be helpful. (Note: these questions really should be asked on the pytables-users mailing list -- CC'd here -- so please join that list: https://lists.sourceforge.net/lists/listinfo/pytables-users) On Fri, Jul 12, 2013 at 12:48 PM, Robert Nelson < rrn...@at...> wrote: > Dr. Scopatz, > > I came across your SciPy 2012 "HDF5 is for lovers" video and thought you > might be able to help me. > > I'm trying to read large (>1GB) HDF files and do multidimensional indexing > (with repeated values) on them. I saw a post<http://www.mail-archive.com/pyt...@li.../msg02586.html>of yours from over a year ago saying that the best solution would be to > convert it to a NumPy array but this takes too long. > I think that the strategy is the same as before. Ask (to the best of my recollection) did not open an issue and so no changes have been made to PyTables to handle this. Also in this strategy, you should only be loading in the indices to start with. I doubt (though I could be wrong) that you have 1 Gb worth of index data alone. The whole idea here is to do a unique (set) and a sort operation on the much smaller index data AND THEN use fancy indexing to pull the actual data back out. As always some sample code and a sample file would be extremely helpful. I don't think I can do much more for you without these. Be Well Anthony > Have there been any updates in PyTables that would make this possible? > > Thank you! > > Robert Nelson > Colorado State University > Rob...@gm... > 763-354-8411 > |
From: Anthony S. <sc...@gm...> - 2013-07-12 16:13:25
|
On Fri, Jul 12, 2013 at 1:51 AM, Mathieu Dubois <dub...@ya... > wrote: > Hi Anthony, > > Thank you very much for your answer (it works). I will try to remodel my > code around this trick but I'm not sure it's possible because I use a > framework that need arrays. > I think that this method still works. You can always send back a numpy array to the main process that you pull out from a subprocess. > Can somebody explain what is going on? I was thinking that PyTables keep > weakref to the file for lazy loading but I'm not sure. > > How > > In any case, the PyTables community is very helpful. > Glad to help! Be Well Anthony > > Thanks, > Mathieu > > Le 12/07/2013 00:44, Anthony Scopatz a écrit : > > Hi Mathieu, > > I think you should try opening a new file handle per process. The > following works for me on v3.0: > > import tables > import random > import multiprocessing > > # Reload the data > > # Use multiprocessing to perform a simple computation (column average) > > def f(filename): > h5file = tables.openFile(filename, mode='r') > name = multiprocessing.current_process().name > column = random.randint(0, 10) > print '%s use column %i' % (name, column) > rtn = h5file.root.X[:, column].mean() > h5file.close() > return rtn > > p = multiprocessing.Pool(2) > col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5']) > > Be well > Anthony > > > On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois < > dub...@ya...> wrote: > >> Le 11/07/2013 21:56, Anthony Scopatz a écrit : >> >> >> >> >> On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois < >> dub...@ya...> wrote: >> >>> Hello, >>> >>> I wanted to use PyTables in conjunction with multiprocessing for some >>> embarrassingly parallel tasks. >>> >>> However, it seems that it is not possible. In the following (very >>> stupid) example, X is a Carray of size (100, 10) stored in the file >>> test.hdf5: >>> >>> import tables >>> >>> import multiprocessing >>> >>> # Reload the data >>> >>> h5file = tables.openFile('test.hdf5', mode='r') >>> >>> X = h5file.root.X >>> >>> # Use multiprocessing to perform a simple computation (column average) >>> >>> def f(X): >>> >>> name = multiprocessing.current_process().name >>> >>> column = random.randint(0, n_features) >>> >>> print '%s use column %i' % (name, column) >>> >>> return X[:, column].mean() >>> >>> p = multiprocessing.Pool(2) >>> >>> col_mean = p.map(f, [X, X, X]) >>> >>> When executing it the following error: >>> >>> Exception in thread Thread-2: >>> >>> Traceback (most recent call last): >>> >>> File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner >>> >>> self.run() >>> >>> File "/usr/lib/python2.7/threading.py", line 504, in run >>> >>> self.__target(*self.__args, **self.__kwargs) >>> >>> File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in >>> _handle_tasks >>> >>> put(task) >>> >>> PicklingError: Can't pickle <type 'weakref'>: attribute lookup >>> __builtin__.weakref failed >>> >>> >>> I have googled for weakref and pickle but can't find a solution. >>> >>> Any help? >>> >> >> Hello Mathieu, >> >> I have used multiprocessing and files opened in read mode many times so >> I am not sure what is going on here. >> >> Thanks for your answer. Maybe you can point me to an working example? >> >> >> Could you provide the test.hdf5 file so that we could try to reproduce >> this. >> >> Here is the script that I have used to generate the data: >> >> import tables >> >> import numpy >> >> # Create data & store it >> >> n_features = 10 >> >> n_obs = 100 >> >> X = numpy.random.rand(n_obs, n_features) >> >> h5file = tables.openFile('test.hdf5', mode='w') >> >> Xatom = tables.Atom.from_dtype(X.dtype) >> >> Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) >> >> Xhdf5[:] = X >> >> h5file.close() >> >> >> I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu >> 12.04 (libhdf5 is 1.8.4patch1). >> >> >> >> >>> By the way, I have noticed that by slicing a Carray, I get a numpy array >>> (I created the HDF5 file with numpy). Therefore, everything is copied to >>> memory. Is there a way to avoid that? >>> >> >> Only the slice that you ask for is brought into memory an it is >> returned as a non-view numpy array. >> >> OK. I may be careful about that. >> >> >> >> Be Well >> Anthony >> >> >>> >>> Mathieu >>> >>> >>> ------------------------------------------------------------------------------ >>> See everything from the browser to the database with AppDynamics >>> Get end-to-end visibility with application monitoring from AppDynamics >>> Isolate bottlenecks and diagnose root cause in seconds. >>> Start your free trial of AppDynamics Pro today! >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >> >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> >> >> >> _______________________________________________ >> Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > > > _______________________________________________ > Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Mathieu D. <dub...@ya...> - 2013-07-12 06:51:34
|
Hi Anthony, Thank you very much for your answer (it works). I will try to remodel my code around this trick but I'm not sure it's possible because I use a framework that need arrays. Can somebody explain what is going on? I was thinking that PyTables keep weakref to the file for lazy loading but I'm not sure. How In any case, the PyTables community is very helpful. Thanks, Mathieu Le 12/07/2013 00:44, Anthony Scopatz a écrit : > Hi Mathieu, > > I think you should try opening a new file handle per process. The > following works for me on v3.0: > > import tables > import random > import multiprocessing > > # Reload the data > > # Use multiprocessing to perform a simple computation (column average) > > def f(filename): > h5file = tables.openFile(filename, mode='r') > name = multiprocessing.current_process().name > column = random.randint(0, 10) > print '%s use column %i' % (name, column) > rtn = h5file.root.X[:, column].mean() > h5file.close() > return rtn > > p = multiprocessing.Pool(2) > col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5']) > > Be well > Anthony > > > On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois > <dub...@ya... <mailto:dub...@ya...>> wrote: > > Le 11/07/2013 21:56, Anthony Scopatz a écrit : >> >> >> >> On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois >> <dub...@ya... >> <mailto:dub...@ya...>> wrote: >> >> Hello, >> >> I wanted to use PyTables in conjunction with multiprocessing >> for some >> embarrassingly parallel tasks. >> >> However, it seems that it is not possible. In the following (very >> stupid) example, X is a Carray of size (100, 10) stored in >> the file >> test.hdf5: >> >> import tables >> >> import multiprocessing >> >> # Reload the data >> >> h5file = tables.openFile('test.hdf5', mode='r') >> >> X = h5file.root.X >> >> # Use multiprocessing to perform a simple computation (column >> average) >> >> def f(X): >> >> name = multiprocessing.current_process().name >> >> column = random.randint(0, n_features) >> >> print '%s use column %i' % (name, column) >> >> return X[:, column].mean() >> >> p = multiprocessing.Pool(2) >> >> col_mean = p.map(f, [X, X, X]) >> >> When executing it the following error: >> >> Exception in thread Thread-2: >> >> Traceback (most recent call last): >> >> File "/usr/lib/python2.7/threading.py", line 551, in >> __bootstrap_inner >> >> self.run() >> >> File "/usr/lib/python2.7/threading.py", line 504, in run >> >> self.__target(*self.__args, **self.__kwargs) >> >> File "/usr/lib/python2.7/multiprocessing/pool.py", line >> 319, in _handle_tasks >> >> put(task) >> >> PicklingError: Can't pickle <type 'weakref'>: attribute >> lookup __builtin__.weakref failed >> >> >> I have googled for weakref and pickle but can't find a solution. >> >> Any help? >> >> >> Hello Mathieu, >> >> I have used multiprocessing and files opened in read mode many >> times so I am not sure what is going on here. > Thanks for your answer. Maybe you can point me to an working example? > > >> Could you provide the test.hdf5 file so that we could try to >> reproduce this. > Here is the script that I have used to generate the data: > > import tables > > import numpy > > # Create data & store it > > n_features = 10 > > n_obs = 100 > > X = numpy.random.rand(n_obs, n_features) > > h5file = tables.openFile('test.hdf5', mode='w') > > Xatom = tables.Atom.from_dtype(X.dtype) > > Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) > > Xhdf5[:] = X > > h5file.close() > > I hope it's not a stupid mistake. I am using PyTables 2.3.1 on > Ubuntu 12.04 (libhdf5 is 1.8.4patch1). > > >> By the way, I have noticed that by slicing a Carray, I get a >> numpy array >> (I created the HDF5 file with numpy). Therefore, everything >> is copied to >> memory. Is there a way to avoid that? >> >> >> Only the slice that you ask for is brought into memory an it is >> returned as a non-view numpy array. > OK. I may be careful about that. > > >> >> Be Well >> Anthony >> >> >> Mathieu >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from >> AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> <mailto:Pyt...@li...> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... <mailto:Pyt...@li...> >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Anthony S. <sc...@gm...> - 2013-07-11 22:44:40
|
Hi Mathieu, I think you should try opening a new file handle per process. The following works for me on v3.0: import tables import random import multiprocessing # Reload the data # Use multiprocessing to perform a simple computation (column average) def f(filename): h5file = tables.openFile(filename, mode='r') name = multiprocessing.current_process().name column = random.randint(0, 10) print '%s use column %i' % (name, column) rtn = h5file.root.X[:, column].mean() h5file.close() return rtn p = multiprocessing.Pool(2) col_mean = p.map(f, ['test.hdf5', 'test.hdf5', 'test.hdf5']) Be well Anthony On Thu, Jul 11, 2013 at 3:43 PM, Mathieu Dubois <dub...@ya... > wrote: > Le 11/07/2013 21:56, Anthony Scopatz a écrit : > > > > > On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois < > dub...@ya...> wrote: > >> Hello, >> >> I wanted to use PyTables in conjunction with multiprocessing for some >> embarrassingly parallel tasks. >> >> However, it seems that it is not possible. In the following (very >> stupid) example, X is a Carray of size (100, 10) stored in the file >> test.hdf5: >> >> import tables >> >> import multiprocessing >> >> # Reload the data >> >> h5file = tables.openFile('test.hdf5', mode='r') >> >> X = h5file.root.X >> >> # Use multiprocessing to perform a simple computation (column average) >> >> def f(X): >> >> name = multiprocessing.current_process().name >> >> column = random.randint(0, n_features) >> >> print '%s use column %i' % (name, column) >> >> return X[:, column].mean() >> >> p = multiprocessing.Pool(2) >> >> col_mean = p.map(f, [X, X, X]) >> >> When executing it the following error: >> >> Exception in thread Thread-2: >> >> Traceback (most recent call last): >> >> File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner >> >> self.run() >> >> File "/usr/lib/python2.7/threading.py", line 504, in run >> >> self.__target(*self.__args, **self.__kwargs) >> >> File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in >> _handle_tasks >> >> put(task) >> >> PicklingError: Can't pickle <type 'weakref'>: attribute lookup >> __builtin__.weakref failed >> >> >> I have googled for weakref and pickle but can't find a solution. >> >> Any help? >> > > Hello Mathieu, > > I have used multiprocessing and files opened in read mode many times so > I am not sure what is going on here. > > Thanks for your answer. Maybe you can point me to an working example? > > > Could you provide the test.hdf5 file so that we could try to reproduce > this. > > Here is the script that I have used to generate the data: > > import tables > > import numpy > > # Create data & store it > > n_features = 10 > > n_obs = 100 > > X = numpy.random.rand(n_obs, n_features) > > h5file = tables.openFile('test.hdf5', mode='w') > > Xatom = tables.Atom.from_dtype(X.dtype) > > Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) > > Xhdf5[:] = X > > h5file.close() > > > I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu > 12.04 (libhdf5 is 1.8.4patch1). > > > > >> By the way, I have noticed that by slicing a Carray, I get a numpy array >> (I created the HDF5 file with numpy). Therefore, everything is copied to >> memory. Is there a way to avoid that? >> > > Only the slice that you ask for is brought into memory an it is returned > as a non-view numpy array. > > OK. I may be careful about that. > > > > Be Well > Anthony > > >> >> Mathieu >> >> >> ------------------------------------------------------------------------------ >> See everything from the browser to the database with AppDynamics >> Get end-to-end visibility with application monitoring from AppDynamics >> Isolate bottlenecks and diagnose root cause in seconds. >> Start your free trial of AppDynamics Pro today! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today!http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > > > _______________________________________________ > Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Mathieu D. <dub...@ya...> - 2013-07-11 20:43:48
|
Le 11/07/2013 21:56, Anthony Scopatz a écrit : > > > > On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois > <dub...@ya... <mailto:dub...@ya...>> wrote: > > Hello, > > I wanted to use PyTables in conjunction with multiprocessing for some > embarrassingly parallel tasks. > > However, it seems that it is not possible. In the following (very > stupid) example, X is a Carray of size (100, 10) stored in the file > test.hdf5: > > import tables > > import multiprocessing > > # Reload the data > > h5file = tables.openFile('test.hdf5', mode='r') > > X = h5file.root.X > > # Use multiprocessing to perform a simple computation (column average) > > def f(X): > > name = multiprocessing.current_process().name > > column = random.randint(0, n_features) > > print '%s use column %i' % (name, column) > > return X[:, column].mean() > > p = multiprocessing.Pool(2) > > col_mean = p.map(f, [X, X, X]) > > When executing it the following error: > > Exception in thread Thread-2: > > Traceback (most recent call last): > > File "/usr/lib/python2.7/threading.py", line 551, in > __bootstrap_inner > > self.run() > > File "/usr/lib/python2.7/threading.py", line 504, in run > > self.__target(*self.__args, **self.__kwargs) > > File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in > _handle_tasks > > put(task) > > PicklingError: Can't pickle <type 'weakref'>: attribute lookup > __builtin__.weakref failed > > > I have googled for weakref and pickle but can't find a solution. > > Any help? > > > Hello Mathieu, > > I have used multiprocessing and files opened in read mode many times > so I am not sure what is going on here. Thanks for your answer. Maybe you can point me to an working example? > Could you provide the test.hdf5 file so that we could try to reproduce > this. Here is the script that I have used to generate the data: import tables import numpy # Create data & store it n_features = 10 n_obs = 100 X = numpy.random.rand(n_obs, n_features) h5file = tables.openFile('test.hdf5', mode='w') Xatom = tables.Atom.from_dtype(X.dtype) Xhdf5 = h5file.createCArray(h5file.root, 'X', Xatom, X.shape) Xhdf5[:] = X h5file.close() I hope it's not a stupid mistake. I am using PyTables 2.3.1 on Ubuntu 12.04 (libhdf5 is 1.8.4patch1). > By the way, I have noticed that by slicing a Carray, I get a numpy > array > (I created the HDF5 file with numpy). Therefore, everything is > copied to > memory. Is there a way to avoid that? > > > Only the slice that you ask for is brought into memory an it is > returned as a non-view numpy array. OK. I may be careful about that. > > Be Well > Anthony > > > Mathieu > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Anthony S. <sc...@gm...> - 2013-07-11 19:57:20
|
On Thu, Jul 11, 2013 at 2:49 PM, Mathieu Dubois <dub...@ya... > wrote: > Hello, > > I wanted to use PyTables in conjunction with multiprocessing for some > embarrassingly parallel tasks. > > However, it seems that it is not possible. In the following (very > stupid) example, X is a Carray of size (100, 10) stored in the file > test.hdf5: > > import tables > > import multiprocessing > > # Reload the data > > h5file = tables.openFile('test.hdf5', mode='r') > > X = h5file.root.X > > # Use multiprocessing to perform a simple computation (column average) > > def f(X): > > name = multiprocessing.current_process().name > > column = random.randint(0, n_features) > > print '%s use column %i' % (name, column) > > return X[:, column].mean() > > p = multiprocessing.Pool(2) > > col_mean = p.map(f, [X, X, X]) > > When executing it the following error: > > Exception in thread Thread-2: > > Traceback (most recent call last): > > File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner > > self.run() > > File "/usr/lib/python2.7/threading.py", line 504, in run > > self.__target(*self.__args, **self.__kwargs) > > File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in > _handle_tasks > > put(task) > > PicklingError: Can't pickle <type 'weakref'>: attribute lookup > __builtin__.weakref failed > > > I have googled for weakref and pickle but can't find a solution. > > Any help? > Hello Mathieu, I have used multiprocessing and files opened in read mode many times so I am not sure what is going on here. Could you provide the test.hdf5 file so that we could try to reproduce this. > By the way, I have noticed that by slicing a Carray, I get a numpy array > (I created the HDF5 file with numpy). Therefore, everything is copied to > memory. Is there a way to avoid that? > Only the slice that you ask for is brought into memory an it is returned as a non-view numpy array. Be Well Anthony > > Mathieu > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Mathieu D. <dub...@ya...> - 2013-07-11 19:49:33
|
Hello, I wanted to use PyTables in conjunction with multiprocessing for some embarrassingly parallel tasks. However, it seems that it is not possible. In the following (very stupid) example, X is a Carray of size (100, 10) stored in the file test.hdf5: import tables import multiprocessing # Reload the data h5file = tables.openFile('test.hdf5', mode='r') X = h5file.root.X # Use multiprocessing to perform a simple computation (column average) def f(X): name = multiprocessing.current_process().name column = random.randint(0, n_features) print '%s use column %i' % (name, column) return X[:, column].mean() p = multiprocessing.Pool(2) col_mean = p.map(f, [X, X, X]) When executing it the following error: Exception in thread Thread-2: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 504, in run self.__target(*self.__args, **self.__kwargs) File "/usr/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks put(task) PicklingError: Can't pickle <type 'weakref'>: attribute lookup __builtin__.weakref failed I have googled for weakref and pickle but can't find a solution. Any help? By the way, I have noticed that by slicing a Carray, I get a numpy array (I created the HDF5 file with numpy). Therefore, everything is copied to memory. Is there a way to avoid that? Mathieu |
From: Tony Yu <ts...@gm...> - 2013-07-09 19:34:27
|
On Tue, Jul 9, 2013 at 1:58 PM, Anthony Scopatz <sc...@gm...> wrote: > > > > On Tue, Jul 9, 2013 at 8:57 AM, Tony Yu <ts...@gm...> wrote: > >> >> >> >> On Tue, Jul 9, 2013 at 12:58 AM, Antonio Valentino < >> ant...@ti...> wrote: >> <snip> >> >> Yes, this is a bug IMO. >>> Thank you for reporting and thank you for the small demonstration script. >>> >>> Can you please file a bug report on github [1]? >>> Please also add info about the PyTables version you used for the test.. >>> >>> >> Thanks for you quick reply. Ticket filed here: >> >> https://github.com/PyTables/PyTables/issues/267 >> > > Thanks Tony, > > I have made my comments on the issue, but the short version is that I > don't think this is a bug, iteration needs a rewrite, and you should use > iterrows(). > > Be Well > Anthony > > PS you should upgrade to 3.0 and use the new API :) > Hey Anthony, Thanks for your thorough response and explanation on the ticket. I closed the ticket, and I'll be using `iterrows` instead of `islice` from now on. I'll have to wait a bit to upgrade to 3.0, but I'm looking forward to getting rid of all the camelCase. Cheers! -Tony > >> >> Best, >> -Tony >> >> > |
From: Anthony S. <sc...@gm...> - 2013-07-09 18:59:20
|
On Tue, Jul 9, 2013 at 8:57 AM, Tony Yu <ts...@gm...> wrote: > > > > On Tue, Jul 9, 2013 at 12:58 AM, Antonio Valentino < > ant...@ti...> wrote: > <snip> > > Yes, this is a bug IMO. >> Thank you for reporting and thank you for the small demonstration script. >> >> Can you please file a bug report on github [1]? >> Please also add info about the PyTables version you used for the test.. >> >> > Thanks for you quick reply. Ticket filed here: > > https://github.com/PyTables/PyTables/issues/267 > Thanks Tony, I have made my comments on the issue, but the short version is that I don't think this is a bug, iteration needs a rewrite, and you should use iterrows(). Be Well Anthony PS you should upgrade to 3.0 and use the new API :) > > Best, > -Tony > > >> >> >> [1] https://github.com/PyTables/PyTables/issues >> >> -- >> Antonio Valentino >> > > > ------------------------------------------------------------------------------ > See everything from the browser to the database with AppDynamics > Get end-to-end visibility with application monitoring from AppDynamics > Isolate bottlenecks and diagnose root cause in seconds. > Start your free trial of AppDynamics Pro today! > http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Tony Yu <ts...@gm...> - 2013-07-09 13:58:30
|
On Tue, Jul 9, 2013 at 12:58 AM, Antonio Valentino < ant...@ti...> wrote: <snip> > Yes, this is a bug IMO. > Thank you for reporting and thank you for the small demonstration script. > > Can you please file a bug report on github [1]? > Please also add info about the PyTables version you used for the test.. > > Thanks for you quick reply. Ticket filed here: https://github.com/PyTables/PyTables/issues/267 Best, -Tony > > > [1] https://github.com/PyTables/PyTables/issues > > -- > Antonio Valentino > |
From: Antonio V. <ant...@ti...> - 2013-07-09 05:59:20
|
Hi Tony, Il giorno 09/lug/2013, alle ore 06:38, Tony Yu <ts...@gm...> ha scritto: > Hi, > > I ran into a subtle, unexpected issue while using `itertools.islice`. I wanted to pass slices of an array for processing without actually reading the entire array, and I wanted that processing function to know nothing about how I'm taking that slice. To that end, I had a loop that sliced the array using `itertools.islice` and called the function on each slice. Instead of returning the slice I specified, `islice` treated the previous end slice as the starting point to the next slice. > > That description is a bit confusing, but the example below (along with the attached test data) should illustrate the point. Maybe I'm missing something, but the only work around that I found was to set a private flag (e.g. `h5.root.array._init = False`) on each call to `islice` to reset the counter used in `__iter__`. > > I'm not sure if this is expected behavior or not, but it does differ from how `islice` works on numpy arrays (as demonstrated in the example below). I used the google and nothing similar came up, so I thought I'd post here. > > Best, > -Tony > > > #~~~~ > > import tables > import itertools > import numpy as np > > > h5 = tables.openFile('test.h5') > array = np.arange(100) > for i in range(5): > # Numpy array slice always returns 0..10 > print list(itertools.islice(array, 0, 10)) > # PyTables array slice shifts with each iteration > print list(itertools.islice(h5.root.array, 0, 10)) > h5.close() > <test.h5>------------------------------------------------------------------------------ Yes, this is a bug IMO. Thank you for reporting and thank you for the small demonstration script. Can you please file a bug report on github [1]? Please also add info about the PyTables version you used for the test.. [1] https://github.com/PyTables/PyTables/issues -- Antonio Valentino |
From: Anthony S. <sc...@gm...> - 2013-07-05 23:54:22
|
Thanks Mathieu! I am glad this is working for you now. File this one under "Mysterious Errors of the Universe" :). Be Well Anthony On Fri, Jul 5, 2013 at 6:51 PM, Mathieu Dubois <dub...@ya...>wrote: > Hi, > > Sorry for the late response. > > First of all, I have managed to achieve what I wanted to do differently. > > Then the code Francesc send works well (I had to adapt it because I use > version 2.3.1 under Ubuntu 12.04). > > I was able to reproduce something similar with a class like this (copied & > pasted from the tutorial): > > import tables as tb > > import numpy as np > > class Subject(tb.IsDescription): > > # Subject information > > Id = tb.UInt16Col() > > Image = tb.Float32Col(shape=(121, 145, 121)) > > h5file = tb.openFile("tutorial1.h5", mode = "w", title = "Test file") > > group = h5file.createGroup("/", 'subject', 'Suject information') > > table = h5file.createTable(group, 'readout', Subject, "Readout example") > > subject = table.row > > for i in xrange(10): > > subject['Id'] = i > > subject['Image'] = np.ones((121, 145, 121)) > > subject.append() > > This code works well too. > > So I don't really know why nothing was working yesterday: this was the > same class and a very close program. I will try to investigate later on > this. > > Thanks for everything, > Mahtieu > > Le 05/07/2013 16:54, Anthony Scopatz a écrit : > > > > > On Fri, Jul 5, 2013 at 8:40 AM, Francesc Alted <fa...@gm...> wrote: > >> On 7/5/13 1:33 AM, Mathieu Dubois wrote: >> > tables.tableExtension.Table._createTable (tables/tableExtension.c:2181) >> >> >> >> tables.exceptions.HDF5ExtError: Problems creating the table >> >> >> >> I think that the size of the column is too large (if I remove the >> >> Image >> >> field, everything works perfectly). >> >> >> >> >> >> Hi Mathieu, >> >> >> >> This shouldn't be the case. What is the value of IMAGE_SIZE? >> > >> > IMAGE_SIZE is a tuple containing (121, 145, 121). >> >> This is a bit large for a row in the Table object. My recommendation >> for these cases is to use an associated EArray with shape (0, 121, 145, >> 121) and then append the images there. You can always refer to the >> image by issuing a __getitem__() operation on the EArray object with the >> index of the row in the table. Easy as a pie and you will allow the >> compression library (in case you are using compression) to work much >> more efficiently for the table. >> > > > Hi Francesc, > > I disagree that this shape is too large for a table. Here is a minimal > example that works for me: > > import tables as tb > import numpy as np > > images = np.ones(100, dtype=[('id', np.uint16), > ('image', np.float32, (121, 145, 121)) > ]) > > with tb.open_file('temp.h5', 'w') as f: > f.create_table('/', 'images', images) > > I think that there is something else going on with the initialization > but Mathieu hasn't given us enough information to figure it out =/. A > minimal failing script would be super helpful here! > > (BTW Mathieu, Tables can also take advantage of compression. Though > Francesc's solution is nicer for a lot of reason too.) > > Be Well > Anthony > > >> >> HTH, >> >> -- Francesc Alted >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > http://p.sf.net/sfu/windows-dev2dev > > > > _______________________________________________ > Pytables-users mailing lis...@li...https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Mathieu D. <dub...@ya...> - 2013-07-05 23:52:07
|
Hi, Sorry for the late response. First of all, I have managed to achieve what I wanted to do differently. Then the code Francesc send works well (I had to adapt it because I use version 2.3.1 under Ubuntu 12.04). I was able to reproduce something similar with a class like this (copied & pasted from the tutorial): import tables as tb import numpy as np class Subject(tb.IsDescription): # Subject information Id = tb.UInt16Col() Image = tb.Float32Col(shape=(121, 145, 121)) h5file = tb.openFile("tutorial1.h5", mode = "w", title = "Test file") group = h5file.createGroup("/", 'subject', 'Suject information') table = h5file.createTable(group, 'readout', Subject, "Readout example") subject = table.row for i in xrange(10): subject['Id'] = i subject['Image'] = np.ones((121, 145, 121)) subject.append() This code works well too. So I don't really know why nothing was working yesterday: this was the same class and a very close program. I will try to investigate later on this. Thanks for everything, Mahtieu Le 05/07/2013 16:54, Anthony Scopatz a écrit : > > > > On Fri, Jul 5, 2013 at 8:40 AM, Francesc Alted <fa...@gm... > <mailto:fa...@gm...>> wrote: > > On 7/5/13 1:33 AM, Mathieu Dubois wrote: > > tables.tableExtension.Table._createTable > (tables/tableExtension.c:2181) > >> > >> tables.exceptions.HDF5ExtError: Problems creating the table > >> > >> I think that the size of the column is too large (if I > remove the > >> Image > >> field, everything works perfectly). > >> > >> > >> Hi Mathieu, > >> > >> This shouldn't be the case. What is the value of IMAGE_SIZE? > > > > IMAGE_SIZE is a tuple containing (121, 145, 121). > > This is a bit large for a row in the Table object. My recommendation > for these cases is to use an associated EArray with shape (0, 121, > 145, > 121) and then append the images there. You can always refer to the > image by issuing a __getitem__() operation on the EArray object > with the > index of the row in the table. Easy as a pie and you will allow the > compression library (in case you are using compression) to work much > more efficiently for the table. > > > > Hi Francesc, > > I disagree that this shape is too large for a table. Here is a > minimal example that works for me: > > import tables as tb > import numpy as np > > images = np.ones(100, dtype=[('id', np.uint16), > ('image', np.float32, (121, 145, 121)) > ]) > > with tb.open_file('temp.h5', 'w') as f: > f.create_table('/', 'images', images) > > I think that there is something else going on with the initialization > but Mathieu hasn't given us enough information to figure it out =/. A > minimal failing script would be super helpful here! > > (BTW Mathieu, Tables can also take advantage of compression. Though > Francesc's solution is nicer for a lot of reason too.) > > Be Well > Anthony > > > HTH, > > -- Francesc Alted > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Anthony S. <sc...@gm...> - 2013-07-05 14:55:22
|
On Fri, Jul 5, 2013 at 8:40 AM, Francesc Alted <fa...@gm...> wrote: > On 7/5/13 1:33 AM, Mathieu Dubois wrote: > > tables.tableExtension.Table._createTable (tables/tableExtension.c:2181) > >> > >> tables.exceptions.HDF5ExtError: Problems creating the table > >> > >> I think that the size of the column is too large (if I remove the > >> Image > >> field, everything works perfectly). > >> > >> > >> Hi Mathieu, > >> > >> This shouldn't be the case. What is the value of IMAGE_SIZE? > > > > IMAGE_SIZE is a tuple containing (121, 145, 121). > > This is a bit large for a row in the Table object. My recommendation > for these cases is to use an associated EArray with shape (0, 121, 145, > 121) and then append the images there. You can always refer to the > image by issuing a __getitem__() operation on the EArray object with the > index of the row in the table. Easy as a pie and you will allow the > compression library (in case you are using compression) to work much > more efficiently for the table. > Hi Francesc, I disagree that this shape is too large for a table. Here is a minimal example that works for me: import tables as tb import numpy as np images = np.ones(100, dtype=[('id', np.uint16), ('image', np.float32, (121, 145, 121)) ]) with tb.open_file('temp.h5', 'w') as f: f.create_table('/', 'images', images) I think that there is something else going on with the initialization but Mathieu hasn't given us enough information to figure it out =/. A minimal failing script would be super helpful here! (BTW Mathieu, Tables can also take advantage of compression. Though Francesc's solution is nicer for a lot of reason too.) Be Well Anthony > > HTH, > > -- Francesc Alted > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Francesc A. <fa...@gm...> - 2013-07-05 13:40:12
|
On 7/5/13 1:33 AM, Mathieu Dubois wrote: > tables.tableExtension.Table._createTable (tables/tableExtension.c:2181) >> >> tables.exceptions.HDF5ExtError: Problems creating the table >> >> I think that the size of the column is too large (if I remove the >> Image >> field, everything works perfectly). >> >> >> Hi Mathieu, >> >> This shouldn't be the case. What is the value of IMAGE_SIZE? > > IMAGE_SIZE is a tuple containing (121, 145, 121). This is a bit large for a row in the Table object. My recommendation for these cases is to use an associated EArray with shape (0, 121, 145, 121) and then append the images there. You can always refer to the image by issuing a __getitem__() operation on the EArray object with the index of the row in the table. Easy as a pie and you will allow the compression library (in case you are using compression) to work much more efficiently for the table. HTH, -- Francesc Alted |
From: Mathieu D. <dub...@ya...> - 2013-07-05 05:33:41
|
Le 05/07/2013 00:31, Anthony Scopatz a écrit : > > > > On Thu, Jul 4, 2013 at 4:13 PM, Mathieu Dubois > <dub...@ya... <mailto:dub...@ya...>> wrote: > > Hello, > > I'm a beginner with Pyable. > > I wanted to store a database in a HDF5 file using PyTable. The DB is > made by a CSV file (which contains the subject information) and a > lot of > images (I work on MRI so the images are 3 dimensional float32 > arrays of > shape (121, 145, 121)). The relation is very simple: there are a 3 > images per subject. > > My first idea was to create a class Subject like this: > class Subject(tables.IsDescription): > # Subject information > Id = tables.UInt16Col() > ... > Image = tables.Float32Col(shape=IMAGE_SIZE) > > And the proceed like in the tutorial (open a file, create a group > and a > table associated to the Subject class and then append data to this > table). > > Unfortunately I got an error when creating the table (even before > inserting data): > HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread > 140612945950464: > #000: ../../../src/H5Ddeprec.c line 170 in H5Dcreate1(): unable to > create dataset > major: Dataset > minor: Unable to initialize object > #001: ../../../src/H5Dint.c line 428 in H5D_create_named(): > unable to > create and link to dataset > major: Dataset > minor: Unable to initialize object > #002: ../../../src/H5L.c line 1639 in H5L_link_object(): unable to > create new link to object > major: Links > minor: Unable to initialize object > #003: ../../../src/H5L.c line 1862 in H5L_create_real(): can't > insert > link > major: Symbol table > minor: Unable to insert object > #004: ../../../src/H5Gtraverse.c line 877 in H5G_traverse(): > internal > path traversal failed > major: Symbol table > minor: Object not found > #005: ../../../src/H5Gtraverse.c line 703 in H5G_traverse_real(): > traversal operator failed > major: Symbol table > minor: Callback failed > #006: ../../../src/H5L.c line 1685 in H5L_link_cb(): unable to > create > object > major: Object header > minor: Unable to initialize object > #007: ../../../src/H5O.c line 2677 in H5O_obj_create(): unable to > open object > major: Object header > minor: Can't open object > #008: ../../../src/H5Doh.c line 296 in H5O_dset_create(): unable to > create dataset > major: Dataset > minor: Unable to initialize object > #009: ../../../src/H5Dint.c line 1034 in H5D_create(): can't update > the metadata cache > major: Dataset > minor: Unable to initialize object > #010: ../../../src/H5Dint.c line 799 in H5D_update_oh_info(): > unable > to update new fill value header message > major: Dataset > minor: Unable to initialize object > #011: ../../../src/H5Omessage.c line 188 in H5O_msg_append_oh(): > unable to create new message in header > major: Attribute > minor: Unable to insert object > #012: ../../../src/H5Omessage.c line 228 in H5O_msg_append_real(): > unable to create new message > major: Object header > minor: No space available for allocation > #013: ../../../src/H5Omessage.c line 1940 in H5O_msg_alloc(): > unable > to allocate space for message > major: Object header > minor: Unable to initialize object > #014: ../../../src/H5Oalloc.c line 1032 in H5O_alloc(): object > header > message is too large > major: Object header > minor: Unable to initialize object > Traceback (most recent call last): > File "00_build_dataset.tmp.py > <http://00_build_dataset.tmp.py>", line 52, in <module> > dump_in_hdf5(**vars(args)) > File "00_build_dataset.tmp.py > <http://00_build_dataset.tmp.py>", line 32, in dump_in_hdf5 > data_api.Subject) > File "/usr/lib/python2.7/dist-packages/tables/file.py", line > 770, in > createTable > chunkshape=chunkshape, byteorder=byteorder) > File "/usr/lib/python2.7/dist-packages/tables/table.py", line > 832, in > __init__ > byteorder, _log) > File "/usr/lib/python2.7/dist-packages/tables/leaf.py", line > 291, in > __init__ > super(Leaf, self).__init__(parentNode, name, _log) > File "/usr/lib/python2.7/dist-packages/tables/node.py", line > 296, in > __init__ > self._v_objectID = self._g_create() > File "/usr/lib/python2.7/dist-packages/tables/table.py", line > 983, in > _g_create > self._v_new_title, self.filters.complib or '', obversion ) > File "tableExtension.pyx", line 195, in > tables.tableExtension.Table._createTable > (tables/tableExtension.c:2181) > tables.exceptions.HDF5ExtError: Problems creating the table > > I think that the size of the column is too large (if I remove the > Image > field, everything works perfectly). > > > Hi Mathieu, > > This shouldn't be the case. What is the value of IMAGE_SIZE? IMAGE_SIZE is a tuple containing (121, 145, 121). > > Be Well > Anthony > > > Therefore what is the best way to store the images (while keeping the > relation)? I have read various post about this subject on the web but > could not find a definitive answer (the more helpful was > http://stackoverflow.com/questions/8843062/python-how-to-store-a-numpy-multidimensional-array-in-pytables). > > I was thinking to create an extensible array and store each image > in the > same order than the subject. However, I would feel more comfortable if > the subject Id could be inserted too (to join the tables). > > Any help? > > Mathieu > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > <mailto:Pyt...@li...> > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > > > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |
From: Tim B. <tim...@ma...> - 2013-07-04 23:11:45
|
Hi Mathieu, As Anthony indicates, it's hard to discern the exact issue when you don't provide much in the way of code to look at. If it helps, here is an example of creating a HDF5 file with a float32 array of the dimensions you specified. The shape value should be a tuple. >>> import numpy as np >>> import tables >>> x = np.random.random((121,145,121)) >>> x.shape (121, 145, 121) >>> f = tables.openFile('patient.h5', 'w') >>> atom = tables.Float32Atom() >>> image1 = f.createCArray(f.root, 'image1', atom, x.shape) >>> image1[:] = x >>> f.close() You could have an HDF5 file per patient and keep the three float32 arrays and the CSV data as separate nodes. I would suggest that you have the HDF5 structure and resulting PyTables code worked out before thinking about how to wrap it in an object. Cheers, Tim On 05/07/2013, at 7:13 AM, Mathieu Dubois <dub...@ya...> wrote: > Hello, > > I'm a beginner with Pyable. > > I wanted to store a database in a HDF5 file using PyTable. The DB is > made by a CSV file (which contains the subject information) and a lot of > images (I work on MRI so the images are 3 dimensional float32 arrays of > shape (121, 145, 121)). The relation is very simple: there are a 3 > images per subject. > > My first idea was to create a class Subject like this: > class Subject(tables.IsDescription): > # Subject information > Id = tables.UInt16Col() > ... > Image = tables.Float32Col(shape=IMAGE_SIZE) > > And the proceed like in the tutorial (open a file, create a group and a > table associated to the Subject class and then append data to this table). > > Unfortunately I got an error when creating the table (even before > inserting data): > HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 140612945950464: > #000: ../../../src/H5Ddeprec.c line 170 in H5Dcreate1(): unable to <snip> |
From: Anthony S. <sc...@gm...> - 2013-07-04 22:31:50
|
On Thu, Jul 4, 2013 at 4:13 PM, Mathieu Dubois <dub...@ya...>wrote: > Hello, > > I'm a beginner with Pyable. > > I wanted to store a database in a HDF5 file using PyTable. The DB is > made by a CSV file (which contains the subject information) and a lot of > images (I work on MRI so the images are 3 dimensional float32 arrays of > shape (121, 145, 121)). The relation is very simple: there are a 3 > images per subject. > > My first idea was to create a class Subject like this: > class Subject(tables.IsDescription): > # Subject information > Id = tables.UInt16Col() > ... > Image = tables.Float32Col(shape=IMAGE_SIZE) > > And the proceed like in the tutorial (open a file, create a group and a > table associated to the Subject class and then append data to this table). > > Unfortunately I got an error when creating the table (even before > inserting data): > HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 140612945950464: > #000: ../../../src/H5Ddeprec.c line 170 in H5Dcreate1(): unable to > create dataset > major: Dataset > minor: Unable to initialize object > #001: ../../../src/H5Dint.c line 428 in H5D_create_named(): unable to > create and link to dataset > major: Dataset > minor: Unable to initialize object > #002: ../../../src/H5L.c line 1639 in H5L_link_object(): unable to > create new link to object > major: Links > minor: Unable to initialize object > #003: ../../../src/H5L.c line 1862 in H5L_create_real(): can't insert > link > major: Symbol table > minor: Unable to insert object > #004: ../../../src/H5Gtraverse.c line 877 in H5G_traverse(): internal > path traversal failed > major: Symbol table > minor: Object not found > #005: ../../../src/H5Gtraverse.c line 703 in H5G_traverse_real(): > traversal operator failed > major: Symbol table > minor: Callback failed > #006: ../../../src/H5L.c line 1685 in H5L_link_cb(): unable to create > object > major: Object header > minor: Unable to initialize object > #007: ../../../src/H5O.c line 2677 in H5O_obj_create(): unable to > open object > major: Object header > minor: Can't open object > #008: ../../../src/H5Doh.c line 296 in H5O_dset_create(): unable to > create dataset > major: Dataset > minor: Unable to initialize object > #009: ../../../src/H5Dint.c line 1034 in H5D_create(): can't update > the metadata cache > major: Dataset > minor: Unable to initialize object > #010: ../../../src/H5Dint.c line 799 in H5D_update_oh_info(): unable > to update new fill value header message > major: Dataset > minor: Unable to initialize object > #011: ../../../src/H5Omessage.c line 188 in H5O_msg_append_oh(): > unable to create new message in header > major: Attribute > minor: Unable to insert object > #012: ../../../src/H5Omessage.c line 228 in H5O_msg_append_real(): > unable to create new message > major: Object header > minor: No space available for allocation > #013: ../../../src/H5Omessage.c line 1940 in H5O_msg_alloc(): unable > to allocate space for message > major: Object header > minor: Unable to initialize object > #014: ../../../src/H5Oalloc.c line 1032 in H5O_alloc(): object header > message is too large > major: Object header > minor: Unable to initialize object > Traceback (most recent call last): > File "00_build_dataset.tmp.py", line 52, in <module> > dump_in_hdf5(**vars(args)) > File "00_build_dataset.tmp.py", line 32, in dump_in_hdf5 > data_api.Subject) > File "/usr/lib/python2.7/dist-packages/tables/file.py", line 770, in > createTable > chunkshape=chunkshape, byteorder=byteorder) > File "/usr/lib/python2.7/dist-packages/tables/table.py", line 832, in > __init__ > byteorder, _log) > File "/usr/lib/python2.7/dist-packages/tables/leaf.py", line 291, in > __init__ > super(Leaf, self).__init__(parentNode, name, _log) > File "/usr/lib/python2.7/dist-packages/tables/node.py", line 296, in > __init__ > self._v_objectID = self._g_create() > File "/usr/lib/python2.7/dist-packages/tables/table.py", line 983, in > _g_create > self._v_new_title, self.filters.complib or '', obversion ) > File "tableExtension.pyx", line 195, in > tables.tableExtension.Table._createTable (tables/tableExtension.c:2181) > tables.exceptions.HDF5ExtError: Problems creating the table > > I think that the size of the column is too large (if I remove the Image > field, everything works perfectly). > Hi Mathieu, This shouldn't be the case. What is the value of IMAGE_SIZE? Be Well Anthony > > Therefore what is the best way to store the images (while keeping the > relation)? I have read various post about this subject on the web but > could not find a definitive answer (the more helpful was > > http://stackoverflow.com/questions/8843062/python-how-to-store-a-numpy-multidimensional-array-in-pytables > ). > > I was thinking to create an extensible array and store each image in the > same order than the subject. However, I would feel more comfortable if > the subject Id could be inserted too (to join the tables). > > Any help? > > Mathieu > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Mathieu D. <dub...@ya...> - 2013-07-04 21:14:08
|
Hello, I'm a beginner with Pyable. I wanted to store a database in a HDF5 file using PyTable. The DB is made by a CSV file (which contains the subject information) and a lot of images (I work on MRI so the images are 3 dimensional float32 arrays of shape (121, 145, 121)). The relation is very simple: there are a 3 images per subject. My first idea was to create a class Subject like this: class Subject(tables.IsDescription): # Subject information Id = tables.UInt16Col() ... Image = tables.Float32Col(shape=IMAGE_SIZE) And the proceed like in the tutorial (open a file, create a group and a table associated to the Subject class and then append data to this table). Unfortunately I got an error when creating the table (even before inserting data): HDF5-DIAG: Error detected in HDF5 (1.8.4-patch1) thread 140612945950464: #000: ../../../src/H5Ddeprec.c line 170 in H5Dcreate1(): unable to create dataset major: Dataset minor: Unable to initialize object #001: ../../../src/H5Dint.c line 428 in H5D_create_named(): unable to create and link to dataset major: Dataset minor: Unable to initialize object #002: ../../../src/H5L.c line 1639 in H5L_link_object(): unable to create new link to object major: Links minor: Unable to initialize object #003: ../../../src/H5L.c line 1862 in H5L_create_real(): can't insert link major: Symbol table minor: Unable to insert object #004: ../../../src/H5Gtraverse.c line 877 in H5G_traverse(): internal path traversal failed major: Symbol table minor: Object not found #005: ../../../src/H5Gtraverse.c line 703 in H5G_traverse_real(): traversal operator failed major: Symbol table minor: Callback failed #006: ../../../src/H5L.c line 1685 in H5L_link_cb(): unable to create object major: Object header minor: Unable to initialize object #007: ../../../src/H5O.c line 2677 in H5O_obj_create(): unable to open object major: Object header minor: Can't open object #008: ../../../src/H5Doh.c line 296 in H5O_dset_create(): unable to create dataset major: Dataset minor: Unable to initialize object #009: ../../../src/H5Dint.c line 1034 in H5D_create(): can't update the metadata cache major: Dataset minor: Unable to initialize object #010: ../../../src/H5Dint.c line 799 in H5D_update_oh_info(): unable to update new fill value header message major: Dataset minor: Unable to initialize object #011: ../../../src/H5Omessage.c line 188 in H5O_msg_append_oh(): unable to create new message in header major: Attribute minor: Unable to insert object #012: ../../../src/H5Omessage.c line 228 in H5O_msg_append_real(): unable to create new message major: Object header minor: No space available for allocation #013: ../../../src/H5Omessage.c line 1940 in H5O_msg_alloc(): unable to allocate space for message major: Object header minor: Unable to initialize object #014: ../../../src/H5Oalloc.c line 1032 in H5O_alloc(): object header message is too large major: Object header minor: Unable to initialize object Traceback (most recent call last): File "00_build_dataset.tmp.py", line 52, in <module> dump_in_hdf5(**vars(args)) File "00_build_dataset.tmp.py", line 32, in dump_in_hdf5 data_api.Subject) File "/usr/lib/python2.7/dist-packages/tables/file.py", line 770, in createTable chunkshape=chunkshape, byteorder=byteorder) File "/usr/lib/python2.7/dist-packages/tables/table.py", line 832, in __init__ byteorder, _log) File "/usr/lib/python2.7/dist-packages/tables/leaf.py", line 291, in __init__ super(Leaf, self).__init__(parentNode, name, _log) File "/usr/lib/python2.7/dist-packages/tables/node.py", line 296, in __init__ self._v_objectID = self._g_create() File "/usr/lib/python2.7/dist-packages/tables/table.py", line 983, in _g_create self._v_new_title, self.filters.complib or '', obversion ) File "tableExtension.pyx", line 195, in tables.tableExtension.Table._createTable (tables/tableExtension.c:2181) tables.exceptions.HDF5ExtError: Problems creating the table I think that the size of the column is too large (if I remove the Image field, everything works perfectly). Therefore what is the best way to store the images (while keeping the relation)? I have read various post about this subject on the web but could not find a definitive answer (the more helpful was http://stackoverflow.com/questions/8843062/python-how-to-store-a-numpy-multidimensional-array-in-pytables). I was thinking to create an extensible array and store each image in the same order than the subject. However, I would feel more comfortable if the subject Id could be inserted too (to join the tables). Any help? Mathieu |
From: Wagner S. <Seb...@ai...> - 2013-06-27 10:38:20
|
Dear PyTables users and devs, I tried to figure out, whether PyTables keeps the Indexes up2date with inserts and updates, or I have to manually call reindex() or reindex_dirty() after every change or a series of changes, but I couldn't find any clear statement in the docs and in the mailing list archives (well, the search functions by sourceforge is not very helpful) If the index is always updated automatically: How can this be turned off, to have accurate performance when applying a bunch of changes? If the index is "dirty" after every change: Could you write this clearly to the doc? (The SQL-pages, as also in the reference, sections create_index and reindex) It is IMO unclear to the user/read of the docs how the indexes have to be maintained. Regards, Sebastian |
From: Andre' Walker-L. <wal...@gm...> - 2013-06-26 05:33:51
|
Hi Andreas, Josh, Anthony and Antonio, Thanks for your help. Andre On Jun 26, 2013, at 2:48 AM, Antonio Valentino wrote: > Hi Andre', > > Il 25/06/2013 10:26, Andre' Walker-Loud ha scritto: >> Dear PyTables users, >> >> I am trying to figure out the best way to write some metadata into some files I have. >> >> The hdf5 file looks like >> >> /root/data_1/stat >> /root/data_1/sys >> >> where "stat" and "sys" are Arrays containing statistical and systematic fluctuations of numerical fits to some data I have. What I would like to do is add another object >> >> /root/data_1/fit >> >> where "fit" is just a metadata key that describes all the choices I made in performing the fit, such as seed for the random number generator, and many choices for fitting options, like initial guess values of parameters, fitting range, etc. >> >> I began to follow the example in the PyTables manual, in Section 1.2 "The Object Tree", where first a class is defined >> >> class Particle(tables.IsDescription): >> identity = tables.StringCol(itemsize=22, dflt=" ", pos=0) >> ... >> >> and then this class is used to populate a table. >> >> In my case, I won't have a table, but really just want a single object containing my metadata. I am wondering if there is a recommended way to do this? The "Table" does not seem optimal, but I don't see what else I would use. >> >> >> Thanks, >> >> Andre >> > > For leaf nodes (Tables, Array, ets) you can use the "attrs" attribute > set [1] as described in [2]. > For group objects (like e.g. "root") you can use the "set_node_attr" > method [3] of File objects or "_v_attrs". > > > cheers > > [1] > http://pytables.github.io/usersguide/libref/declarative_classes.html#attributesetclassdescr > [2] > http://pytables.github.io/usersguide/tutorials.html#setting-and-getting-user-attributes > [3] > http://pytables.github.io/usersguide/libref/file_class.html#tables.File.set_node_attr > > > -- > Antonio Valentino > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users |