You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Anthony S. <sc...@gm...> - 2013-09-10 07:18:04
|
Hello PyTables Users, In an effort to close #81 <https://github.com/PyTables/PyTables/issues/81>, Andrea Bedini has been working very hard to migrate this mailing list ( pyt...@li...) from Source Forge to Google Groups ( pyt...@go...). Andrea has done a really amazing job in that all of the history of the old list has been imported into the new list. If you are subscribed to the old list you should soon receive an invite from Google to join the new one as well. *Please do not send any more messages to pyt...@li...*. This list is now deprecated, long term backups have been, and we will no longer be supporting it. You can manually add yourself to the new list by visiting: https://groups.google.com/forum/#!forum/pytables-users We understand that migrating can sometimes be painful, but we hope that you'll follow us. Thanks and thanks again to Andrea! Be Well Anthony |
From: Anthony S. <sc...@gm...> - 2013-08-29 17:21:50
|
Hello Premal, This is just how HDF5 works. When you delete a Leaf the reference to that node is removed and the space in the file becomes available for future use. However, HDF5 will not reduce files, it will only grow them. Thus new data could fill in the used space but it doesn't go away. It just sits there empty. If you really want to get rid of this extraneous space you should use the ptrepack or h5repack command line utilities to create a clean copy of the file. Hope this helps. Be Well Anthony On Thu, Aug 29, 2013 at 10:40 AM, Forafo San <ppv...@gm...> wrote: > Hello All, > I have some data in an HDF5 file that is created with PyTables. > Occasionally, I update the data by reading in one of the tables and adding > or deleting rows. Then, I create a new table containing the updated data, > give it a random name, and let it reside in the same group where the old > table resides. I flush the new table, then use the table.remove() (or > Leaf.remove()) method to delete the old table and table.rename() method to > rename the randomly-named new table to the same name as the old table. > > Problem: > In a small sized table, the size of the hdf5 file doubles with the above > process even when no new rows or other modifications are made (let's assume > that the hdf5 file contains only this table). A ptdump indicates no > presence of the old table. > > In a medium-sized table, the size of the hdf5 file rises substantially > (20% or 30%) even when no new rows or columns are added. > > Do I understand the table.remove() right as completely deleting the table? > Does it leave some residue that I should be aware of? > > All help is appreciated. Thanks, > Premal > > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Forafo S. <ppv...@gm...> - 2013-08-29 15:40:32
|
Hello All, I have some data in an HDF5 file that is created with PyTables. Occasionally, I update the data by reading in one of the tables and adding or deleting rows. Then, I create a new table containing the updated data, give it a random name, and let it reside in the same group where the old table resides. I flush the new table, then use the table.remove() (or Leaf.remove()) method to delete the old table and table.rename() method to rename the randomly-named new table to the same name as the old table. Problem: In a small sized table, the size of the hdf5 file doubles with the above process even when no new rows or other modifications are made (let's assume that the hdf5 file contains only this table). A ptdump indicates no presence of the old table. In a medium-sized table, the size of the hdf5 file rises substantially (20% or 30%) even when no new rows or columns are added. Do I understand the table.remove() right as completely deleting the table? Does it leave some residue that I should be aware of? All help is appreciated. Thanks, Premal |
From: Anthony S. <sc...@gm...> - 2013-08-28 00:51:59
|
Glad I could help! On Tue, Aug 27, 2013 at 7:44 PM, Oleksandr Huziy <guz...@gm...>wrote: > 2013/8/27 Anthony Scopatz <sc...@gm...> > >> >> You are right that this loads the entire computed array into memory and >> is therefore not optimal. I would do something like the following: >> >> h = tb.open_file(path, mode="a") >> varTable = h.get_node("/", var_name) >> coef = 3 * 60 * 60 #output step >> c = varTable.cols.field >> expr = tb.Expr("c * m", uservars = {"c": c, "m": coef }) >> expr.set_output(c) >> expr.eval() >> varTable.flush() >> h.close() >> > > Aha, this is cool. Thanks Anthony. > > Cheers > -- > Sasha > > >> >>>> On Tue, Aug 27, 2013 at 11:44 AM, Oleksandr Huziy < >>>> guz...@gm...> wrote: >>>> >>>>> Hi All: >>>>> >>>>> I have a huge table imported from other binary files to hdf, and I >>>>> forgot to multiply the data by a factor in one case. Is there an easy way >>>>> to multiply a column by a constant factor using pytables? >>>>> To modify it in place? >>>>> >>>>> Thank you >>>>> >>>>> -- >>>>> Sasha >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >>>>> Discover the easy way to master current and previous Microsoft >>>>> technologies >>>>> and advance your career. Get an incredible 1,500+ hours of step-by-step >>>>> tutorial videos with LearnDevNow. Subscribe today and save! >>>>> >>>>> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >>>>> _______________________________________________ >>>>> Pytables-users mailing list >>>>> Pyt...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>>> >>>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >>>> Discover the easy way to master current and previous Microsoft >>>> technologies >>>> and advance your career. Get an incredible 1,500+ hours of step-by-step >>>> tutorial videos with LearnDevNow. Subscribe today and save! >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Pytables-users mailing list >>>> Pyt...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >>> Discover the easy way to master current and previous Microsoft >>> technologies >>> and advance your career. Get an incredible 1,500+ hours of step-by-step >>> tutorial videos with LearnDevNow. Subscribe today and save! >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >> Discover the easy way to master current and previous Microsoft >> technologies >> and advance your career. Get an incredible 1,500+ hours of step-by-step >> tutorial videos with LearnDevNow. Subscribe today and save! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Oleksandr H. <guz...@gm...> - 2013-08-28 00:45:06
|
2013/8/27 Anthony Scopatz <sc...@gm...> > > You are right that this loads the entire computed array into memory and is > therefore not optimal. I would do something like the following: > > h = tb.open_file(path, mode="a") > varTable = h.get_node("/", var_name) > coef = 3 * 60 * 60 #output step > c = varTable.cols.field > expr = tb.Expr("c * m", uservars = {"c": c, "m": coef }) > expr.set_output(c) > expr.eval() > varTable.flush() > h.close() > Aha, this is cool. Thanks Anthony. Cheers -- Sasha > >>> On Tue, Aug 27, 2013 at 11:44 AM, Oleksandr Huziy < >>> guz...@gm...> wrote: >>> >>>> Hi All: >>>> >>>> I have a huge table imported from other binary files to hdf, and I >>>> forgot to multiply the data by a factor in one case. Is there an easy way >>>> to multiply a column by a constant factor using pytables? >>>> To modify it in place? >>>> >>>> Thank you >>>> >>>> -- >>>> Sasha >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >>>> Discover the easy way to master current and previous Microsoft >>>> technologies >>>> and advance your career. Get an incredible 1,500+ hours of step-by-step >>>> tutorial videos with LearnDevNow. Subscribe today and save! >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Pytables-users mailing list >>>> Pyt...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >>> Discover the easy way to master current and previous Microsoft >>> technologies >>> and advance your career. Get an incredible 1,500+ hours of step-by-step >>> tutorial videos with LearnDevNow. Subscribe today and save! >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >> Discover the easy way to master current and previous Microsoft >> technologies >> and advance your career. Get an incredible 1,500+ hours of step-by-step >> tutorial videos with LearnDevNow. Subscribe today and save! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2013-08-27 23:58:39
|
On Tue, Aug 27, 2013 at 6:50 PM, Oleksandr Huziy <guz...@gm...>wrote: > Hi Again: > > > 2013/8/27 Anthony Scopatz <sc...@gm...> > >> Hey Sasha, >> >> You probably want to look at the Expr class [1] where you set "out" to be >> the same as the original array. >> >> Be Well >> Anthony >> >> 1. http://pytables.github.io/usersguide/libref/expr_class.html >> > > > I just wanted to make sure if it is possible to use an assignment in > expressions? (this gives me a syntax error exception, complains about the > equal sign in the expression) > Hi Sasha, Assignment is a statement not an expression, so it is not possible to use here. This is why you are getting a syntax error. > > h = tb.open_file(path, mode="a") > varTable = h.get_node("/", var_name) > coef = 3 * 60 * 60 #output step > expr = tb.Expr("c = c * m", uservars = {"c": varTable.cols.field, "m": > coef }) > expr.eval() > varTable.flush() > h.close() > > Is this an optimal way of multiplying a column? (this one works, but I > think it loads all the data into memory...right?) > > expr = tb.Expr("c * m", uservars = {"c": varTable.cols.field, "m": > coef }) > varTable.cols.field[:] = expr.eval() > You are right that this loads the entire computed array into memory and is therefore not optimal. I would do something like the following: h = tb.open_file(path, mode="a") varTable = h.get_node("/", var_name) coef = 3 * 60 * 60 #output step c = varTable.cols.field expr = tb.Expr("c = c * m", uservars = {"c": c, "m": coef }) expr.set_output(c) expr.eval() varTable.flush() h.close() Be Well Anthony > > Thank you > > Cheers > > >> >> >> On Tue, Aug 27, 2013 at 11:44 AM, Oleksandr Huziy <guz...@gm... >> > wrote: >> >>> Hi All: >>> >>> I have a huge table imported from other binary files to hdf, and I >>> forgot to multiply the data by a factor in one case. Is there an easy way >>> to multiply a column by a constant factor using pytables? >>> To modify it in place? >>> >>> Thank you >>> >>> -- >>> Sasha >>> >>> >>> ------------------------------------------------------------------------------ >>> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >>> Discover the easy way to master current and previous Microsoft >>> technologies >>> and advance your career. Get an incredible 1,500+ hours of step-by-step >>> tutorial videos with LearnDevNow. Subscribe today and save! >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> ------------------------------------------------------------------------------ >> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >> Discover the easy way to master current and previous Microsoft >> technologies >> and advance your career. Get an incredible 1,500+ hours of step-by-step >> tutorial videos with LearnDevNow. Subscribe today and save! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Oleksandr H. <guz...@gm...> - 2013-08-27 23:50:43
|
Hi Again: 2013/8/27 Anthony Scopatz <sc...@gm...> > Hey Sasha, > > You probably want to look at the Expr class [1] where you set "out" to be > the same as the original array. > > Be Well > Anthony > > 1. http://pytables.github.io/usersguide/libref/expr_class.html > I just wanted to make sure if it is possible to use an assignment in expressions? (this gives me a syntax error exception, complains about the equal sign in the expression) h = tb.open_file(path, mode="a") varTable = h.get_node("/", var_name) coef = 3 * 60 * 60 #output step expr = tb.Expr("c = c * m", uservars = {"c": varTable.cols.field, "m": coef }) expr.eval() varTable.flush() h.close() Is this an optimal way of multiplying a column? (this one works, but I think it loads all the data into memory...right?) expr = tb.Expr("c * m", uservars = {"c": varTable.cols.field, "m": coef }) varTable.cols.field[:] = expr.eval() Thank you Cheers > > > On Tue, Aug 27, 2013 at 11:44 AM, Oleksandr Huziy <guz...@gm...>wrote: > >> Hi All: >> >> I have a huge table imported from other binary files to hdf, and I forgot >> to multiply the data by a factor in one case. Is there an easy way to >> multiply a column by a constant factor using pytables? >> To modify it in place? >> >> Thank you >> >> -- >> Sasha >> >> >> ------------------------------------------------------------------------------ >> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >> Discover the easy way to master current and previous Microsoft >> technologies >> and advance your career. Get an incredible 1,500+ hours of step-by-step >> tutorial videos with LearnDevNow. Subscribe today and save! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Oleksandr H. <guz...@gm...> - 2013-08-27 19:44:06
|
Thank you Anthony. Cheers 2013/8/27 Anthony Scopatz <sc...@gm...> > Hey Sasha, > > You probably want to look at the Expr class [1] where you set "out" to be > the same as the original array. > > Be Well > Anthony > > 1. http://pytables.github.io/usersguide/libref/expr_class.html > > > On Tue, Aug 27, 2013 at 11:44 AM, Oleksandr Huziy <guz...@gm...>wrote: > >> Hi All: >> >> I have a huge table imported from other binary files to hdf, and I forgot >> to multiply the data by a factor in one case. Is there an easy way to >> multiply a column by a constant factor using pytables? >> To modify it in place? >> >> Thank you >> >> -- >> Sasha >> >> >> ------------------------------------------------------------------------------ >> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! >> Discover the easy way to master current and previous Microsoft >> technologies >> and advance your career. Get an incredible 1,500+ hours of step-by-step >> tutorial videos with LearnDevNow. Subscribe today and save! >> >> http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2013-08-27 18:38:07
|
Hey Sasha, You probably want to look at the Expr class [1] where you set "out" to be the same as the original array. Be Well Anthony 1. http://pytables.github.io/usersguide/libref/expr_class.html On Tue, Aug 27, 2013 at 11:44 AM, Oleksandr Huziy <guz...@gm...>wrote: > Hi All: > > I have a huge table imported from other binary files to hdf, and I forgot > to multiply the data by a factor in one case. Is there an easy way to > multiply a column by a constant factor using pytables? > To modify it in place? > > Thank you > > -- > Sasha > > > ------------------------------------------------------------------------------ > Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more! > Discover the easy way to master current and previous Microsoft technologies > and advance your career. Get an incredible 1,500+ hours of step-by-step > tutorial videos with LearnDevNow. Subscribe today and save! > http://pubads.g.doubleclick.net/gampad/clk?id=58040911&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Oleksandr H. <guz...@gm...> - 2013-08-27 16:44:37
|
Hi All: I have a huge table imported from other binary files to hdf, and I forgot to multiply the data by a factor in one case. Is there an easy way to multiply a column by a constant factor using pytables? To modify it in place? Thank you -- Sasha |
From: Gabriel J.L. B. <pyt...@gb...> - 2013-08-08 15:25:34
|
Anthony Scopatz <sc...@gm...> schreef: > Are you using compression on this EArray? This method is basically a thin > wrapper over some HDF5 functions. I think that the data that you are asking > for (inadvertently, maybe) is just expensive to get. No, no compression. But I saw this is one of the first pytables data sets I created years ago. The chunk size was not chosen well. I improved that now (better chunk size/shape, transposed axes, and using CArray) and things are roughly 50% faster. But I still don't understand why so much data is apparently being read when I only want to know which children (i.e. the leaf names) a group contains. To do this in my program I loop over _v_children.items(), i.e., like, d = {} for label, node in f.root.recordings.AB_5000._v_children.items(): d[label] = node I would have expected code like this to yield a dictionary with node objects, without reading/inspecting the data content that nodes contain. But apparently under the hood HDF5 is looking at the contents of the nodes, which takes a while if they are large, especially over a usb3 connection. It is not reading the full array into RAM, because the memory footprint of the python session doesn't increase appreciably if I run the code above. Thanks, all the best, Gabriel |
From: Anthony S. <sc...@gm...> - 2013-08-08 07:06:04
|
Hi David, I think that you can do what you want in one, rather long line: hfile.createTable(grp, 'signal', description=np.array(zip(some_func(t, v)), dtype=[('time', np.float64), ('value', np.float64)])) Or two nicer lines: arr = np.array(zip(some_func(t, v)), dtype=[('time', np.float64), ('value', np.float64)]) hfile.createTable(grp, 'signal', description=arr) zip() is your friend =). If zip is too slow and you don't want to make more than one copy, you could try something like this: temparr = np.array(some_func(t, v)).T arr = np.view(temparr, dtype=[('time', np.float64), ('value', np.float64)]) This really only works because both columns have the same dtype. Of course, you can always keep basically what you have and loop through the column names programmaticly: for name, col in zip(A.dtype.names, some_func(t, v)): A[name] = col I hope this helps! Be Well Anthony On Wed, Aug 7, 2013 at 5:58 PM, David Reed <dav...@gm...> wrote: > Hi there, > > I have some generic functions that take time series data with 2 numpy > array arguments, time and value, and return 2 numpy arrays of time and > value. > > I would like to place these arrays into a Numpy structured array or > directly into a new pytables table with fields, time and value. > > Now Ive found I could do this: > > t, v = some_func(t, v) > > A = np.empty(len(t), dtype=[('time', np.float64), ('value', > np.float64)]) > > A['time'] = t > A['value'] = v > > hfile.createTable(grp, 'signal', description=A) > hfile.flush() > > But this seems rather clunky and inefficient. Any suggestions to make > this repackaging a little smoother? > > > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: David R. <dav...@gm...> - 2013-08-08 00:58:35
|
Hi there, I have some generic functions that take time series data with 2 numpy array arguments, time and value, and return 2 numpy arrays of time and value. I would like to place these arrays into a Numpy structured array or directly into a new pytables table with fields, time and value. Now Ive found I could do this: t, v = some_func(t, v) A = np.empty(len(t), dtype=[('time', np.float64), ('value', np.float64)]) A['time'] = t A['value'] = v hfile.createTable(grp, 'signal', description=A) hfile.flush() But this seems rather clunky and inefficient. Any suggestions to make this repackaging a little smoother? |
From: Anthony S. <sc...@gm...> - 2013-08-07 21:03:26
|
Hi Jason, A key-value store pattern is definitely supported. However, be forewarned that groups are implemented using B-trees, not hash tables. However, with data of your size most of the access time will be in the leaf nodes and not getting the group. I'd say try it out and see. Be Well Anthony On Wed, Aug 7, 2013 at 11:33 AM, Xianli Xu <xia...@gm...> wrote: > Hi all, > > I'm developing data processing service and evaluating if Pytable. Since > hdf5 supports hierarchical data like a tree of folder, can I use such a > tree-like structure as a K-V store like possibly store million of tables or > arrays under one group and randomly access any one of them in O(1) time? > e.g. > > root/ > user_log/ > uid1-> table / array, (of tens of thousand rows / > elements, ETL'ed user log info in int format) > uid2-> table / array, > uid3-> table / array, > uid4-> table / array, > uid5-> table / array, > …… (perhaps million user) > > Just wondering how the hierarchical structure is implemented and such > usage pattern is supported? if no, is there any running or better way to > store such type of information? We adopt Pytables because the data is > stored in higher density, faster loaded and no ACID / concurrency overhead, > so traditional DB and no-sql db is not our option.. > > Thanks, > Jason > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Chao Y. <cha...@gm...> - 2013-08-07 19:44:15
|
Thanks Anthony, I think I will give a try, apprently at some stage I would like to flush the data into disk :p cheers, Chao On Wed, Aug 7, 2013 at 6:44 PM, Anthony Scopatz <sc...@gm...> wrote: > On Wed, Aug 7, 2013 at 5:44 AM, Chao YUE <cha...@gm...> wrote: > >> Dear all, >> >> I have a hierachical nested python dictionaries with the end of the >> branch as either pandas dataframe, or np.ndarray or list or plain scalars. >> >> let's say the different levels of keys are: >> >> 1st level: ['top1', 'top2', 'top3'] >> 2nd level: ['mid1','mid2','mid3'] >> 3rd level: ['bot1','bot2','bot3','bot4'] >> >> I think I am looking for some data strucuture that allow easy retrieving >> of the data at different levels as dictionaries (I cannot think out >> something better yet). >> >> for example: data.ix['top1',:,'bot1'] will have keys only at the middle >> levels. >> >> I have a quick look of pytables document but not very sure, should I use >> pytables for this purpose? >> > > Hello Chao, > > If you are only ever going to use this data structure in memory, you > shouldn't use pytables. If you are going to persist this information to > disk than pytables is a great choice! Every dictionary will become a group > and every leaf data structure will become an Array or a Table. > > Be Well > Anthony > > >> >> thanks a lot for any idea. >> >> cheers, >> >> Chao >> >> -- >> >> *********************************************************************************** >> Chao YUE >> Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) >> UMR 1572 CEA-CNRS-UVSQ >> Batiment 712 - Pe 119 >> 91191 GIF Sur YVETTE Cedex >> Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 >> >> ************************************************************************************ >> >> >> ------------------------------------------------------------------------------ >> Get 100% visibility into Java/.NET code with AppDynamics Lite! >> It's a free troubleshooting tool designed for production. >> Get down to code-level detail for bottlenecks, with <2% overhead. >> Download for free and get started troubleshooting in minutes. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ |
From: Xianli Xu <xia...@gm...> - 2013-08-07 18:40:08
|
oops sorry, seem auto-correction of my email client created some typo for me : P here's the corrections, On 8 Aug, 2013, at 2:33 AM, Xianli Xu <xia...@gm...> wrote: > Hi all, > > I'm developing data processing service and evaluating if Pytable. Since hdf5 supports hierarchical data like a tree of folder, can I use such a tree-like structure as a K-V store like possibly store million of tables or arrays under one group and randomly access any one of them in O(1) time? e.g. > > root/ > user_log/ > uid1-> table / array, (of tens of thousand rows / elements, ETL'ed user log info in int format) > uid2-> table / array, > uid3-> table / array, > uid4-> table / array, > uid5-> table / array, > …… (perhaps million user) > > Just wondering how the hierarchical structure is implemented and such usage pattern is supported? if no, is there any running or better way to store such type of information? We adopt Pytables because the data is stored in running -> tuning > higher density, faster loaded and no ACID / concurrency overhead, so traditional DB and no-sql db is not our option.. > > Thanks, > Jason |
From: Xianli Xu <xia...@gm...> - 2013-08-07 18:33:42
|
Hi all, I'm developing data processing service and evaluating if Pytable. Since hdf5 supports hierarchical data like a tree of folder, can I use such a tree-like structure as a K-V store like possibly store million of tables or arrays under one group and randomly access any one of them in O(1) time? e.g. root/ user_log/ uid1-> table / array, (of tens of thousand rows / elements, ETL'ed user log info in int format) uid2-> table / array, uid3-> table / array, uid4-> table / array, uid5-> table / array, …… (perhaps million user) Just wondering how the hierarchical structure is implemented and such usage pattern is supported? if no, is there any running or better way to store such type of information? We adopt Pytables because the data is stored in higher density, faster loaded and no ACID / concurrency overhead, so traditional DB and no-sql db is not our option.. Thanks, Jason |
From: Anthony S. <sc...@gm...> - 2013-08-07 17:14:08
|
On Wed, Aug 7, 2013 at 4:39 AM, Gabriel J.L. Beckers < pyt...@gb...> wrote: > Hi, > > I don't know if this is related in any way to Gergo's problem, but I > have slow responses when querying which children a group contains, if > that group contains big leafs. I am using pytables 2.5 and hdf5 1.8.9 > on linux 64 bit. > > Specifically, I found that using the _g_get_objinfo method (which is > used by other methods that I use) is slow when used on a large leaf. > The slowness is proportional to the size of the leaf. It is almost as > if some process is actually reading the data instead of just info on > the type of data. I am noticing this because my data is on an external > usb3 disk. To give you an idea: that method takes almost 80 seconds to > return the string 'Leaf' when used on a 5 Gb EArray. That should > roughly correspond to reading the complete disk-based array. The info > is cached somehow, because if I run the method a second time in the > same python session it is very fast. > > If I copy my hdf5 file to my SSD disk, things are much faster, but > running the method still takes 2 seconds or so on a 5 Gb leaf. > > Is this expected behavior and should I just avoid this method in my > applications, or is something wrong? > Hi Gabriel, Are you using compression on this EArray? This method is basically a thin wrapper over some HDF5 functions. I think that the data that you are asking for (inadvertently, maybe) is just expensive to get. Be Well Anthony > > Best, Gabriel > > Anthony Scopatz <sc...@gm...> schreef: > > > On Mon, Aug 5, 2013 at 4:11 AM, Nyirő Gergő <ger...@gm...> > wrote: > > > >> Hello, > >> > >> > >> We develop a measurement evaluation tool, and we'd like to use > >> pytables/hdf5 as a middle layer for signal accessing. > >> > >> We have to deal with the silly structure of the recorder device > >> measurement format. > >> > >> > >> > >> The signals can be accessed via two identifiers: > >> > >> * device name: <source of the signal>-<channel of the > >> message>-<another tag>-<yet another tag> > >> > >> * signal name > >> > >> > >> > >> The first identifier says the source information of the signal, which > >> can be quite long. > >> > >> Therefore I grouped the device name into two layers: > >> > >> /<source of the signal> > >> > >> /<channel of the message>... > >> > >> /<signal name> > >> > >> > >> > >> So if you have the same message from two channels, than you will get > >> /foo-device-name > >> > >> /channel-1 > >> > >> /bar > >> > >> /baz > >> > >> /channel-2 > >> > >> /bar > >> > >> /baz > >> > >> > >> > >> Besides signal loading, we have to search for signal name as fast as > >> possible, and return with the shortest unique device name part and the > >> signal name. > >> > >> Using the structure above, iterating over the group names is quite > >> slow. So I build up a table from device and signal name. > >> > >> As far as I know, the pytables query does not support string searching > >> (e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us > >> to a pure python loop which is slow again. > >> > >> Therefore I build up a python dictionary from the table, which provide > >> fast iteration against the table, but the init time increased from 100 > >> ms to 3-4 sec (we have more than 40 000 signals). > >> > >> > >> > >> Do you have any advice how to search for group names in hdf5 with > >> pytables in an efficient way? > >> > > > > Hi grego, > > > > Searching through group names, like accessing all HDF5 metadata, is slow. > > For group names this is because rather than searching through a list you > > are traversing a B-tree, IIRC. So you have to use the couple of tricks > > that you used: 1) have another Table / Array of all table names, 2) read > > this in once to a native Python data structure (dict here). > > > > However, 4 sec to read in this table seems excessive for data of this > size. > > You are probably not reading this in properly. You should be using: > > > > raw_grps = f.root.grp_names[:] > > > > or similar. > > > > Maybe other people have some other ideas. > > > > Be Well > > Anthony > > > > > >> > >> ps: I would be most happy with a glob interface. > >> > >> > >> > >> thanks for your advices in advance, > >> > >> gergo > >> > >> > >> > ------------------------------------------------------------------------------ > >> Get your SQL database under version control now! > >> Version control is standard for application code, but databases havent > >> caught up. So what steps can you take to put your SQL databases under > >> version control? Why should you start doing it? Read more to find out. > >> > http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > >> > > > > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Anthony S. <sc...@gm...> - 2013-08-07 16:45:02
|
On Wed, Aug 7, 2013 at 5:44 AM, Chao YUE <cha...@gm...> wrote: > Dear all, > > I have a hierachical nested python dictionaries with the end of the branch > as either pandas dataframe, or np.ndarray or list or plain scalars. > > let's say the different levels of keys are: > > 1st level: ['top1', 'top2', 'top3'] > 2nd level: ['mid1','mid2','mid3'] > 3rd level: ['bot1','bot2','bot3','bot4'] > > I think I am looking for some data strucuture that allow easy retrieving > of the data at different levels as dictionaries (I cannot think out > something better yet). > > for example: data.ix['top1',:,'bot1'] will have keys only at the middle > levels. > > I have a quick look of pytables document but not very sure, should I use > pytables for this purpose? > Hello Chao, If you are only ever going to use this data structure in memory, you shouldn't use pytables. If you are going to persist this information to disk than pytables is a great choice! Every dictionary will become a group and every leaf data structure will become an Array or a Table. Be Well Anthony > > thanks a lot for any idea. > > cheers, > > Chao > > -- > > *********************************************************************************** > Chao YUE > Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) > UMR 1572 CEA-CNRS-UVSQ > Batiment 712 - Pe 119 > 91191 GIF Sur YVETTE Cedex > Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 > > ************************************************************************************ > > > ------------------------------------------------------------------------------ > Get 100% visibility into Java/.NET code with AppDynamics Lite! > It's a free troubleshooting tool designed for production. > Get down to code-level detail for bottlenecks, with <2% overhead. > Download for free and get started troubleshooting in minutes. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Chao Y. <cha...@gm...> - 2013-08-07 12:44:33
|
Dear all, I have a hierachical nested python dictionaries with the end of the branch as either pandas dataframe, or np.ndarray or list or plain scalars. let's say the different levels of keys are: 1st level: ['top1', 'top2', 'top3'] 2nd level: ['mid1','mid2','mid3'] 3rd level: ['bot1','bot2','bot3','bot4'] I think I am looking for some data strucuture that allow easy retrieving of the data at different levels as dictionaries (I cannot think out something better yet). for example: data.ix['top1',:,'bot1'] will have keys only at the middle levels. I have a quick look of pytables document but not very sure, should I use pytables for this purpose? thanks a lot for any idea. cheers, Chao -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 29 02; Fax:01.69.08.77.16 ************************************************************************************ |
From: Gabriel J.L. B. <pyt...@gb...> - 2013-08-07 11:39:31
|
Hi, I don't know if this is related in any way to Gergo's problem, but I have slow responses when querying which children a group contains, if that group contains big leafs. I am using pytables 2.5 and hdf5 1.8.9 on linux 64 bit. Specifically, I found that using the _g_get_objinfo method (which is used by other methods that I use) is slow when used on a large leaf. The slowness is proportional to the size of the leaf. It is almost as if some process is actually reading the data instead of just info on the type of data. I am noticing this because my data is on an external usb3 disk. To give you an idea: that method takes almost 80 seconds to return the string 'Leaf' when used on a 5 Gb EArray. That should roughly correspond to reading the complete disk-based array. The info is cached somehow, because if I run the method a second time in the same python session it is very fast. If I copy my hdf5 file to my SSD disk, things are much faster, but running the method still takes 2 seconds or so on a 5 Gb leaf. Is this expected behavior and should I just avoid this method in my applications, or is something wrong? Best, Gabriel Anthony Scopatz <sc...@gm...> schreef: > On Mon, Aug 5, 2013 at 4:11 AM, Nyirő Gergő <ger...@gm...> wrote: > >> Hello, >> >> >> We develop a measurement evaluation tool, and we'd like to use >> pytables/hdf5 as a middle layer for signal accessing. >> >> We have to deal with the silly structure of the recorder device >> measurement format. >> >> >> >> The signals can be accessed via two identifiers: >> >> * device name: <source of the signal>-<channel of the >> message>-<another tag>-<yet another tag> >> >> * signal name >> >> >> >> The first identifier says the source information of the signal, which >> can be quite long. >> >> Therefore I grouped the device name into two layers: >> >> /<source of the signal> >> >> /<channel of the message>... >> >> /<signal name> >> >> >> >> So if you have the same message from two channels, than you will get >> /foo-device-name >> >> /channel-1 >> >> /bar >> >> /baz >> >> /channel-2 >> >> /bar >> >> /baz >> >> >> >> Besides signal loading, we have to search for signal name as fast as >> possible, and return with the shortest unique device name part and the >> signal name. >> >> Using the structure above, iterating over the group names is quite >> slow. So I build up a table from device and signal name. >> >> As far as I know, the pytables query does not support string searching >> (e.g. startswidth, *foo[0-9]ch*, etc.), so fetching this table lead us >> to a pure python loop which is slow again. >> >> Therefore I build up a python dictionary from the table, which provide >> fast iteration against the table, but the init time increased from 100 >> ms to 3-4 sec (we have more than 40 000 signals). >> >> >> >> Do you have any advice how to search for group names in hdf5 with >> pytables in an efficient way? >> > > Hi grego, > > Searching through group names, like accessing all HDF5 metadata, is slow. > For group names this is because rather than searching through a list you > are traversing a B-tree, IIRC. So you have to use the couple of tricks > that you used: 1) have another Table / Array of all table names, 2) read > this in once to a native Python data structure (dict here). > > However, 4 sec to read in this table seems excessive for data of this size. > You are probably not reading this in properly. You should be using: > > raw_grps = f.root.grp_names[:] > > or similar. > > Maybe other people have some other ideas. > > Be Well > Anthony > > >> >> ps: I would be most happy with a glob interface. >> >> >> >> thanks for your advices in advance, >> >> gergo >> >> >> ------------------------------------------------------------------------------ >> Get your SQL database under version control now! >> Version control is standard for application code, but databases havent >> caught up. So what steps can you take to put your SQL databases under >> version control? Why should you start doing it? Read more to find out. >> http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> |
From: Giovanni L. C. <glc...@gm...> - 2013-08-06 15:07:08
|
hi Anthony and Antonio, thanks for the explanations. I was hoping I could do it programmatically, but if it is an inherent limitation of HDF5 there is little I can do. Compression plus a sensible chunk size should help though. Best, Giovanni On 08/06/2013 04:11 AM, pyt...@li... wrote: > Hi Anthony, hi Giovanni, > > Il giorno 06/ago/2013, alle ore 00:45, Anthony Scopatz<sc...@gm...> ha scritto: > >> On Mon, Aug 5, 2013 at 3:14 PM, Giovanni Luca Ciampaglia<glc...@gm...> wrote: >> Hi Anthony, >> >> what do you mean precisely? I tried >> >> del ca[:,:] >> >> but CArray does not support __delitem__. Looking in the documentation I could >> only find a method called remove_rows, but it's in Table, not CArray. Maybe I am >> missing something? >> >> Huh, it should... This is definitely an oversight on our part. If you could please open an issue for this -- or better yet -- write a pull request that implements delitem, that'd be great! >> >> So I think you are right that there is no current way to delete rows from a CArray. Oops! (Of course, I may still be missing something as well). >> >> It looks like EArray also has this problem too, otherwise I would just tell you to use that. > I'm not sure to understand the problem. > > The "truncate" method of arrays can be used to remove rows from an extendable array. > It seems to me that it is not documented but we should add it to the UG. > > CArrays cannot be resized. > >> Be Well >> Anthony >> >> >> Thank, >> >> Giovanni >> >> On Mon 05 Aug 2013 03:43:42 PM EDT,pyt...@li... >> wrote: >>>> Hello Giovanni, I think you may need to del that slice and then possibly >>>> repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 PM, >>>> Giovanni Luca Ciampaglia <glc...@gm...> wrote: >>>>> Hello all, >>>>> >>>>> is there a way to clear out a chunk from a CArray? I noticed that setting >>>>> the >>>>> data to zero actually takes disk space, i.e. >>>>> >>>>> *** >>>>> from tables import open_file, BoolAtom >>>>> >>>>> h5f = open_file('test.h5', 'w') >>>>> ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), shape=(1000,1000), >>>>> chunkshape=(1,1000)) >>>>> ca[:,:] = False >>>>> h5f.close() >>>>> *** >>>>> >>>>> The resulting file takes 249K ... >>>>> >>>>> Best, >>>>> >>>>> -- >>>>> Giovanni Luca Ciampaglia > HDF5 handles efficiently chunks that have never been written saving some disk space but I doubt that chunks can be "de-initializad". > If my understanding is correct, once one write some value in a chunk (even if it is the default value) the chunk is allocated at HDF5 level and written to disk. > At they point one can only change item values. > Also I doubt that a repack can help in this case (not tested). > > The only solution IMO is compression. > > > cheers > > -- > Antonio Valentino -- Giovanni Luca Ciampaglia Postdoctoral fellow Center for Complex Networks and Systems Research Indiana University ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408 ☞ http://cnets.indiana.edu/ ✉ gci...@in... |
From: Anthony S. <sc...@gm...> - 2013-08-06 08:11:17
|
Hi Antonio, Now that you mention it I think that you are right that there is no way to remove a chunk from an existing data set. If you think about this it makes a lot of sense since you would have to alter the B-tree in strange and unfortunate way. So HDF5 doesn't even try. However, you can fake it by copying over only the data you want to keep, deleting the old data set, and repacking. Or you can fake it by copying over the data you want to keep to a new file. Neither of these are ideal, but they would work. For existing data sets, especially if you are using compression, setting all of the contents of a chunck to to same value should work extraordinarily well. Be Well Anthony On Mon, Aug 5, 2013 at 11:29 PM, Antonio Valentino < ant...@ti...> wrote: > Hi Anthony, hi Giovanni, > > Il giorno 06/ago/2013, alle ore 00:45, Anthony Scopatz <sc...@gm...> > ha scritto: > > > > > On Mon, Aug 5, 2013 at 3:14 PM, Giovanni Luca Ciampaglia < > glc...@gm...> wrote: > > Hi Anthony, > > > > what do you mean precisely? I tried > > > > del ca[:,:] > > > > but CArray does not support __delitem__. Looking in the documentation I > could > > only find a method called remove_rows, but it's in Table, not CArray. > Maybe I am > > missing something? > > > > Huh, it should... This is definitely an oversight on our part. If you > could please open an issue for this -- or better yet -- write a pull > request that implements delitem, that'd be great! > > > > So I think you are right that there is no current way to delete rows > from a CArray. Oops! (Of course, I may still be missing something as > well). > > > > It looks like EArray also has this problem too, otherwise I would just > tell you to use that. > > I'm not sure to understand the problem. > > The "truncate" method of arrays can be used to remove rows from an > extendable array. > It seems to me that it is not documented but we should add it to the UG. > > CArrays cannot be resized. > > > > > Be Well > > Anthony > > > > > > Thank, > > > > Giovanni > > > > On Mon 05 Aug 2013 03:43:42 PM EDT, > pyt...@li... > > wrote: > > > > > >> Hello Giovanni, I think you may need to del that slice and then > possibly > > >> repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 > PM, > > >> Giovanni Luca Ciampaglia < glc...@gm...> wrote: > > >>> Hello all, > > >>> > > >>> is there a way to clear out a chunk from a CArray? I noticed that > setting > > >>> the > > >>> data to zero actually takes disk space, i.e. > > >>> > > >>> *** > > >>> from tables import open_file, BoolAtom > > >>> > > >>> h5f = open_file('test.h5', 'w') > > >>> ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), > shape=(1000,1000), > > >>> chunkshape=(1,1000)) > > >>> ca[:,:] = False > > >>> h5f.close() > > >>> *** > > >>> > > >>> The resulting file takes 249K ... > > >>> > > >>> Best, > > >>> > > >>> -- > > >>> Giovanni Luca Ciampaglia > > > HDF5 handles efficiently chunks that have never been written saving some > disk space but I doubt that chunks can be "de-initializad". > If my understanding is correct, once one write some value in a chunk (even > if it is the default value) the chunk is allocated at HDF5 level and > written to disk. > At they point one can only change item values. > Also I doubt that a repack can help in this case (not tested). > > The only solution IMO is compression. > > > cheers > > -- > Antonio Valentino > > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out. > http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Antonio V. <ant...@ti...> - 2013-08-06 06:30:59
|
Hi Anthony, hi Giovanni, Il giorno 06/ago/2013, alle ore 00:45, Anthony Scopatz <sc...@gm...> ha scritto: > > On Mon, Aug 5, 2013 at 3:14 PM, Giovanni Luca Ciampaglia <glc...@gm...> wrote: > Hi Anthony, > > what do you mean precisely? I tried > > del ca[:,:] > > but CArray does not support __delitem__. Looking in the documentation I could > only find a method called remove_rows, but it's in Table, not CArray. Maybe I am > missing something? > > Huh, it should... This is definitely an oversight on our part. If you could please open an issue for this -- or better yet -- write a pull request that implements delitem, that'd be great! > > So I think you are right that there is no current way to delete rows from a CArray. Oops! (Of course, I may still be missing something as well). > > It looks like EArray also has this problem too, otherwise I would just tell you to use that. I'm not sure to understand the problem. The "truncate" method of arrays can be used to remove rows from an extendable array. It seems to me that it is not documented but we should add it to the UG. CArrays cannot be resized. > > Be Well > Anthony > > > Thank, > > Giovanni > > On Mon 05 Aug 2013 03:43:42 PM EDT, pyt...@li... > wrote: > > > >> Hello Giovanni, I think you may need to del that slice and then possibly > >> repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 PM, > >> Giovanni Luca Ciampaglia < glc...@gm...> wrote: > >>> Hello all, > >>> > >>> is there a way to clear out a chunk from a CArray? I noticed that setting > >>> the > >>> data to zero actually takes disk space, i.e. > >>> > >>> *** > >>> from tables import open_file, BoolAtom > >>> > >>> h5f = open_file('test.h5', 'w') > >>> ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), shape=(1000,1000), > >>> chunkshape=(1,1000)) > >>> ca[:,:] = False > >>> h5f.close() > >>> *** > >>> > >>> The resulting file takes 249K ... > >>> > >>> Best, > >>> > >>> -- > >>> Giovanni Luca Ciampaglia HDF5 handles efficiently chunks that have never been written saving some disk space but I doubt that chunks can be "de-initializad". If my understanding is correct, once one write some value in a chunk (even if it is the default value) the chunk is allocated at HDF5 level and written to disk. At they point one can only change item values. Also I doubt that a repack can help in this case (not tested). The only solution IMO is compression. cheers -- Antonio Valentino |
From: Anthony S. <sc...@gm...> - 2013-08-05 22:46:11
|
On Mon, Aug 5, 2013 at 3:14 PM, Giovanni Luca Ciampaglia < glc...@gm...> wrote: > Hi Anthony, > > what do you mean precisely? I tried > > del ca[:,:] > > but CArray does not support __delitem__. Looking in the documentation I > could > only find a method called remove_rows, but it's in Table, not CArray. > Maybe I am > missing something? > Huh, it should... This is definitely an oversight on our part. If you could please open an issue for this -- or better yet -- write a pull request that implements delitem, that'd be great! So I think you are right that there is no current way to delete rows from a CArray. Oops! (Of course, I may still be missing something as well). It looks like EArray also has this problem too, otherwise I would just tell you to use that. Be Well Anthony > > Thank, > > Giovanni > > On Mon 05 Aug 2013 03:43:42 PM EDT, > pyt...@li... > wrote: > > > >> Hello Giovanni, I think you may need to del that slice and then possibly > >> repack. Hope this helps. Be Well Anthony On Mon, Aug 5, 2013 at 2:09 PM, > >> Giovanni Luca Ciampaglia < glc...@gm...> wrote: > >>> Hello all, > >>> > >>> is there a way to clear out a chunk from a CArray? I noticed that > setting > >>> the > >>> data to zero actually takes disk space, i.e. > >>> > >>> *** > >>> from tables import open_file, BoolAtom > >>> > >>> h5f = open_file('test.h5', 'w') > >>> ca = h5f.create_carray(h5f.root, 'carray', BoolAtom(), > shape=(1000,1000), > >>> chunkshape=(1,1000)) > >>> ca[:,:] = False > >>> h5f.close() > >>> *** > >>> > >>> The resulting file takes 249K ... > >>> > >>> Best, > >>> > >>> -- > >>> Giovanni Luca Ciampaglia > >>> > >>> Postdoctoral fellow > >>> Center for Complex Networks and Systems Research > >>> Indiana University > >>> > >>> ? 910 E 10th St ? Bloomington ? IN 47408 > >>> ?http://cnets.indiana.edu/ > >>> ?gci...@in... > >>> > > > > > > -- > Giovanni Luca Ciampaglia > > Postdoctoral fellow > Center for Complex Networks and Systems Research > Indiana University > > ✎ 910 E 10th St ∙ Bloomington ∙ IN 47408 > ☞ http://cnets.indiana.edu/ > ✉ gci...@in... > > > > ------------------------------------------------------------------------------ > Get your SQL database under version control now! > Version control is standard for application code, but databases havent > caught up. So what steps can you take to put your SQL databases under > version control? Why should you start doing it? Read more to find out. > http://pubads.g.doubleclick.net/gampad/clk?id=49501711&iu=/4140/ostg.clktrk > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |