You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Derek S. <der...@gm...> - 2012-09-25 17:30:35
|
Hi Anthony, It doesn't happen if I set start=0 or seemingly any number below 3257 (though I didn't try them *all*). I am new to PyTables and hdf5, so I'm not sure about the chunksize or if I'm at a boundary. I did however notice that the table's chunkshape is 203, and this happens for exactly 203 sequential records, so I doubt that's a coincidence. The table description is below. Thanks, Derek /events (Table(5988,)) '' description := { "client_id": StringCol(itemsize=24, shape=(), dflt='', pos=0), "data_01": StringCol(itemsize=36, shape=(), dflt='', pos=1), "data_02": StringCol(itemsize=36, shape=(), dflt='', pos=2), "data_03": StringCol(itemsize=36, shape=(), dflt='', pos=3), "data_04": StringCol(itemsize=36, shape=(), dflt='', pos=4), "data_05": StringCol(itemsize=36, shape=(), dflt='', pos=5), "device_id": StringCol(itemsize=36, shape=(), dflt='', pos=6), "id": StringCol(itemsize=36, shape=(), dflt='', pos=7), "timestamp": Time64Col(shape=(), dflt=0.0, pos=8), "type": UInt16Col(shape=(), dflt=0, pos=9), "user_id": StringCol(itemsize=36, shape=(), dflt='', pos=10)} byteorder := 'little' chunkshape := (203,) autoIndex := True colindexes := { "timestamp": Index(9, full, shuffle, zlib(1)).is_CSI=True, "type": Index(9, full, shuffle, zlib(1)).is_CSI=True, "id": Index(9, full, shuffle, zlib(1)).is_CSI=True, "user_id": Index(9, full, shuffle, zlib(1)).is_CSI=True} On Tue, Sep 25, 2012 at 9:32 AM, Anthony Scopatz <sc...@gm...> wrote: > Hi Derek, > > Ok That is very strange. I cannot reproduce this on any of my data. A > quick couple of extra questions: > > 1) Does this still happen when you set start=0? > 2) What is the chunksize of this data set (are you at a boundary)? > 3) Could you send us the full table information, ie repr(table). > > Be Well > Anthony > > > On Tue, Sep 25, 2012 at 12:42 AM, Derek Shockey <der...@gm...> > wrote: >> >> I ran the tests. All 4988 passed. The information it output is: >> >> PyTables version: 2.4.0 >> HDF5 version: 1.8.9 >> NumPy version: 1.6.2 >> Numexpr version: 2.0.1 (not using Intel's VML/MKL) >> Zlib version: 1.2.5 (in Python interpreter) >> LZO version: 2.06 (Aug 12 2011) >> BZIP2 version: 1.0.6 (6-Sept-2010) >> Blosc version: 1.1.3 (2010-11-16) >> Cython version: 0.16 >> Python version: 2.7.3 (default, Jul 6 2012, 00:17:51) >> [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.58)] >> Platform: darwin-x86_64 >> Byte-ordering: little >> Detected cores: 4 >> >> -Derek >> >> On Mon, Sep 24, 2012 at 9:09 PM, Anthony Scopatz <sc...@gm...> >> wrote: >> > Hi Derek, >> > >> > Can you please run the following command and report back what you see? >> > >> > python -c "import tables; tables.test()" >> > >> > Be Well >> > Anthony >> > >> > On Mon, Sep 24, 2012 at 10:56 PM, Derek Shockey >> > <der...@gm...> >> > wrote: >> >> >> >> Hello, >> >> >> >> I'm hoping someone can help me. When I specify start and stop values >> >> for calls to where() and readWhere(), it is returning blatantly >> >> incorrect results: >> >> >> >> >>> table.readWhere("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", >> >> >>> start=3257, stop=table.nrows)[0]['id'] >> >> '7f589d3e-a0e1-4882-b69b-0223a7de3801' >> >> >> >> >>> table.where("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", >> >> >>> start=3257, stop=table.nrows).next()['id'] >> >> '7f589d3e-a0e1-4882-b69b-0223a7de3801' >> >> >> >> This happens with a sequential block of about 150 rows of data, and >> >> each time it seems to be 8 rows off (i.e. the row it returns is 8 rows >> >> ahead of the row it should be returning). If I remove the start and >> >> stop args, it behaves correctly. This seems to be a bug, unless I am >> >> misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0, >> >> and hdf5 1.8.9 on OS X 10.8.2. >> >> >> >> Any ideas? >> >> >> >> Thanks, >> >> Derek >> >> >> >> >> >> >> >> ------------------------------------------------------------------------------ >> >> Live Security Virtual Conference >> >> Exclusive live event will cover all the ways today's security and >> >> threat landscape has changed and how IT managers can respond. >> >> Discussions >> >> will include endpoint security, mobile security and the latest in >> >> malware >> >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> >> _______________________________________________ >> >> Pytables-users mailing list >> >> Pyt...@li... >> >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> > >> > >> > >> > ------------------------------------------------------------------------------ >> > Live Security Virtual Conference >> > Exclusive live event will cover all the ways today's security and >> > threat landscape has changed and how IT managers can respond. >> > Discussions >> > will include endpoint security, mobile security and the latest in >> > malware >> > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> > _______________________________________________ >> > Pytables-users mailing list >> > Pyt...@li... >> > https://lists.sourceforge.net/lists/listinfo/pytables-users >> > >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Anthony S. <sc...@gm...> - 2012-09-25 16:33:08
|
Hi Derek, Ok That is very strange. I cannot reproduce this on any of my data. A quick couple of extra questions: 1) Does this still happen when you set start=0? 2) What is the chunksize of this data set (are you at a boundary)? 3) Could you send us the full table information, ie repr(table). Be Well Anthony On Tue, Sep 25, 2012 at 12:42 AM, Derek Shockey <der...@gm...>wrote: > I ran the tests. All 4988 passed. The information it output is: > > PyTables version: 2.4.0 > HDF5 version: 1.8.9 > NumPy version: 1.6.2 > Numexpr version: 2.0.1 (not using Intel's VML/MKL) > Zlib version: 1.2.5 (in Python interpreter) > LZO version: 2.06 (Aug 12 2011) > BZIP2 version: 1.0.6 (6-Sept-2010) > Blosc version: 1.1.3 (2010-11-16) > Cython version: 0.16 > Python version: 2.7.3 (default, Jul 6 2012, 00:17:51) > [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.58)] > Platform: darwin-x86_64 > Byte-ordering: little > Detected cores: 4 > > -Derek > > On Mon, Sep 24, 2012 at 9:09 PM, Anthony Scopatz <sc...@gm...> > wrote: > > Hi Derek, > > > > Can you please run the following command and report back what you see? > > > > python -c "import tables; tables.test()" > > > > Be Well > > Anthony > > > > On Mon, Sep 24, 2012 at 10:56 PM, Derek Shockey <der...@gm... > > > > wrote: > >> > >> Hello, > >> > >> I'm hoping someone can help me. When I specify start and stop values > >> for calls to where() and readWhere(), it is returning blatantly > >> incorrect results: > >> > >> >>> table.readWhere("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", > >> >>> start=3257, stop=table.nrows)[0]['id'] > >> '7f589d3e-a0e1-4882-b69b-0223a7de3801' > >> > >> >>> table.where("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", > >> >>> start=3257, stop=table.nrows).next()['id'] > >> '7f589d3e-a0e1-4882-b69b-0223a7de3801' > >> > >> This happens with a sequential block of about 150 rows of data, and > >> each time it seems to be 8 rows off (i.e. the row it returns is 8 rows > >> ahead of the row it should be returning). If I remove the start and > >> stop args, it behaves correctly. This seems to be a bug, unless I am > >> misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0, > >> and hdf5 1.8.9 on OS X 10.8.2. > >> > >> Any ideas? > >> > >> Thanks, > >> Derek > >> > >> > >> > ------------------------------------------------------------------------------ > >> Live Security Virtual Conference > >> Exclusive live event will cover all the ways today's security and > >> threat landscape has changed and how IT managers can respond. > Discussions > >> will include endpoint security, mobile security and the latest in > malware > >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > >> _______________________________________________ > >> Pytables-users mailing list > >> Pyt...@li... > >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > > > > > ------------------------------------------------------------------------------ > > Live Security Virtual Conference > > Exclusive live event will cover all the ways today's security and > > threat landscape has changed and how IT managers can respond. Discussions > > will include endpoint security, mobile security and the latest in malware > > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > > _______________________________________________ > > Pytables-users mailing list > > Pyt...@li... > > https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Derek S. <der...@gm...> - 2012-09-25 05:42:24
|
I ran the tests. All 4988 passed. The information it output is: PyTables version: 2.4.0 HDF5 version: 1.8.9 NumPy version: 1.6.2 Numexpr version: 2.0.1 (not using Intel's VML/MKL) Zlib version: 1.2.5 (in Python interpreter) LZO version: 2.06 (Aug 12 2011) BZIP2 version: 1.0.6 (6-Sept-2010) Blosc version: 1.1.3 (2010-11-16) Cython version: 0.16 Python version: 2.7.3 (default, Jul 6 2012, 00:17:51) [GCC 4.2.1 Compatible Apple Clang 3.1 (tags/Apple/clang-318.0.58)] Platform: darwin-x86_64 Byte-ordering: little Detected cores: 4 -Derek On Mon, Sep 24, 2012 at 9:09 PM, Anthony Scopatz <sc...@gm...> wrote: > Hi Derek, > > Can you please run the following command and report back what you see? > > python -c "import tables; tables.test()" > > Be Well > Anthony > > On Mon, Sep 24, 2012 at 10:56 PM, Derek Shockey <der...@gm...> > wrote: >> >> Hello, >> >> I'm hoping someone can help me. When I specify start and stop values >> for calls to where() and readWhere(), it is returning blatantly >> incorrect results: >> >> >>> table.readWhere("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", >> >>> start=3257, stop=table.nrows)[0]['id'] >> '7f589d3e-a0e1-4882-b69b-0223a7de3801' >> >> >>> table.where("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", >> >>> start=3257, stop=table.nrows).next()['id'] >> '7f589d3e-a0e1-4882-b69b-0223a7de3801' >> >> This happens with a sequential block of about 150 rows of data, and >> each time it seems to be 8 rows off (i.e. the row it returns is 8 rows >> ahead of the row it should be returning). If I remove the start and >> stop args, it behaves correctly. This seems to be a bug, unless I am >> misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0, >> and hdf5 1.8.9 on OS X 10.8.2. >> >> Any ideas? >> >> Thanks, >> Derek >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Anthony S. <sc...@gm...> - 2012-09-25 04:12:34
|
PS When I do this on linux all 5077 tests pass for me. On Mon, Sep 24, 2012 at 11:09 PM, Anthony Scopatz <sc...@gm...> wrote: > Hi Derek, > > Can you please run the following command and report back what you see? > > python -c "import tables; tables.test()" > > Be Well > Anthony > > > On Mon, Sep 24, 2012 at 10:56 PM, Derek Shockey <der...@gm...>wrote: > >> Hello, >> >> I'm hoping someone can help me. When I specify start and stop values >> for calls to where() and readWhere(), it is returning blatantly >> incorrect results: >> >> >>> table.readWhere("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", >> start=3257, stop=table.nrows)[0]['id'] >> '7f589d3e-a0e1-4882-b69b-0223a7de3801' >> >> >>> table.where("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", >> start=3257, stop=table.nrows).next()['id'] >> '7f589d3e-a0e1-4882-b69b-0223a7de3801' >> >> This happens with a sequential block of about 150 rows of data, and >> each time it seems to be 8 rows off (i.e. the row it returns is 8 rows >> ahead of the row it should be returning). If I remove the start and >> stop args, it behaves correctly. This seems to be a bug, unless I am >> misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0, >> and hdf5 1.8.9 on OS X 10.8.2. >> >> Any ideas? >> >> Thanks, >> Derek >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > |
From: Anthony S. <sc...@gm...> - 2012-09-25 04:09:53
|
Hi Derek, Can you please run the following command and report back what you see? python -c "import tables; tables.test()" Be Well Anthony On Mon, Sep 24, 2012 at 10:56 PM, Derek Shockey <der...@gm...>wrote: > Hello, > > I'm hoping someone can help me. When I specify start and stop values > for calls to where() and readWhere(), it is returning blatantly > incorrect results: > > >>> table.readWhere("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", > start=3257, stop=table.nrows)[0]['id'] > '7f589d3e-a0e1-4882-b69b-0223a7de3801' > > >>> table.where("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", > start=3257, stop=table.nrows).next()['id'] > '7f589d3e-a0e1-4882-b69b-0223a7de3801' > > This happens with a sequential block of about 150 rows of data, and > each time it seems to be 8 rows off (i.e. the row it returns is 8 rows > ahead of the row it should be returning). If I remove the start and > stop args, it behaves correctly. This seems to be a bug, unless I am > misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0, > and hdf5 1.8.9 on OS X 10.8.2. > > Any ideas? > > Thanks, > Derek > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Derek S. <der...@gm...> - 2012-09-25 03:56:18
|
Hello, I'm hoping someone can help me. When I specify start and stop values for calls to where() and readWhere(), it is returning blatantly incorrect results: >>> table.readWhere("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", start=3257, stop=table.nrows)[0]['id'] '7f589d3e-a0e1-4882-b69b-0223a7de3801' >>> table.where("id == 'ceec536a-394e-4dd7-a182-eea557f3bb93'", start=3257, stop=table.nrows).next()['id'] '7f589d3e-a0e1-4882-b69b-0223a7de3801' This happens with a sequential block of about 150 rows of data, and each time it seems to be 8 rows off (i.e. the row it returns is 8 rows ahead of the row it should be returning). If I remove the start and stop args, it behaves correctly. This seems to be a bug, unless I am misunderstanding something. I'm using Python 2.7.3, PyTables 2.4.0, and hdf5 1.8.9 on OS X 10.8.2. Any ideas? Thanks, Derek |
From: Ümit S. <uem...@gm...> - 2012-09-24 13:35:21
|
With CArrays you can only have one specific type for the array (int, float, etc) whereas with a table each column can have a different type (string, float, etc). If you want to replicate this with carray, you would have to have multiple carray's for each type. I think for storing numerical data where querying isn't that important, carrays are just fine. But even if you have to query, you can replicate the indexing behavior for example by adding a second carray with the values you want to index. On Mon, Sep 24, 2012 at 3:27 PM, Luke Lee <dur...@gm...> wrote: > Thanks for the information guys. I have joined the dev group on Google > groups. I'm sure I can learn a lot just by watching the discussions. > > Also, I think for my current situation I'm going to stick with Pytables > carrays. We already have Pytables as a dependency, and we are using it for > some other stuff in the project as well. I will definitely keep the > stand-alone carray project in mind for the future though. > > I guess by using Pytables.carrays I'm just losing the ability to query, > etc.? Are there any other downsides in a Pytables.carray vs. Pytables.table > comparison? > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Luke L. <dur...@gm...> - 2012-09-24 13:28:02
|
Thanks for the information guys. I have joined the dev group on Google groups. I'm sure I can learn a lot just by watching the discussions. Also, I think for my current situation I'm going to stick with Pytables carrays. We already have Pytables as a dependency, and we are using it for some other stuff in the project as well. I will definitely keep the stand-alone carray project in mind for the future though. I guess by using Pytables.carrays I'm just losing the ability to query, etc.? Are there any other downsides in a Pytables.carray vs. Pytables.table comparison? |
From: Anthony S. <sc...@gm...> - 2012-09-21 21:01:05
|
On Fri, Sep 21, 2012 at 4:55 PM, Francesc Alted <fa...@gm...> wrote: > On 9/21/12 10:07 PM, Anthony Scopatz wrote: > > On Fri, Sep 21, 2012 at 10:49 AM, Luke Lee <dur...@gm... > > <mailto:dur...@gm...>> wrote: > > > > Hi again, > > > > I haven't been getting the updates via email so I'm attempting to > > post again to respond. > > > > Thanks everyone for the suggestions. I have a few questions: > > > > 1. What is the benefit of using the stand-alone carray project > > (https://github.com/FrancescAlted/carray) vs Pytables.carray? > > > > > > Hello Luke, > > > > carrays are in-memory, not on disk. > > Well, that was true until version 0.5 where disk persistency was > introduced. Now, carray supports both in-memory and on-disk objects, > and they work exactly in the same way. > Sorry for not being exactly up to date ;) > > -- > Francesc Alted > > > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Francesc A. <fa...@gm...> - 2012-09-21 20:55:20
|
On 9/21/12 10:07 PM, Anthony Scopatz wrote: > On Fri, Sep 21, 2012 at 10:49 AM, Luke Lee <dur...@gm... > <mailto:dur...@gm...>> wrote: > > Hi again, > > I haven't been getting the updates via email so I'm attempting to > post again to respond. > > Thanks everyone for the suggestions. I have a few questions: > > 1. What is the benefit of using the stand-alone carray project > (https://github.com/FrancescAlted/carray) vs Pytables.carray? > > > Hello Luke, > > carrays are in-memory, not on disk. Well, that was true until version 0.5 where disk persistency was introduced. Now, carray supports both in-memory and on-disk objects, and they work exactly in the same way. -- Francesc Alted |
From: Anthony S. <sc...@gm...> - 2012-09-21 20:08:14
|
On Fri, Sep 21, 2012 at 10:49 AM, Luke Lee <dur...@gm...> wrote: > Hi again, > > I haven't been getting the updates via email so I'm attempting to post > again to respond. > > Thanks everyone for the suggestions. I have a few questions: > > 1. What is the benefit of using the stand-alone carray project ( > https://github.com/FrancescAlted/carray) vs Pytables.carray? > Hello Luke, carrays are in-memory, not on disk. > 2. I realized my code base never uses the query functionality of a Table. > So, I changed all my columns to be just Pytables.carray objects instead. > They are all sitting at the top of the hierarchy, just below root. Is > this a good idea? > > I see a big speed increase from this obviously because now everything is > stored contiguously. However, are there any downsides to doing this? I > suppose I could also use EArray, but we are never actually changing the > data once it is stored in HDF5. > If it works for you, then great! > 3. Is compression automatically happening with the Carray? I know the > documentation says that compression is supported, but what do I need to do > to enable it? Maybe it's already happening and this is contributing to my > big speed improvement. > For compression to be enabled, you need to define the appropriate filter [1] on either the node or the file. 4. I would certainly love to take a look at contributing something like > this in my free time. I don't have a whole lot at this time so the changes > could take a while. I'm sure I need to learn a lot more about the codebase > before really giving it a try. I'm going to take a look at this though, > thanks for the suggestion! > No problem ;) > 5. How do I subscribe to the dev mailing list? I only see announcements > and users. > Here is the dev list site: https://groups.google.com/forum/?fromgroups#!forum/pytables-dev > 6. Any idea why I'm not getting the emails from the list? I signed up 2 > days ago and didn't get any of your replies via email. > We have been having problems with this list. I think It might be time to transition... Be Well Anthony 1. http://pytables.github.com/usersguide/libref/helper_classes.html?highlight=filter#tables.Filters > > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Luke L. <dur...@gm...> - 2012-09-21 14:50:10
|
Hi again, I haven't been getting the updates via email so I'm attempting to post again to respond. Thanks everyone for the suggestions. I have a few questions: 1. What is the benefit of using the stand-alone carray project ( https://github.com/FrancescAlted/carray) vs Pytables.carray? 2. I realized my code base never uses the query functionality of a Table. So, I changed all my columns to be just Pytables.carray objects instead. They are all sitting at the top of the hierarchy, just below root. Is this a good idea? I see a big speed increase from this obviously because now everything is stored contiguously. However, are there any downsides to doing this? I suppose I could also use EArray, but we are never actually changing the data once it is stored in HDF5. 3. Is compression automatically happening with the Carray? I know the documentation says that compression is supported, but what do I need to do to enable it? Maybe it's already happening and this is contributing to my big speed improvement. 4. I would certainly love to take a look at contributing something like this in my free time. I don't have a whole lot at this time so the changes could take a while. I'm sure I need to learn a lot more about the codebase before really giving it a try. I'm going to take a look at this though, thanks for the suggestion! 5. How do I subscribe to the dev mailing list? I only see announcements and users. 6. Any idea why I'm not getting the emails from the list? I signed up 2 days ago and didn't get any of your replies via email. Thanks! |
From: Alvaro T. C. <al...@mi...> - 2012-09-21 09:50:34
|
Hi! You may want to have a look | reuse | combine your approach with that implemented in pandas (pandas.io.pytables.HDFStore) https://github.com/pydata/pandas/blob/master/pandas/io/pytables.py (see _write_array method) A certain liberality in Pandas with dtypes (partly induced by the missing data problem) leads to VLArrays being created often that might be not the most performant solution. But if the types of the columns in the data frames are guessed right, then CArrays embedded in groups will be used, as far as I understand (as suggested above). Best, -á. On 21 September 2012 01:14, Anthony Scopatz <sc...@gm...> wrote: > Luke, > > I'd also like to mention, that if you don't want to wait for us to implement > this we will gladly take contributions ;). If you need help getting started > or throughout the process we are also happy to provide that too. Please > sign up for PyTables Dev (pyt...@go...) so we move > implementation discussions away from users. Clearly, people would benefit > from you taking this upon yourself, should you choose to accept this > mission! > > Be Well > Anthony > > On Thu, Sep 20, 2012 at 3:26 PM, Josh Ayers <jos...@gm...> wrote: >> >> Depending on your use case, you may be able to get around this by storing >> each column in its own table. That will effectively store the data in >> column-first order. Instead of creating a table, you would create a group, >> which then contains a separate table for each column. >> >> If you want, you can wrap all the functionality you need in a single >> object that hides the complexity and makes it act just like a single table. >> I did something similar to this recently and it's worked well. However, I >> wasn't too concerned with exactly matching the Table API or implementing all >> of its features. >> >> Creating a more general version that does duplicate the Table class >> interface and can be included in PyTables is definitely possible and is >> something I'd like to do, but I've never had the necessary time to dedicate >> to it. >> >> Hope that helps, >> Josh >> >> >> >> On Wed, Sep 19, 2012 at 10:56 AM, Francesc Alted <fa...@py...> >> wrote: >>> >>> On 9/19/12 3:37 PM, Luke Lee wrote: >>> > Hi all, >>> > >>> > I'm attempting to optimize my HDF5/pytables application for reading >>> > entire columns at a time. I was wondering what the best way to go >>> > about this is. >>> > >>> > My HDF5 has the following properties: >>> > >>> > - 400,000+ rows >>> > - 25 columns >>> > - 147 MB in total size >>> > - 1 string column of size 12 >>> > - 1 column of type 'Float' >>> > - 23 columns of type 'Float64' >>> > >>> > My access pattern for this data is generally to read an entire column >>> > out at a time. So, I want to minimize the number of disk accesses >>> > this takes and store data contiguously by column. >>> >>> To start with, you must be aware that the Table object stores data in >>> row-order, not column order. In practice, that means that whenever you >>> want to access a single column, you will need to traverse the *entire* >>> table. >>> >>> I always wished to implement a column-order table in PyTables, but that >>> did not happen in the end. >>> >>> > >>> > I think the proper way to do this via HDF5 is to use 'chunking.' I'm >>> > creating my HDF5 files via Pytables so I guess using the 'chunkshape' >>> > parameter during creation is the correct way to do this? >>> >>> Yes, it is. >>> >>> > >>> > All of the HDF5 documentation I read discusses 'chunksize' in terms of >>> > rows and columns. However, the Pytables 'chunkshape' parameter only >>> > takes a single number. I looked through the source and see that I can >>> > in fact pass a tuple, which I assume is (row, column) as the HDF5 >>> > documentation would suggest. >>> >>> Not quite. The Table object is actually an uni-dimensional beast, but >>> with a 'compound' datatype (that in some way can be regarded as another >>> dimension, but it is not a 'true' dimension). >>> >>> > >>> > Is it best to use the 'expectedrows' parameter instead of the >>> > 'chunkshape' or use both? >>> >>> You can try both. The `expectedrows` parameter was introduced to ease >>> the life of users, and it 'optimizes' the `chunkshape` but for 'normal' >>> usage. For specific requirements, playing directly with the >>> `chunkshape` normally gives better results. >>> >>> > >>> > I have done some debugging/profiling and discovered that my default >>> > chunkshape is 321 for this dataset. I have increased this to 1000 and >>> > see quite a bit better speeds. I'm sure I could keep changing these >>> > numbers and find what is best for this particular dataset. However, >>> > I'm seeking a bit more knowledge on how Pytables uses each of these >>> > parameters, how they relate to the HDF5 'chunking' concept and >>> > best-practices. This will help me to understand how to optimize in >>> > the future instead of just for this particular dataset. Is there any >>> > documentation on best practices for using the 'expectedrows' and >>> > 'chunkshape' parameters? >>> >>> Well, there is: >>> >>> http://pytables.github.com/usersguide/optimization.html >>> >>> but I'm sure you already know this. >>> >>> Frankly, if you want to enhance the speed of column retrieval, you are >>> going to need an object that is stored in column-order. In this sense, >>> you may want to experiment with the ctable object in carray package >>> (https://github.com/FrancescAlted/carray). It supports barely the same >>> capabilities than the Table object, but the column-order is implemented >>> properly, so probably a ctable will buy you a nice speed-up. >>> >>> > >>> > Thank you for your time, >>> >>> Hope this helps, >>> >>> -- >>> Francesc Alted >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> >> >> >> ------------------------------------------------------------------------------ >> Everyone hates slow websites. So do we. >> Make your web apps faster with AppDynamics >> Download AppDynamics Lite for free today: >> http://ad.doubleclick.net/clk;258768047;13503038;j? >> http://info.appdynamics.com/FreeJavaPerformanceDownload.html >> >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > > ------------------------------------------------------------------------------ > Got visibility? > Most devs has no idea what their production app looks like. > Find out how fast your code is with AppDynamics Lite. > http://ad.doubleclick.net/clk;262219671;13503038;y? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Anthony S. <sc...@gm...> - 2012-09-21 00:15:21
|
Luke, I'd also like to mention, that if you don't want to wait for us to implement this we will gladly take contributions ;). If you need help getting started or throughout the process we are also happy to provide that too. Please sign up for PyTables Dev (pyt...@go...) so we move implementation discussions away from users. Clearly, people would benefit from you taking this upon yourself, should you choose to accept this mission! Be Well Anthony On Thu, Sep 20, 2012 at 3:26 PM, Josh Ayers <jos...@gm...> wrote: > Depending on your use case, you may be able to get around this by storing > each column in its own table. That will effectively store the data in > column-first order. Instead of creating a table, you would create a group, > which then contains a separate table for each column. > > If you want, you can wrap all the functionality you need in a single > object that hides the complexity and makes it act just like a single > table. I did something similar to this recently and it's worked well. > However, I wasn't too concerned with exactly matching the Table API or > implementing all of its features. > > Creating a more general version that does duplicate the Table class > interface and can be included in PyTables is definitely possible and is > something I'd like to do, but I've never had the necessary time to dedicate > to it. > > Hope that helps, > Josh > > > > On Wed, Sep 19, 2012 at 10:56 AM, Francesc Alted <fa...@py...>wrote: > >> On 9/19/12 3:37 PM, Luke Lee wrote: >> > Hi all, >> > >> > I'm attempting to optimize my HDF5/pytables application for reading >> > entire columns at a time. I was wondering what the best way to go >> > about this is. >> > >> > My HDF5 has the following properties: >> > >> > - 400,000+ rows >> > - 25 columns >> > - 147 MB in total size >> > - 1 string column of size 12 >> > - 1 column of type 'Float' >> > - 23 columns of type 'Float64' >> > >> > My access pattern for this data is generally to read an entire column >> > out at a time. So, I want to minimize the number of disk accesses >> > this takes and store data contiguously by column. >> >> To start with, you must be aware that the Table object stores data in >> row-order, not column order. In practice, that means that whenever you >> want to access a single column, you will need to traverse the *entire* >> table. >> >> I always wished to implement a column-order table in PyTables, but that >> did not happen in the end. >> >> > >> > I think the proper way to do this via HDF5 is to use 'chunking.' I'm >> > creating my HDF5 files via Pytables so I guess using the 'chunkshape' >> > parameter during creation is the correct way to do this? >> >> Yes, it is. >> >> > >> > All of the HDF5 documentation I read discusses 'chunksize' in terms of >> > rows and columns. However, the Pytables 'chunkshape' parameter only >> > takes a single number. I looked through the source and see that I can >> > in fact pass a tuple, which I assume is (row, column) as the HDF5 >> > documentation would suggest. >> >> Not quite. The Table object is actually an uni-dimensional beast, but >> with a 'compound' datatype (that in some way can be regarded as another >> dimension, but it is not a 'true' dimension). >> >> > >> > Is it best to use the 'expectedrows' parameter instead of the >> > 'chunkshape' or use both? >> >> You can try both. The `expectedrows` parameter was introduced to ease >> the life of users, and it 'optimizes' the `chunkshape` but for 'normal' >> usage. For specific requirements, playing directly with the >> `chunkshape` normally gives better results. >> >> > >> > I have done some debugging/profiling and discovered that my default >> > chunkshape is 321 for this dataset. I have increased this to 1000 and >> > see quite a bit better speeds. I'm sure I could keep changing these >> > numbers and find what is best for this particular dataset. However, >> > I'm seeking a bit more knowledge on how Pytables uses each of these >> > parameters, how they relate to the HDF5 'chunking' concept and >> > best-practices. This will help me to understand how to optimize in >> > the future instead of just for this particular dataset. Is there any >> > documentation on best practices for using the 'expectedrows' and >> > 'chunkshape' parameters? >> >> Well, there is: >> >> http://pytables.github.com/usersguide/optimization.html >> >> but I'm sure you already know this. >> >> Frankly, if you want to enhance the speed of column retrieval, you are >> going to need an object that is stored in column-order. In this sense, >> you may want to experiment with the ctable object in carray package >> (https://github.com/FrancescAlted/carray). It supports barely the same >> capabilities than the Table object, but the column-order is implemented >> properly, so probably a ctable will buy you a nice speed-up. >> >> > >> > Thank you for your time, >> >> Hope this helps, >> >> -- >> Francesc Alted >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> > > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://ad.doubleclick.net/clk;258768047;13503038;j? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Josh A. <jos...@gm...> - 2012-09-20 19:26:15
|
Depending on your use case, you may be able to get around this by storing each column in its own table. That will effectively store the data in column-first order. Instead of creating a table, you would create a group, which then contains a separate table for each column. If you want, you can wrap all the functionality you need in a single object that hides the complexity and makes it act just like a single table. I did something similar to this recently and it's worked well. However, I wasn't too concerned with exactly matching the Table API or implementing all of its features. Creating a more general version that does duplicate the Table class interface and can be included in PyTables is definitely possible and is something I'd like to do, but I've never had the necessary time to dedicate to it. Hope that helps, Josh On Wed, Sep 19, 2012 at 10:56 AM, Francesc Alted <fa...@py...>wrote: > On 9/19/12 3:37 PM, Luke Lee wrote: > > Hi all, > > > > I'm attempting to optimize my HDF5/pytables application for reading > > entire columns at a time. I was wondering what the best way to go > > about this is. > > > > My HDF5 has the following properties: > > > > - 400,000+ rows > > - 25 columns > > - 147 MB in total size > > - 1 string column of size 12 > > - 1 column of type 'Float' > > - 23 columns of type 'Float64' > > > > My access pattern for this data is generally to read an entire column > > out at a time. So, I want to minimize the number of disk accesses > > this takes and store data contiguously by column. > > To start with, you must be aware that the Table object stores data in > row-order, not column order. In practice, that means that whenever you > want to access a single column, you will need to traverse the *entire* > table. > > I always wished to implement a column-order table in PyTables, but that > did not happen in the end. > > > > > I think the proper way to do this via HDF5 is to use 'chunking.' I'm > > creating my HDF5 files via Pytables so I guess using the 'chunkshape' > > parameter during creation is the correct way to do this? > > Yes, it is. > > > > > All of the HDF5 documentation I read discusses 'chunksize' in terms of > > rows and columns. However, the Pytables 'chunkshape' parameter only > > takes a single number. I looked through the source and see that I can > > in fact pass a tuple, which I assume is (row, column) as the HDF5 > > documentation would suggest. > > Not quite. The Table object is actually an uni-dimensional beast, but > with a 'compound' datatype (that in some way can be regarded as another > dimension, but it is not a 'true' dimension). > > > > > Is it best to use the 'expectedrows' parameter instead of the > > 'chunkshape' or use both? > > You can try both. The `expectedrows` parameter was introduced to ease > the life of users, and it 'optimizes' the `chunkshape` but for 'normal' > usage. For specific requirements, playing directly with the > `chunkshape` normally gives better results. > > > > > I have done some debugging/profiling and discovered that my default > > chunkshape is 321 for this dataset. I have increased this to 1000 and > > see quite a bit better speeds. I'm sure I could keep changing these > > numbers and find what is best for this particular dataset. However, > > I'm seeking a bit more knowledge on how Pytables uses each of these > > parameters, how they relate to the HDF5 'chunking' concept and > > best-practices. This will help me to understand how to optimize in > > the future instead of just for this particular dataset. Is there any > > documentation on best practices for using the 'expectedrows' and > > 'chunkshape' parameters? > > Well, there is: > > http://pytables.github.com/usersguide/optimization.html > > but I'm sure you already know this. > > Frankly, if you want to enhance the speed of column retrieval, you are > going to need an object that is stored in column-order. In this sense, > you may want to experiment with the ctable object in carray package > (https://github.com/FrancescAlted/carray). It supports barely the same > capabilities than the Table object, but the column-order is implemented > properly, so probably a ctable will buy you a nice speed-up. > > > > > Thank you for your time, > > Hope this helps, > > -- > Francesc Alted > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Francesc A. <fa...@py...> - 2012-09-19 17:55:52
|
On 9/19/12 3:37 PM, Luke Lee wrote: > Hi all, > > I'm attempting to optimize my HDF5/pytables application for reading > entire columns at a time. I was wondering what the best way to go > about this is. > > My HDF5 has the following properties: > > - 400,000+ rows > - 25 columns > - 147 MB in total size > - 1 string column of size 12 > - 1 column of type 'Float' > - 23 columns of type 'Float64' > > My access pattern for this data is generally to read an entire column > out at a time. So, I want to minimize the number of disk accesses > this takes and store data contiguously by column. To start with, you must be aware that the Table object stores data in row-order, not column order. In practice, that means that whenever you want to access a single column, you will need to traverse the *entire* table. I always wished to implement a column-order table in PyTables, but that did not happen in the end. > > I think the proper way to do this via HDF5 is to use 'chunking.' I'm > creating my HDF5 files via Pytables so I guess using the 'chunkshape' > parameter during creation is the correct way to do this? Yes, it is. > > All of the HDF5 documentation I read discusses 'chunksize' in terms of > rows and columns. However, the Pytables 'chunkshape' parameter only > takes a single number. I looked through the source and see that I can > in fact pass a tuple, which I assume is (row, column) as the HDF5 > documentation would suggest. Not quite. The Table object is actually an uni-dimensional beast, but with a 'compound' datatype (that in some way can be regarded as another dimension, but it is not a 'true' dimension). > > Is it best to use the 'expectedrows' parameter instead of the > 'chunkshape' or use both? You can try both. The `expectedrows` parameter was introduced to ease the life of users, and it 'optimizes' the `chunkshape` but for 'normal' usage. For specific requirements, playing directly with the `chunkshape` normally gives better results. > > I have done some debugging/profiling and discovered that my default > chunkshape is 321 for this dataset. I have increased this to 1000 and > see quite a bit better speeds. I'm sure I could keep changing these > numbers and find what is best for this particular dataset. However, > I'm seeking a bit more knowledge on how Pytables uses each of these > parameters, how they relate to the HDF5 'chunking' concept and > best-practices. This will help me to understand how to optimize in > the future instead of just for this particular dataset. Is there any > documentation on best practices for using the 'expectedrows' and > 'chunkshape' parameters? Well, there is: http://pytables.github.com/usersguide/optimization.html but I'm sure you already know this. Frankly, if you want to enhance the speed of column retrieval, you are going to need an object that is stored in column-order. In this sense, you may want to experiment with the ctable object in carray package (https://github.com/FrancescAlted/carray). It supports barely the same capabilities than the Table object, but the column-order is implemented properly, so probably a ctable will buy you a nice speed-up. > > Thank you for your time, Hope this helps, -- Francesc Alted |
From: Luke L. <dur...@gm...> - 2012-09-19 13:38:00
|
Hi all, I'm attempting to optimize my HDF5/pytables application for reading entire columns at a time. I was wondering what the best way to go about this is. My HDF5 has the following properties: - 400,000+ rows - 25 columns - 147 MB in total size - 1 string column of size 12 - 1 column of type 'Float' - 23 columns of type 'Float64' My access pattern for this data is generally to read an entire column out at a time. So, I want to minimize the number of disk accesses this takes and store data contiguously by column. I think the proper way to do this via HDF5 is to use 'chunking.' I'm creating my HDF5 files via Pytables so I guess using the 'chunkshape' parameter during creation is the correct way to do this? All of the HDF5 documentation I read discusses 'chunksize' in terms of rows and columns. However, the Pytables 'chunkshape' parameter only takes a single number. I looked through the source and see that I can in fact pass a tuple, which I assume is (row, column) as the HDF5 documentation would suggest. Is it best to use the 'expectedrows' parameter instead of the 'chunkshape' or use both? I have done some debugging/profiling and discovered that my default chunkshape is 321 for this dataset. I have increased this to 1000 and see quite a bit better speeds. I'm sure I could keep changing these numbers and find what is best for this particular dataset. However, I'm seeking a bit more knowledge on how Pytables uses each of these parameters, how they relate to the HDF5 'chunking' concept and best-practices. This will help me to understand how to optimize in the future instead of just for this particular dataset. Is there any documentation on best practices for using the 'expectedrows' and 'chunkshape' parameters? Thank you for your time, Luke Lee |
From: Anthony S. <sc...@gm...> - 2012-09-16 17:26:46
|
Hello Gelin, Unless you were using the undo / redo mechanism, then I don't think that there is. You'll probably have to fix the file manually using PyTables normally and the provided tools like ptrepack. Be Well Anthony On Sun, Sep 16, 2012 at 12:22 PM, gelin yan <dyn...@gm...> wrote: > Hi All > > I have a question about data corruption. Is it possible to repair data > file when there is a situation like power outage or process crash? I have > poked around the manual; however I did fail to find anything about how to > repair corrupted data if it happened. > > > Thanks > > Regards > > gelin yan > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://ad.doubleclick.net/clk;258768047;13503038;j? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: gelin y. <dyn...@gm...> - 2012-09-16 17:22:13
|
Hi All I have a question about data corruption. Is it possible to repair data file when there is a situation like power outage or process crash? I have poked around the manual; however I did fail to find anything about how to repair corrupted data if it happened. Thanks Regards gelin yan |
From: Anthony S. <sc...@gm...> - 2012-09-16 16:43:56
|
Great! Thanks to you both. On Sun, Sep 16, 2012 at 11:42 AM, Antonio Valentino < ant...@ti...> wrote: > Hi Francesc, > thank you. > Just pushed updates into pytables. > > ciao > > > Il 16/09/2012 12:07, Francesc Alted ha scritto: > > =============================================================== > > Announcing Blosc 1.1.4 > > A blocking, shuffling and lossless compression library > > =============================================================== > > > > What is new? > > ============ > > > > - Redefinition of the BLOSC_MAX_BUFFERSIZE constant as (INT_MAX - > > BLOSC_MAX_OVERHEAD) instead of just INT_MAX. This prevents to > produce > > outputs larger than INT_MAX, which is not supported. > > > > - `exit()` call has been replaced by a ``return -1`` in blosc_compress() > > when checking for buffer sizes. Now programs will not just exit when > > the buffer is too large, but return a negative code. > > > > - Improvements in explicit casts. Blosc compiles without warnings > > (with GCC) now. > > > > - Lots of improvements in docs, in particular a nice ascii-art diagram > > of the Blosc format (Valentin Haenel). > > > > - [HDF5 filter] Adapted HDF5 filter to use HDF5 1.8 by default > > (Antonio Valentino). > > > > For more info, please see the release notes in: > > > > https://github.com/FrancescAlted/blosc/wiki/Release-notes > > > > -- > Antonio Valentino > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://ad.doubleclick.net/clk;258768047;13503038;j? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > |
From: Antonio V. <ant...@ti...> - 2012-09-16 16:42:12
|
Hi Francesc, thank you. Just pushed updates into pytables. ciao Il 16/09/2012 12:07, Francesc Alted ha scritto: > =============================================================== > Announcing Blosc 1.1.4 > A blocking, shuffling and lossless compression library > =============================================================== > > What is new? > ============ > > - Redefinition of the BLOSC_MAX_BUFFERSIZE constant as (INT_MAX - > BLOSC_MAX_OVERHEAD) instead of just INT_MAX. This prevents to produce > outputs larger than INT_MAX, which is not supported. > > - `exit()` call has been replaced by a ``return -1`` in blosc_compress() > when checking for buffer sizes. Now programs will not just exit when > the buffer is too large, but return a negative code. > > - Improvements in explicit casts. Blosc compiles without warnings > (with GCC) now. > > - Lots of improvements in docs, in particular a nice ascii-art diagram > of the Blosc format (Valentin Haenel). > > - [HDF5 filter] Adapted HDF5 filter to use HDF5 1.8 by default > (Antonio Valentino). > > For more info, please see the release notes in: > > https://github.com/FrancescAlted/blosc/wiki/Release-notes > -- Antonio Valentino |
From: Francesc A. <fa...@gm...> - 2012-09-16 10:07:32
|
=============================================================== Announcing Blosc 1.1.4 A blocking, shuffling and lossless compression library =============================================================== What is new? ============ - Redefinition of the BLOSC_MAX_BUFFERSIZE constant as (INT_MAX - BLOSC_MAX_OVERHEAD) instead of just INT_MAX. This prevents to produce outputs larger than INT_MAX, which is not supported. - `exit()` call has been replaced by a ``return -1`` in blosc_compress() when checking for buffer sizes. Now programs will not just exit when the buffer is too large, but return a negative code. - Improvements in explicit casts. Blosc compiles without warnings (with GCC) now. - Lots of improvements in docs, in particular a nice ascii-art diagram of the Blosc format (Valentin Haenel). - [HDF5 filter] Adapted HDF5 filter to use HDF5 1.8 by default (Antonio Valentino). For more info, please see the release notes in: https://github.com/FrancescAlted/blosc/wiki/Release-notes What is it? =========== Blosc (http://blosc.pytables.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor (that I'm aware of) that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate object manipulations that are memory-bound. It also comes with a filter for HDF5 (http://www.hdfgroup.org/HDF5) so that you can easily implement support for Blosc in your favourite HDF5 tool. Download sources ================ Please go to main web site: http://blosc.pytables.org/sources/ or the github repository: https://github.com/FrancescAlted/blosc and download the most recent release from there. Blosc is distributed using the MIT license, see LICENSES/BLOSC.txt for details. Mailing list ============ There is an official Blosc blosc mailing list at: bl...@go... http://groups.google.es/group/blosc ---- **Enjoy data!** -- Francesc Alted |
From: Josh A. <jos...@gm...> - 2012-09-02 18:48:11
|
Jacob, I just put together a small example demonstrating this. You can find it in the develop branch of the PyTables repository. https://github.com/PyTables/PyTables/blob/develop/examples/multiprocess_access_queues.py It's somewhat limited, because all the client processes have to be known at the time the file-accessing process is created. It should be possible to create a more flexible implementation (and probably faster as well) using sockets. I'll put together an example of that at some point. Hope that helps, Josh On Mon, Jul 16, 2012 at 10:03 PM, Anthony Scopatz <sc...@gm...> wrote: > On Mon, Jul 16, 2012 at 3:30 PM, Jacob Bennett <jac...@gm...>wrote: > >> Wait, is there perhaps a way to simulataneously read and write without >> any kind of blocking? Perhaps the "a" mode or the "r+" mode might help for >> simultaneous read/write? I am currently implementing the >> multithreading.Queue, but I think that a large number of query requests >> might put an necessary load on my writing queue since the data comes in >> sooooo fast. ;) > > > Hmm I'll have to look into it, but I vaguely recall a file access mode > that HDF5 has that PyTables doesn't expose... I may be wrong about this.... > > >> Btw, I will submit the example soon. >> > > +1! > > >> >> -Jacob >> >> >> On Sat, Jul 14, 2012 at 1:39 PM, Anthony Scopatz <sc...@gm...>wrote: >> >>> +1 to example of this! >>> >>> >>> On Sat, Jul 14, 2012 at 1:36 PM, Jacob Bennett < >>> jac...@gm...> wrote: >>> >>>> Awesome, I think this sounds like a very workable solution and the idea >>>> is very neat. I will try to implement this right away. I definitely agree >>>> to putting a small example. >>>> >>>> Let you know how this works, thanks guys! >>>> >>>> Thanks, >>>> Jacob >>>> >>>> >>>> On Sat, Jul 14, 2012 at 2:36 AM, Antonio Valentino < >>>> ant...@ti...> wrote: >>>> >>>>> Hi all, >>>>> Il 14/07/2012 00:44, Josh Ayers ha scritto: >>>>> > My first instinct would be to handle all access (read and write) to >>>>> > that file from a single process. You could create two >>>>> > multiprocessing.Queue objects, one for data to write and one for read >>>>> > requests. Then the process would check the queues in a loop and >>>>> > handle each request serially. The data read from the file could be >>>>> > sent back to the originating process using another queue or pipe. >>>>> You >>>>> > should be able to do the same thing with sockets if the other parts >>>>> of >>>>> > your application are in languages other than Python. >>>>> > >>>>> > I do something similar to handle writing to a log file from multiple >>>>> > processes and it works well. In that case the file is write-only - >>>>> > and just a simple text file rather than HDF5 - but I don't see any >>>>> > reason why it wouldn't work for read and write as well. >>>>> > >>>>> > Hope that helps, >>>>> > Josh >>>>> > >>>>> >>>>> I totally agree with Josh. >>>>> >>>>> I don't have a test code to demonstrate it but IMHO parallelizing I/O >>>>> to/from a single file on a single disk do not makes too much sense >>>>> unless you have special HW. Is this your case Jacob? >>>>> >>>>> IMHO with standard SATA devices you could have a marginal speedup (in >>>>> the best case), but if your bottleneck is the I/O this will not solve >>>>> your problem. >>>>> >>>>> If someone finds the time to implement a toy example of what Josh >>>>> suggested we could put it on the cookbook :) >>>>> >>>>> >>>>> regards >>>>> >>>>> > On Fri, Jul 13, 2012 at 12:18 PM, Anthony Scopatz <sc...@gm...> >>>>> wrote: >>>>> >> On Fri, Jul 13, 2012 at 2:09 PM, Jacob Bennett < >>>>> jac...@gm...> >>>>> >> wrote: >>>>> >> >>>>> >> [snip] >>>>> >> >>>>> >>> >>>>> >>> My first implementation was to have a set of current files stay >>>>> in write >>>>> >>> mode and have an overall lock over these files for the current >>>>> day, but >>>>> >>> (stupidly) I forgot that lock instances cannot be shared over >>>>> separate >>>>> >>> processes, only threads. >>>>> >>> >>>>> >>> So could you give me any advice in this situation? I'm sure it has >>>>> come up >>>>> >>> before. ;) >>>>> >> >>>>> >> >>>>> >> Hello All, I previously suggested to Jacob a setup where only one >>>>> proc would >>>>> >> have a write handle and all of the other processes would be in >>>>> read-only >>>>> >> mode. I am not sure that this would work. >>>>> >> >>>>> >> Francesc, Antonio, Josh, etc or anyone else, how would you solve >>>>> this >>>>> >> problem where you may want many processors to query the file, while >>>>> >> something else may be writing to it? I defer to people with more >>>>> >> experience... Thanks for your help! >>>>> >> >>>>> >> Be Well >>>>> >> Anthony >>>>> >> >>>>> >>> >>>>> >>> Thanks, >>>>> >>> Jacob Bennett >>>>> >>> >>>>> >>>>> >>>>> -- >>>>> Antonio Valentino >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Live Security Virtual Conference >>>>> Exclusive live event will cover all the ways today's security and >>>>> threat landscape has changed and how IT managers can respond. >>>>> Discussions >>>>> will include endpoint security, mobile security and the latest in >>>>> malware >>>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>>> _______________________________________________ >>>>> Pytables-users mailing list >>>>> Pyt...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>>> >>>> >>>> >>>> >>>> -- >>>> Jacob Bennett >>>> Massachusetts Institute of Technology >>>> Department of Electrical Engineering and Computer Science >>>> Class of 2014| ben...@mi... >>>> >>>> >>>> >>>> ------------------------------------------------------------------------------ >>>> Live Security Virtual Conference >>>> Exclusive live event will cover all the ways today's security and >>>> threat landscape has changed and how IT managers can respond. >>>> Discussions >>>> will include endpoint security, mobile security and the latest in >>>> malware >>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>>> _______________________________________________ >>>> Pytables-users mailing list >>>> Pyt...@li... >>>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>>> >>>> >>> >>> >>> ------------------------------------------------------------------------------ >>> Live Security Virtual Conference >>> Exclusive live event will cover all the ways today's security and >>> threat landscape has changed and how IT managers can respond. Discussions >>> will include endpoint security, mobile security and the latest in malware >>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >>> _______________________________________________ >>> Pytables-users mailing list >>> Pyt...@li... >>> https://lists.sourceforge.net/lists/listinfo/pytables-users >>> >>> >> >> >> -- >> Jacob Bennett >> Massachusetts Institute of Technology >> Department of Electrical Engineering and Computer Science >> Class of 2014| ben...@mi... >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> Pytables-users mailing list >> Pyt...@li... >> https://lists.sourceforge.net/lists/listinfo/pytables-users >> >> > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Pytables-users mailing list > Pyt...@li... > https://lists.sourceforge.net/lists/listinfo/pytables-users > > |
From: Anthony S. <sc...@gm...> - 2012-08-30 16:41:28
|
Passing the bad news along, in case you hadn't heard. ---------- Forwarded message ---------- From: Fernando Perez <fpe...@gm...> Date: Wed, Aug 29, 2012 at 9:32 PM Subject: A sad day for our community. John Hunter: 1968-2012. To: matplotlib development list <mat...@li...>, Matplotlib Users <mat...@li...>, IPython Development list <ipy...@sc...>, IPython User list < ipy...@sc...>, Discussion of Numerical Python < num...@sc...>, SciPy Developers List <sci...@sc...>, SciPy Users List <sci...@sc...>, num...@go..., py...@go..., scikit-learn-general < sci...@li...>, networkx-discuss < net...@go...>, sage-devel <sag...@go...>, pys...@go..., enthought-dev < ent...@ma...>, yt...@li... Dear friends and colleagues, I am terribly saddened to report that yesterday, August 28 2012 at 10am, John D. Hunter died from complications arising from cancer treatment at the University of Chicago hospital, after a brief but intense battle with this terrible illness. John is survived by his wife Miriam, his three daughters Rahel, Ava and Clara, his sisters Layne and Mary, and his mother Sarah. Note: If you decide not to read any further (I know this is a long message), please go to this page for some important information about how you can thank John for everything he gave in a decade of generous contributions to the Python and scientific communities: http://numfocus.org/johnhunter. Just a few weeks ago, John delivered his keynote address at the SciPy 2012 conference in Austin centered around the evolution of matplotlib: http://www.youtube.com/watch?v=e3lTby5RI54 but tragically, shortly after his return home he was diagnosed with advanced colon cancer. This diagnosis was a terrible discovery to us all, but John took it with his usual combination of calm and resolve, and initiated treatment procedures. Unfortunately, the first round of chemotherapy treatments led to severe complications that sent him to the intensive care unit, and despite the best efforts of the University of Chicago medical center staff, he never fully recovered from these. Yesterday morning, he died peacefully at the hospital with his loved ones at his bedside. John fought with grace and courage, enduring every necessary procedure with a smile on his face and a kind word for all of his caretakers and becoming a loved patient of the many teams that ended up involved with his case. This was no surprise for those of us who knew him, but he clearly left a deep and lasting mark even amongst staff hardened by the rigors of oncology floors and intensive care units. I don't need to explain to this community the impact of John's work, but allow me to briefly recap, in case this is read by some who don't know the whole story. In 2002, John was a postdoc at the University of Chicago hospital working on the analysis of epilepsy seizure data in children. Frustrated with the state of the existing proprietary solutions for this class of problems, he started using Python for his work, back when the scientific Python ecosystem was much, much smaller than it is today and this could have been seen as a crazy risk. Furthermore, he found that there were many half-baked solutions for data visualization in Python at the time, but none that truly met his needs. Undeterred, he went on to create matplotlib (http://matplotlib.org) and thus overcome one of the key obstacles for Python to become the best solution for open source scientific and technical computing. Matplotlib is both an amazing technical achievement and a shining example of open source community building, as John not only created its backbone but also fostered the development of a very strong development team, ensuring that the talent of many others could also contribute to this project. The value and importance of this are now painfully clear: despite having lost John, matplotlib continues to thrive thanks to the leadership of Michael Droetboom, the support of Perry Greenfield at the Hubble Telescope Science Institute, and the daily work of the rest of the team. I want to thank Perry and Michael for putting their resources and talent once more behind matplotlib, securing the future of the project. It is difficult to overstate the value and importance of matplotlib, and therefore of John's contributions (which do not end in matplotlib, by the way; but a biography will have to wait for another day...). Python has become a major force in the technical and scientific computing world, leading the open source offers and challenging expensive proprietary platforms with large teams and millions of dollars of resources behind them. But this would be impossible without a solid data visualization tool that would allow both ad-hoc data exploration and the production of complex, fine-tuned figures for papers, reports or websites. John had the vision to make matplotlib easy to use, but powerful and flexible enough to work in graphical user interfaces and as a server-side library, enabling a myriad use cases beyond his personal needs. This means that now, matplotlib powers everything from plots in dissertations and journal articles to custom data analysis projects and websites. And despite having left his academic career a few years ago for a job in industry, he remained engaged enough that as of today, he is still the top committer to matplotlib; this is the git shortlog of those with more than 1000 commits to the project: 2145 John Hunter <jd...@gm...> 2130 Michael Droettboom <md...@gm...> 1060 Eric Firing <ef...@ha...> All of this was done by a man who had three children to raise and who still always found the time to help those on the mailing lists, solve difficult technical problems in matplotlib, teach courses and seminars about scientific Python, and more recently help create the NumFOCUS foundation project. Despite the challenges that raising three children in an expensive city like Chicago presented, he never once wavered from his commitment to open source. But unfortunately now he is not here anymore to continue providing for their well-being, and I hope that all those who have so far benefited from his generosity, will thank this wonderful man who always gave far more than he received. Thanks to the rapid action of Travis Oliphant, the NumFOCUS foundation is now acting as an escrow agent to accept donations that will go into a fund to support the education and care of his wonderful girls Rahel, Ava and Clara. If you have benefited from John's many contributions, please say thanks in the way that would matter most to him, by helping Miriam continue the task of caring for and educating Rahel, Ava and Clara. You will find all the information necessary to make a donation here: http://numfocus.org/johnhunter Remember that even a small donation helps! If all those who ever use matplotlib give just a little bit, in the long run I am sure that we can make a difference. If you are a company that benefits in a serious way from matplotlib, remember that John was a staunch advocate of keeping all scientific Python projects under the BSD license so that commercial users could benefit from them without worry. Please say thanks to John in a way commensurate with your resources (and check how much a yearly matlab license would cost you in case you have any doubts about the value you are getting...). John's family is planning a private burial in Tennessee, but (most likely in September) there will also be a memorial service in Chicago that friends and members of the community can attend. We don't have the final scheduling details at this point, but I will post them once we know. I would like to again express my gratitude to Travis Oliphant for moving quickly with the setup of the donation support, and to Eric Jones (the founder of Enthought and another one of the central figures in our community) who immediately upon learning of John's plight contributed resources to support the family with everyday logistics while John was facing treatment as well as my travel to Chicago to assist. This kind of immediate urge to come to the help of others that Eric and Travis displayed is a hallmark of our community. Before closing, I want to take a moment to publicly thank the incredible staff of the University of Chicago medical center. The last two weeks were an intense and brutal ordeal for John and his loved ones, but the hospital staff offered a sometimes hard to believe, unending supply of generosity, care and humanity in addition to their technical competence. The latter is something we expect from a first-rate hospital at a top university, where the attending physicians can be world-renowned specialists in their field. But the former is often forgotten in a world often ruled by a combination of science and concerns about regulations and liability. Instead, we found generous and tireless staff who did everything in their power to ease the pain, always putting our well being ahead of any mindless adherence to protocol, patiently tending to every need we had and working far beyond their stated responsibilities to support us. To name only one person (and many others are equally deserving), I want to thank Dr. Carla Moreira, chief surgical resident, who spent the last few hours of John's life with us despite having just completed a solid night shift of surgical work. Instead of resting she came to the ICU and worked to ensure that those last hours were as comfortable as possible for John; her generous actions helped us through a very difficult moment. It is now time to close this already too long message... John, thanks for everything you gave all of us, and for the privilege of knowing you. Fernando. ps - I have sent this with my 'mailing lists' email. If you need to contact me directly for anything regarding the above, please write to my regular address at Fer...@be..., where I do my best to reply more promptly. |
From: Stuart M. <Stu...@ob...> - 2012-08-27 18:53:33
|
On 8/27/2012 2:45 PM, Christoph Gohlke wrote: > On 8/27/2012 10:58 AM, Stuart Mentzer wrote: >> On 8/27/2012 1:22 PM, Christoph Gohlke wrote: >>> On 8/27/2012 9:42 AM, Antonio Valentino wrote: >>>> Hi Stuart, >>>> >>>> Il 27/08/2012 17:43, Stuart Mentzer ha scritto: >>>>> Hello, >>>>> >>>>> I upgraded to PyTables 2.4.0 and I was "freezing" an application on Windows with PyInstaller. The frozen app fails at this new find_library call in __init__.py: >>>>> >>>>> if not ctypes.util.find_library('hdf5dll.dll'): >>>>> raise ImportError('Could not load "hdf5dll.dll", please ensure' + >>>>> ' that it can be found in the system path') >>>>> >>>>> PyInstaller correctly places this DLL in the same directory as the application .exe where standard Windows DLL search logic will find it. Apparently the find_library doesn't do that in a frozen application. That is a big problem. I had to comment this code out to get a working frozen app. >>>>> >>>>> That code was added in revision e9f6919. >>>>> >>>> It is mainly a sanity check added under request of one of our users: >>>> https://github.com/PyTables/PyTables/pull/146 >>>> >>>> >>>>> This is on Windows 7 64-bit with a 32-bit Python toolchain. Trying both PyInstaller 1.5.1 and 2.0. >>>>> >>>>> Should I file a bug report? Any easy work-around? >>>>> >>>>> Thanks, >>>>> Stuart >>>>> >>>> Yes please file a pull request with your patch. >>>> It would be nice to preserve the sanity check in standard case so, >>>> maybe, a good solution could be adding some check on sys.frozen or >>>> something like that. >>>> >>>> Thank you >>>> >>> Hello, >>> >>> As a workaround for frozen distributions, try to add the sys.executable >>> directory to os.environ['PATH'] before importing tables. >>> >>> Ctypes only tries to find a library in the os.environ['PATH'] >>> directories, not the current directory or the sys.executable directory >>> as one could expect. >>> http://hg.python.org/cpython/file/64640a02b0ca/Lib/ctypes/util.py#l48 >>> >>> As a workaround, for distributions that place the HDF5 and other DLLs in >>> the tables package directory, tables.__init__.py adds the tables package >>> directory to os.environ['PATH']. This also makes sure that the DLLs are >>> found when loading the hdf5Extension.pyd and other C extension modules >>> (another common problem). The use of __file__ to get the tables >>> directory should better be wrapped in a try..except statement. >>> https://github.com/PyTables/PyTables/blob/develop/tables/__init__.py#L24 >>> >>> Christoph >> Hi Christoph, >> >> Thanks for the info/suggestions. It might be nice to add your comments to the >> Issue #177 I created. >> >> I was aware that altering the PATH is a work-around. Patching PyTables is cleaner >> and seems like the proper fix, and I think we agree that there is a problem here >> that should be addressed. Maybe the PyTables test suite should even include a >> frozen application test. >> >> Thanks, >> Stuart >> > Hi Stuart, > > I'll try to work on a patch tonight. It's probably better to use Ctypes > LoadLibrary instead of find_library because that makes sure all HDF5 > dependencies are found, it (supposedly) searches the Windows DLL search > path not just os.environ['PATH'], and (supposedly) takes into account > libraries already loaded into the process. > > Christoph Great. I'll be happy to test a patch in the frozen context. Stuart |