You can subscribe to this list here.
2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(5) |
Dec
|
---|---|---|---|---|---|---|---|---|---|---|---|---|
2003 |
Jan
|
Feb
(2) |
Mar
|
Apr
(5) |
May
(11) |
Jun
(7) |
Jul
(18) |
Aug
(5) |
Sep
(15) |
Oct
(4) |
Nov
(1) |
Dec
(4) |
2004 |
Jan
(5) |
Feb
(2) |
Mar
(5) |
Apr
(8) |
May
(8) |
Jun
(10) |
Jul
(4) |
Aug
(4) |
Sep
(20) |
Oct
(11) |
Nov
(31) |
Dec
(41) |
2005 |
Jan
(79) |
Feb
(22) |
Mar
(14) |
Apr
(17) |
May
(35) |
Jun
(24) |
Jul
(26) |
Aug
(9) |
Sep
(57) |
Oct
(64) |
Nov
(25) |
Dec
(37) |
2006 |
Jan
(76) |
Feb
(24) |
Mar
(79) |
Apr
(44) |
May
(33) |
Jun
(12) |
Jul
(15) |
Aug
(40) |
Sep
(17) |
Oct
(21) |
Nov
(46) |
Dec
(23) |
2007 |
Jan
(18) |
Feb
(25) |
Mar
(41) |
Apr
(66) |
May
(18) |
Jun
(29) |
Jul
(40) |
Aug
(32) |
Sep
(34) |
Oct
(17) |
Nov
(46) |
Dec
(17) |
2008 |
Jan
(17) |
Feb
(42) |
Mar
(23) |
Apr
(11) |
May
(65) |
Jun
(28) |
Jul
(28) |
Aug
(16) |
Sep
(24) |
Oct
(33) |
Nov
(16) |
Dec
(5) |
2009 |
Jan
(19) |
Feb
(25) |
Mar
(11) |
Apr
(32) |
May
(62) |
Jun
(28) |
Jul
(61) |
Aug
(20) |
Sep
(61) |
Oct
(11) |
Nov
(14) |
Dec
(53) |
2010 |
Jan
(17) |
Feb
(31) |
Mar
(39) |
Apr
(43) |
May
(49) |
Jun
(47) |
Jul
(35) |
Aug
(58) |
Sep
(55) |
Oct
(91) |
Nov
(77) |
Dec
(63) |
2011 |
Jan
(50) |
Feb
(30) |
Mar
(67) |
Apr
(31) |
May
(17) |
Jun
(83) |
Jul
(17) |
Aug
(33) |
Sep
(35) |
Oct
(19) |
Nov
(29) |
Dec
(26) |
2012 |
Jan
(53) |
Feb
(22) |
Mar
(118) |
Apr
(45) |
May
(28) |
Jun
(71) |
Jul
(87) |
Aug
(55) |
Sep
(30) |
Oct
(73) |
Nov
(41) |
Dec
(28) |
2013 |
Jan
(19) |
Feb
(30) |
Mar
(14) |
Apr
(63) |
May
(20) |
Jun
(59) |
Jul
(40) |
Aug
(33) |
Sep
(1) |
Oct
|
Nov
|
Dec
|
From: Francesc A. <fa...@ca...> - 2005-01-25 16:40:23
|
Hi, A Dissabte 22 Gener 2005 14:06, Norbert Nemec va escriure: > Am Donnerstag, 20. Januar 2005 21:07 schrieb Andreu Alted Abad: > > And finally I would like to mention an example that was used many times > > along this thread: > > > > mygroup.mynode=3D Node(...) > > > > Some time ago I was discussing with Francesc about the correctness of > > using node constructors in the client code. My perception was that this > > notation only could bring problems since the instance created with the > > constructor is not bounded to any pytables file, and more importantly, = it > > has no utility except be assigned to a group after his creation; any > > other uses would end with an error. > > You call this thread "Unbound node" which already shows one possible > solution: Make unbound nodes an offical part of PyTables. A node can be > created, and is unbound at first, but it already contains all the necessa= ry > data to create a physical node in a file once it is assigned to a group > member. We have given to this quite a deal of consideration. While we agree that allowing: mygroup.mynode=3D Node(...) #this can be done now, but not a good practice = IMO! or mygroup['mynode'] =3D Node(...) can easy the work to the programmer (or at least, to some programmers), the fact that a Node() in itself has not a real utility (or at least, not the utility it was primarily designed for), may lead to innecessary confusions for users. So, my opinion is that this should not be regarded (and hence, not documented) as a feature. We have even considered the possibility to add a new parameter to the node constructor, for specifying the path of where the object should be created, something like: Table(where=3Dmygroup, name=3D'mynode', ...) but, again, I find this a bit confusing (and would require some work both in PyTables and CSTables). I very much prefer creating the node through the use of factories than using class constructors. Maybe I'm biased because of my habits, but this is how I feel like. However, we plan to add a few methods to Group in order to easy the creation of nodes. Most probably, we will implement: Group._f_createGroup, Group._f_create*Array and Group._f_createTable that will do the similar thing than their counterparts in File. Besides, we plan to add support for Group.__getitem__('nodename') and Group.__delitem__('nodename'). With that, we think that node creation and referencing would result somewhat improved. > The simple option would then be that - as you say it - writing or reading > unbound nodes is prohibited and raises an exception. Maybe, though, it > would even prove useful to consider unbound nodes just like regular nodes > that are simply located in memory and on disk. I don't know the > implications of this, but maybe, temporary tables in memory even have some > practical use - be it only to defer physical writing to some defined point > in the future. Temporary tables (or arrays) in memory would be nice, and of course, is an option for the future. And their single existence would justify the mygroup.mynode =3D Table() notation. However, this would imply quite a lot = of work, and this is not a priority for us right now. Cheers, =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: Norbert N. <Nor...@gm...> - 2005-01-25 14:52:07
|
Hi there, i would have assumed that the following works: ------------------------- #!/usr/bin/env python from tables import * file = openFile('tryout.h5','w') desc = {} desc['x'] = FloatCol() file.root.mytable = Table(desc) mytable = file.root.mytable row = mytable.row for i in range(3): row['x'] = i row.append() mytable.flush() for row in mytable.iterrows(): row['x'] = row['x'] + 0.5 mytable.flush() for row in mytable.iterrows(): print row['x'] file.close() ------------------------- but obviously, the writing is completely ignored. Is this intuitive approach wrong? Is it just not implemented yet? Is there some other way the same thing could be done? Ciao, Norbert -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: <No...@Ne...> |
From: Norbert N. <Nor...@gm...> - 2005-01-24 19:51:04
|
Hi there, has anybody considered merging ideas from netcdf-4 into pytables? http://my.unidata.ucar.edu/content/software/netcdf/netcdf-4/index.html I have used NetCDF in the past to store my data. Now since I use HDF5, I badly miss the concept of 'dimensions' from NetCDF. In short pseudonotation: I want to store the result of a function f(x,y,time) as well as another (time independent) function g(x,y) with x in range(0.0,2.0,0.1) y in range(-1.0,1.0,0.1) time in range(0.0,500.0,10.0) NetCDF allows me to first define the dimensions 'x','y' and 'time' and then write datasets (arrays) that extend in a certain subset of these dimensions. Now, NetCDF 4 is a project to implement NetCDF datastructures on top of HDF5, meaning that the metadata needed to store such dependencies is somehow put into existing HDF5 metadata. In pytables, I can certainly think up my own convention how to store such data conveniently, but why should everybody have to reinvent the wheel? Offering and supporting conventions for this common case of use might certainly be an idea for PyTables. Greetings, Norbert Nemec -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: <No...@Ne...> |
From: Norbert N. <Nor...@gm...> - 2005-01-24 19:01:56
|
Hi there, I noticed that ptdump shows much redundant information. The attached patch changes that by * not displaying system attributes in AttributeSet.__repr__ * sticking with group.__str__ instead of group.__repr__ even for verbose mode (the latter only displays a list of children which is dumped lateron anyway) There probably still is some room for optimization... Ciao, Nobbi -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: <No...@Ne...> |
From: Norbert N. <Nor...@gm...> - 2005-01-24 14:43:37
|
Am Montag, 24. Januar 2005 10:12 schrieb Ivan Vilata i Balaguer: > Hi all! There has been quite a lot of discussion on the topics of > natural naming, __getattr__ vs. __getitem__ and unbound nodes on the > last week. Since verbal descriptions seem to be incomplete most of the > time, or simply misleading, I've decided to make a sample implementation > as a proof of concept for those topics. Very nice! Looks fine to me. Now one remaining question that could be reconsidered: What about HDF5-Attributes? The minimum would be to unify mygroup._v_attrs and myleaf.attrs to _v_attrs in both cases (old version being deprecated but kept for a while for compatibility) and to add __{get|set|del}item__ to AttributeSet. Any further ideas? -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: <No...@Ne...> |
From: Francesc A. <fa...@ca...> - 2005-01-24 09:27:46
|
A Dilluns 24 Gener 2005 09:33, kevin lester va escriure: > I thought I'd provide this too. > Three things are different... 150000nrows; I changed > the query from where(r['H1']<3) to where(r['H1']>3). > Notice the radical change in the indexed query. I also > included the numarray ""where"" > function to compare by H =3D table['H1']; print > where(H>3). The script is also below. Yes, that has been somewhat explained in my earlier message. Hopefully, the next query optimizer will take care of that. =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: Francesc A. <fa...@ca...> - 2005-01-24 09:25:02
|
A Dilluns 24 Gener 2005 08:29, kevin lester va escriure: > Hi Francesc, > > My apologies for not reciprocating to you a quick > response. No problem > The only radical difference I noticed was the page > file usage as noted. The CPU ran between 50-54% > consistently. Every other calib. on the Win task > manager stayed approximately equal for each test. I do Sorry, page file usage means more access to disk? > run into intermittent crashes (w/pytables) every once > in a while, especially after a fresh reboot of Win. Perhaps this is consequence of the new procedure for generating extensions for Python 2.4 in Win. Would you be able to run these tests on Python 2.3 and check if the crashes still happen? > Finally, notice that the UserWarning message is always > being displayed even though all comp. package dll's > are in order. Uh, I've no idea of what can be happening here. Regarding your figures, I can notice (with some surprise) that the speed-up for bigger Tables is not as good as desired. I'm afraid that this is a consequence of having a larger list of rows that pass the selections. The current indexing algorithm implemented in PyTables 0.9.1 is quite slow when the number of rows that pass the cuts is relatively large. However, it works much better when this number is slow compared with the total number of rows in Table. In the future, we plan to improve this by looking at the number of rows that passes the cuts, and if that number is bigger than a certain percentage of the total number of rows (although other facts maybe considered as well), then dropping to a inkernel search. In fact, this will be part of a query optimizer that we are working on, and that will be part of the future PyTables Pro. Thanks for the more exhaustive benchmarks anyway. =2D-=20 >qo< Francesc Altet =A0 =A0 http://www.carabos.com/ V =A0V C=E1rabos Coop. V. =A0=A0Enjoy Data "" |
From: Ivan V. i B. <iv...@ca...> - 2005-01-24 09:12:53
|
Hi all! There has been quite a lot of discussion on the topics of natural naming, __getattr__ vs. __getitem__ and unbound nodes on the last week. Since verbal descriptions seem to be incomplete most of the time, or simply misleading, I've decided to make a sample implementation as a proof of concept for those topics. Basically, the attached code does: * define a new Node class from which Group and Leaf inherit. * allow nodes to be created unbound and then be bound to a group by simple assigning. Nodes are not allowed to handle data while unbound, but a partial hierarchy can be built using unbound nodes (this last feature might be more difficult to implement in PyTables). * allow quasi-arbitrary node names in a group. They can always be accessed using __{get,set,del}item__, also using __{get,set,del}attr__ if they match natural naming conventions. * put every group child name in its __dict__, to allow the user to see all possibilities on completion (although some of them may actually be not accessed using dot notation). * only allow Node objects in a Group. Please try to read the whole code. It is quite straightforward and simple. Don't be frightened by its length, there's a lot of blank lines in there. ;) Hope this helps to clarify some things. Of course, suggestions are accepted. Regards, Ivan import disclaimer -- Ivan Vilata i Balaguer >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data "" |
From: kevin l. <lke...@ya...> - 2005-01-24 08:33:15
|
I thought I'd provide this too. Three things are different... 150000nrows; I changed the query from where(r['H1']<3) to where(r['H1']>3). Notice the radical change in the indexed query. I also included the numarray ""where"" function to compare by H = table['H1']; print where(H>3). The script is also below. Nrows----> 150000*20cols Filters----> 'None' Time for standard query--> 0.18799996376 Time for inkernel query--> 0.139999866486 Time for indexed query--> 2.14100003242 Time for numarray--> 0.0310001373291 Nrows----> 150000*20cols Filters----> 'None' Time for standard query--> 0.18799996376 Time for inkernel query--> 0.155999898911 Time for indexed query--> 2.17200016975 Time for numarray--> 0.0309998989105 Nrows----> 150000*20cols Filters----> 'ZLIB' Time for standard query--> 0.25 Time for inkernel query--> 0.219000101089 Time for indexed query--> 2.45300006866 Time for numarray--> 0.0929999351501 Nrows----> 150000*20cols Filters----> 'LZO' Time for standard query--> 0.203999996185 Time for inkernel query--> 0.18700003624 Time for indexed query--> 2.28099989891 Time for numarray--> 0.0629999637604 Nrows----> 150000*20cols Filters----> 'UCL' Time for standard query--> 0.234999895096 Time for inkernel query--> 0.203000068665 Time for indexed query--> 2.32799983025 Time for numarray--> 0.0620000362396 Running 'C:\H5\klester.py' ... C:\Python24\lib\site-packages\tables\Leaf.py:90: UserWarning: zlib compression library is not available. Using zlib instead!. warnings.warn( \ Nrows----> 150000*20cols Filters----> 'None' Time for standard query--> 0.18799996376 Time for inkernel query--> 0.156000137329 Time for indexed query--> 2.17199993134 from tables import * from numarray import * class ES(IsDescription): H1 = UInt8Col(indexed=1) H2 = UInt8Col(indexed=0) M1 = Int8Col(indexed=1) S1 = Int8Col(indexed=1) BS = Int16Col(indexed=1) BP = Float32Col(indexed=1) AP = Float32Col(indexed=1) AS = Int16Col(indexed=1) L1 = Float32Col(indexed=1) V1 = Int16Col(indexed=1) A1 = Int8Col(indexed=1) UD = BoolCol(indexed=1) TV = Int64Col(indexed=1) DT = Float32Col(indexed=1) HI = Float32Col(indexed=1) LO = Float32Col(indexed=1) def create(): nrows = 150000 dat = arange(nrows*20, shape=(nrows,20), type=UInt8) date = 'd01_04_05' idxdate = 'd050104' length = (len(dat))+20 filt = Filters(complevel=1,complib='ucl',shuffle=1,fletcher32=0) file = openFile("/H5/ES_DATA139.h5",mode="w",title="ES_DATA_FILE",filters=None) root = file.root group1 = file.createGroup("/", 'd050104', 'd01_04_05') table1 = file.createTable(group1,'raw',ES,"RAW", expectedrows=length) # ATTRIBUTES--------- g1 = file.root.d050104 g1._v_attrs.date = date g1._v_attrs.idxdate = idxdate t1 = file.root.d050104.raw t1.attrs.date = date t1.attrs.idxdate = idxdate eS = table1.row for i in xrange(len(dat)): eS['H1'] = dat[i][13] eS['H2'] = dat[i][13] eS['M1'] = dat[i][14] eS['S1'] = int(dat[i][15]) eS['BS'] = dat[i][0] eS['BP'] = dat[i][1] eS['AP'] = dat[i][2] eS['AS'] = dat[i][3] eS['L1'] = dat[i][4] eS['V1'] = dat[i][5] eS['UD'] = dat[i][12] eS['A1'] = dat[i][11] eS['TV'] = dat[i][8] eS['DT'] = dat[i][16] eS['HI'] = dat[i][6] eS['LO'] = dat[i][7] eS.append() table1.flush() file.close() def select(): from time import time print 'Nrows----> 150000*20cols' print "Filters----> 'None'" file = openFile("/H5/ES_DATA139.h5") table = file.root.d050104.raw t1=time() results = [r["H1"] for r in table if r['H1']>3] print "Time for standard query-->", time()-t1 t1=time() results = [r["H2"] for r in table.where(table.cols.H2>3)] print "Time for inkernel query-->", time()-t1 t1=time() results = [r["H1"] for r in table.where(table.cols.H1>3)] print "Time for indexed query-->", time()-t1 t1 = time() h = table['H1'] results = where(h>3)[0] print 'Time for numarray-->', time()-t1 file.close() if __name__ == '__main__': create() select() __________________________________ Do you Yahoo!? Yahoo! Mail - Easier than ever with enhanced search. Learn more. http://info.mail.yahoo.com/mail_250 |
From: kevin l. <lke...@ya...> - 2005-01-24 07:32:00
|
--- kevin lester <lke...@ya...> wrote: OOps, Python 2.4 is the version > Hi Francesc, > > My apologies for not reciprocating to you a quick > response. > > > ...where the speed-up is clearly inferior. Just > out > > >of > > curiosity, which processor, speed, and python > version >are you using? > > I have a Pent4;WinXP;1G_DDR2;2.6GHz;800FSB. I > provided > a couple of tests below. These were only for queries > though not 'file writes', and only a single run for > each. > > The only radical difference I noticed was the page > file usage as noted. The CPU ran between 50-54% > consistently. Every other calib. on the Win task > manager stayed approximately equal for each test. I > do > run into intermittent crashes (w/pytables) every > once > in a while, especially after a fresh reboot of Win. > > Finally, notice that the UserWarning message is > always > being displayed even though all comp. package dll's > are in order. > > Thank you. > > Running 'C:\H5\klester.py' ... > C:\Python24\lib\site-packages\tables\Leaf.py:90: > UserWarning: zlib compression library is not > available. Using zlib instead!. > warnings.warn( \ > > > File Size: 6.39M > Nrows----> 100000*20cols > Filters----> None > Time for standard query--> 0.0620000362396 > Time for inkernel query--> 0.0469999313354 > Time for indexed query--> 0.0160000324249 > > File Size: 2.07M > Nrows----> 100000*20cols > Filters----> 'ZLIB' > Time for standard query--> 0.139999866486 > Time for inkernel query--> 0.0940001010895 > Time for indexed query--> 0.0629999637604 > > File Size: 2.05M > Nrows----> 100000*20cols > Filters----> 'LZO' > Time for standard query--> 0.0929999351501 > Time for inkernel query--> 0.0629999637604 > Time for indexed query--> 0.0469999313354 > > File Size: 2.04M > Nrows----> 100000*20cols > Filters----> 'UCL' > Time for standard query--> 0.0940001010895 > Time for inkernel query--> 0.0780000686646 > Time for indexed query--> 0.0469999313354 > > File Size: 31.1M > Nrows----> 500000*20cols > Filters----> 'None' > Time for standard query--> 0.358999967575 > Time for inkernel query--> 0.171999931335 > Time for indexed query--> 0.125 > > File Size: 10.1M > Nrows----> 500000*20cols > Filters----> 'ZLIB' > Time for standard query--> 0.516000032425 > Time for inkernel query--> 0.375 > Time for indexed query--> 0.31299996376 > > File Size: 10.0M > Nrows----> 500000*20cols > Filters----> 'LZO' > Time for standard query--> 0.43799996376 > Time for inkernel query--> 0.280999898911 > Time for indexed query--> 0.219000101089 > ***Notes: Page file usage more than doubled > > File Size: 10.0M > Nrows----> 500000*20cols > Filters----> 'UCL' > Time for standard query--> 0.484999895096 > Time for inkernel query--> 0.344000101089 > Time for indexed query--> 0.25 > ***Notes Page File usage nearly doubled > > > > __________________________________ > Do you Yahoo!? > The all-new My Yahoo! - What will yours do? > http://my.yahoo.com > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com |
From: kevin l. <lke...@ya...> - 2005-01-24 07:29:54
|
Hi Francesc, My apologies for not reciprocating to you a quick response. > ...where the speed-up is clearly inferior. Just out > >of > curiosity, which processor, speed, and python version >are you using? I have a Pent4;WinXP;1G_DDR2;2.6GHz;800FSB. I provided a couple of tests below. These were only for queries though not 'file writes', and only a single run for each. The only radical difference I noticed was the page file usage as noted. The CPU ran between 50-54% consistently. Every other calib. on the Win task manager stayed approximately equal for each test. I do run into intermittent crashes (w/pytables) every once in a while, especially after a fresh reboot of Win. Finally, notice that the UserWarning message is always being displayed even though all comp. package dll's are in order. Thank you. Running 'C:\H5\klester.py' ... C:\Python24\lib\site-packages\tables\Leaf.py:90: UserWarning: zlib compression library is not available. Using zlib instead!. warnings.warn( \ File Size: 6.39M Nrows----> 100000*20cols Filters----> None Time for standard query--> 0.0620000362396 Time for inkernel query--> 0.0469999313354 Time for indexed query--> 0.0160000324249 File Size: 2.07M Nrows----> 100000*20cols Filters----> 'ZLIB' Time for standard query--> 0.139999866486 Time for inkernel query--> 0.0940001010895 Time for indexed query--> 0.0629999637604 File Size: 2.05M Nrows----> 100000*20cols Filters----> 'LZO' Time for standard query--> 0.0929999351501 Time for inkernel query--> 0.0629999637604 Time for indexed query--> 0.0469999313354 File Size: 2.04M Nrows----> 100000*20cols Filters----> 'UCL' Time for standard query--> 0.0940001010895 Time for inkernel query--> 0.0780000686646 Time for indexed query--> 0.0469999313354 File Size: 31.1M Nrows----> 500000*20cols Filters----> 'None' Time for standard query--> 0.358999967575 Time for inkernel query--> 0.171999931335 Time for indexed query--> 0.125 File Size: 10.1M Nrows----> 500000*20cols Filters----> 'ZLIB' Time for standard query--> 0.516000032425 Time for inkernel query--> 0.375 Time for indexed query--> 0.31299996376 File Size: 10.0M Nrows----> 500000*20cols Filters----> 'LZO' Time for standard query--> 0.43799996376 Time for inkernel query--> 0.280999898911 Time for indexed query--> 0.219000101089 ***Notes: Page file usage more than doubled File Size: 10.0M Nrows----> 500000*20cols Filters----> 'UCL' Time for standard query--> 0.484999895096 Time for inkernel query--> 0.344000101089 Time for indexed query--> 0.25 ***Notes Page File usage nearly doubled __________________________________ Do you Yahoo!? The all-new My Yahoo! - What will yours do? http://my.yahoo.com |
From: russ <ru...@ag...> - 2005-01-23 05:38:34
|
>> Actually I'd disagree with that. I think the beauty of allowing any >> arbitrary string in the __getitem__ is that anything is allowed; there >> are no special cases for the programmer to have to worry about. It is >> very simple to use. It doesn't make sense to me to provide a special >> interpretation for the "/" character purely in order to save typing a >> few characters in an interactive session. >'/' is prohibited by the underlying HDF5 as part of an identifier. It is >therefore not an option to allow it in a PyTables identifer. The only >question is whether it should raise an exception or have the special meaning. >Personally, I have no strong opinion on this question. Well, in that case maybe it is best to allow the special "/" interpretation. Cheers, -Russ |
From: Norbert N. <Nor...@gm...> - 2005-01-22 13:06:27
|
Am Donnerstag, 20. Januar 2005 21:07 schrieb Andreu Alted Abad: > And finally I would like to mention an example that was used many times > along this thread: > > mygroup.mynode= Node(...) > > Some time ago I was discussing with Francesc about the correctness of using > node constructors in the client code. My perception was that this notation > only could bring problems since the instance created with the constructor > is not bounded to any pytables file, and more importantly, it has no > utility except be assigned to a group after his creation; any other uses > would end with an error. You call this thread "Unbound node" which already shows one possible solution: Make unbound nodes an offical part of PyTables. A node can be created, and is unbound at first, but it already contains all the necessary data to create a physical node in a file once it is assigned to a group member. The simple option would then be that - as you say it - writing or reading unbound nodes is prohibited and raises an exception. Maybe, though, it would even prove useful to consider unbound nodes just like regular nodes that are simply located in memory and on disk. I don't know the implications of this, but maybe, temporary tables in memory even have some practical use - be it only to defer physical writing to some defined point in the future. The nice thing about the above syntax is, that it allows natural naming throughout the lifetime of an item. Having to switch between 'stringified' naming and natural naming for the same item just looks ugly in my eyes. The above syntax makes clear that the created object has just one name: mygroup.mynode used for creating handling and deletion. Any aliases made for this name are explicit and easy to understand. -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: <No...@Ne...> |
From: Norbert N. <Nor...@gm...> - 2005-01-22 12:53:00
|
Am Freitag, 21. Januar 2005 08:00 schrieb russ: > > it might make sense to do '/' parsing in __getitem__, such > > that something["abc/djk"] is internally read equivalently > > to something["abc"]["djk"] > > Actually I'd disagree with that. I think the beauty of allowing any > arbitrary string in the __getitem__ is that anything is allowed; there > are no special cases for the programmer to have to worry about. It is > very simple to use. It doesn't make sense to me to provide a special > interpretation for the "/" character purely in order to save typing a > few characters in an interactive session. '/' is prohibited by the underlying HDF5 as part of an identifier. It is therefore not an option to allow it in a PyTables identifer. The only question is whether it should raise an exception or have the special meaning. Personally, I have no strong opinion on this question. -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: <No...@Ne...> |
From: Norbert N. <Nor...@gm...> - 2005-01-22 12:46:33
|
Am Donnerstag, 20. Januar 2005 13:35 schrieb Francesc Altet: > Yeah, but I have to give to that a more complete thought before stepping > ahead and implementing that. There are some issues with CSTables (the > forthcoming client-server version of PyTables) that would have problems > using: > > group["name"] = Array() > > or even > > group.name = Array() > > This last sentence is perfectly possible now, but may be deprecated in the > future. However, as I said before, this needs more thought on our part. Well, I hope you'll find a way to allow it. In any case, adding children should be a method of the group, not of the file. As I see it, creating, moving and changing groups, items and subtrees should happen completely transparent to the user, no matter whether they are in the same file or in different files. File should offer root as an entry point, but the tree should just span from there transparently. And one you are at that point the detailed method of adding members should be mostly cosmetics, so the above mentioned syntax should be rather simple to implement. -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: <No...@Ne...> |
From: Andreu A. A. <aa...@ar...> - 2005-01-21 16:46:25
|
> De: pyt...@li... > [mailto:pyt...@li...] En nombre > de Ivan Vilata i Balaguer > Enviado el: viernes, 21 de enero de 2005 10:10 > __getattr__ vs. __getitem__ (Natural naming)] > > On Thu, Jan 20, 2005 at 09:07:16PM +0100, Andreu Alted Abad wrote: > [...] > > And finally I would like to mention an example that was used many > > times along this thread: > > > > mygroup.mynode= Node(...) > > > > Some time ago I was discussing with Francesc about the > correctness of > > using node constructors in the client code. My perception was that > > this notation only could bring problems since the instance created > > with the constructor is not bounded to any pytables file, and more > > importantly, it has no utility except be assigned to a > group after his > > creation; any other uses would end with an error. > [...] > > I don't see much technical difficulty in this change: > File.create* methods basically do not do anything beyond > getting the destination group, creating the object and > assigning it using setattr. With the new model, some very > simple checking should be added to leafs and groups to check > if they are bound before doing some operations (including > assigning them into a Group, which is already done by using a > mutual call pattern). And the result of this checking what will be...? a=Array([1,2]) a.read() 1) Exception: "read is not supported in unbound nodes" 2) None The problem is why to maintain the concept of unbound nodes: Unbound nodes are not supported in pytables, but is supported the creation of unbound nodes. The following sequence of instructions results in a exception: a=Array([1,2]) a.read() Or any other methods you call. There is a mechanism to convert unbound nodes in bound nodes: mygroup.mynode=a And after that no errors will be produced, yes. But, why to keep the concept of unbound nodes? I am thinking in very useful applications to unbound nodes like in-memory nodes that could be rebound to the file, allowing a personalized data-in-memory cache for the user, or simply use them to work with volatile data. The problem is that this is not implemented. So, the only utility for unbound nodes is be bounded to a file. And, why not create them directly bounded? And avoid: 1)A concept that is not supported, 2)Errors derived from that. > Although having unbound nodes might seem incorrect, using > create* methods forces some knowledge into the File (it would > also on Group) objects which I think they need not have. mmm I don't know what is better for remember, methods on Group nodes or the parameters required in each node constructor. File.create* methods are supported and are actually the standard, Group.create* will be nearly identical and simplest. And we could invent a NodeCreator interface that File and Group implements (or they are descendants of a Creator class), so only NodeCreator methods might be remembered. > However, if create* methods were to be used anyway, I prefer > them to belong in Group objects rather than File, to avoid > walking the whole tree structure as you mentioned. File.create* must be supported for compatibility. For best easy-to-use usage the best placement for create* methods are Group objects. But, in the implementation view File should keep all the logic of this methods, because File is the real container (groups are containers of other groups, but following the recursion the container of the root Group is a File instance). So implementing Group.create* would be very simple: class Group: def createArray(self, name, data): return self._v_file.createArray(self, name, data) I prefer File as the instance who contains all the logic because there will be many instances of a Group, and the code will be more maintenable in a File instance. Besides, in the concept-problem point of view and not in the easy-to-use point of view, File should be the class that was in charge of this task. So, I think File.create* methods should exist forever, but could be contemplated, in a future, if they should be private if Group.create* methods are implemented. |
From: Ivan V. i B. <iv...@ca...> - 2005-01-21 09:10:13
|
On Thu, Jan 20, 2005 at 09:07:16PM +0100, Andreu Alted Abad wrote: [...] > And finally I would like to mention an example that was used many times > along this thread: >=20 > mygroup.mynode=3D Node(...) >=20 > Some time ago I was discussing with Francesc about the correctness of usi= ng > node constructors in the client code. My perception was that this notation > only could bring problems since the instance created with the constructor= is > not bounded to any pytables file, and more importantly, it has no utility > except be assigned to a group after his creation; any other uses would end > with an error. [...] I don't see much technical difficulty in this change: File.create* methods basically do not do anything beyond getting the destination group, creating the object and assigning it using setattr. With the new model, some very simple checking should be added to leafs and groups to check if they are bound before doing some operations (including assigning them into a Group, which is already done by using a mutual call pattern). Although having unbound nodes might seem incorrect, using create* methods forces some knowledge into the File (it would also on Group) objects which I think they need not have. However, if create* methods were to be used anyway, I prefer them to belong in Group objects rather than File, to avoid walking the whole tree structure as you mentioned. Regards, Ivan import disclaimer --=20 Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C=E1rabos Coop. V. V V Enjoy Data "" |
From: russ <ru...@ag...> - 2005-01-21 07:00:18
|
> it might make sense to do '/' parsing in __getitem__, such > that something["abc/djk"] is internally read equivalently > to something["abc"]["djk"] Actually I'd disagree with that. I think the beauty of allowing any arbitrary string in the __getitem__ is that anything is allowed; there are no special cases for the programmer to have to worry about. It is very simple to use. It doesn't make sense to me to provide a special interpretation for the "/" character purely in order to save typing a few characters in an interactive session. In general, where the programmer knows the structure of the database in advance and preassigns all the names, then using the natural naming (__getattr__) adds a lot in readability and interactive exploration. In the other case where the programmer doesn't know the structure in advance, and the names are chosen, for example, by the user, then you're going to use the __getitem__ method of node retrieval since the names are already in variables to begin with. To mix up these two common situations in a halfway measure doesn't make any sense. Cheers, -Russ |
From: Andreu A. A. <aa...@ar...> - 2005-01-20 20:07:37
|
In the very interesting thread "__getattr__ vs. __getitem__ (Natural naming)". The following improvements were suggested: >>>To sum up: >>> * __dict__ should contain only the immediate children. It is not clear yet, >>>whether it should only contain the pythonic names or all names. and, >>>whether it should only contain the pythonic names or all names. I also think that __dict__ should allow non-pythonic names for the children names (if __getitem__ is added to Group). If someone wants to use non-pythonic names he must assume that he might find problems related to accessing nodes with group.__getattr__ or problems with the key-completion, but IMO should be allowed. Often names used to identify or classify things in a computer is becoming closest to human language. So I think is a good idea to support non-programming-language identifiers. >>>* __getitem__ and __getattr__ might well allow additional keys that are not >>>advertised in __dict__: these will not show up for key-completion but they >>> will work nontheless. >>>* it might make sense to do "/" parsing in __getitem__, such that >>> something["abc/djk"] >>>is internally read equivalently to >>> something["abc"]["djk"] >>> or >>> something.abc.djk >>>Of course, "abc/djk" should not appear in __dict__ I agree with you and I do think this would be really useful, since allowing a pathname in __getitem__ will allow direct access to the desired node without walking through the hierarchy. Besides, this would improve performance even in the case of a cached object tree [I'm thinking from a client-server point of view, i.e. CSTables]. Of course, pathnames used in __getitem__ should be relative to the group in wich __getitem__ is used. For absolute paths we can use file.root (that will be relative to root too). And finally I would like to mention an example that was used many times along this thread: mygroup.mynode= Node(...) Some time ago I was discussing with Francesc about the correctness of using node constructors in the client code. My perception was that this notation only could bring problems since the instance created with the constructor is not bounded to any pytables file, and more importantly, it has no utility except be assigned to a group after his creation; any other uses would end with an error. So I suggested the following (that is not currently implemented): a=mygroup.createArray('mynode', [1,2]) That is pretty simple and clear and will return a correct bound array. Another option for using the constructor with bound objects is: a=Array(file.root, '/group/myarray',[1,2]) or a=Array(file.root.group, 'myarray', [1,2]) which results in a bound and actually correct node. Any suggestions about these alternatives? |
From: Ivan V. i B. <iv...@ca...> - 2005-01-20 14:51:26
|
On Wed, Jan 19, 2005 at 02:50:56PM +0100, Norbert Nemec wrote: > Am Mittwoch, 19. Januar 2005 09:34 schrieb Ivan Vilata i Balaguer: > > On Mon, Jan 17, 2005 at 09:17:08AM +0100, Norbert Nemec wrote: [...] > Seems to me that you are cofusing __getattr__ and __getitem__ - the latte= r has=20 > nothing to do with key completion. if you start with > something[" > and hit TAB, nothing will happen, no matter which IDE or shell you use. >=20 > Key-completion only works for natural naming. Most shells simply use __di= ct__=20 > to find out what to offer for completion. It is true, that eric seems to = have=20 > some additional place to look for but that does not affect this discussio= n. I may have not been clear enough, but I was not mixing __getattr__ and __getitem__. In fact, Eric 3 *does* show completions for this: >>> d =3D {'foo': 1, 'bar': 10, 'foobar': 20} >>> d["fo But, as you say, this does not affect the discussion, which is mainly about __dict__ and __*attr__ behavior. >=20 > To sum up: > * __dict__ should contain only the immediate children. It is not clear ye= t,=20 > whether it should only contain the pythonic names or all names. > * __getitem__ and __getattr__ might well allow additional keys that are n= ot=20 > advertised in __dict__: these will not show up for key-completion but the= y=20 > will work nontheless. > * it might make sense to do '/' parsing in __getitem__, such that > something["abc/djk"] > is internally read equivalently to > something["abc"]["djk"] > or > something.abc.djk > Of course, "abc/djk" should not appear in __dict__ [...] I agree with these three points and the lack of utility of completing non-pythonic names. However, not completing a non-pythonic name may give the impression that the name does not exist! Hopefully, the user should stop at this point, print the node, see that the named child exists and use [] notation instead (or go for the manual to see what is happening). It seems coherent to me. Bye! import disclaimer --=20 Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C=E1rabos Coop. V. V V Enjoy Data "" |
From: Francesc A. <fa...@ca...> - 2005-01-20 12:38:33
|
A Dilluns 17 Gener 2005 10:59, Norbert Nemec va escriure: > Hi there, > > is there any specific reason why in-kernel iterators cannot be used more > then once: > > Doing something like: > > selection =3D table.where(table.cols.somecol =3D=3D someval) > val1 =3D [ x.val1 for x in selection ] > val2 =3D [ x.val2 for x in selection ] > > gives me a rather intimidating errormessage from somewhere deep inside the > HDF library. I think, it is a rather common case, so it should definitely > work. Good suggestion, we are looking at that, and will foreseeably provide this= =20 functionality for PyTables 1.0 Cheers, =46rancesc |
From: Francesc A. <fa...@ca...> - 2005-01-20 12:35:18
|
Ooops, I've recently made changes on my local MTA and set it up badly :( I'm sending a couple of messages that were incorrectly sent. Regards, =2D--------- Missatge transm=E8s ---------- Subject: Re: [Pytables-users] Re: natural naming versus getitem/setitem Date: Dimarts 18 Gener 2005 08:57 =46rom: Francesc Altet <fa...@ca...> To: pyt...@li... A Dilluns 17 Gener 2005 09:09, Norbert Nemec va escriure: > Am Montag, 17. Januar 2005 06:12 schrieb russ: > > I would agree that the natural naming can easily get in the way. For > > example, I tried storing stock data into pytables using the ticker > > symbol as the table name. However, as it stands you can't use "def" as > > a table name; the "checkNameValidity" function throws a NameError, since > > "def" is a python reserved word. > > This problem should already be solved in the newest CVS version of pytabl= es > (just a little past 0.9.1, I think). A little while ago, it was the first That's correct, Norbert. With the new version, if one uses a name that is n= ot compliant with the natural naming requirements, just a warning is issued: warnings.warn(""""%s" is not a valid python identifier, so the associated element cannot be accessed by natural naming. Check for special symbols ($, %%, @, ...), spaces or reserved words, or be sure to access it later on by using getattr().""" % (name), NaturalNameWarning) so preventing the users that such a name will not be able to be used in natural naming references. > step to allow non-pythonic names in the first place. Now it would be the > second step to make it more convenient to access such names. Yeah, but I have to give to that a more complete thought before stepping ahead and implementing that. There are some issues with CSTables (the forthcoming client-server version of PyTables) that would have problems using: group["name"] =3D Array() or even group.name =3D Array() This last sentence is perfectly possible now, but may be deprecated in the future. However, as I said before, this needs more thought on our part. Cheers, =46rancesc =2D------------------------------------------------------ |
From: Norbert N. <Nor...@gm...> - 2005-01-19 13:51:31
|
Am Mittwoch, 19. Januar 2005 09:34 schrieb Ivan Vilata i Balaguer: > On Mon, Jan 17, 2005 at 09:17:08AM +0100, Norbert Nemec wrote: > > The dictionary should, of course, contain only the names of immediate > > children. It would be __getitem__ that does the '/' parsing and looks up > > the individual steps. dictionary key completion is not an issue. > > I was thinking about the Eric IDE, which somehow manages to complete > values which are only returned by __getitem__ but are not in __dict__ > (don't ask me how it does that). However, __dict__ lookup still works > right in Eric regardless of __getitem__, so it seems that my proposal > would break nothing indeed. I still admit that it might be confusing, > so maybe another new method (e.g. getItem() or getPath()) would be a > better choice. Seems to me that you are cofusing __getattr__ and __getitem__ - the latter has nothing to do with key completion. if you start with something[" and hit TAB, nothing will happen, no matter which IDE or shell you use. Key-completion only works for natural naming. Most shells simply use __dict__ to find out what to offer for completion. It is true, that eric seems to have some additional place to look for but that does not affect this discussion. To sum up: * __dict__ should contain only the immediate children. It is not clear yet, whether it should only contain the pythonic names or all names. * __getitem__ and __getattr__ might well allow additional keys that are not advertised in __dict__: these will not show up for key-completion but they will work nontheless. * it might make sense to do '/' parsing in __getitem__, such that something["abc/djk"] is internally read equivalently to something["abc"]["djk"] or something.abc.djk Of course, "abc/djk" should not appear in __dict__ > > Actually: __dict__ can even contain non-pythonic names. You just cannot > > query them by natural naming, but ipython even shows these names in key > > completion. Anyhow: it certainly would be confusing to have non-pythonic > > names in a __dict__ ... > > I don't see potential trouble, unless you are using __dict__ keys to > build expressions for 'eval' or 'exec', which could be avoided by the > use of __getitem__. The potential confusion would be: >>> mygroup["some non-pythonic name"] = Array(somevalue) >>> mygroup.so<TAB> resulting in >>> mygroup.some non-python name appearing at the command line. Of course this is rather harmless unless you hit return afterwards. But it may be confusing nontheless. Ciao, Norbert -- _________________________________________Norbert Nemec Bernhardstr. 2 ... D-93053 Regensburg Tel: 0941 - 2009638 ... Mobil: 0179 - 7475199 eMail: <No...@Ne...> |
From: Ivan V. i B. <iv...@ca...> - 2005-01-19 08:34:51
|
On Mon, Jan 17, 2005 at 09:17:08AM +0100, Norbert Nemec wrote: > The dictionary should, of course, contain only the names of immediate=20 > children. It would be __getitem__ that does the '/' parsing and looks up = the=20 > individual steps. dictionary key completion is not an issue. I was thinking about the Eric IDE, which somehow manages to complete values which are only returned by __getitem__ but are not in __dict__ (don't ask me how it does that). However, __dict__ lookup still works right in Eric regardless of __getitem__, so it seems that my proposal would break nothing indeed. I still admit that it might be confusing, so maybe another new method (e.g. getItem() or getPath()) would be a better choice. >=20 > Actually: __dict__ can even contain non-pythonic names. You just cannot q= uery=20 > them by natural naming, but ipython even shows these names in key complet= ion.=20 > Anyhow: it certainly would be confusing to have non-pythonic names in a= =20 > __dict__ ... I don't see potential trouble, unless you are using __dict__ keys to build expressions for 'eval' or 'exec', which could be avoided by the use of __getitem__. --=20 Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C=E1rabos Coop. V. V V Enjoy Data "" |
From: Antonio V. <val...@co...> - 2005-01-19 08:15:20
|
Ok, I sent data you asked for yesterday. I run some more tests: Python 2.3.4 [GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2 numarray v. 1.1.1 tables v. 0.9.1 hdf5 v. 1.6.2 and=20 Python 2.4 Win XP SP2 numarray v. 1.1.1 tables v. 0.9.1 hdf5 v. 1.6.3-pathc the result is always the same. bye Alle 17:18, luned=EC 17 gennaio 2005, Francesc Altet ha scritto: > Hi Antonio, > > Can you send me privately both files so as to see what's going on with > them? > > Cheers, > > A Diumenge 16 Gener 2005 16:19, Antonio Valentino va escriure: > > hi, > > I'm not an expert user and I'm having some problems trying to open an > > hdf5 file containing a chunked dataset. > > Here it is some infos > > > > Python 2.3.4 > > [GCC 3.4.2 20041017 (Red Hat 3.4.2-6.fc3)] on linux2 > > numarray v. 1.1.1 > > tables v. 0.9.1 > > hdf5 v. 1.6.3-patch > > > > this is the test program > > > > # BEGIN file test-uchar.py > > import tables > > h5file =3D tables.openFile('data.h5') > > print h5file > > # END file test-uchar.py > > > > an this is the data > > > > # chunk (128x128x2) > > > > [antonio@m6n h5]$ h5dump -A data.h5 > > HDF5 "data.h5" { > > GROUP "/" { > > DATASET "ComplexUCharArray" { > > DATATYPE H5T_STD_U8LE > > DATASPACE SIMPLE { ( 200, 150, 2 ) / ( 200, 150, 2 ) } > > } > > } > > } > > > > When i run the test program i get a segfault > > > > [antonio@m6n h5]$ python test-uchar.py > > /usr/lib/python2.3/site-packages/tables/File.py:192: UserWarning: > > 'data.h5' does exist, is an HDF5 file, but has not a PyTables format. > > Trying toguess what's there using HDF5 metadata. I can't promise you > > getting the correctobjects, but I will do my best!. > > path, UserWarning) > > Segmentation fault > > > > > > If I try it with a *non* chunked dataset ... > > > > [antonio@m6n h5]$ python test-uchar.py > > /usr/lib/python2.3/site-packages/tables/File.py:192: UserWarning: > > 'data.h5' does exist, is an HDF5 file, but has not a PyTables format. > > Trying to guess what's there using HDF5 metadata. I can't promise you > > getting the correct objects, but I will do my best!. > > path, UserWarning) > > Traceback (most recent call last): > > File "test-uchar.py", line 6, in ? > > print h5file > > File "/usr/lib/python2.3/site-packages/tables/File.py", line 1000, = in > > __str__ > > astring +=3D str(leaf) + '\n' > > File "/usr/lib/python2.3/site-packages/tables/Leaf.py", line 472, i= n > > __str__ > > title =3D self.attrs.TITLE > > File "/usr/lib/python2.3/site-packages/tables/AttributeSet.py", lin= e > > 166, in __getattr__ > > raise AttributeError, \ > > > > [SNIP] > > > > File "/usr/lib/python2.3/site-packages/tables/AttributeSet.py", lin= e > > 166, in __getattr__ > > raise AttributeError, \ > > File "/usr/lib/python2.3/site-packages/tables/Leaf.py", line 472, i= n > > __str__ > > title =3D self.attrs.TITLE > > File "/usr/lib/python2.3/site-packages/tables/Leaf.py", line 189, i= n > > _get_attrs > > return AttributeSet(self) > > RuntimeError: maximum recursion depth exceeded > > > > > > in this case the file seems to be correctly opened but some problem i= s > > met in the print statement. > > > > antonio --=20 Antonio Valentino Consorzio Innova S.r.l. Home Page: www.consorzio-innova.it |