From: Ivan V. i B. <iv...@ca...> - 2005-10-10 14:03:53
|
Hi list! I'm posting this document as a Request For Comments for a possible new natural naming approach for a future version of PyTables. Please note that this is *not* going to be implemented in a near future, but I would like to know your opinion indeed, to know if we will be moving in the right direction. All criticism and opinions welcome, thank you! :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ Cárabos Coop. V. V V Enjoy Data "" |
From: Norbert N. <Nor...@gm...> - 2005-10-11 07:25:40
|
The proposal sounds reasonable to me. The only "uglification" that it introduces is the trailing underscore for regular items. I think people can bear that. All the escaping should probably not appear in everyday use. (Unless somebody wants to punish him/herself.) It certainly is good to have the predictability and the full power of tab-expansion. One idea: would it make sense to consider __getitem__/__setitem__ as alternative access method for children? It cannot offer tab-completion, but for non-interactive use, it might come in handy. Ivan Vilata i Balaguer wrote: >Hi list! I'm posting this document as a Request For Comments for a >possible new natural naming approach for a future version of PyTables. >Please note that this is *not* going to be implemented in a near future,= >but I would like to know your opinion indeed, to know if we will be >moving in the right direction. All criticism and opinions welcome, >thank you! > >:: > > Ivan Vilata i Balaguer >qo< http://www.carabos.com/ > C=C3=A1rabos Coop. V. V V Enjoy Data > "" > =20 > |
From: Ivan V. i B. <iv...@ca...> - 2005-10-11 09:10:16
Attachments:
signature.asc
|
En/na Norbert Nemec ha escrit:: > The proposal sounds reasonable to me. The only "uglification" that it > introduces is the trailing underscore for regular items. I think people= > can bear that. All the escaping should probably not appear in everyday > use. (Unless somebody wants to punish him/herself.) It certainly is goo= d > to have the predictability and the full power of tab-expansion. The trailing underscore may be ugly but I find it quite interesting that it makes child nodes instantly distinguishable from instance members. As you noted, escaping will not happen in most cases, since people tends to use id-like names. We people using accents and such in computing know we will come across trouble sooner or later... :-/ En/na Norbert Nemec ha escrit:: > One idea: would it make sense to consider __getitem__/__setitem__ as > alternative access method for children? It cannot offer tab-completion,= > but for non-interactive use, it might come in handy. I have been quite convinced for a while that this would be nice to have in PyTables, but I am not so sure now. At first, ``__setitem__()`` would require unbound nodes, and idea we do not like much (this is why ``__setattr__()`` of nodes no longer works). Then, I have developed some ideas against using the item interface: * I think the semantics of items is not related with the semantics of child nodes, so it may not be obvious to the user what ``group['nodename']`` does. * The natural naming interface is related with interactive usage, and programmatic usage would already have ``getChild()`` and similarly named methods. These are already two ways of doing the same thing, a third one is only confusing. I *really* would like to discuss these points a little more. Should groups, in your opinion, behave more like dictionaries? If that was the case, the item interface would make more sense, but then node attributes would look more strange to be placed in a separate ``attrs`` object. Cheers, :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C=C3=A1rabos Coop. V. V V Enjoy Data "" |
From: Norbert N. <Nor...@gm...> - 2005-10-12 22:46:48
|
Ivan Vilata i Balaguer wrote: >>One idea: would it make sense to consider __getitem__/__setitem__ as >>alternative access method for children? It cannot offer tab-completion, >>but for non-interactive use, it might come in handy. >> >> > >I have been quite convinced for a while that this would be nice to have >in PyTables, but I am not so sure now. At first, ``__setitem__()`` >would require unbound nodes, and idea we do not like much (this is why >``__setattr__()`` of nodes no longer works). > OK, this opens a different topic: unbound nodes. Personally, I think that creating a node like h5file.root.somenode = Array(somedata) is the most natural way to do the job. It is always said that unbound nodes should be avoided. I believe, however, that this is a move in the wrong direction. Well - matter of taste and of discussion. In the end it is up to you to decide. As for the original question: should child nodes be accessible via getitem/setitem? Thinking about your words, I notice that the Python-principle actually answers the question: "There should be one - and only one - obvious way to do the job." Guess, my proposal is killed by exactly that principle. True, we already have two ways of accessing child nodes. Those are well distinguished and both have their purpose. My alternative might at best save some typing. I'm withdrawing my proposal. Greetings, Norbert |
From: Ivan V. i B. <iv...@ca...> - 2005-10-18 14:28:56
Attachments:
signature.asc
|
En/na Norbert Nemec ha escrit:: > Ivan Vilata i Balaguer wrote: >>The motto you quote was in fact what prevented me from accepting one >>more interface. But if we were to make groups more dictionary-like, >>then the ``getChild()``-like interface would be the one out of place! ;= ) >> >>So, do you see advantages in the dictionary-like interface? >=20 > Well, why not? It could be made clear that __get/setitem__ and > __get/setattr__ are basically the same thing, accessing the children of= > any node, except that item will use the unquoted original name from > HDF5, while attr will quote the name in the style that you propose. >=20 > For HDF5-Attributes, the behavior should be the same, except that it > does not work on node itself, but on node.attrs. >=20 > In general, use of operators should be handled with care, but in this > case, since accessing and setting the children of a node is the one > major purpose of any group, it makes sense to give it this special synt= ax. >=20 > Of course, get/setChild should then go away, or at least its use should= > be discouraged. Francesc and I have been giving this issue a thought, and some problems have immediately come to mind. In the first place, as you say, some operators would not make sense or they would be ambiguous, e.g. does ``del group[childgroup]`` act recursively or not? Does ``for n in group`` yield names of nodes or nodes themselves? Does it recurse? What does ``group.pop()`` mean? How do you rename a child node? Some of the previous ambiguity goes away by using more descriptive methods, but hen we end up implementing half a dictionary interface with lots of extra methods to complete it. Maybe there's no point in making groups dictionary-like, then. But maybe it's OK to implement only a part of the dictionary interface. In the second place, leaves will also have their own, very differently behaving item interface, so having an item interface for leaves and groups may get confusing. Now, I know we seem to oscillate from one side to the other one with this question; maybe we need the ultimate advice! ;) :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C=C3=A1rabos Coop. V. V V Enjoy Data "" |
From: Francesc A. <fa...@ca...> - 2005-10-13 18:46:38
|
A Dijous 13 Octubre 2005 00:46, Norbert Nemec va escriure: > OK, this opens a different topic: unbound nodes. Personally, I think > that creating a node like > > h5file.root.somenode =3D Array(somedata) > > is the most natural way to do the job. It is always said that unbound > nodes should be avoided. I believe, however, that this is a move in the > wrong direction. I also do like *very much* this way of creating nodes. However, as you already said on a previous message, the problem is what it is supposed that the user is going to do with such an unbound node other than bounding it to a node on a file. If we could force something like: h5file.root.somenode =3D Array(somedata) as the only thing that is allowed to the user, then I'd gladly vote for this. However, I'm afraid that we can't prevent the user doing things like: node =3D EArray(somedata) node.append(someotherdata) <lot of bad message errors appearing....> So, at least until nodes in-memory would be supported, I don't think this is going to be a good idea. You can still use it, if you want, but documenting it and stimulating its use for every kind of user could be an unnecessary source of confusion, IMO. Cheers, =2D-=20 >0,0< Francesc Altet =C2=A0 =C2=A0 http://www.carabos.com/ V V C=C3=A1rabos Coop. V. =C2=A0=C2=A0Enjoy Data "-" |
From: Norbert N. <Nor...@gm...> - 2005-10-15 13:25:16
|
What is wrong about checking whether a node is bound and say "manipulations to unbound nodes not yet implemented" if people to anything to them before they are assigned to a on-disk group? Even if unbound nodes are still a far way off, there is nothing wrong about following the design idea now. I think the idea of unbound nodes is something very clear to understand for the user, even if - for the time being - these nodes are seriously limited until they are actually written to disk. Of course, checking whether a node is bound costs a tiny bit of performance, but that certainly can be minimized. Greetings, Norbert Francesc Altet wrote: >A Dijous 13 Octubre 2005 00:46, Norbert Nemec va escriure: > > >>OK, this opens a different topic: unbound nodes. Personally, I think >>that creating a node like >> >> h5file.root.somenode = Array(somedata) >> >>is the most natural way to do the job. It is always said that unbound >>nodes should be avoided. I believe, however, that this is a move in the >>wrong direction. >> >> > >I also do like *very much* this way of creating nodes. However, as you >already said on a previous message, the problem is what it is supposed >that the user is going to do with such an unbound node other than >bounding it to a node on a file. If we could force something like: > > h5file.root.somenode = Array(somedata) > >as the only thing that is allowed to the user, then I'd gladly vote >for this. However, I'm afraid that we can't prevent the user doing >things like: > >node = EArray(somedata) >node.append(someotherdata) ><lot of bad message errors appearing....> > > >So, at least until nodes in-memory would be supported, I don't think >this is going to be a good idea. You can still use it, if you want, >but documenting it and stimulating its use for every kind of user >could be an unnecessary source of confusion, IMO. > >Cheers, > > > |
From: Francesc A. <fa...@ca...> - 2005-10-17 14:35:54
|
A Dissabte 15 Octubre 2005 15:25, Norbert Nemec va escriure: > What is wrong about checking whether a node is bound and say > "manipulations to unbound nodes not yet implemented" if people to > anything to them before they are assigned to a on-disk group? Indeed, it's a possibility. > Even if unbound nodes are still a far way off, there is nothing wrong > about following the design idea now. I think the idea of unbound nodes > is something very clear to understand for the user, even if - for the > time being - these nodes are seriously limited until they are actually > written to disk. > > Of course, checking whether a node is bound costs a tiny bit of > performance, but that certainly can be minimized. One can always use an "assert" instructuction, and if maximum speed is needed, pass the -O option to python. What other people think? Implementing this would improve readability of the code? My opinion is starting to change and I think it does. Cheers, =2D-=20 >0,0< Francesc Altet =C2=A0 =C2=A0 http://www.carabos.com/ V V C=C3=A1rabos Coop. V. =C2=A0=C2=A0Enjoy Data "-" |
From: Ivan V. i B. <iv...@ca...> - 2005-10-18 11:17:57
Attachments:
signature.asc
|
En/na Francesc Altet ha escrit:: > A Dissabte 15 Octubre 2005 15:25, Norbert Nemec va escriure: >> >>Even if unbound nodes are still a far way off, there is nothing wrong >>about following the design idea now. I think the idea of unbound nodes >>is something very clear to understand for the user, even if - for the >>time being - these nodes are seriously limited until they are actually >>written to disk. >> >>Of course, checking whether a node is bound costs a tiny bit of >>performance, but that certainly can be minimized. >=20 >=20 > One can always use an "assert" instructuction, and if maximum speed is > needed, pass the -O option to python. >=20 > What other people think? Implementing this would improve readability > of the code? My opinion is starting to change and I think it does. The following three threads may be interesting, because the same problems are touched in them (but keep in mind we are now talking of an entirely different release of PyTables): * https://sourceforge.net/mailarchive/forum.php?thread_id=3D6361641&forum_i= d=3D13760 * https://sourceforge.net/mailarchive/forum.php?thread_id=3D6391459&forum_i= d=3D13760 * https://sourceforge.net/mailarchive/forum.php?thread_id=3D6412440&forum_i= d=3D13760 Please also have a look at the syntax used in the HDF5 RFC at http://hdf.ncsa.uiuc.edu/RFC/linkEncodings/Character_Encoding.pdf:: dataset_id =3D H5Dcreate(dataspace, datatype, DCPL, DAPL, DXPL) H5Lcreate("dataset name", dataset_id, LCPL) Dataset creation and link (name/directory entry) creation are separated in the new interface, so maybe PyTables 2 would not have problems in *actually storing data* in unbound nodes. :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C=C3=A1rabos Coop. V. V V Enjoy Data "" |
From: Ivan V. i B. <iv...@ca...> - 2005-10-18 14:55:04
Attachments:
signature.asc
|
En/na Ivan Vilata i Balaguer ha escrit:: > Please also have a look at the syntax used in the HDF5 RFC at > http://hdf.ncsa.uiuc.edu/RFC/linkEncodings/Character_Encoding.pdf:: >=20 > dataset_id =3D H5Dcreate(dataspace, datatype, DCPL, DAPL, DXPL) > H5Lcreate("dataset name", dataset_id, LCPL) >=20 > Dataset creation and link (name/directory entry) creation are separated= > in the new interface, so maybe PyTables 2 would not have problems in > *actually storing data* in unbound nodes. The previous example would be translated to PyTables 2 like this:: >>> array =3D Array([1,2,3]) >>> group.myarray_ =3D array # interactive interface, or... >>> group.link(array, 'myarray') # programmatic if. (methods), or... >>> group['myarray'] =3D array # programmatic if. (dictionary) (``link()`` could also be named ``addChild()`` or similar.) But the HDF5 hierarchy is not properly a tree, but a graph. Then, we could follow with: >>> group.samearray_ =3D array With no problems on the HDF5 side. That reminds us a truth in HDF5 and Un*x filesystems: files in themselves *do not have a name, nor a path*. Paths and names are only a concatenation of links in groups. Then, moving and renaming operations should be provided by groups, not nodes, i.e. no ``node.rename('newname')``, but ``group.rename('oldchildname', 'newchildname')``. This opens a new range of chances and problems, but I think we should not avoid them in order to not lag behind HDF5 functionality. :: Ivan Vilata i Balaguer >qo< http://www.carabos.com/ C=C3=A1rabos Coop. V. V V Enjoy Data "" |
From: travlr <vel...@gm...> - 2005-10-20 02:17:40
|
Hi All, I just wanted to mention that which ever way necessity dictates the syntax and symantics of pyTables, that great care should be considered in making pyTables congruent as much as possible with numarray/scipy in those respects. I also hope that pyTables roadmap would be to able to seamlessly integrate in scipy itself... That would be terrific. |
From: Francesc A. <fa...@ca...> - 2005-10-20 10:25:04
|
A Dijous 20 Octubre 2005 04:17, travlr va escriure: > Hi All, > > I just wanted to mention that which ever way necessity dictates the > syntax and symantics of pyTables, that great care should be considered > in making pyTables congruent as much as possible with numarray/scipy > in those respects. > > I also hope that pyTables roadmap would be to able to seamlessly > integrate in scipy itself... That would be terrific. I agree. If you have some particular suggestions on this regard apart of indexing "a la numarray" and "supporting scipy_core", I'd love to see them. Thanks, =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |
From: travlr <vel...@gm...> - 2005-10-20 12:16:36
|
On 10/20/05, Francesc Altet <fa...@ca...> wrote: > A Dijous 20 Octubre 2005 04:17, travlr va escriure: > > Hi All, > > > > I just wanted to mention that which ever way necessity dictates the > > syntax and symantics of pyTables, that great care should be considered > > in making pyTables congruent as much as possible with numarray/scipy > > in those respects. > > > > I also hope that pyTables roadmap would be to able to seamlessly > > integrate in scipy itself... That would be terrific. > > I agree. If you have some particular suggestions on this regard apart > of indexing "a la numarray" and "supporting scipy_core", I'd love to > see them. > > Thanks, > > -- > >0,0< Francesc Altet http://www.carabos.com/ > V V C=E1rabos Coop. V. Enjoy Data > "-" > > Hi Francsec, Could it be possible that along the road of maturity for both pyTables and scipy, that hdf5/pytables db functionality could be under scipy's hood? Some sort of optional behavior. What I mean is maybe array types could somehow be creatively integrated in both hdf's physical as well as scipy's virtual memory through pointers/assignments/flags or some such manor, etc. Just thinking off the cuff here :-). I really don't know what I'm talking about technically, I'm only just now teaching myself ANSI C. In the near future, I should be able to be more helpful, in the openSource way. ;-) |