From: Travis O. <oli...@ee...> - 2006-06-08 20:27:20
|
One of the hopes for the Summer of Code project involving getting the multidimensional array object into Python 2.6 is advertisement of the array protocol or array interface. I think one way to simplify the array protocol is simply have only one attribute that is looked to to provide access to the protocol. I would like to deprecate all the array protocol attributes except for __array_struct__ (perhaps we could call this __array_interface__ but I'm happy keeping the name the same too.) If __array_struct__ is a CObject then it behaves as it does now. If __array_struct__ is a tuple then each entry in the tuple is one of the items currently obtained by an additional attribute access (except the first item is always an integer indicating the version of the protocol --- unused entries are None). This should simplify the array interface and allow easier future changes. It should also simplify NumPy so that it doesn't have to check for multiple attributes on arbitrary objects. I would like to eliminate all the other array protocol attributes before NumPy 1.0 (and re-label those such as __array_data__ that are useful in other contexts --- like ctypes). Comments? -Travis |
From: Sasha <nd...@ma...> - 2006-06-08 21:07:58
|
On 6/8/06, Travis Oliphant <oli...@ee...> wrote: > ... > __array_struct__ (perhaps we could call this __array_interface__ but > I'm happy keeping the name the same too.) +0 on the name change and consider making it a method rather than an attribute. > > If __array_struct__ is a CObject then it behaves as it does now. > > If __array_struct__ is a tuple then each entry in the tuple is one of > the items currently obtained by an additional attribute access (except > the first item is always an integer indicating the version of the > protocol --- unused entries are None). > -1 This will complicate the use of array interface. I would propose creating a subtype of CObject that has the necessary attributes so that one can do a.__array_interface__.shape, for example. I did not check if CObject is subclassable in 2.5, but if not, we can propose to make it subclassable for 2.6. > ... > > I would like to eliminate all the other array protocol attributes before > NumPy 1.0 (and re-label those such as __array_data__ that are useful in > other contexts --- like ctypes). +1 |
From: David M. C. <co...@ph...> - 2006-06-08 21:29:54
|
On Thu, 8 Jun 2006 17:07:55 -0400 Sasha <nd...@ma...> wrote: > On 6/8/06, Travis Oliphant <oli...@ee...> wrote: > > ... > > __array_struct__ (perhaps we could call this __array_interface__ but > > I'm happy keeping the name the same too.) > > +0 on the name change and consider making it a method rather than an > attribute. +0 for name change; I'm happy with it as an attribute. > > If __array_struct__ is a CObject then it behaves as it does now. > > > > If __array_struct__ is a tuple then each entry in the tuple is one of > > the items currently obtained by an additional attribute access (except > > the first item is always an integer indicating the version of the > > protocol --- unused entries are None). > > > > -1 > > This will complicate the use of array interface. I would propose > creating a subtype of CObject that has the necessary attributes so > that one can do a.__array_interface__.shape, for example. I did not > check if CObject is subclassable in 2.5, but if not, we can propose to > make it subclassable for 2.6. The idea behind the array interface was to have 0 external dependencies: any array-like object from any package could add the interface, without requiring a 3rd-party module. That's why the C version uses a CObject. Subclasses of CObject start getting into 3rd-party requirements. How about a dict instead of a tuple? With keys matching the attributes it's replacing: "shapes", "typestr", "descr", "data", "strides", "mask", and "offset". The problem with a tuple from my point of view is I can never remember which order things go (this is why in the standard library the result of os.stat() and time.localtime() are now "tuple-like" classes with attributes). We still need __array_descr__, as the C struct doesn't provide all the info that this does. > > I would like to eliminate all the other array protocol attributes before > > NumPy 1.0 (and re-label those such as __array_data__ that are useful in > > other contexts --- like ctypes). > +1 +1 also -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |co...@ph... |
From: Sasha <nd...@ma...> - 2006-06-09 02:52:55
|
On 6/8/06, David M. Cooke <co...@ph...> wrote: > ... > +0 for name change; I'm happy with it as an attribute. > My rule of thumb for choosing between an attribute and a method is that attribute access should not create new objects. In addition, to me __array_interface__ feels like a generalization of __array__ method, so I personally expected it to be a method the first time I tried to use it. >... > The idea behind the array interface was to have 0 external dependencies: any > array-like object from any package could add the interface, without requiring > a 3rd-party module. That's why the C version uses a CObject. Subclasses of > CObject start getting into 3rd-party requirements. > Not necessarily. Different packages don't need to share the subclass, but subclassing CObject is probably a bad idea for the reasons I will explain below. > How about a dict instead of a tuple? With keys matching the attributes it's > replacing: "shapes", "typestr", "descr", "data", "strides", "mask", and > "offset". The problem with a tuple from my point of view is I can never > remember which order things go (this is why in the standard library the > result of os.stat() and time.localtime() are now "tuple-like" classes with > attributes). > My problem with __array_struct__ returning either a tuple or a CObject is that array protocol sholuld really provide both. CObject is useless for interoperability at python level and a tuple (or dict) is inefficient at the C level. Thus a good array-like object should really provide both __array_struct__ for use by C modules and __array_tuple__ (or whatever) for use by python modules. On the other hand, making both required attributes/methods will put an extra burden on package writers. Moreover, a pure python implementation of an array-like object will not be able to provide __array_struct__ at all. One possible solution would be an array protocol metaclass that adds __array_struct__ to a class with __array_tuple__ and __array_tuple__ to a class with __array_struct__ (yet another argument to make both methods). > We still need __array_descr__, as the C struct doesn't provide all the info > that this does. > What do you have in mind? |
From: Tim H. <tim...@co...> - 2006-06-09 16:06:31
|
Sasha wrote: >On 6/8/06, David M. Cooke <co...@ph...> wrote: > > >>... >>+0 for name change; I'm happy with it as an attribute. >> >> >> >My rule of thumb for choosing between an attribute and a method is >that attribute access should not create new objects. > Conceptually at least, couldn't there be a single __array_interface__ object associated with a given array? In that sense, it doesn't really feel like creating a new object. > In addition, to >me __array_interface__ feels like a generalization of __array__ >method, so I personally expected it to be a method the first time I >tried to use it. > > > >>... >>The idea behind the array interface was to have 0 external dependencies: any >>array-like object from any package could add the interface, without requiring >>a 3rd-party module. That's why the C version uses a CObject. Subclasses of >>CObject start getting into 3rd-party requirements. >> >> >> > >Not necessarily. Different packages don't need to share the subclass, >but subclassing CObject is probably a bad idea for the reasons I will >explain below. > > > >>How about a dict instead of a tuple? With keys matching the attributes it's >>replacing: "shapes", "typestr", "descr", "data", "strides", "mask", and >>"offset". The problem with a tuple from my point of view is I can never >>remember which order things go (this is why in the standard library the >>result of os.stat() and time.localtime() are now "tuple-like" classes with >>attributes). >> >> >> >My problem with __array_struct__ returning either a tuple or a CObject >is that array protocol sholuld really provide both. CObject is >useless for interoperability at python level and a tuple (or dict) is >inefficient at the C level. Thus a good array-like object should >really provide both __array_struct__ for use by C modules and >__array_tuple__ (or whatever) for use by python modules. On the other >hand, making both required attributes/methods will put an extra burden >on package writers. Moreover, a pure python implementation of an >array-like object will not be able to provide __array_struct__ at all. > One possible solution would be an array protocol metaclass that adds >__array_struct__ to a class with __array_tuple__ and __array_tuple__ >to a class with __array_struct__ (yet another argument to make both >methods). > > I don't understand this. I'm don't see how bringing in metaclass is going to help a pure python type provide a sensible __array_struct__. That seems like a hopeless task. Shouldn't pure python implementations just provide __array__? A single attribute seems pretty appealing to me, I'm don't see much use for anything else. >>We still need __array_descr__, as the C struct doesn't provide all the info >>that this does. >> >> >> >What do you have in mind? > > Is there any prospect of merging this data into the C struct? It would be cleaner if all of the information could be embedded into the C struct, but I can see how that might be a backward compatibility nightmare. -tim > >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > |
From: Sasha <nd...@ma...> - 2006-06-09 16:50:19
|
On 6/9/06, Tim Hochberg <tim...@co...> wrote: > Sasha wrote: > ... > >> > >My rule of thumb for choosing between an attribute and a method is > >that attribute access should not create new objects. > > > Conceptually at least, couldn't there be a single __array_interface__ > object associated with a given array? In that sense, it doesn't really > feel like creating a new object. > In my view, conceptually, __array_interface__ creates a adaptor to the array-like object. What are the advantages of it being an attribute? It is never settable, so the most common advantage of packing get/set methods in a single attribute can be rulled out. Saving typing of '()' cannot be taken seriousely when the name contains a pair of double underscores :-). There was a similar issue discussed on the python-3000 mailing list with respect to __hash__ method <http://mail.python.org/pipermail/python-3000/2006-April/000362.html>. > .... > >> > >My problem with __array_struct__ returning either a tuple or a CObject > >is that array protocol sholuld really provide both. CObject is > >useless for interoperability at python level and a tuple (or dict) is > >inefficient at the C level. Thus a good array-like object should > >really provide both __array_struct__ for use by C modules and > >__array_tuple__ (or whatever) for use by python modules. On the other > >hand, making both required attributes/methods will put an extra burden > >on package writers. Moreover, a pure python implementation of an > >array-like object will not be able to provide __array_struct__ at all. > > One possible solution would be an array protocol metaclass that adds > >__array_struct__ to a class with __array_tuple__ and __array_tuple__ > >to a class with __array_struct__ (yet another argument to make both > >methods). > > > > > I don't understand this. I'm don't see how bringing in metaclass is > going to help a pure python type provide a sensible __array_struct__. > That seems like a hopeless task. Shouldn't pure python implementations > just provide __array__? > My metaclass idea is very similar to your unpack_interface suggestion. A metaclass can autonatically add def __array_tuple__(self): return unpack_interface(self.__array_interface__()) or def __array_interface__(self): return pack_interface(self.__array_tuple__()) to a class that only implements only one of the two required methods. > A single attribute seems pretty appealing to me, I'm don't see much use > for anything else. I don't mind just having __array_struct__ that must return a CObject. My main objection was against a method/attribute that may return either CObject or something else. That felt like shifting the burden from package writer to the package user. |
From: Tim H. <tim...@co...> - 2006-06-09 17:56:58
|
Sasha wrote: >On 6/9/06, Tim Hochberg <tim...@co...> wrote: > > >>Sasha wrote: >>... >> >> >>>My rule of thumb for choosing between an attribute and a method is >>>that attribute access should not create new objects. >>> >>> >>> >>Conceptually at least, couldn't there be a single __array_interface__ >>object associated with a given array? In that sense, it doesn't really >>feel like creating a new object. >> >> >> >In my view, conceptually, __array_interface__ creates a adaptor to the >array-like object. What are the advantages of it being an attribute? >It is never settable, so the most common advantage of packing get/set >methods in a single attribute can be rulled out. Saving typing of >'()' cannot be taken seriousely when the name contains a pair of >double underscores :-). > >There was a similar issue discussed on the python-3000 mailing list >with respect to __hash__ method ><http://mail.python.org/pipermail/python-3000/2006-April/000362.html>. > > Isn't __array_interface__ always O(1)? By the criteria in that thread, that would make is good candidate for being an attribute. [Stare at __array_interface__ spec...think..stare...] OK, I think I'm coming around to making it a function. Presumably, in: >>> a = arange(6) >>> ai1 = a.__array_interface__() >>> a.shape = [3, 2] >>> ai2 = a.__array_interface__() ai1 and ai2 will be different objects with different objects, pointing to structs with different shape and stride attributes. So, in that sense it's not conceptually constant and should be a function. What happens if I then delete or resize a? Hmmm. It looks like that's probably OK since CObject grabs a reference to a. FWIW, at this point, I marginally prefer array_struct to array_interface. > > >>.... >> >> >>>My problem with __array_struct__ returning either a tuple or a CObject >>>is that array protocol sholuld really provide both. CObject is >>>useless for interoperability at python level and a tuple (or dict) is >>>inefficient at the C level. Thus a good array-like object should >>>really provide both __array_struct__ for use by C modules and >>>__array_tuple__ (or whatever) for use by python modules. On the other >>>hand, making both required attributes/methods will put an extra burden >>>on package writers. Moreover, a pure python implementation of an >>>array-like object will not be able to provide __array_struct__ at all. >>>One possible solution would be an array protocol metaclass that adds >>>__array_struct__ to a class with __array_tuple__ and __array_tuple__ >>>to a class with __array_struct__ (yet another argument to make both >>>methods). >>> >>> >>> >>> >>I don't understand this. I'm don't see how bringing in metaclass is >>going to help a pure python type provide a sensible __array_struct__. >>That seems like a hopeless task. Shouldn't pure python implementations >>just provide __array__? >> >> >> > >My metaclass idea is very similar to your unpack_interface suggestion. > A metaclass can autonatically add > >def __array_tuple__(self): > return unpack_interface(self.__array_interface__()) > > >or > >def __array_interface__(self): > return pack_interface(self.__array_tuple__()) > >to a class that only implements only one of the two required methods. > > It seems like 99% of the people will never care about this at the Python level, so adding an extra attribute is mostly clutter. For those few who do care a function seems preferable. To be honest, I don't actually see a need for anything other than the basic __array_struct__. >>A single attribute seems pretty appealing to me, I'm don't see much use >>for anything else. >> >> > >I don't mind just having __array_struct__ that must return a CObject. >My main objection was against a method/attribute that may return >either CObject or something else. That felt like shifting the burden >from package writer to the package user. > > I concur. > >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > |
From: Sasha <nd...@ma...> - 2006-06-09 16:53:23
|
On 6/9/06, Tim Hochberg <tim...@co...> wrote: > Shouldn't pure python implementations > just provide __array__? > You cannot implement __array__ without importing numpy. |
From: Travis O. <oli...@ee...> - 2006-06-09 18:08:56
|
Tim Hochberg wrote: > Sasha wrote: > >> On 6/8/06, David M. Cooke <co...@ph...> wrote: >> >> >>> ... >>> +0 for name change; I'm happy with it as an attribute. >>> >>> >> >> My rule of thumb for choosing between an attribute and a method is >> that attribute access should not create new objects. >> Interesting rule. In NumPy this is not quite the rule followed. Bascially attributes are used when getting or setting intrinsinc "properties" of the array. Attributes are used for properties that are important in defining what an array *is*. The flags attribute, for example, is an important intrinsinc property of the array but it returns an flags object when it is accessed. The flat attribute also returns a new object (it is arguable whether it should have been a method or an attribute but it is enough of an intrinsic property --- setting the flat attribute sets elements of the array -- that with historical precedence it was left as an attribute). By this meausure, the array interface should be an attribute. >>> >> >> My problem with __array_struct__ returning either a tuple or a CObject >> is that array protocol sholuld really provide both. > This is a convincing argument. Yes, the array protocol should provide both. Thus, we can't over-ride the usage of the same name unless that name produces an object through which both interfaces can be obtained. Is that Sasha's suggestion? > > A single attribute seems pretty appealing to me, I'm don't see much > use for anything else. > > >>> We still need __array_descr__, as the C struct doesn't provide all >>> the info >>> that this does. >>> >>> >> >> What do you have in mind? >> >> > Is there any prospect of merging this data into the C struct? It would > be cleaner if all of the information could be embedded into the C > struct, but I can see how that might be a backward compatibility > nightmare. I do think it should be merged into the C struct. The simplest thing to do is to have an additional PyObject * as part of the C struct which could be NULL (or unassigned). The backward compatibility is a concern but when thinking about what Python 2.6 should support we should not be too crippled by it. Perhaps we should just keep __array_struct__ and compress all the other array_interface methods into the __array_interface__ attribute which returns a dictionary from which the Python-side interface can be produced. Keep in mind there are two different (but related) issues at play here. 1) What goes in to NumPy 1.0 2) What we propose should go into Python 2.6 I think for #1 we should compress the Python-side array protocol into a single __array_interface__ attribute that returns a dictionary. We should also expand the C-struct to contain what _array_descr_ currently provides. -Travis |
From: Alexander B. <ale...@gm...> - 2006-06-09 18:55:15
|
On 6/9/06, Travis Oliphant <oli...@ee...> wrote: > ... In NumPy this is not quite the rule followed. > Bascially attributes are used when getting or setting intrinsinc > "properties" of the array. Attributes are used for properties that are > important in defining what an array *is*. The flags attribute, for > example, is an important intrinsinc property of the array but it returns > an flags object when it is accessed. The flat attribute also returns a > new object (it is arguable whether it should have been a method or an > attribute but it is enough of an intrinsic property --- setting the flat > attribute sets elements of the array -- that with historical precedence > it was left as an attribute). > > By this meausure, the array interface should be an attribute. > Array interface is not an intrinsic property of the array, but rather an alternative representation of the array itself. Flags are properly an attribute because they are settable. Something like >>> x.flags()['WRITEABLE'] = False although technically possible, would be quite ugly. Similarly, shape attribute, although fails my rule of thumb by creating a new object, >>> x.shape is x.shape False is justifiably an attribute because otherwise two methods: get_shape and set_shape would be required. I don't think "flat" should be an attribute, however. I could not find the reference, but I remember a discussion of why __iter__ should not be an attribute and IIRC the answer was because an iterator has a mutable state that is not reflected in the underlying object: >>> x = arange(5) >>> i = x.flat >>> list(i) [0, 1, 2, 3, 4] >>> list(i) [] >>> list(x.flat) [0, 1, 2, 3, 4] > >> My problem with __array_struct__ returning either a tuple or a CObject > >> is that array protocol sholuld really provide both. > > > This is a convincing argument. Yes, the array protocol should provide > both. Thus, we can't over-ride the usage of the same name unless that > name produces an object through which both interfaces can be obtained. > > Is that Sasha's suggestion? > It was, but I quckly retracted it in favor of a mechanism to unpack the CObject. FWIW, I am also now -0 on the name change from __array_struct__ to __array_interface__ if what it provides is just a struct wrapped in a CObject. |
From: Sasha <nd...@ma...> - 2006-06-09 18:56:14
|
On 6/9/06, Travis Oliphant <oli...@ee...> wrote: > ... In NumPy this is not quite the rule followed. > Bascially attributes are used when getting or setting intrinsinc > "properties" of the array. Attributes are used for properties that are > important in defining what an array *is*. The flags attribute, for > example, is an important intrinsinc property of the array but it returns > an flags object when it is accessed. The flat attribute also returns a > new object (it is arguable whether it should have been a method or an > attribute but it is enough of an intrinsic property --- setting the flat > attribute sets elements of the array -- that with historical precedence > it was left as an attribute). > > By this meausure, the array interface should be an attribute. > Array interface is not an intrinsic property of the array, but rather an alternative representation of the array itself. Flags are properly an attribute because they are settable. Something like >>> x.flags()['WRITEABLE'] = False although technically possible, would be quite ugly. Similarly, shape attribute, although fails my rule of thumb by creating a new object, >>> x.shape is x.shape False is justifiably an attribute because otherwise two methods: get_shape and set_shape would be required. I don't think "flat" should be an attribute, however. I could not find the reference, but I remember a discussion of why __iter__ should not be an attribute and IIRC the answer was because an iterator has a mutable state that is not reflected in the underlying object: >>> x = arange(5) >>> i = x.flat >>> list(i) [0, 1, 2, 3, 4] >>> list(i) [] >>> list(x.flat) [0, 1, 2, 3, 4] > >> My problem with __array_struct__ returning either a tuple or a CObject > >> is that array protocol sholuld really provide both. > > > This is a convincing argument. Yes, the array protocol should provide > both. Thus, we can't over-ride the usage of the same name unless that > name produces an object through which both interfaces can be obtained. > > Is that Sasha's suggestion? > It was, but I quckly retracted it in favor of a mechanism to unpack the CObject. FWIW, I am also now -0 on the name change from __array_struct__ to __array_interface__ if what it provides is just a struct wrapped in a CObject. |
From: Tim H. <tim...@co...> - 2006-06-09 19:55:00
|
Sasha wrote: >On 6/9/06, Travis Oliphant <oli...@ee...> wrote: > > >>... In NumPy this is not quite the rule followed. >>Bascially attributes are used when getting or setting intrinsinc >>"properties" of the array. Attributes are used for properties that are >>important in defining what an array *is*. The flags attribute, for >>example, is an important intrinsinc property of the array but it returns >>an flags object when it is accessed. The flat attribute also returns a >>new object (it is arguable whether it should have been a method or an >>attribute but it is enough of an intrinsic property --- setting the flat >>attribute sets elements of the array -- that with historical precedence >>it was left as an attribute). >> >>By this meausure, the array interface should be an attribute. >> >> >> > >Array interface is not an intrinsic property of the array, but rather >an alternative representation of the array itself. > > I was going to say that it may help to think of array_interface as returning a *view*, since that seems to be the semantics that could probably be implemented safely without too much trouble. However, it looks like that's not what happens. array_interface->shape and strides point to the raw shape and strides for the array. That looks like it's a problem. Isn't: >>> ai = a.__array_interface__ >>> a.shape = newshape going to result in ai having a stale pointers to shape and strides that no longer exist? Potentially resulting in a segfault? It seems the safe approach is to give array_interface it's own shape and strides data. An implementation shortcut could be to actually generate a new view in array_struct_get and then pass that to PyCObject_FromVoidPtrAndDesc. Thus the CObject would have the only handle to the new view and it couldn't be corrupted. [SNIP] -tim |
From: Travis O. <oli...@ee...> - 2006-06-09 20:06:58
|
Tim Hochberg wrote: >I was going to say that it may help to think of array_interface as >returning a *view*, since that seems to be the semantics that could >probably be implemented safely without too much trouble. However, it >looks like that's not what happens. array_interface->shape and strides >point to the raw shape and strides for the array. That looks like it's a >problem. Isn't: > > >>> ai = a.__array_interface__ > >>> a.shape = newshape > >going to result in ai having a stale pointers to shape and strides that >no longer exist? > This is an implementation detail. I'm still trying to gather some kind of consensus on what to actually do here. There is no such __array_interface__ attribute at this point. -Travis |
From: Tim H. <tim...@co...> - 2006-06-09 21:10:50
|
Travis Oliphant wrote: >Tim Hochberg wrote: > > > >>I was going to say that it may help to think of array_interface as >>returning a *view*, since that seems to be the semantics that could >>probably be implemented safely without too much trouble. However, it >>looks like that's not what happens. array_interface->shape and strides >>point to the raw shape and strides for the array. That looks like it's a >>problem. Isn't: >> >> >> >>>>>ai = a.__array_interface__ >>>>>a.shape = newshape >>>>> >>>>> >>going to result in ai having a stale pointers to shape and strides that >>no longer exist? >> >> >> >This is an implementation detail. I'm still trying to gather some kind >of consensus on what to actually do here. > There were three things mixed together in my post: 1. The current implementation of __array_struct__ looks buggy. Should I go ahead and file a bug report so that this behaviour doesn't get blindly copied over from __array_struct__ to whatever the final dohickey is called or is that going to be totally rewritten in any case. 2. Whether __array_struct__ or __array_interface__ or whatever it gets called returns something that's kind of like a view (has it's own copies of shape and strides mainly) versus an alias for the original array (somehow tries to track the original arrays shape and strides) is a semantic difference, not an implementation details. I suspect that no one really cares that much about this and we'll end up doing what's easiest to get right; I'm pretty certain that is view semantics. It may be helpful to pronounce on that now, since it's possible the semantics might influence the name chosen, but I don't think it's critical. 3. The implementation details I provided were, uh, implentation details. -tim > There is no such >__array_interface__ attribute at this point. > > >-Travis > > > >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > |
From: Tim H. <tim...@co...> - 2006-06-09 22:51:27
|
Which of the following should we require for an object to be "supporting the array interface"? Here a producer is something that supplies array_struct or array_interface (where the latter is the Python level version of the former as per recent messages). Consumers do something with the results. 1. Producers can supply either array_struct (if implemented in C) or array_interface (if implemented in Python). Consumers must accept both. 2. Producers must supply both array_struct and array_interface. Consumers may accept either. 3. Producers most supply both array_struct and array_interface. Consumers must accept both as well. A possibly related point, array_interface['data'] should be required to be a buffer object; a 2-tuple of address/read-only should not be allowed as that's a simple way to crash the interpreter. I see some reasonable arguments for either 1 or 2. 3 seems like excess work. -tim |
From: Andrew S. <str...@as...> - 2006-06-09 23:03:47
|
Tim Hochberg wrote: >Which of the following should we require for an object to be "supporting >the array interface"? Here a producer is something that supplies >array_struct or array_interface (where the latter is the Python level >version of the former as per recent messages). Consumers do something >with the results. > > 1. Producers can supply either array_struct (if implemented in C) or > array_interface (if implemented in Python). Consumers must accept > both. > 2. Producers must supply both array_struct and array_interface. > Consumers may accept either. > 3. Producers most supply both array_struct and array_interface. > Consumers must accept both as well. > > I haven't been following as closely as I could, but is the following a possibility? 4. Producers can supply either array_struct or array_interface. Consumers may accept either. The intermediate is a small, standalone (does not depend on NumPy) extension module that does automatic translation if necessary by provides 2 functions: as_array_struct() (which returns a CObject) and as_array_interface() (which returns a tuple/dict/whatever). |
From: David M. C. <co...@ph...> - 2006-06-09 23:31:00
|
On Fri, 09 Jun 2006 16:03:32 -0700 Andrew Straw <str...@as...> wrote: > Tim Hochberg wrote: > > >Which of the following should we require for an object to be "supporting > >the array interface"? Here a producer is something that supplies > >array_struct or array_interface (where the latter is the Python level > >version of the former as per recent messages). Consumers do something > >with the results. > > > > 1. Producers can supply either array_struct (if implemented in C) or > > array_interface (if implemented in Python). Consumers must accept > > both. > > 2. Producers must supply both array_struct and array_interface. > > Consumers may accept either. > > 3. Producers most supply both array_struct and array_interface. > > Consumers must accept both as well. > > > > > I haven't been following as closely as I could, but is the following a > possibility? > 4. Producers can supply either array_struct or array_interface. > Consumers may accept either. The intermediate is a small, standalone > (does not depend on NumPy) extension module that does automatic > translation if necessary by provides 2 functions: as_array_struct() > (which returns a CObject) and as_array_interface() (which returns a > tuple/dict/whatever). For something to go in the Python standard library this is certainly possible. Heck, if it's in the standard library we can have one attribute which is a special ArrayInterface object, which can be queried from both Python and C efficiently. For something like numpy (where we don't require a special object: the "producer" and "consumers" in Tim's terminology could be Numeric and numarray, for instance), we don't want a 3rd-party dependence. There's one case that I mentioned in another email: 5. Producers must supply array_interface, and may supply array_struct. Consumers can use either. Requiring array_struct means that Python-only modules can't play along, so I think it should be optional (of course, if you're concerned about speed, you would provide it). Or maybe we should revisit the "no external dependencies". Perhaps one module would make everything easier, with helper functions and consistent handling of special cases. Packages wouldn't need it if they don't interact: you could conditionally import it when __array_interface__ is requested, and fail if you don't have it. It would just be required if you want to do sharing. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |co...@ph... |
From: Tim H. <tim...@co...> - 2006-06-10 04:01:08
|
David M. Cooke wrote: >On Fri, 09 Jun 2006 16:03:32 -0700 >Andrew Straw <str...@as...> wrote: > > > >>Tim Hochberg wrote: >> >> >> >>>Which of the following should we require for an object to be "supporting >>>the array interface"? Here a producer is something that supplies >>>array_struct or array_interface (where the latter is the Python level >>>version of the former as per recent messages). Consumers do something >>>with the results. >>> >>> 1. Producers can supply either array_struct (if implemented in C) or >>> array_interface (if implemented in Python). Consumers must accept >>> both. >>> 2. Producers must supply both array_struct and array_interface. >>> Consumers may accept either. >>> 3. Producers most supply both array_struct and array_interface. >>> Consumers must accept both as well. >>> >>> >>> >>> >>I haven't been following as closely as I could, but is the following a >>possibility? >> 4. Producers can supply either array_struct or array_interface. >>Consumers may accept either. The intermediate is a small, standalone >>(does not depend on NumPy) extension module that does automatic >>translation if necessary by provides 2 functions: as_array_struct() >>(which returns a CObject) and as_array_interface() (which returns a >>tuple/dict/whatever). >> >> > >For something to go in the Python standard library this is certainly >possible. Heck, if it's in the standard library we can have one attribute >which is a special ArrayInterface object, which can be queried from both >Python and C efficiently. > >For something like numpy (where we don't require a special object: the >"producer" and "consumers" in Tim's terminology could be Numeric and >numarray, for instance), we don't want a 3rd-party dependence. There's one >case that I mentioned in another email: > >5. Producers must supply array_interface, and may supply array_struct. >Consumers can use either. > >Requiring array_struct means that Python-only modules can't play along, so I >think it should be optional (of course, if you're concerned about speed, you >would provide it). > >Or maybe we should revisit the "no external dependencies". Perhaps one module >would make everything easier, with helper functions and consistent handling >of special cases. Packages wouldn't need it if they don't interact: you could >conditionally import it when __array_interface__ is requested, and fail if >you don't have it. It would just be required if you want to do sharing. > > Here's another idea: move array_struct *into* array_interface. That is, array_interface becomes a dictionary with the following items: shape : sequence specifying the shape typestr : the typestring descr: you get the idea strides: ... shape: ... mask: ... offset: ... data: A buffer object struct: the array_struct or None. The downside is that you have to do two lookups to get the array_struct, and that should be the fast path. A partial solution is to instead have array_interface be a super_tuple similar to the result of os.stat. This should be faster since tuple is quite fast to index if you know what index you want. An advantage of having one module that you need to import is that we could use something other than CObject, which would allow us to bullet proof the array interface at the python level. One nit with using a CObject is that I can pass an object that doesn't refer to a PyArrayInterface with unpleasant results. -tim |
From: Andrew S. <str...@as...> - 2006-06-10 21:22:25
|
OK, here's another (semi-crazy) idea: __array_struct__ is the interface. ctypes lets us use it in "pure" Python. We provide a "reference implementation" so that newbies don't get segfaults. |
From: Andrew S. <str...@as...> - 2006-06-09 19:26:51
|
On the one hand, I feel we should keep __array_struct__ behaving exactly as it is now. There's already lots of code that uses it, and it's tremendously useful despite (because of?) it's simplicity. For these of use cases, the __array_descr__ information has already proven unnecessary. I must say that I, and probably others, thought that __array_struct__ would be future-proof. Although the magnitude of the proposed change to add this information to the C-struct PyArrayInterface is minor, it still breaks code in the wild. On the other hand, I'm only beginning to grasp the power of the __array_descr__ information. So perhaps bumping the PyArrayInterface.version to 3 (2 is the current, and as far as I can tell, original version) and going forward would be justified. Perhaps there's a way towards backwards-compatibility -- the various array consumers could presumably support _reading_ both v2 and version 3 nearly forever, but could spit out warnings when reading v2. It seems v3 would be a simple superset of v2, so implementation of this wouldn't be hard. The challenge will be when a implementor returns a v3 __array_struct__ to something that reads only v2. For this reason, maybe it's better to break backwards compatibility now before even more code is written to read v2. Is it clear what would need to be done to provide a C-struct giving the _array_descr_ information? What's the problem with keeping __array_descr__ access available only at the Python level? Your original email suggested limiting the number of attributes, which I agree with, but I don't think we need to go to the logical extreme. Does simply keeping __array_descr__ as part of the Python array interface avoid these issues? At what cost? Cheers! Andrew Travis Oliphant wrote: >Keep in mind there are two different (but related) issues at play here. > >1) What goes in to NumPy 1.0 >2) What we propose should go into Python 2.6 > > >I think for #1 we should compress the Python-side array protocol into a >single __array_interface__ attribute that returns a dictionary. We >should also expand the C-struct to contain what _array_descr_ currently >provides. > > >-Travis > > > >_______________________________________________ >Numpy-discussion mailing list >Num...@li... >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > |
From: David M. C. <co...@ph...> - 2006-06-09 21:04:16
|
On Fri, 09 Jun 2006 12:08:51 -0600 Travis Oliphant <oli...@ee...> wrote: > Tim Hochberg wrote: > > > Sasha wrote: > > > >> On 6/8/06, David M. Cooke <co...@ph...> wrote: > > >>> > >> > >> My problem with __array_struct__ returning either a tuple or a CObject > >> is that array protocol sholuld really provide both. > > > This is a convincing argument. Yes, the array protocol should provide > both. Thus, we can't over-ride the usage of the same name unless that > name produces an object through which both interfaces can be obtained. True, didn't think about that. +1. > >>> We still need __array_descr__, as the C struct doesn't provide all > >>> the info > >>> that this does. > >> > >> What do you have in mind? > >> > > Is there any prospect of merging this data into the C struct? It would > > be cleaner if all of the information could be embedded into the C > > struct, but I can see how that might be a backward compatibility > > nightmare. > > I do think it should be merged into the C struct. The simplest thing > to do is to have an additional PyObject * as part of the C struct which > could be NULL (or unassigned). The backward compatibility is a concern > but when thinking about what Python 2.6 should support we should not be > too crippled by it. > > Perhaps we should just keep __array_struct__ and compress all the other > array_interface methods into the __array_interface__ attribute which > returns a dictionary from which the Python-side interface can be produced. +1. I'm ok with two attributes: __array_struct__ (for C), and __array_interface__ (as a dict for Python). For __array_descr__, I would require everything that provides an __array_struct__ must also provide an __array_interface__, then __array_descr__ can become a 'descr' key in __array_interface__. Requiring that would also mean that any array-like object can be introspected from Python or C. I think that the array_descr is complicated enough that keeping it as a Python object is ok: you don't have to reinvent routines to make tuple-like objects, and handle memory for strings, etc. If you're using the array interface, you've got Python available: use it. If you *do* want a C-level version, I'd make it simple, and concatenate the typestr descriptions of each field together, like '>i2>f8', and forget the names (you can grab them out of __array_interface__['descr'] if you need them). That's simple enough to be parseable with sscanf. > Keep in mind there are two different (but related) issues at play here. > > 1) What goes in to NumPy 1.0 > 2) What we propose should go into Python 2.6 > > > I think for #1 we should compress the Python-side array protocol into a > single __array_interface__ attribute that returns a dictionary. We > should also expand the C-struct to contain what _array_descr_ currently > provides. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |co...@ph... |
From: Andrew S. <str...@as...> - 2006-06-09 20:52:31
|
Travis Oliphant wrote: > Andrew Straw wrote: > >> On the one hand, I feel we should keep __array_struct__ behaving >> exactly as it is now. There's already lots of code that uses it, and >> it's tremendously useful despite (because of?) it's simplicity. For >> these of use cases, the __array_descr__ information has already >> proven unnecessary. I must say that I, and probably others, thought >> that __array_struct__ would be future-proof. Although the magnitude >> of the proposed change to add this information to the C-struct >> PyArrayInterface is minor, it still breaks code in the wild. >> > I don't see how it breaks any code in the wild to add an additional > member to the C-struct. We could easily handle it in new code with a > flag setting (like Python uses). The only possible problem is > looking for it when it is not there. Ahh, thanks for clarifying. Let me paraphrase to make sure I got it right: given a C-struct "inter" of type PyArrayInterface, if and only if ((inter.flags & HAS_ARRAY_DESCR) == HAS_ARRAY_DESCR) inter could safely be cast as PyArrayInterfaceWithArrayDescr and thus expose a new member. This does seem to avoid all the issues and maintain backwards compatibility. I guess the only potential complaint is that it's a little C trick which might be unpalatable to the core Python devs, but it doesn't seem egregious to me. If I do understand this issue, I'm +1 for the above scheme provided the core Python devs don't mind. Cheers! Andrew |
From: Tim H. <tim...@co...> - 2006-06-08 21:59:53
|
Sasha wrote: >On 6/8/06, Travis Oliphant <oli...@ee...> wrote: > > >>... >>__array_struct__ (perhaps we could call this __array_interface__ but >>I'm happy keeping the name the same too.) >> >> > >+0 on the name change and consider making it a method rather than an attribute. > > I'm not thrilled with either name, nor do I have a better one, so put me down as undecided on name. I marginally prefer an attribute to a name here. I'm +1 on narrowing the interface though. >>If __array_struct__ is a CObject then it behaves as it does now. >> >>If __array_struct__ is a tuple then each entry in the tuple is one of >>the items currently obtained by an additional attribute access (except >>the first item is always an integer indicating the version of the >>protocol --- unused entries are None). >> >> >> > >-1 > >This will complicate the use of array interface. > I concur. >I would propose >creating a subtype of CObject that has the necessary attributes so >that one can do a.__array_interface__.shape, for example. I did not >check if CObject is subclassable in 2.5, but if not, we can propose to >make it subclassable for 2.6. > > Alternatively, if this proves to be a hassle, a function, unpack_interface or some such, could be provided that takes an __array_interface__ object and spits out the appropriate tuple or, perhaps better, and object with the appropriate field. > > >>... >> >>I would like to eliminate all the other array protocol attributes before >>NumPy 1.0 (and re-label those such as __array_data__ that are useful in >>other contexts --- like ctypes). >> >> >+1 > > +1. -tim |
From: Albert S. <fu...@gm...> - 2006-06-09 09:54:29
|
Hello all > -----Original Message----- > From: num...@li... [mailto:numpy- > dis...@li...] On Behalf Of Travis Oliphant > Sent: 08 June 2006 22:27 > To: numpy-discussion > Subject: [Numpy-discussion] Array Protocol change for Python 2.6 > > ... > > I would like to eliminate all the other array protocol attributes before > NumPy 1.0 (and re-label those such as __array_data__ that are useful in > other contexts --- like ctypes). Just out of curiosity: In [1]: x = N.array([]) In [2]: x.__array_data__ Out[2]: ('0x01C23EE0', False) Is there a reason why the __array_data__ tuple stores the address as a hex string? I would guess that this representation of the address isn't the most useful one for most applications. Regards, Albert |
From: Francesc A. <fa...@ca...> - 2006-06-09 10:06:53
|
A Divendres 09 Juny 2006 11:54, Albert Strasheim va escriure: > Just out of curiosity: > > In [1]: x =3D N.array([]) > > In [2]: x.__array_data__ > Out[2]: ('0x01C23EE0', False) > > Is there a reason why the __array_data__ tuple stores the address as a hex > string? I would guess that this representation of the address isn't the > most useful one for most applications. Good point. I hit this before and forgot to send a message about this. I ag= ree=20 that a integer would be better. Although, now that I think about this, I=20 suppose that the issue should be the difference of representation of longs = in=20 32-bit and 64-bit platforms, isn't it? Cheers, =2D-=20 >0,0< Francesc Altet =A0 =A0 http://www.carabos.com/ V V C=E1rabos Coop. V. =A0=A0Enjoy Data "-" |