|
From: Travis O. <oli...@ee...> - 2005-12-02 01:11:30
|
Christopher Hanley wrote: >Hi Travis, > >About a year ago (summer 2004) on the numpy distribution list there was >a lot of discussion of the records interface. I will dig through my >notes and put together a summary. > > Thanks for the pointers. I had forgotten about that discussion. I went back and re-read the thread. Here's a good link for others to re-read (the end of) this thread: http://news.gmane.org/find-root.php?message_id=%3cBD22BAC0.E9EB%25perry%40stsci.edu%3e I think some very good points were made. These points should be addressed from the context of scipy arrays which now support records in a very basic way. Because of this, we can support nested records of records --- but how is this to be presented to the user is still an open question (i.e. how do you build one...) I've finally been converted to believe that the notion of records is very important because it speaks of how to do the basic (typeless, mathless) array object that will go into Python correctly If we can get the general records type done right, then all the other types are examples of it. Thus, I would like to revive discussion of the record object for inclusion in scipy core. I pretty much agree with the semantics that Perry described in his final email (is this all implemented in numarray, yet?), except I would agree with Francesc Alted that a titles or labels concept should be allowed. I'm more enthusiastic about code than discussion, so I'm hoping for a short-lived discussion followed by actual code. I'm ready to do the implementation this week (I've already borrowed lots of great code from numarray which makes it easier), but feel free to chime in even if you read this later. In my mind, the discussion about the records array is primarily a discussion about the records data-type. The way I'm thinking, the scipy ndarray is a homogeneous collection of the same "thing." The big change in scipy core is that Numeric used to allow only certain data types, but now the ndarray can contain an arbitrary "void" data type. You can also add data-types to scipy core. These data-types are "almost" full members of the scipy data-type community. The "almost" is because the N*N casting matrix is not updated (this would require a re-design of how casting is considered). At some point, I'd like to fix this wart and make it so that data-types can be added at will -- I think if we get the record type right, I'll be able to figure out how to do this. We need to add a "record" data-type to scipy. Then, any array can be of "record" type, and there will be an additional "array scalar" that is what is returned when selecting a single element from the array. So, a record array would simply be an array of "records" plus some extra stuff for dealing with the mapping from field names to actual segments of the array element (we may decide that this mapping is general enough that all scipy arrays should have the capability of assigning names to sub-bytes of its main data-type and means of accessing those sub-bytes in which case the subclass is unnecessary). Let me explain further: Right now, the machinery is in place in scipy_core to get and set in any ndarray (regardless of its data-type) an arbitrary "field". A "field" in this context is defined as a sub-section of the basic element making up the array. Generically the sub-section is defined by an offset and a data-type or a tuple of a data type and a shape (to allow sub-arrays in a record). What I understand the user to want is the binding of a name to this generic sub-section descriptor. 1) Should we allow that for every scipy ndarray: complex data types have an obvious binding, would anybody want to name the first two bytes of their int32 array? I suggest holding off on this one until a records array is working.... 2) Supposing we don't go with number 1, we need to design a record data type that has this name-binding capability. The recarray class in scipy core SVN essentially just does this. Question: How important is backwards compatibility with old numarray specification. In particular, I would go with the .fields access described by Perry, and eliminate the .field() approach? Thanks for reading and any comments you can make. -Travis |