From: Michael D. <md...@st...> - 2007-09-12 14:04:22
|
[Background: I'm working on refactoring the transforms framework with the end goal of making it easier to add new kinds of non-linear transforms and projections to matplotlib. I've been talking a bit with John Hunter about this -- this question is mainly for John and Ken McIvor, though there are probably some other interested parties on this list as well.] I've studied John's mpl1.py and Ken's mpl1_displaypdf.py to try to get a sense of where things could go. I appreciate the ideas both of these present as clean slates -- however, I think what I'm running into is "how to get there from here" in manageable steps. My first baby step in this large task has been to try to remove transforms.py/cpp and replace it with something based on standard 3x3 affine matrices, using Python/numpy only. The way transforms.py/.cpp works now, everything is built around live updates of a tree of interdependent bounding boxes and transforms, where a change to a single scalar in any object automatically propagates through the tree. My first thought was to make something out of immutable transforms where a transform would have to be calculated from its dependencies immediately before drawing, and therefore get rid of these "magical" side-effects by not allowing transforms to change in place. Reading between the lines, this seems to be what mpl1_displaypdf.py suggests. I quickly came to the conclusion that that is perhaps a step too far -- matplotlib is very much built around these side-effects and I would hate to replace hundreds of lines of well-tested code. On the other hand, there is probably a pattern to those changes, and it may be worth the effort if others agree it's useful. My second kick at the can was to build a live-updating tree of transforms. This is similar to what I saw in mpl1.py using "changed" callbacks so that a change in one transform would affect all transforms that depend on it. [I worry about a pure callback approach because of the likelihood of computing many partial values. For example, if 'a' depends on 'b' and 'c', and I change 'b' then 'c', 'a' will get recomputed twice. Instead, I used an "invalidation" technique, where a change in b simply invalidates a, and a doesn't get recomputed until it is later requested. This is something we used a lot when I programmed for gaming hardware. The resulting semantics are very similar to using callbacks, however.] This approach got closer, until I hit the wall that dependencies work at an even lower level -- single lazy values get borrowed from one bounding box and referenced in another (e.g. Axes.autoscale_view()) Certainly, this could be implemented in my new affine-based framework, but then we're almost back to square one and have basically re-implemented transforms.py/.cpp into something that is probably slower -- though perhaps more flexible in that more kinds of transforms could be added using only Python. Of course, autoscale_view() (and other instances of this) could be rewritten to work differently, but it's hard to know where that might end. (You can see my semi-working sketch of this here: http://matplotlib.svn.sourceforge.net/viewvc/matplotlib/branches/transforms/lib/matplotlib/affine.py?revision=3835&view=markup If you check out r3835 from my branch, simple_plot.py is working, with the exception of things that rely on this really low-level interdependence, e.g. the data limits.) So, I feel like I'm going in a bit of a circle here, and I might need a reality check. I thought I'd better check in and see where you guys (who've thought about this a lot longer than I have) see this going. A statement of objectives of this part of the task would be helpful. (e.g. what's the biggest problem with how transforms work now, and what model would be a better fit). John, I know you've mentioned some to me before, e.g. the LazyValue concept is quirky and relies on C and the PDF stateful transforms model is close, but not quite what we need, etc. I feel I have a better sense of the overall code structure now, but you guys may have a better "gut" sense of what will fit best. My next planned step, to move more (affine) transformations to the backends to allow the same path data to be transformed in the backend without retransmitting/converting the path data each time, doesn't actually seem to be dependent on getting the above done. (The existing transforms.cpp code already has a way to get a representation as an affine matrix). John, I have a note from our phone conversation that indicates you thought these two things would be dependent, but I don't remember the reason you gave -- could you maybe refresh my memory -- I was much less aware of the code structure then. Sorry if this is sort of an open-ended question... Any pointers or impressions, no matter how small, are appreciated. Cheers, Mike -- Michael Droettboom Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA |
From: John H. <jd...@gm...> - 2007-09-12 14:41:13
|
On 9/12/07, Michael Droettboom <md...@st...> wrote: > If you check out r3835 from my branch, simple_plot.py is working, with > the exception of things that rely on this really low-level > interdependence, e.g. the data limits.) I am at 3836 in the transforms branch, but I do not see "pbox". Perhaps you forgot to svn add it? JDH |
From: Michael D. <md...@st...> - 2007-09-12 14:46:52
|
Yes. Sorry. It's in r3837 on the branch. Cheers, Mike John Hunter wrote: > On 9/12/07, Michael Droettboom <md...@st...> wrote: > >> If you check out r3835 from my branch, simple_plot.py is working, with >> the exception of things that rely on this really low-level >> interdependence, e.g. the data limits.) > > I am at 3836 in the transforms branch, but I do not see "pbox". > Perhaps you forgot to svn add it? > > JDH -- Michael Droettboom Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA |
From: Michael D. <md...@st...> - 2007-09-12 14:47:19
|
I should also add -- it's only working with the Agg backend. John Hunter wrote: > On 9/12/07, Michael Droettboom <md...@st...> wrote: > >> If you check out r3835 from my branch, simple_plot.py is working, with >> the exception of things that rely on this really low-level >> interdependence, e.g. the data limits.) > > I am at 3836 in the transforms branch, but I do not see "pbox". > Perhaps you forgot to svn add it? > > JDH -- Michael Droettboom Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA |
From: John H. <jd...@gm...> - 2007-09-12 15:41:47
|
On 9/12/07, Michael Droettboom <md...@st...> wrote: > This approach got closer, until I hit the wall that dependencies work at > an even lower level -- single lazy values get borrowed from one bounding > box and referenced in another (e.g. Axes.autoscale_view()) Certainly, > this could be implemented in my new affine-based framework, but then > we're almost back to square one and have basically re-implemented > transforms.py/.cpp into something that is probably slower -- though > perhaps more flexible in that more kinds of transforms could be added > using only Python. Of course, autoscale_view() (and other instances of > this) could be rewritten to work differently, but it's hard to know > where that might end. The locators do have a reference to the datalim and viewlim intervals, which is is what they use to compute their autoscale limits and tick locations, but they return scalars, and autoscale_view simply sets the new limits with these scalars. So the fact that there is a reference here is easy to work around. I made a minor change in your code (ticker.py and axis.py) to illustrate. Instead of relying on the Interval to pass information from the Axis -> Locator/Formatter, I simply set the axis instance instead. Then, eg, the Locator can do vmin, vmax = self.axis.get_view_interval() dmin, dmax = self.axis.get_data_interval() so there are no confusing intertwined references to deal with, and the axis can be responsible for knowings its data and view limits, which seem reasonable. I made these changes just to the MaxNLocator and ScalarFormatter classes for proof of concept, but it should be trivial to port to the others. I think in general communicating by scalar values passed explicitly or through callbacks will make for clearer code than the deeply nested references we have been using the in the existingcode. There are places where one bounding box value is shared with another (most clearly in sharex and sharey support, eg left = self._sharex.viewLim.xmin() The ability to "share" and axis, eg so changes in pan and zoom on one are reflected in another, is extremely useful, but a better approach may be to use callbacks (or something like them) rather than shared, composited transforms which are updated in place. I need to spend more time reading through your code before I comment further, but I just wanted to make a quick comment vis-a-vis the locators and formatters. I commited these changes to your branch, and autoscaling is now working there :-) I'll keep poking and learning more about what you are doing before commenting on some of your bigger questions. I made a couple of comments in affine.py as well, prefixed by 'JDH' JDH |
From: Michael D. <md...@st...> - 2007-09-12 15:50:24
|
John Hunter wrote: > On 9/12/07, Michael Droettboom <md...@st...> wrote: > I need to spend more time reading through your code before I comment > further, but I just wanted to make a quick comment vis-a-vis the > locators and formatters. I commited these changes to your branch, and > autoscaling is now working there :-) I'll keep poking and learning > more about what you are doing before commenting on some of your bigger > questions. > > I made a couple of comments in affine.py as well, prefixed by 'JDH' Thanks for taking the time. Very helpful (and please excuse the mess in the code -- I was just trying to get something end-to-end working before refining/optimizing/documenting etc...) Cheers, Mike -- Michael Droettboom Operations and Engineering Division Space Telescope Science Institute Operated by AURA for NASA |
From: John H. <jd...@gm...> - 2007-09-12 16:06:34
|
On 9/12/07, Michael Droettboom <md...@st...> wrote: > Thanks for taking the time. Very helpful (and please excuse the mess in > the code -- I was just trying to get something end-to-end working before > refining/optimizing/documenting etc...) I think this is definitely the right approach -- get something that works in the existing framework and understand where the various issues are and then try and peel away the stuff that is not ideal. I looked at the tick labels -- if you just comment out the transformation offset trans = trans + Affine2D().translate(0, -1 * self._padPixels) the tick labels show up too (minus the pad of course), so my guess is some reference is being lost in the addition.... |
From: John H. <jd...@gm...> - 2007-09-12 18:12:07
|
On 9/12/07, Michael Droettboom <md...@st...> wrote: > So, I feel like I'm going in a bit of a circle here, and I might need a > reality check. I thought I'd better check in and see where you guys > (who've thought about this a lot longer than I have) see this going. A > statement of objectives of this part of the task would be helpful. > (e.g. what's the biggest problem with how transforms work now, and what > model would be a better fit). John, I know you've mentioned some to me > before, e.g. the LazyValue concept is quirky and relies on C and the PDF > stateful transforms model is close, but not quite what we need, etc. I > feel I have a better sense of the overall code structure now, but you > guys may have a better "gut" sense of what will fit best. Here is a brief summary of what I see some of the problems to be with the existing approach to transformations, and what I would like to see improved in a refactoring. The three major objectives are clarity, extensibility and efficiency. Clarity: The existing transformation framework, written in C++ and making extensive use of deferred evaluation of binary operation trees and values by reference, is difficult for most developers to understand (and hence enhance). Additionally, since all the heavy lifting is done in C++, python developers who are not versed in C++ have an additional barrier to making contributions. Extensibilty: We would like to make it fairly easy for users to add additional non-linear transformations. The current framework requires adding a new function at the C++ layer, and hacking into axes.py to support additional functions. We would like the existing nonlinear transformations (log and polar) to be part of a general infrastructure where users could supply their own nonlinear functions which map (possibly nonseparable) (xhat, yhat) -> separable (x, y). There are two parts to this: one pretty easy and one pretty hard. The easy part is supporting a transformation which has a separation callable that takes, eg an Nx2 array and returns and Nx2 array. For log, this will simply be log(XY), for polar, it will be r*cos(X[:,0]), r*sin(X[:,1]). Presumably we will want to take advantage of masked arrays to support invalid transformations, eg log of nonpositive data. The harder part is to support axis, tick and label layout generically. Currently we do this by special casing log and polar, either with special tick locators and formatters (log) or special derived Axes (polar). Efficiency: There are three parts to the efficiency question: the efficiency of the transformation itself, the efficiency with which transformation data structures are updated in the presence of viewlim changes (panning and zooming, window resizing) and the efficiency in getting transformed data to the backends. My guess is that the new design may be slower or not dramatically faster for the first two (which are not the bottleneck in most cases anyhow) but you might get sigificant savings on the 3rd. What we would like to support is something like an operation which pushes the partially transformed data to the backend, and the backend then stores this data in a path or other data structure, and when the viewlimits are changed or the window is resized, the backend merely needs to get an updated affine to redraw the data. I say "partially transformed" because in the case of nonlinear or separable transformations, we will probably want to do the nonlinear/separation part first, and then push this to the backend which can build a path (eg an agg::path in agg) and on pan and zoom we would only need to let the backend know what the current affine is. There is more than one way to solve this problem: in mpl1 I used a path dictionary keyed off of a path id which was updated on renderer changes. Then the front end (eg Line2D) could do something like def on_renderer_change(self, renderer): # on renderer change; path data already has the # separable/nonlinear part handled self.pathid = renderer.push_path(pathdata) Additionally, you would need to track when either the data or nonlinear mapping function are changed in order to remove the old path and push out a new one. Then at draw time, you would not need to push any data to the backend def on_draw(self, renderer): # on draw the line just needs to inform the backend to draw the # cached path with the current separable transformation renderer.draw_path(self.pathid, gc, septrans) Ken I believe used some meta class magic to solve this problem. It may be that the ideal approach is different from either of these so feel free to experiment and I'll give it some more thought too. Ken will hopefully pipe in with his perspective too. |
From: Gael V. <gae...@no...> - 2007-09-12 22:32:59
|
On Wed, Sep 12, 2007 at 01:11:54PM -0500, John Hunter wrote: > Then the front end (eg Line2D) could do something like > def on_renderer_change(self, renderer): > # on renderer change; path data already has the > # separable/nonlinear part handled > self.pathid = renderer.push_path(pathdata) > Additionally, you would need to track when either the data or > nonlinear mapping function are changed in order to remove the old > path and push out a new one. Then at draw time, you would not need > to push any data to the backend > def on_draw(self, renderer): > # on draw the line just needs to inform the backend to draw the > # cached path with the current separable transformation > renderer.draw_path(self.pathid, gc, septrans) I am a bit tired, and I haven't been following the discussion too closely, but I have the feeling this is the kind of pattern that Traits makes both obvious and optimizes a lot. If you give me a concrete minimal example I would be able to say more, and maybe try to see how it flows in Traits. I guess I am just trying to sell Traits, but it seems to me it solves the problem very well, and has been well optimized (quicker than vanilla Python for these kind of things). If you are interested checking this line out, posting a minimal example of what you are trying to achieve on the enthought-dev mailing-list would get you higher quality answers. Gaël |
From: Paul K. <pki...@ni...> - 2007-09-12 20:27:22
|
On Wed, Sep 12, 2007 at 01:11:54PM -0500, John Hunter wrote: > On 9/12/07, Michael Droettboom <md...@st...> wrote: > > > So, I feel like I'm going in a bit of a circle here, and I might need a > > reality check. I thought I'd better check in and see where you guys > > (who've thought about this a lot longer than I have) see this going. A > > statement of objectives of this part of the task would be helpful. > > (e.g. what's the biggest problem with how transforms work now, and what > > model would be a better fit). John, I know you've mentioned some to me > > before, e.g. the LazyValue concept is quirky and relies on C and the PDF > > stateful transforms model is close, but not quite what we need, etc. I > > feel I have a better sense of the overall code structure now, but you > > guys may have a better "gut" sense of what will fit best. > > Here is a brief summary of what I see some of the problems to be with > the existing approach to transformations, and what I would like to see > improved in a refactoring. The three major objectives are clarity, > extensibility and efficiency. > > Clarity: > > The existing transformation framework, written in C++ and > making extensive use of deferred evaluation of binary operation > trees and values by reference, is difficult for most developers to > understand (and hence enhance). Additionally, since all the heavy > lifting is done in C++, python developers who are not versed in C++ > have an additional barrier to making contributions. Indeed! > Extensibilty: > > We would like to make it fairly easy for users to add additional > non-linear transformations. The current framework requires adding a > new function at the C++ layer, and hacking into axes.py to support > additional functions. We would like the existing nonlinear > transformations (log and polar) to be part of a general > infrastructure where users could supply their own nonlinear > functions which map (possibly nonseparable) (xhat, yhat) -> > separable (x, y). There are two parts to this: one pretty easy and > one pretty hard. > > The easy part is supporting a transformation which has a separation > callable that takes, eg an Nx2 array and returns and Nx2 array. For > log, this will simply be log(XY), for polar, it will be > r*cos(X[:,0]), r*sin(X[:,1]). Presumably we will want to take > advantage of masked arrays to support invalid transformations, eg > log of nonpositive data. > > The harder part is to support axis, tick and label layout > generically. Currently we do this by special casing log and polar, > either with special tick locators and formatters (log) or special > derived Axes (polar). Another hard part is grids. More generally, a straight line in x,y becomes curved in x',y'. Ideally, a sequence of points plotted on a straight line should lie directly on the transformed line. This would make the caps on the polar_bar demo follow the arcs of the grid. The extreme case is map projections, where for some projections, a straight line will not even be connected. Another issue is zooming and panning. For amusement, try it with polar_demo. > Efficiency: > > There are three parts to the efficiency question: the efficiency of > the transformation itself, the efficiency with which transformation > data structures are updated in the presence of viewlim changes > (panning and zooming, window resizing) and the efficiency in getting > transformed data to the backends. My guess is that the new design > may be slower or not dramatically faster for the first two (which > are not the bottleneck in most cases anyhow) but you might get > sigificant savings on the 3rd. Changing the internal representation of things like collections so that the transform can be done using numpy vectors will help a lot. - Paul |
From: Peter W. <pw...@en...> - 2007-09-13 16:24:18
|
On Sep 12, 2007, at 3:27 PM, Paul Kienzle wrote: >> Extensibilty: >> >> We would like to make it fairly easy for users to add additional >> non-linear transformations. The current framework requires >> adding a >> new function at the C++ layer, and hacking into axes.py to support >> additional functions. We would like the existing nonlinear >> transformations (log and polar) to be part of a general >> infrastructure where users could supply their own nonlinear >> functions which map (possibly nonseparable) (xhat, yhat) -> >> separable (x, y). There are two parts to this: one pretty easy and >> one pretty hard. >> >> The easy part is supporting a transformation which has a separation >> callable that takes, eg an Nx2 array and returns and Nx2 array. >> For >> log, this will simply be log(XY), for polar, it will be >> r*cos(X[:,0]), r*sin(X[:,1]). Presumably we will want to take >> advantage of masked arrays to support invalid transformations, eg >> log of nonpositive data. >> >> The harder part is to support axis, tick and label layout >> generically. Currently we do this by special casing log and polar, >> either with special tick locators and formatters (log) or special >> derived Axes (polar). > > Another hard part is grids. More generally, a straight line in > x,y becomes curved in x',y'. Ideally, a sequence of points plotted > on a straight line should lie directly on the transformed line. This > would make the caps on the polar_bar demo follow the arcs of the grid. > > The extreme case is map projections, where for some projections, a > straight line will not even be connected. Just wanted to chime in because I've done some thinking on this problem for Chaco. Right now chaco's coordinate transformation process ("mapping") is handled by explicit objects that subclass from 1D and 2D mapper base classes. We're talking about moving to a scheme where the DisplayPDF GraphicsContext is extended into a MathematicalCanvas that is both aware of the transformation stack and is also aware of "screen" properties such as subpixel alignment and such. You would then be able to hand off dataspace coordinates to methods like move_to(), line_to(), rect(), etc., so you could move_to () a dataspace coordinate and then draw a screen-aligned box. The MathCanvas would also have additional methods like geodesic_to() for rendering manifold-aware grids and axes. (Of course, grids aren't necessary geodesics all the time.) I don't know if discontinuous map projections could be handled cleanly in such a framework, without the renderer querying the canvas about screenspace limits of the current transformation. > Another issue is zooming and panning. For amusement, try it with > polar_demo. Yes, one of the problems with non-linear transformations is that panning is very much a screen space interaction, and you have to map it back into data space to do proper data clipping and transformation. Unfortunately (and this is a problem even with logarithmic plots), the user may sometimes want to view things on the screen that are outside the valid domain of the coordinate transform, in which case the code handling the interaction (the "tool", in chaco parlance) has to be smart enough to maintain screen-space coordinates only. -Peter |