From: Thomas C. <tca...@gm...> - 2015-07-25 23:03:25
|
Hey all, Everyone should be aware of https://github.com/matplotlib/matplotlib/pull/4787 which is both a very simple, but very important change to the mpl API by providing a minimal API to pass labeled data (that is anything that `foo[key]` return an array-like object) into mpl plotting functions. This is due to Fernando and Brian's persuasive case to the importance of starting to address labeled data in mpl and it is now or in 6-9 months The general approach follows R / seaborn / panadas and allows users to pass in a `data` kwarg which if present, any data fields which are strings are replaced by a call to `data[key]`. In code ax.plot(labeled_data['a'], labeled_data['b']) and ax.plot('a', 'b', data=labeled_data) are equivalent. This is the minimal change to get quality of life for users who work with labeled data at the repl and to put a flag in the sand for the API that down stream projects should be targeting. Major changes to what the plotting functions do (inferring labels, inferring what computation to do etc) are out of scope for _this_ PR which I want to see included in 1.5. What a higher-level API which can make use of the additional meta-data available looks like is a much larger discussion which will must have input from all of the stake holders (ex IPython, pandas, bokeh, seaborn, xray). Tom |
From: Benjamin R. <ben...@ou...> - 2015-07-26 03:42:13
|
A couple immediate thoughts: what if the data is spread across a mix of objects? Also, I think "labeled" might be a better kwarg name. Less likely to conflict with apis. I'll give this a careful look-see tomorrow. Ben Root On Jul 25, 2015 7:03 PM, "Thomas Caswell" <tca...@gm...> wrote: > Hey all, > > Everyone should be aware of > https://github.com/matplotlib/matplotlib/pull/4787 which is both a very > simple, but very important change to the mpl API by providing a minimal API > to pass labeled data (that is anything that `foo[key]` return an array-like > object) into mpl plotting functions. > > This is due to Fernando and Brian's persuasive case to the importance of > starting to address labeled data in mpl and it is now or in 6-9 months > > The general approach follows R / seaborn / panadas and allows users to > pass in a `data` kwarg which if present, any data fields which are strings > are replaced by a call to `data[key]`. In code > > ax.plot(labeled_data['a'], labeled_data['b']) > > and > > ax.plot('a', 'b', data=labeled_data) > > are equivalent. > > This is the minimal change to get quality of life for users who work with > labeled data at the repl and to put a flag in the sand for the API that > down stream projects should be targeting. > > Major changes to what the plotting functions do (inferring labels, > inferring what computation to do etc) are out of scope for _this_ PR which > I want to see included in 1.5. What a higher-level API which can make use > of the additional meta-data available looks like is a much larger > discussion which will must have input from all of the stake holders (ex > IPython, pandas, bokeh, seaborn, xray). > > Tom > > > ------------------------------------------------------------------------------ > > _______________________________________________ > Matplotlib-devel mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-devel > > |
From: Jouni K. S. <jk...@ik...> - 2015-07-29 15:00:06
Attachments:
expression_of_labels.png
|
Thomas Caswell <tca...@gm...> writes: > The general approach follows R / seaborn / panadas and allows users to pass > in a `data` kwarg which if present, any data fields which are strings are > replaced by a call to `data[key]`. In code > > ax.plot(labeled_data['a'], labeled_data['b']) > > and > > ax.plot('a', 'b', data=labeled_data) > > are equivalent. I commented on github briefly, but here's an expanded argument. I'm proposing that instead of using strings (or only strings) as labels, we allow arbitrary (hashable) objects to be looked up from the data dict. I think using strings, or at least restricting to strings only is a mistake for two reasons. One reason has been touched upon: in ax.scatter('a', 'b', c='b', data=data) should c='b' be interpreted as a constant blue color or a sequence to be looked up from data['b']? Another is that since this functionality seems to be modeled after R's plot functions, people will want to do more than just lookups. A simple labeled plot in R is plot(speed ~ dist, data=cars) but you can also do expressions, e.g. plot(speed^2 ~ dist, data=cars) if you want to plot the square of speed against dist. This is pretty neat for trying to find transformations for variables that depend on each other non-linearly. If we only allow strings as placeholders for plottable variables, implementing expressions gets pretty clunky. We'd basically end up defining a mini-language for parsing expressions from strings. But if we allow objects for which you can implement methods like __add__, it's much nicer. There's sample code below. I'm proposing a small change to the patch. This still allows using strings but also user-defined objects: https://github.com/jkseppan/matplotlib/commit/b4709b38426ad5c2905f3ce253ce1bb68d314e7e Here's a demo of implementing expressions on top of that patch: https://github.com/jkseppan/matplotlib/blob/label-with-nonstrings/lib/matplotlib/tests/test_labeled.py Here's how the test case looks, and the (albeit incomplete) expression classes and evaluator to support this are about 50 lines of pretty simple code. def test_expression_of_labels(): fig, axes = plt.subplots(2, 2) x, y, z = Expr.vars('x y z') data = {'x': np.arange(10), 'y': np.array([3, 1, 4, 1, 5, 9, 2, 6, 5, 3]), 'z': np.array([2, 7, 1, 8, 2, 8, 1, 8, 2, 8])} ev = Evaluator(data) axes[0, 0].plot(x, y, data=ev) axes[0, 1].plot(x, 2 * y + 1, data=ev) axes[1, 0].plot(x, y ** 2, data=ev) axes[1, 1].plot(x, 2 * y ** z, data=ev) The output: https://raw.githubusercontent.com/jkseppan/matplotlib/label-with-nonstrings/lib/matplotlib/tests/baseline_images/test_labeled/expression_of_labels.png |