From: David C. <da...@ar...> - 2006-12-12 12:41:13
|
Hi, I am a regular user of matplotlib since I moved from matlab to python/numpy/scipy. Even if I find matplotlib to be a real help during the transition from matlab to python, I must confess I found it the most disappointing compare other packages ( essentially numpy/scipy/ipython). This is not a rant; I want to know if this slowness is coming from my lack of matplotlib knowledge or not; I apologize in advance if the following hurts anyone feelings :) First, I must admit that whereas I took a significant amount of time to study numpy and scipy, I didn't take that same time for matplotlib. So this disappointment may just be a consequences of this laziness. My main problem with matplotlib is speed: I find it really annoying to use in an interactive manner. For example, when I need to display some 2d information, such as spectrogramm or correlogram, this take 1 or 2 seconds for a small signal (~4500 frames of 256 samples). My function correlogram (similar to specgram, but compute correlation instead of log spectrum) uses imshow, and this function takes 20 times more time than imagesc of matlab for the same size. Also, I found changing the size of the matplotlib window really 'annoying to the eye': I compared to matlab, and this may be due to the fact that the whole window is redrawn with matplotlib, including the toolbar, whereas in matlab, the top toolbar is not redrawn. Finally, plotting many data (using plot(X, Y) with X and Y around 1000/10000 samples) is 'slow' (the '' are because I don't know much about computer graphics, and I understand that slow in the rendering is often just a perception) So, is this a current limitation of matplotlib, is matplotlib optimized for good rendering for publication, and not for interactive use, or I am just misguided in my use of matplotlib ? Config info: - ubuntu edgy on a bi xeon 3.2 Ghz with 2 Gb of Ram - numpy SVN (post 1.0) - matplotlib 0.87.7 - matplotlibrc: uses numpy for numeric, Gtk as a backend (or GtkAdd for anti aliasing, but this makes the problem worse). Cheers, David |
From: John H. <jdh...@ac...> - 2006-12-12 17:05:12
|
>>>>> "David" == David Cournapeau <da...@ar...> writes: David> Hi, I am a regular user of matplotlib since I moved from David> matlab to python/numpy/scipy. Even if I find matplotlib to David> be a real help during the transition from matlab to python, David> I must confess I found it the most disappointing compare David> other packages ( essentially numpy/scipy/ipython). This is Meatloaf: Now don't be sad, cause two out of three ain't bad If you consider the fact that matplotlib was originally an ipython patch that was rejected, you can see why we are such a bastard child of the scientific python world. There is a seed of truth in this; Numeric, scipy and ipython were all mature packages in widespread use before the first line of matplotlib code was written. So they are farther along in terms of maturity, documentation, usability, etc... than matplotlib is. But we've achieved a lot in a comparably short time. When I started working on matplotlib there were probably two dozen plotting packages that people used and recommended. Now we are down to 5 or 6, with matplotlib doing most of what most people need. I've focused on making something that does most of what people (and I) need rather than doing it the fastest, so it is too slow for some purposes but fast enough for most. When we get a well defined important test case that is too slow, we typically try and optimize it, sometimes with dramatic results (eg 25 fold speedups); more on this below. A consequence of trying to support most of the needs of most users is this: we run on all major operating systems and all major GUIs with all major array packages. Consider the combinatorial problem: 5 graphical user interfaces with two or more versions in the wild across 3 operating systems and you will get a feel for what the support problem we have. This is not an academic point. Most of the GUI maintainers for *a single backend* burn out in short order. Most graphics packages *solve* this problem by supporting a single output format (PYX) or GUI (chaco) which is a damned fine and admirable solution. But the consequence of this is plotting fragmentation: people who need GTK cannot use Chaco, people who need SVG cannot use PYX, and so on, and so they'll write their own plotting library for their own GUI or output format (the situation before matplotlib). You can certainly get closer to bare metal speed by reducing choices and focusing on a single target -- part of the performance price we pay is in our abstraction layers, part is in trying to support features that may be rarely used but cost something (masked array support, rotated text with newlines), part is because we need to get to work and optimize the slow parts. David> not a rant; I want to know if this slowness is coming from David> my lack of matplotlib knowledge or not; I apologize in David> advance if the following hurts anyone feelings :) Meatloaf: But -- there ain't no way I'm ever gonna love you OK, I'll stop now. David> First, I must admit that whereas I took a significant David> amount of time to study numpy and scipy, I didn't take that David> same time for matplotlib. So this disappointment may just David> be a consequences of this laziness. I suspect this is partly true; see below. David> My main problem with matplotlib is speed: I find it David> really annoying to use in an interactive manner. For David> example, when I need to display some 2d information, such David> as spectrogramm or correlogram, this take 1 or 2 seconds David> for a small signal (~4500 frames of 256 samples). My David> function correlogram (similar to specgram, but compute David> correlation instead of log spectrum) uses imshow, and this David> function takes 20 times more time than imagesc of matlab David> for the same size. Also, I found changing the size of the This is where you can help us. Saying specgram is slow is only marginally more useful than saying matplotlib is slow or python is slow. What is helpful is to post a complete, free-standing script that we can run, with some attached performance numbers. For starters, just run it with the Agg backend so we can isolate matplotlib from the respective GUIs. Show us how the performance scales with the specgram parameters (frames and samples). specgram is divided into two parts (if you look at the Axes.specgram you will see that it calls matplotlib.mlab.specgram to do the computation and Axes.imshow to visualize it. Which part is slow: the mlab.specgram computation or the visualizion (imshow) part or both? You can paste this function into your own python file and start timing different parts. The most helpful "this is slow" posts come with profiler output so we can see where the bottlenecks are. Such a post by Fernando Perez on "plot" with markers yielded performance boosts of 25x for large numbers of points when he showed we were making about one hundred thousand function calls per plot. David> matplotlib window really 'annoying to the eye': I compared David> to matlab, and this may be due to the fact that the whole David> window is redrawn with matplotlib, including the toolbar, David> whereas in matlab, the top toolbar is not redrawn. It would be nice if we exposed the underlying GTK widgets to you so you could customize the "expand" and "fill" properties of the gtk toolbar, but this gets us into the multiple GUI, multiple version problem discussed above. Providing an abstract interface to such details that works across the mpl backends is a lot of work that takes us away from our core incompetency -- plotting. What we do is enable you to write your own widgets and embed mpl in them; see examples/embedding_in_gtk2.py which shows you how to do this for GTK/GTKAgg. You can then customize the toolbar to your heart's content. David> Finally, plotting many data (using plot(X, Y) with X and Y David> around 1000/10000 samples) is 'slow' (the '' are because I David> don't know much about computer graphics, and I understand David> that slow in the rendering is often just a perception) This shouldn't be slow -- again a test script with some performance numbers would help so we can compare what we are getting. One thought: make sure you are using the numerix layer properly -- ie, if you are creating arrays with numpy make sure you have numerix set to numpy ( i see below that you set numerix to numpy but --verbose-helpful will confirm the setting). A good way to start is to write a demonstration script that you find too slow which makes a call to savefig, and run it with > time myscript.py --verbose-helpful -dAgg and post the output and script. Then we might be able to help. David> So, is this a current limitation of matplotlib, is David> matplotlib optimized for good rendering for publication, David> and not for interactive use, or I am just misguided in my David> use of matplotlib ? Many people use it interactively, but a number of power users find it slow. JDH |
From: Fernando P. <fpe...@gm...> - 2006-12-12 17:26:10
|
On 12/12/06, John Hunter <jdh...@ac...> wrote: > --verbose-helpful will confirm the setting). A good way to start is > to write a demonstration script that you find too slow which makes a > call to savefig, and run it with > > > time myscript.py --verbose-helpful -dAgg It may be worth mentioning here this little utility (Linux only, unfortunately): http://amath.colorado.edu/faculty/fperez/python/profiling/ For profiling more complex codes, it's really a godsend. And note that the generated cachegrind files are typically small and can be sent to others for analysis, so you can run it locally (if for example the run depends on data you can't share) and then send to the list the generated profile. Anyone with Kcachegrind will then be able to load your profile info and study it in detail. Cheers, f |
From: David C. <da...@ar...> - 2006-12-13 07:07:22
Attachments:
slowmatplotlib.tbz2
|
John Hunter wrote: > This is where you can help us. Saying specgram is slow is only > marginally more useful than saying matplotlib is slow or python is > slow. What is helpful is to post a complete, free-standing script > that we can run, with some attached performance numbers. For > starters, just run it with the Agg backend so we can isolate > matplotlib from the respective GUIs. Show us how the performance > scales with the specgram parameters (frames and samples). specgram is > divided into two parts (if you look at the Axes.specgram you will see > that it calls matplotlib.mlab.specgram to do the computation and > Axes.imshow to visualize it. Which part is slow: the mlab.specgram > computation or the visualizion (imshow) part or both? You can paste > this function into your own python file and start timing different > parts. The most helpful "this is slow" posts come with profiler > output so we can see where the bottlenecks are. (sorry for double posting) Ok, here we go: I believe that the rendering of the figure returned by imshow to be slow. For example, let's say I have a 2 minutes signal @ 8kHz sampling-rate, with windows of 256 samples with 50 % overlap. I have around 64 frames / seconds, eg ~ 8000 frames of 256 windows. So for benchmark purposes, we can just send random data of shape 8000x256 to imshow. In ipython, this takes a long time (around 2 seconds for imshow(data), where data = random(8000, 256)). Now, on a small script to have a better idea: import numpy as N import pylab as P def generate_data_2d(fr, nwin, hop, len): nframes = 1.0 * fr / hop * len return N.random.randn(nframes, nwin) def bench_imshow(fr, nwin, hop, len, show = True): data = generate_data_2d(fr, nwin, hop, len) P.imshow(data) if show: P.show() if __name__ == '__main__': # 2 minutes (120 sec) of sounds @ 8 kHz with 256 samples with 50 % overlap bench_imshow(8000, 256, 128, 120, show = False) Now, I have a problem, because I don't know how to benchmark when using show to True (I have to manually close the figure). If I run the above script with time, I got 1.5 seconds with show = False (after several trials to be sure matplotlib files are in the system cache: this matters because my home dir is on NFS). If I set show = True, and close the figure by hand once the figure is plotted, I have 4.5 sec instead. If I run the above script with -dAgg --versbose-helpful (I was looking for this one to check numerix is correctly set to numpy:) ): with show = False: matplotlib data path /home/david/local/lib/python2.4/site-packages/matplotlib/mpl-data $HOME=/home/david CONFIGDIR=/home/david/.matplotlib loaded rc file /home/david/.matplotlib/matplotlibrc matplotlib version 0.87.7 verbose.level helpful interactive is False platform is linux2 numerix numpy 1.0.2.dev3484 font search path ['/home/david/local/lib/python2.4/site-packages/matplotlib/mpl-data'] loaded ttfcache file /home/david/.matplotlib/ttffont.cache backend Agg version v2.2 real 0m1.185s user 0m0.808s sys 0m0.224s with show = True matplotlib data path /home/david/local/lib/python2.4/site-packages/matplotlib/mpl-data $HOME=/home/david CONFIGDIR=/home/david/.matplotlib loaded rc file /home/david/.matplotlib/matplotlibrc matplotlib version 0.87.7 verbose.level helpful interactive is False platform is linux2 numerix numpy 1.0.2.dev3484 font search path ['/home/david/local/lib/python2.4/site-packages/matplotlib/mpl-data'] loaded ttfcache file /home/david/.matplotlib/ttffont.cache backend Agg version v2.2 real 0m1.193s user 0m0.848s sys 0m0.192s So the problem is in the rendering, right ? (Not sure to understand exactly what Agg backend is doing). Now, using hotshot (kcache grind profiles attached to the email), for the noshow case: 1 0.001 0.001 0.839 0.839 slowmatplotlib.py:181(bench_imshow_noshow) 1 0.000 0.000 0.837 0.837 slowmatplotlib.py:163(bench_imshow) 1 0.000 0.000 0.586 0.586 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:1894(imshow) 3 0.000 0.000 0.510 0.170 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:883(gca) 1 0.000 0.000 0.509 0.509 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:950(ishold) 4 0.000 0.000 0.409 0.102 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:903(gcf) 1 0.000 0.000 0.409 0.409 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:818(figure) 1 0.000 0.000 0.408 0.408 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:36(new_figure_manager) 1 0.003 0.003 0.400 0.400 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:401(__init__) 1 0.000 0.000 0.397 0.397 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:25(_get_toolbar) 1 0.001 0.001 0.397 0.397 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:496(__init__) 1 0.000 0.000 0.396 0.396 /home/david/local/lib/python2.4/site-packages/matplotlib/backend_bases.py:1112(__init__) 1 0.000 0.000 0.396 0.396 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:557(_init_toolbar) 1 0.008 0.008 0.396 0.396 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:595(_init_toolbar2_4) 1 0.388 0.388 0.388 0.388 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:967(__init__) 1 0.251 0.251 0.251 0.251 slowmatplotlib.py:155(generate_data_2d) 3 0.000 0.000 0.101 0.034 /home/david/local/lib/python2.4/site-packages/matplotlib/figure.py:629(gca) 1 0.000 0.000 0.101 0.101 /home/david/local/lib/python2.4/site-packages/matplotlib/figure.py:449(add_subplot) 1 0.000 0.000 0.100 0.100 /home/david/local/lib/python2.4/site-packages/matplotlib/axes.py:4523(__init__) 1 0.000 0.000 0.100 0.100 /home/david/local/lib/python2.4/site-packages/matplotlib/axes.py:337(__init__) But the show case is more interesting: ncalls tottime percall cumtime percall filename:lineno(function) 1 0.002 0.002 3.886 3.886 slowmatplotlib.py:177(bench_imshow_show) 1 0.000 0.000 3.884 3.884 slowmatplotlib.py:163(bench_imshow) 1 0.698 0.698 3.003 3.003 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:70(show) 2 0.000 0.000 2.266 1.133 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:275(expose_event) 1 0.009 0.009 2.266 2.266 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:71(_render_figure) 1 0.000 0.000 2.256 2.256 /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_agg.py:385(draw) 1 0.000 0.000 2.253 2.253 /home/david/local/lib/python2.4/site-packages/matplotlib/figure.py:510(draw) 1 0.000 0.000 2.251 2.251 /home/david/local/lib/python2.4/site-packages/matplotlib/axes.py:994(draw) 1 0.005 0.005 1.951 1.951 /home/david/local/lib/python2.4/site-packages/matplotlib/image.py:173(draw) 1 0.096 0.096 1.946 1.946 /home/david/local/lib/python2.4/site-packages/matplotlib/image.py:109(make_image) 1 0.002 0.002 1.850 1.850 /home/david/local/lib/python2.4/site-packages/matplotlib/cm.py:50(to_rgba) 1 0.001 0.001 0.949 0.949 /home/david/local/lib/python2.4/site-packages/matplotlib/colors.py:735(__call__) 1 0.097 0.097 0.899 0.899 /home/david/local/lib/python2.4/site-packages/matplotlib/colors.py:568(__call__) 325 0.050 0.000 0.671 0.002 /home/david/local/lib/python2.4/site-packages/numpy/core/ma.py:533(__init__) 1 0.600 0.600 0.600 0.600 /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:282(resize) 1 0.000 0.000 0.596 0.596 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:1894(imshow) 10 0.570 0.057 0.570 0.057 /home/david/local/lib/python2.4/site-packages/numpy/oldnumeric/functions.py:117(where) 3 0.000 0.000 0.513 0.171 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:883(gca) 1 0.000 0.000 0.513 0.513 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:950(ishold) 4 0.000 0.000 0.408 0.102 /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:903(gcf) For more details, see the .kc files which are the in the tbz2 archive, with the script for generating profiles for kcachegrind, I will post an other email for the other problem (with several subplots) cheers, David |
From: David C. <da...@ar...> - 2006-12-13 08:37:36
|
David Cournapeau wrote: > But the show case is more interesting: > > ncalls tottime percall cumtime percall filename:lineno(function) > 1 0.002 0.002 3.886 3.886 > slowmatplotlib.py:177(bench_imshow_show) > 1 0.000 0.000 3.884 3.884 > slowmatplotlib.py:163(bench_imshow) > 1 0.698 0.698 3.003 3.003 > /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:70(show) > > 2 0.000 0.000 2.266 1.133 > /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtk.py:275(expose_event) > > 1 0.009 0.009 2.266 2.266 > /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_gtkagg.py:71(_render_figure) > > 1 0.000 0.000 2.256 2.256 > /home/david/local/lib/python2.4/site-packages/matplotlib/backends/backend_agg.py:385(draw) > > 1 0.000 0.000 2.253 2.253 > /home/david/local/lib/python2.4/site-packages/matplotlib/figure.py:510(draw) > > 1 0.000 0.000 2.251 2.251 > /home/david/local/lib/python2.4/site-packages/matplotlib/axes.py:994(draw) > > 1 0.005 0.005 1.951 1.951 > /home/david/local/lib/python2.4/site-packages/matplotlib/image.py:173(draw) > > 1 0.096 0.096 1.946 1.946 > /home/david/local/lib/python2.4/site-packages/matplotlib/image.py:109(make_image) > > 1 0.002 0.002 1.850 1.850 > /home/david/local/lib/python2.4/site-packages/matplotlib/cm.py:50(to_rgba) > > 1 0.001 0.001 0.949 0.949 > /home/david/local/lib/python2.4/site-packages/matplotlib/colors.py:735(__call__) > > 1 0.097 0.097 0.899 0.899 > /home/david/local/lib/python2.4/site-packages/matplotlib/colors.py:568(__call__) > > 325 0.050 0.000 0.671 0.002 > /home/david/local/lib/python2.4/site-packages/numpy/core/ma.py:533(__init__) > > 1 0.600 0.600 0.600 0.600 > /home/david/local/lib/python2.4/site-packages/numpy/core/fromnumeric.py:282(resize) > > 1 0.000 0.000 0.596 0.596 > /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:1894(imshow) > > 10 0.570 0.057 0.570 0.057 > /home/david/local/lib/python2.4/site-packages/numpy/oldnumeric/functions.py:117(where) > > 3 0.000 0.000 0.513 0.171 > /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:883(gca) > > 1 0.000 0.000 0.513 0.513 > /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:950(ishold) > > 4 0.000 0.000 0.408 0.102 > /home/david/local/lib/python2.4/site-packages/matplotlib/pylab.py:903(gcf) > > > For more details, see the .kc files which are the in the tbz2 archive, > with the script for generating profiles for kcachegrind, Here is some stuff I tried: - first, we can see that in expose_event (one is expensive, the other negligeable, from my understanding), two calls are pretty expensive: the __call__ at line 735 (for normalize functor) and one for __call__ at line 568 (for colormap functor). - for normalize functor, one line is expensive: val = ma.array(clip(val.filled(vmax), vmin, vmax), mask=mask). If I put a test on mask when mask is None (which it is in my case), then the function becomes negligeable. - for colormap functor, the 3 where calls are expensive. I am not sure to understand in which case they are useful; if I understand correctly, one tries to avoid values out of range (0, N), and force out of range values to be clipped. Isn't there an easier way than using where ? If I remove the where in the colormap functor, I have a 4x speed increase for the to_rgba function. After that, it becomes a bit more tricky to change things for someone like me who have no knowledge about matplotlib internals. Cheers, David |
From: Eric F. <ef...@ha...> - 2006-12-13 18:30:01
|
David, > - first, we can see that in expose_event (one is expensive, the other > negligeable, from my understanding), two calls are pretty expensive: > the __call__ at line 735 (for normalize functor) and one for __call__ > at line 568 (for colormap functor). > - for normalize functor, one line is expensive: val = > ma.array(clip(val.filled(vmax), vmin, vmax), mask=mask). If I put a test > on mask when mask is None (which it is in my case), then the function > becomes negligeable. > - for colormap functor, the 3 where calls are expensive. I am not > sure to understand in which case they are useful; if I understand > correctly, one tries to avoid > values out of range (0, N), and force out of range values to be clipped. > Isn't there an easier way than using where ? > > If I remove the where in the colormap functor, I have a 4x speed > increase for the to_rgba function. After that, it becomes a bit more > tricky to change things for someone like me who have no knowledge about > matplotlib internals. The things you have identified were added by me to support masked array bad values and special colors for regions above or below the mapped range of values. I will be happy to make changes to speed them up. Regarding the clip line, I think that your test for mask is None is not the right solution because it knocks out the clipping operation, but the clipping is intended regardless of the state of the mask. I had expected it to be a very fast operation, so I am surprised it is a bottleneck; in any case I can take a look to see how it can be sped up, or whether it can be bypassed in some cases. Maybe it is also using "where" internally. Now I recall very recent discussion explaining why "where" is slow compared to indexing with a boolean, so I know I can speed it up with numpy. Unfortunately Numeric does not support this, so maybe what will be needed is numerix functions that take advantage of numpy when available. This is one of those times when I really wish we could drop Numeric and numarray support *now* and start taking full advantage of numpy. In any case, thanks for pointing out the slowdowns--I will fix them as best I can--and keep at it. I share your interest in speeding up interactive use of matplotlib, along with fixing bugs, filling holes in functionalisy, and smoothing rough edges. There is a lot to be done. As John noted, though, there will always be tradeoffs among flexibility, code simplicity, generality, and speed. Eric |
From: David C. <da...@ar...> - 2006-12-14 03:09:35
|
Eric Firing wrote: > > Regarding the clip line, I think that your test for mask is None is > not the right solution because it knocks out the clipping operation, > but the clipping is intended regardless of the state of the mask. I > had expected it to be a very fast operation, so I am surprised it is a > bottleneck; in any case I can take a look to see how it can be sped > up, or whether it can be bypassed in some cases. Maybe it is also > using "where" internally. (again, sorry for the double posting, I always forget that some ML do not reply automatically to the ML) My wordings were vague at best :) The clipping operation is *not* removed, and it was not the culprit (it becomes a bottleneck once you get the 4x speed issue, though). What I did was: if self.clip: mask = ma.getmaskorNone(val) if mask == None: val = ma.array(clip(val.filled(vmax), vmin, vmax)) else: val = ma.array(clip(val.filled(vmax), vmin, vmax), mask=mask) Actually, the problem is in ma.array: with a value of mask to None, it should not make a difference between mask = None or no mask arg, right ? I didn't change ma.array to keep my change as local as possible. To change only this operation as above gives a speed up from 1.8 s to ~ 1.0 s for to_rgba, which means calling show goes from ~ 2.2 s to ~1.4 s. I also changed result = (val-vmin)/float(vmax-vmin) to invcache = 1.0 / (vmax - vmin) result = (val-vmin) * invcache which gives a moderate speed up (around 100 ms for a 8000x256 points array, still in the 5-10 % range of the whole cost, though, and is not likely to cause any hidden bug). Once you make both those changes, the clip call is by far the most expensive operation in normalize functor, but the functor is not really expensive anymore compared to the rest, so this is not where I looked at after. For the where calls in Colormap functor, I was wondering if they are necessary in all cases: some of those calls seem redundant, and it may be possible to detect that before calling them. This should be both easier and faster, at least in this case, than having a fast where ? I understand that support of multiple array backend, support of mask arrays have cost consequences. But it looks like it may be possible to speed things up for cases where an array has only meaningful values/no mask. cheers, David |
From: Eric F. <ef...@ha...> - 2006-12-14 18:14:49
|
David, I have made some changes in svn that address all but one of the points you made: [....] > if self.clip: > mask = ma.getmaskorNone(val) > if mask == None: > val = ma.array(clip(val.filled(vmax), vmin, vmax)) > else: > val = ma.array(clip(val.filled(vmax), vmin, vmax), > mask=mask) The real problem here is that I should not have been using getmaskorNone(). In numpy.ma, we need nomask, not None, so we want an ordinary getmask() call. ma.array(...., mask=ma.nomask) is very fast, so the problem goes away. > > Actually, the problem is in ma.array: with a value of mask to None, it > should not make a difference between mask = None or no mask arg, right ? But it does, because for numpy it needs to be nomask; it does something with None, but whatever it is, it is very slow. > I didn't change ma.array to keep my change as local as possible. To > change only this operation as above gives a speed up from 1.8 s to ~ 1.0 > s for to_rgba, which means calling show goes from ~ 2.2 s to ~1.4 s. I > also changed > result = (val-vmin)/float(vmax-vmin) > > to > > invcache = 1.0 / (vmax - vmin) > result = (val-vmin) * invcache This is the one I did not address. I don't understand how this could be making much difference, and some testing using ipython and %prun with 1-line operations showed little difference with variations on this theme. The fastest would appear to be (and logically should be, I think) result = (val-vmin)*(1.0/(vmax-vmin)), but I don't think it makes much difference--it looks to me like maybe 10-20 msec, not 100, on my Pentium M 1.6 Ghz. Maybe still worthwhile, so I may yet make the change after more careful testing. > > which gives a moderate speed up (around 100 ms for a 8000x256 points > array). Once you make both those changes, the clip call is by far the > most expensive operation in normalize functor, but the functor is not > really expensive anymore compared to the rest, so this is not where I > looked at. > > For the where calls in Colormap functor, I was wondering if they are > necessary in all cases: some of those calls seem redundant, and it may > be possible to detect that before calling them. This should be both > easier and faster, at least in this case, than having a fast where ? > You hit the nail squarely: where() is the wrong function to use, and I have eliminated it from colors.py. The much faster replacement is putmask, which does as well as direct indexing with a Boolean but works with all three numerical packages. I think that using the fast putmask is better than trying to figure out special cases in which there would be nothing to put, although I could be convinced otherwise. > I understand that support of multiple array backend, support of mask > arrays have cost consequences. But it looks like it may be possible to > speed things up for cases where an array has only meaningful values/no > mask. The big gains here were essentially bug fixes--picking the appropriate function (getmask versus getmaskorNone and putmask versus where). Here is the colors.py diff: --- trunk/matplotlib/lib/matplotlib/colors.py 2006/12/03 21:54:38 2906 +++ trunk/matplotlib/lib/matplotlib/colors.py 2006/12/14 08:27:04 2923 @@ -30,9 +30,9 @@ """ import re -from numerix import array, arange, take, put, Float, Int, where, \ +from numerix import array, arange, take, put, Float, Int, putmask, \ zeros, asarray, sort, searchsorted, sometrue, ravel, divide,\ - ones, typecode, typecodes, alltrue + ones, typecode, typecodes, alltrue, clip from numerix.mlab import amin, amax import numerix.ma as ma import numerix as nx @@ -536,8 +536,9 @@ lut[0] = y1[0] lut[-1] = y0[-1] # ensure that the lut is confined to values between 0 and 1 by clipping it - lut = where(lut > 1., 1., lut) - lut = where(lut < 0., 0., lut) + clip(lut, 0.0, 1.0) + #lut = where(lut > 1., 1., lut) + #lut = where(lut < 0., 0., lut) return lut @@ -588,16 +589,16 @@ vtype = 'array' xma = ma.asarray(X) xa = xma.filled(0) - mask_bad = ma.getmaskorNone(xma) + mask_bad = ma.getmask(xma) if typecode(xa) in typecodes['Float']: - xa = where(xa == 1.0, 0.9999999, xa) # Tweak so 1.0 is in range. + putmask(xa, xa==1.0, 0.9999999) #Treat 1.0 as slightly less than 1. xa = (xa * self.N).astype(Int) - mask_under = xa < 0 - mask_over = xa > self.N-1 - xa = where(mask_under, self._i_under, xa) - xa = where(mask_over, self._i_over, xa) - if mask_bad is not None: # and sometrue(mask_bad): - xa = where(mask_bad, self._i_bad, xa) + # Set the over-range indices before the under-range; + # otherwise the under-range values get converted to over-range. + putmask(xa, xa>self.N-1, self._i_over) + putmask(xa, xa<0, self._i_under) + if mask_bad is not None and mask_bad.shape == xa.shape: + putmask(xa, mask_bad, self._i_bad) rgba = take(self._lut, xa) if vtype == 'scalar': rgba = tuple(rgba[0,:]) @@ -752,7 +753,7 @@ return 0.*value else: if clip: - mask = ma.getmaskorNone(val) + mask = ma.getmask(val) val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), mask=mask) result = (val-vmin)/float(vmax-vmin) @@ -804,7 +805,7 @@ return 0.*value else: if clip: - mask = ma.getmaskorNone(val) + mask = ma.getmask(val) val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), mask=mask) result = (ma.log(val)-nx.log(vmin))/(nx.log(vmax)-nx.log(vmin)) Eric |
From: Simson G. <si...@ac...> - 2006-12-15 03:37:07
|
HI. I wand to have just horizontal grid lines. Is there any way to do this? Thanks! |
From: David C. <da...@ar...> - 2006-12-18 05:41:57
|
Eric Firing wrote: > David, > > I have made some changes in svn that address all but one of the points > you made: > > [....] >> if self.clip: >> mask = ma.getmaskorNone(val) >> if mask == None: >> val = ma.array(clip(val.filled(vmax), vmin, vmax)) >> else: >> val = ma.array(clip(val.filled(vmax), vmin, vmax), >> mask=mask) > > The real problem here is that I should not have been using > getmaskorNone(). In numpy.ma, we need nomask, not None, so we want an > ordinary getmask() call. ma.array(...., mask=ma.nomask) is very fast, > so the problem goes away. > >> >> Actually, the problem is in ma.array: with a value of mask to None, >> it should not make a difference between mask = None or no mask arg, >> right ? > But it does, because for numpy it needs to be nomask; it does > something with None, but whatever it is, it is very slow. > >> I didn't change ma.array to keep my change as local as possible. To >> change only this operation as above gives a speed up from 1.8 s to ~ >> 1.0 s for to_rgba, which means calling show goes from ~ 2.2 s to ~1.4 >> s. I also changed result = (val-vmin)/float(vmax-vmin) >> >> to >> >> invcache = 1.0 / (vmax - vmin) >> result = (val-vmin) * invcache > > This is the one I did not address. I don't understand how this could > be making much difference, and some testing using ipython and %prun > with 1-line operations showed little difference with variations on > this theme. The fastest would appear to be (and logically should be, > I think) result = (val-vmin)*(1.0/(vmax-vmin)), but I don't think it > makes much difference--it looks to me like maybe 10-20 msec, not 100, > on my Pentium M 1.6 Ghz. Maybe still worthwhile, so I may yet make > the change after more careful testing. > > >> >> which gives a moderate speed up (around 100 ms for a 8000x256 points >> array). Once you make both those changes, the clip call is by far the >> most expensive operation in normalize functor, but the functor is not >> really expensive anymore compared to the rest, so this is not where I >> looked at. >> >> For the where calls in Colormap functor, I was wondering if they are >> necessary in all cases: some of those calls seem redundant, and it >> may be possible to detect that before calling them. This should be >> both easier and faster, at least in this case, than having a fast >> where ? >> > > You hit the nail squarely: where() is the wrong function to use, and I > have eliminated it from colors.py. The much faster replacement is > putmask, which does as well as direct indexing with a Boolean but > works with all three numerical packages. I think that using the fast > putmask is better than trying to figure out special cases in which > there would be nothing to put, although I could be convinced otherwise. > > >> I understand that support of multiple array backend, support of mask >> arrays have cost consequences. But it looks like it may be possible >> to speed things up for cases where an array has only meaningful >> values/no mask. > > The big gains here were essentially bug fixes--picking the appropriate > function (getmask versus getmaskorNone and putmask versus where). Ok, I've installed last svn, and now, there is still one function which is much slower than a direct numpy implementation, so I would like to know if this is inherent to the multiple backend nature of matplotlib or not. The functor Normalize uses the clip function, and a direct numpy would be 3 times faster (giving the show call a 20 % speed in my really limited benchmarks): if clip: mask = ma.getmask(val) #val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), # mask=mask) def myclip(a, m, M): a[a<m] = m a[a>M] = M return a val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask) I am a bit lost in the matplotlib code to see where clip is implemented (is it in numerix and as such using the numpy function clip ?). Still, I must confess that all this looks quite good, because it was possible to speed things up quite considerably without too much effort, cheers, David |
From: Eric F. <ef...@ha...> - 2006-12-18 06:53:40
|
David Cournapeau wrote: [...] > Ok, I've installed last svn, and now, there is still one function which > is much slower than a direct numpy implementation, so I would like to > know if this is inherent to the multiple backend nature of matplotlib or > not. The functor Normalize uses the clip function, and a direct numpy > would be 3 times faster (giving the show call a 20 % speed in my really > limited benchmarks): > > if clip: > mask = ma.getmask(val) > #val = ma.array(nx.clip(val.filled(vmax), vmin, vmax), > # mask=mask) > def myclip(a, m, M): > a[a<m] = m > a[a>M] = M > return a > val = ma.array(myclip(val.filled(vmax), vmin, vmax), mask=mask) > > I am a bit lost in the matplotlib code to see where clip is implemented > (is it in numerix and as such using the numpy function clip ?). There is a clip function in all three numeric packages, so a native clip is being used. If numpy.clip is actually slower than your version, that sounds like a problem with the implementation in numpy. By all logic a single clip function should either be the same (if it is implemented like yours) or faster (if it is a single loop in C-code, as I would expect). This warrants a little more investigation before changing the mpl code. The best thing would be if you could make a simple standalone numpy test case profiling both versions and post the results as a question to the numpy-discussion list. Many such questions in the past have resulted in big speedups in numpy. One more thought: it is possible that the difference is because myclip operates on the array in place while clip generates a new array. If this is the cause of the difference then changing your last line to "return a.copy()" probably would slow it down to the numpy clip speed or slower. Eric |
From: David C. <da...@ar...> - 2006-12-18 07:10:15
|
Eric Firing wrote: > > There is a clip function in all three numeric packages, so a native > clip is being used. > > If numpy.clip is actually slower than your version, that sounds like a > problem with the implementation in numpy. By all logic a single clip > function should either be the same (if it is implemented like yours) > or faster (if it is a single loop in C-code, as I would expect). This > warrants a little more investigation before changing the mpl code. > The best thing would be if you could make a simple standalone numpy > test case profiling both versions and post the results as a question > to the numpy-discussion list. Many such questions in the past have > resulted in big speedups in numpy. I am much more familiar with internal numpy code than matplotlib's, so this is much easier for me, too :) > > One more thought: it is possible that the difference is because myclip > operates on the array in place while clip generates a new array. If > this is the cause of the difference then changing your last line to > "return a.copy()" probably would slow it down to the numpy clip speed > or slower. It would be scary if a copy of a 8008x256 array of double took 100 ms... Fortunately, it does not, this does not seem to be the problem. cheers, David |
From: John H. <jdh...@ac...> - 2006-12-19 14:59:39
|
>>>>> "David" == David Cournapeau <da...@ar...> writes: David> In make_image, most of the time is taken into to_rgba: David> almost half of it is taken in by the take call in the David> Colormap.__call__. Almost 200 ms to get colors from the David> indexes seems quite a lot (this means 280 cycles / pixel on David> average !). I can reproduce this number by using a small David> numpy test. David> On my laptop (pentium M, 1.2 Ghz), make_image takes almost David> 85 % of the time, which seems to imply that this is where David> one should focus if one wants to improve the speed, This may have been lost in the longer thread above, but what interpolation are you using? You may see a good performance boost by using interpolation='nearest'. Also, with your clip changes and with Eric's changes is it still painfully slow for you -- how much have these changes helped? Of time time spent in make image, how much is _image.fromarray, ScalarMappable.to_rgba and _image.resize? |
From: David C. <da...@ar...> - 2006-12-20 08:09:20
|
John Hunter wrote: > > > David> In make_image, most of the time is taken into to_rgba: > David> almost half of it is taken in by the take call in the > David> Colormap.__call__. Almost 200 ms to get colors from the > David> indexes seems quite a lot (this means 280 cycles / pixel on > David> average !). I can reproduce this number by using a small > David> numpy test. > > David> On my laptop (pentium M, 1.2 Ghz), make_image takes almost > David> 85 % of the time, which seems to imply that this is where > David> one should focus if one wants to improve the speed, > > This may have been lost in the longer thread above, I am a bit lost myself between numpy and mpl ML, sorry for the inconvenience. > but what > interpolation are you using? You may see a good performance boost by > using interpolation='nearest'. At what point is interpolation used ? > Also, with your clip changes and with > Eric's changes is it still painfully slow for you Painfully is a strong word :) It is still 10 to 15 times slower than matlab on the same computer: the show call is around 800 ms instead of 70 ms with matlab, and matlab image is equivalent to imshow + show calls actually. Matlab having only one toolkit, it obviously has an advantage, but I don't think the problem is on the GUI side anyway. > -- how much have > these changes helped? With the original profiling, it took a bit more than 2100 ms for a show call after a imshow call for a 8000x256 array according to a saved kcachegrind profile. Now, it is around 800 ms, which is already much better, and with minimal changes (eg without using a special fast path more prone to bugs). I estimate that squeezing to a bit less than 500 ms should be easily possible by improving on numpy side (clip, float to int convertion and take function), which has the nice effect of improving mpl without touching one line of it, and improving numpy as the same time :) The last 500 ms would be much more difficult to squeeze: half of it is used to 'launch' the figure anyway. And below a few hundred ms, it is becoming unnoticeable in interactive use (whereas the change from 2.1 s to 0.8 is; on my laptop, it is even more noticeable, because its CPU is kind of slow). David |
From: Christopher B. <Chr...@no...> - 2006-12-13 19:54:41
|
Eric Firing wrote: > Regarding the clip line, I think that your test for mask is None is not > the right solution because it knocks out the clipping operation, but the > clipping is intended regardless of the state of the mask. I had > expected it to be a very fast operation, for what it's worth, a few years ago a wrote a "fast_clip" c extension that did clip without making nearly as many temporary arrays as the Numeric one -- I don't know what numpy does , I haven't needed a fast clip recently. I'd be glad to send the code to anyone interested. > Now I recall very recent discussion explaining why "where" is slow > compared to indexing with a boolean, so I know I can speed it up with > numpy. Unfortunately Numeric does not support this, so maybe what will > be needed is numerix functions that take advantage of numpy when > available. good idea. > This is one of those times when I really wish we could drop > Numeric and numarray support *now* and start taking full advantage of numpy. I'd love that too. Maybe your proposal is a good one, though -- make numeric functions that are optimized for numpy. I think that's a good way to transition. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chr...@no... |
From: Eric F. <ef...@ha...> - 2006-12-15 07:50:03
|
Simson Garfinkel wrote: > HI. I wand to have just horizontal grid lines. Is there any way to do > this? Thanks! gca().yaxis.grid(True) gca().xaxis.grid(False) Here is the grid method docstring: def grid(self, b=None, which='major', **kwargs): """ Set the axis grid on or off; b is a boolean use which = 'major' | 'minor' to set the grid for major or minor ticks if b is None and len(kwargs)==0, toggle the grid state. If kwargs are supplied, it is assumed you want the grid on and b will be set to True kwargs are used to set the line properties of the grids, eg, xax.grid(color='r', linestyle='-', linewidth=2) Eric |
From: Simson G. <si...@ac...> - 2006-12-15 23:56:18
|
Looks like I need to read *all* of the docstrings. I wish there was an easy way to search them.... On Dec 15, 2006, at 2:49 AM, Eric Firing wrote: > Simson Garfinkel wrote: >> HI. I wand to have just horizontal grid lines. Is there any way to >> do this? Thanks! > > gca().yaxis.grid(True) > gca().xaxis.grid(False) > > Here is the grid method docstring: > > def grid(self, b=None, which='major', **kwargs): > """ > Set the axis grid on or off; b is a boolean use which = > 'major' | 'minor' to set the grid for major or minor ticks > > if b is None and len(kwargs)==0, toggle the grid state. If > kwargs are supplied, it is assumed you want the grid on and b > will be set to True > > kwargs are used to set the line properties of the grids, eg, > > xax.grid(color='r', linestyle='-', linewidth=2) > > > Eric > |
From: David C. <da...@ar...> - 2006-12-19 07:14:13
|
David Cournapeau wrote: > Eric Firing wrote: >> There is a clip function in all three numeric packages, so a native >> clip is being used. >> >> If numpy.clip is actually slower than your version, that sounds like a >> problem with the implementation in numpy. By all logic a single clip >> function should either be the same (if it is implemented like yours) >> or faster (if it is a single loop in C-code, as I would expect). This >> warrants a little more investigation before changing the mpl code. >> The best thing would be if you could make a simple standalone numpy >> test case profiling both versions and post the results as a question >> to the numpy-discussion list. Many such questions in the past have >> resulted in big speedups in numpy. > I am much more familiar with internal numpy code than matplotlib's, so > this is much easier for me, too :) >> One more thought: it is possible that the difference is because myclip >> operates on the array in place while clip generates a new array. If >> this is the cause of the difference then changing your last line to >> "return a.copy()" probably would slow it down to the numpy clip speed >> or slower. > It would be scary if a copy of a 8008x256 array of double took 100 ms... > Fortunately, it does not, this does not seem to be the problem. > > cheers, > > David Ok, so now, with my clip function, still for a 8000x256 double array: we have show() after imshow which takes around 760 ms. 3/5 are in make_image, 2/5 in the function blop, which is just an alias I put to measure the difference between axes.py:1043(draw) and image.py:173(draw) in the function Axis.draw (file axes.py): def blop(dsu): for zorder, i, a in dsu: a.draw(renderer) blop(dsu) In make_image, most of the time is taken into to_rgba: almost half of it is taken in by the take call in the Colormap.__call__. Almost 200 ms to get colors from the indexes seems quite a lot (this means 280 cycles / pixel on average !). I can reproduce this number by using a small numpy test. On my laptop (pentium M, 1.2 Ghz), make_image takes almost 85 % of the time, which seems to imply that this is where one should focus if one wants to improve the speed, cheers, David |