From: John G. <jn...@eu...> - 2004-02-09 10:28:18
|
I have some plots I'd like to print out and they would make better use of the paper if they were done landscape. Can the postscript backend do this? John |
From: John H. <jdh...@ac...> - 2004-02-10 15:19:30
|
>>>>> "John" == John Gill <jn...@eu...> writes: John> I have some plots I'd like to print out and they would make John> better use of the paper if they were done landscape. John> Can the postscript backend do this? Hi John, I haven't had time to take a close look at this. My initial suggestions is to experiment with the paper size import matplotlib matplotlib.use('PS') import matplotlib.backends.backend_ps as backend_ps backend_ps.defaultPaperSize = 11,8.5 # default is 8.5, 11 You may also have to specify a landscape portrait at print time, or rotate it. I'll take a closer look later. JDH |
From: John G. <jn...@eu...> - 2004-02-12 20:38:09
|
John, I've looked into this a bit more. I think I'll have it working shortly, seems I have to do three things to get landscape: * set the paper size * add a '90 rotate' incantation to the postscript at a suitable point * add a 'x y translate' command to adjust the origin Getting last bit right is making my head hurt, but I should have something working by tomorrow. John On Tue, 2004-02-10 at 15:05, John Hunter wrote: > >>>>> "John" == John Gill <jn...@eu...> writes: > > John> I have some plots I'd like to print out and they would make > John> better use of the paper if they were done landscape. > > John> Can the postscript backend do this? > > Hi John, > > I haven't had time to take a close look at this. My initial > suggestions is to experiment with the paper size > > import matplotlib > matplotlib.use('PS') > import matplotlib.backends.backend_ps as backend_ps > backend_ps.defaultPaperSize = 11,8.5 # default is 8.5, 11 > > You may also have to specify a landscape portrait at print time, or > rotate it. > > I'll take a closer look later. > > JDH > > > |
From: Peter G. <pgr...@ge...> - 2004-02-11 19:17:09
|
Hello: We will be dealing with large (> 100,000 but in some instances as big as 500,000 points) data sets. They are to be plotted, and I would like to use matplotlib. I did a few preliminary tests, and it seems like plotting that many pairs is a little too much for the system to handle. Currently we are using (as a backend to some other software) gnuplot for doing this plotting. It seems to be "lighting-fast" but I suspect (may be wrong!) that it reduces this data before the plotting takes place, and only selects every nth point. I have to go through the code that calls it to be certain. I would imagine that it is not necessary to get evrey point in 100,000 to produce a page-size plot, but I'm not sure if simply grabbing every nth point and reducing the data like that is the best way about this. So my question is to anyone else out there who is also dealing with these large (and very large) data sets? What do you do? Any library routines that you use before plotting to massage that data? Are there any ways (ie flags to set) to optimize this in matplotlib? Any other software you use? I should note that I use the GD backend and pipe the output to stdout for a cgi scrpit to pick up. Thanks. -- Peter Groszkowski Gemini Observatory Tel: +1 808 974-2509 670 N. A'ohoku Place Fax: +1 808 935-9235 Hilo, Hawai'i 96720, USA |
From: Perry G. <pe...@st...> - 2004-02-11 19:22:34
|
How are you plotting the data? As a scatter plot (e.g., symbols or points) or as a connected line plot. The former can be quite a bit slower and we have some thoughts about speeding that up (which we haven't broached with JDH yet). How long is it taking and how much faster do you need it? Perry Greenfield > -----Original Message----- > From: mat...@li... > [mailto:mat...@li...]On Behalf Of Peter > Groszkowski > Sent: Wednesday, February 11, 2004 2:14 PM > To: mat...@li... > Subject: [Matplotlib-users] large data sets and performance > > > Hello: > > We will be dealing with large (> 100,000 but in some instances as big as > 500,000 points) data sets. They are to be plotted, and I would like to > use matplotlib. I did a few preliminary tests, and it seems like > plotting that many pairs is a little too much for the system to handle. > Currently we are using (as a backend to some other software) gnuplot for > doing this plotting. It seems to be "lighting-fast" but I suspect (may > be wrong!) that it reduces this data before the plotting takes place, > and only selects every nth point. I have to go through the code that > calls it to be certain. I would imagine that it is not necessary to get > evrey point in 100,000 to produce a page-size plot, but I'm not sure if > simply grabbing every nth point and reducing the data like that is the > best way about this. So my question is to anyone else out there who is > also dealing with these large (and very large) data sets? What do you > do? Any library routines that you use before plotting to massage that > data? Are there any ways (ie flags to set) to optimize this in > matplotlib? Any other software you use? I should note that I use the GD > backend and pipe the output to stdout for a cgi scrpit to pick up. > > Thanks. > > -- > Peter Groszkowski Gemini Observatory > Tel: +1 808 974-2509 670 N. A'ohoku Place > Fax: +1 808 935-9235 Hilo, Hawai'i 96720, USA > > > > > ------------------------------------------------------- > SF.Net is sponsored by: Speed Start Your Linux Apps Now. > Build and deploy apps & Web services for Linux with > a free DVD software kit from IBM. Click Now! > http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > |
From: John H. <jdh...@ac...> - 2004-02-11 19:39:28
|
>>>>> "Peter" == Peter Groszkowski <pgr...@ge...> writes: Peter> Hello: We will be dealing with large (> 100,000 but in some Peter> instances as big as 500,000 points) data sets. They are to Peter> be plotted, and I would like to use matplotlib. Are you working with plot/loglog/etc (line data) or pcolor/hist/scatter/bar (patch data)? I routinely plot data sets this large. 500,000 data points is a typical 10 seconds of EEG, which is the application that led me to write matplotlib. EEG is fairly special: the x axis time is monotonically increasing and the y axis is smooth. This lets me take advantage of level of detail subsampling. If your xdata are sorted, ie like time, the following l = plot(blah, blah) set(l, 'lod', True) could be a big win. LOD is "Level of Detail" and if true subsamples the data according to the pixel width of the output, as you described. Whether this is appropriate or not depends on the data set of course, whether it is continuous, and so on. Can you describe your dataset in more detail, because I would like to add whatever optimizations are appropriate -- if others can pipe in here too that would help. Secondly, the standard gdmodule will iterate over the x, y values in a python loop in gd.py. This is slow for lines with lots of points. I have a patched gdmodule that I can send you (provide platform info) that moves this step to the extension module. Potentially a very big win. Another possibility: change backends. The GTK backend is significantly faster than GD. If you want to work off line (ie, draw to image only and not display to screen ) and are on a linux box, you can do this with GTK and Xvfb. I'll give you instructions if interested. In the next release of matplotlib, there will be a libart paint backend (cross platform) that may be faster than GD. I'm working on an Agg backend that should be considerably faster than all the other backends since it does everything in extension code -- we'll see :-). JDH |
From: Perry G. <pe...@st...> - 2004-02-11 19:50:45
|
John Hunter writes: > > could be a big win. LOD is "Level of Detail" and if true subsamples > the data according to the pixel width of the output, as you described. > Whether this is appropriate or not depends on the data set of course, > whether it is continuous, and so on. Can you describe your dataset in > more detail, because I would like to add whatever optimizations are > appropriate -- if others can pipe in here too that would help. > > What I was alluding to was that if a backend primitive was added that allowed plotting a symbol (patch?) or point for an array of points. The base implementation would just do a python loop over the single point case so there is no requirement for a backend to overload this call. But it could do so if it wanted to loop over all points in C. How flexible to make this is open to discussion (e.g., allowing x, and y scaling factors, as arrays, for the symbol to be plotted, and other attributes that may vary with point such as color) Perry |
From: John H. <jdh...@ac...> - 2004-02-11 22:43:33
|
>>>>> "Perry" == Perry Greenfield <pe...@st...> writes: Perry> What I was alluding to was that if a backend primitive was Perry> added that allowed plotting a symbol (patch?) or point for Perry> an array of points. The base implementation would just do Perry> a python loop over the single point case so there is no Perry> requirement for a backend to overload this call. But it Perry> could do so if it wanted to loop over all points in C. How Perry> flexible to make this is open to discussion (e.g., allowing Perry> x, and y scaling factors, as arrays, for the symbol to be Perry> plotted, and other attributes that may vary with point such Perry> as color) To make this work in the current design, you'll need more than a new backend method. Plot commands like scatter instantiate Artists (Circle) and add them to the Axes as a generic patch instances. On a call to draw, the Axes instance iterates over all of it's patch instances and forwards the call on to the artists it contains. These, in turn instantiate gc instances which contain information like linewidth, facecolor, edgecolor, alpha , etc... The patch instance also transforms its data into display units and calls the relevant backend method. Eg, a Circle instance would call renderer.draw_arc(gc, x, y, width, ...) This makes it relatively easy to write a backend since you only have to worry about 1 coordinate system (display) and don't need to know anything about the Artist objects (Circle, Line, Rectangle, Text, ...) The point is that no existing entity knows that a collection of patches are all circles, and noone is keeping track of whether they share a property or not. This buys you total flexibility to set individual properties, but you pay for it in performance, since you have to set every property for every object and call render methods for each one, and so on. My first response to this problem was to use a naive container class, eg Circles, and an appropriate backend method, eg, draw_circles. In this case, scatter would instantiate a Circles instance with a list of circles. When Circles was called to render, it would need to create a sequence of location data and a sequence of gcs locs = [ (x0, y0, w0, h0), (x1, y1, w1, h1), ...] gcs = [ circ0.get_gc(), circ1.get_gc(), ...] and then call renderer.draw_ellipses( locs, gcs). This would provide some savings, but probably not dramatic ones. The backends would need to know how to read the GCs. In backend_agg extension code, I've implemented the code (in CVS) to read the python GraphicsContextBase information using the python API. _gc_get_linecap _gc_get_joinstyle _gc_get_color # returns rgb This is kind of backward, implementing an object in python and then accessing it at the extension level code using the Python API, but it does keep as much of the frontend in python as possible, which is desirable. The point is that for your approach to work and to not break encapsulation, the backends have to know about the GC. The discussion above was focused on preserving all the individual properties of the actors (eg every circle can have it's own linewidth, color, alpha, dash style). But this is rare. Usually, we just want to vary one or two properties across a large collection, eg, color in pcolor and size and color in scatter. Much better is to implement a GraphicsContextCollection, where the relevant properties can be either individual elements or len(collection) sequences. If a property is an element, it's homogeneous across the collection. If it's len(collection), iterate over it. The CircleCollection, instead of storing individual Circle instances as I wrote about above, stores just the location and size data in arrays and a single GraphicsContextCollection. def scatter(x, y, s, c): collection = CircleCollection(x, y, s) gc = GraphicsContextCollection() gc.set_linewidth(1.0) # a single line width gc.set_foreground(c) # a len(x) array of facecolors gc.set_edgecolor('k') # a single edgecolor collection.set_gc(gc) axes.add_collection(collection) return collection And this will be blazingly fast compared to the solution above, since, for example, you transform the x, y, and s coordinates as numeric arrays rather than individually. And there is almost no function call overhead. And as you say, if the backend doesn't implement a draw_circles method, the CircleCollection can just fall back on calling the existing methods in a loop. Thoughts? JDH |
From: Perry G. <pe...@st...> - 2004-02-11 23:03:19
|
John Hunter writes: > >>>>> "Perry" == Perry Greenfield <pe...@st...> writes: > > Perry> What I was alluding to was that if a backend primitive was > Perry> added that allowed plotting a symbol (patch?) or point for > Perry> an array of points. The base implementation would just do > Perry> a python loop over the single point case so there is no > Perry> requirement for a backend to overload this call. But it > Perry> could do so if it wanted to loop over all points in C. How > Perry> flexible to make this is open to discussion (e.g., allowing > Perry> x, and y scaling factors, as arrays, for the symbol to be > Perry> plotted, and other attributes that may vary with point such > Perry> as color) > > To make this work in the current design, you'll need more than a new > backend method. > [much good explanation of why...] OK, I understand. > My first response to this problem was to use a naive container class, > eg Circles, and an appropriate backend method, eg, draw_circles. In > this case, scatter would instantiate a Circles instance with a list of > circles. When Circles was called to render, it would need to create a > sequence of location data and a sequence of gcs [...] I'd agree that this doesn't seem worth the trouble > > Much better is to implement a GraphicsContextCollection, where the > relevant properties can be either individual elements or > len(collection) sequences. If a property is an element, it's > homogeneous across the collection. If it's len(collection), iterate > over it. The CircleCollection, instead of storing individual Circle > instances as I wrote about above, stores just the location and size > data in arrays and a single GraphicsContextCollection. > > def scatter(x, y, s, c): > > collection = CircleCollection(x, y, s) > gc = GraphicsContextCollection() > gc.set_linewidth(1.0) # a single line width > gc.set_foreground(c) # a len(x) array of facecolors > gc.set_edgecolor('k') # a single edgecolor > > collection.set_gc(gc) > > axes.add_collection(collection) > return collection > > And this will be blazingly fast compared to the solution above, since, > for example, you transform the x, y, and s coordinates as numeric > arrays rather than individually. And there is almost no function call > overhead. And as you say, if the backend doesn't implement a > draw_circles method, the CircleCollection can just fall back on > calling the existing methods in a loop. > > Thoughts? > I like the sounds of this approach even more. But I wonder if it can be made somewhat more generic. This approach (if I read it correctly seems to need a backend function for each shape: perhaps only for circle?). What I was thinking was if there was a way to pass it the vectors or path for a symbol (for very often, many points will share the same shape, if not all the same x,y scale). Here the circle is a bit of a special case compared to crosses, error bars triangles and other symbols that are usually made up of a few straight lines. In these cases you could pass the backend the context collection along with the shape (and perhaps some scaling info if that isn't part of the context). That way only one backend routine is needed. I suppose circle and other curved items could be handled with A bezier type call. But perhaps I still misunderstand. Thanks for your very detailed response. Perry |
From: John H. <jdh...@ac...> - 2004-02-11 23:51:49
|
>>>>> "Perry" == Perry Greenfield <pe...@st...> writes: Perry> I like the sounds of this approach even more. But I wonder Perry> if it can be made somewhat more generic. This approach (if Perry> I read it correctly seems to need a backend function for Perry> each shape: perhaps only for circle?). What I was thinking Perry> was if there was a way to pass it the vectors or path for a Perry> symbol (for very often, many points will share the same Perry> shape, if not all the same x,y scale). Of course (slaps self on head). matplotlib 0.1 was designed around gtk drawing which doesn't support paths. Although I've been mumbling about adding paths for sometime (what with paint, ps, and agg), I'm still thinking inside the box. A collection of paths is the natural solution Perry> I suppose circle and other curved items could be handled Perry> with A bezier type call. Agg special cases this one with a dedicated ellipse function. ellipse(ex, y, width, height, numsteps) It's still a path, but you have a dedicated function to build that path up speedily. One potential drawback: how do you bring along the other backends that don't have path support? In the RectangleCollection approach, we can always fall back on draw_rectangle. In the path collection, it's more difficult. backend_gtk (pygtk) - no support for paths AFAIK backend_wx (wxpython) - no support for paths AFAIK; Jeremy? backend_ps - full path support backend_agg - ditto backend_gd - partial, I think; gotta check backend_paint (libart) - full, perhaps with bugs JDH |
From: Perry G. <pe...@st...> - 2004-02-12 05:05:43
|
John Hunter writes: > >>>>> "Perry" == Perry Greenfield <pe...@st...> writes: > Perry> I like the sounds of this approach even more. But I wonder > Perry> if it can be made somewhat more generic. This approach (if > Perry> I read it correctly seems to need a backend function for > Perry> each shape: perhaps only for circle?). What I was thinking > Perry> was if there was a way to pass it the vectors or path for a > Perry> symbol (for very often, many points will share the same > Perry> shape, if not all the same x,y scale). > > Of course (slaps self on head). matplotlib 0.1 was designed around > gtk drawing which doesn't support paths. Although I've been mumbling > about adding paths for sometime (what with paint, ps, and agg), I'm > still thinking inside the box. A collection of paths is the natural > solution > Based on your previous description of collection of circles, I think so (though I wonder about the name, paths may imply many independent paths whereas I'm implying that sharing of one path for a collection of points. Since circles are all identical in shape, that confusion doesn't exist with the plural. But I can see this approach being used for things like error bars (one can view them as scalable symbols). > Perry> I suppose circle and other curved items could be handled > Perry> with A bezier type call. > > Agg special cases this one with a dedicated ellipse function. > > ellipse(ex, y, width, height, numsteps) > > It's still a path, but you have a dedicated function to build that > path up speedily. > > One potential drawback: how do you bring along the other backends that > don't have path support? In the RectangleCollection approach, we can > always fall back on draw_rectangle. In the path collection, it's more > difficult. > Maybe I'm still missing something, but couldn't paths be implemented using the backend lines primitive? After all, any path is a finite set of lines (unless you are using bezier curves). And if lines are available in python, the loop could also be coded in C. Now it is true that some backends don't have the concept of defining a path that can be reused with different coordinate transforms. But that isn't needed for the functionality of rendering it, is it. It just makes it a bit more work to keep rendering the same set of points with different offsets and scales (i.e., you must keep giving the transformed path array(s) to the backend to render within a loop (in python or C). Right? Perry > |
From: Peter G. <pgr...@ge...> - 2004-02-11 20:22:32
|
Perry: Currently using connected line plots, but do not want to limit myself in any way when it comes to presenting data. I am certain that at one point, I will use every plot available in the matplotlib arsenal. On a 3.2Ghz P4 with 2GB RAM get ~90 seconds for a 100,000 data set, ~50 seconds for 50,000 and ~9 seconds for a 10,000 (sort of linear). This is way to long for my purposes. I was hoping more for ~5 seconds for 100,000 points. John: I routinely plot data sets this large. 500,000 data points is a >I routinely plot data sets this large. 500,000 data points is a >typical 10 seconds of EEG, which is the application that led me to > > That sounds good! >If your xdata are sorted, ie like time, the following > > l = plot(blah, blah) > set(l, 'lod', True) > >could be a big win. > >Whether this is appropriate or not depends on the data set of course, >whether it is continuous, and so on. Can you describe your dataset in >more detail, because I would like to add whatever optimizations are >appropriate -- if others can pipe in here too that would help. > > Will mostly be plotting time Vs value(time) but in certain cases will need plots of other data, and therefore have to look at the worst case scenario. Not exactly sure what you mean by "continuous" since all are descrete data points. The data may not be smooth (could have misbehaving sensors giving garbage) and jump all over the place. >econdly, the standard gdmodule will iterate over the x, y values in a >python loop in gd.py. This is slow for lines with lots of points. I >have a patched gdmodule that I can send you (provide platform info) >that moves this step to the extension module. Potentially a very big >win. > > Yes, that would be great! System info: OS: RedHat9 ( kernel 2.4.20) gcc version from running 'gcc -v': Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.2/specs Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --enable-shared --enable-threads=posix --disable-checking --with-system-zlib --enable-__cxa_atexit --host=i386-redhat-linux Thread model: posix gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5) Python: Python 2.2.2 (#1, Feb 24 2003, 19:13:11) matplotlig: matplotlib-0.50e gdpython: 0.51 (with modified _gdmodule.c) gd: gd-2.0.21 >Another possibility: change backends. The GTK backend is >significantly faster than GD. If you want to work off line (ie, draw >to image only and not display to screen ) and are on a linux box, you >can do this with GTK and Xvfb. I'll give you instructions if >interested. In the next release of matplotlib, there will be a libart >paint backend (cross platform) that may be faster than GD. I'm >working on an Agg backend that should be considerably faster than all >the other backends since it does everything in extension code -- we'll >see > Yes I am only planning to work offline. Want to be able to pipe the output images to stdout. I am looking for the fastest solution possible. Thanks again. Peter |
From: John H. <jdh...@ac...> - 2004-02-11 21:16:46
|
>>>>> "Peter" == Peter Groszkowski <pgr...@ge...> writes: Peter> Will mostly be plotting time Vs value(time) but in certain Peter> cases will need plots of other data, and therefore have to Peter> look at the worst case scenario. Not exactly sure what you Peter> mean by "continuous" since all are descrete data Peter> points. The data may not be smooth (could have misbehaving Peter> sensors giving garbage) and jump all over the place. Bad terminology: for x I meant sorted (monotonic) and for y the ideal cases is smooth and not varying too rapidly. Try the lod feature and see if it works for you. Perhaps it would be better to extend the LOD functionality, so that you control the extent of subsampling. Eg, suppose you have 100,000 x data points but only 1000 pixels of display. Then for every data 100 points you could set the decimation factor, perhaps as a percentage. More generally, we could implement a LOD base class users could supply their own derived instances to subsample the data how they see fit, eg, min and max over the 100 points, and so on. By reshaping the points into a 1000x100 matrix, this could be done in Numeric efficiently. >> econdly, the standard gdmodule will iterate over the x, y >> values in a python loop in gd.py. This is slow for lines with >> lots of points. I have a patched gdmodule that I can send you >> (provide platform info) that moves this step to the extension >> module. Potentially a very big win. Peter> Yes, that would be great! System info: Here is the link http://nitace.bsd.uchicago.edu:8080/files/share/gdmodule-0.52b.tar.gz You must also upgrade gd to 2.0.22 (alas 2.0.21 is obsolete!) since I needed the latest version to get this sucker ported to win32. >> Another possibility: change backends. The GTK backend is >> significantly faster than GD. If you want to work off line >> (ie, draw to image only and not display to screen ) and are on >> a linux box, you can do this with GTK and Xvfb. I'll give you >> instructions if interested. In the next release of matplotlib, >> there will be a libart paint backend (cross platform) that may >> be faster than GD. I'm working on an Agg backend that should >> be considerably faster than all the other backends since it >> does everything in extension code -- we'll see Peter> Yes I am only planning to work offline. Want to be able to Peter> pipe the output images to stdout. I am looking for the Peter> fastest solution possible. I don't know how to write a GTK pixbuf to stdout. I inquired on the pygtk mailing list, so perhaps we'll learn something soon. To use GTK in Xvfb, make sure you have Xvfb (X virtual frame buffer) installed (/usr/X11R6/bin/Xvfb). There is probably an RPM, but I don't remember. You then need to start it with something like XVFB_HOME=/usr/X11R6 $XVFB_HOME/bin/Xvfb :1 -co $XVFB_HOME/lib/X11/rgb -fp $XVFB_HOME/lib/X11/fonts/misc/,$XVFB_HOME/lib/X11/fonts/Speedo/,$XVFB_HOME/lib/X11/fonts/Type1/,$XVFB_HOME/lib/X11/fonts/75dpi/,$XVFB_HOME/lib/X11/fonts/100dpi/ & And connect your display to it > setenv DISPLAY :1 Now you can use gtk as follows from matplotlib.matlab import * from matplotlib.backends.backend_gtk import show_xvfb def f(t): s1 = cos(2*pi*t) e1 = exp(-t) return multiply(s1,e1) t1 = arange(0.0, 5.0, 0.1) t2 = arange(0.0, 5.0, 0.02) t3 = arange(0.0, 2.0, 0.01) subplot(211) plot(t1, f(t1), 'bo', t2, f(t2), 'k') title('A tale of 2 subplots') ylabel('Damped oscillation') subplot(212) plot(t3, cos(2*pi*t3), 'r--') xlabel('time (s)') ylabel('Undamped') savefig('subplot_demo') show_xvfb() # not show! |
From: Peter G. <pgr...@ge...> - 2004-02-12 00:49:14
|
Thanks for the prompt answers. >Bad terminology: for x I meant sorted (monotonic) and for y the ideal >cases is smooth and not varying too rapidly. Try the lod feature and >see if it works for you. > Although the data I'm playing with right now is monotonic (in x), I cannot assume that this will always be the case, and need an efficient solutions for all situations. the 'lod' option in: l = plot(arange(10000), arange(20000,30000)) #dummy data.. 10,000 pairs set(l, 'lod', True) option does not work for me. It's still roughly 1000 points/second > >> econdly, the standard gdmodule will iterate over the x, y > >> values in a python loop in gd.py. This is slow for lines with > >> lots of points. I have a patched gdmodule that I can send you > >> (provide platform info) that moves this step to the extension > >> module. Potentially a very big win. > > Peter> Yes, that would be great! System info: > >Here is the link > >http://nitace.bsd.uchicago.edu:8080/files/share/gdmodule-0.52b.tar.gz > >You must also upgrade gd to 2.0.22 (alas 2.0.21 is obsolete!) since I >needed the latest version to get this sucker ported to win32. > > Installed gd 2.0.22, and gdmodule-0.52b (from the link you provided) but there is no change in the times. Not sure why.. I should probably notice at least a little difference. >I don't know how to write a GTK pixbuf to stdout. I inquired on the >pygtk mailing list, so perhaps we'll learn something soon. To use GTK >in Xvfb, make sure you have Xvfb (X virtual frame buffer) installed >(/usr/X11R6/bin/Xvfb). There is probably an RPM, but I don't >remember. > [...] Installed Xvfb, and ran the little script you included. It complained about: File "/usr/lib/python2.2/site-packages/matplotlib/backends/backend_gtk.py", line 528, in _quit_after_print_xvfb if len(manager.drawingArea._printQued): break AttributeError: FigureManagerGTK instance has no attribute 'drawingArea' Didn't inquire further because in my case it is crucial to have stdout output.. I have to be able to pipe these plots to cgi scrips. If you have any other ideas, please let me know. Can anyone else tell me what kind of performance they're getting doing these 10k, 50k, 100k plots? Best, Peter |
From: John H. <jdh...@ac...> - 2004-02-12 04:26:33
|
>>>>> "Peter" == Peter Groszkowski <pgr...@ge...> writes: Peter> Although the data I'm playing with right now is monotonic Peter> (in x), I cannot assume that this will always be the case, Peter> and need an efficient solutions for all situations. Agreed. Peter> the 'lod' option in: l = plot(arange(10000), Peter> arange(20000,30000)) #dummy data.. 10,000 pairs set(l, Peter> 'lod', True) option does not work for me. It's still Peter> roughly 1000 points/second I left out a *critical* detail. The new gd backend code implements antialiased drawing by default. Very slow. Check out the numbers below based on the demo script you supplied backend = 'GD' import matplotlib matplotlib.use(backend) from matplotlib.matlab import * l = plot(arange(10000), arange(20000,30000)) #dummy data.. 10,000 pairs lod, aa = False, False print 'Backend: %s, LOD %d, AA %d' % (backend, lod, aa) set(l, 'lod', lod, 'antialiased', aa) savefig('test') Backend: GD, LOD 1, AA 1 23.770u 0.030s 0:23.77 100.1% 0+0k 0+0io 793pf+0w Backend: GD, LOD 0, AA 1 23.500u 0.020s 0:23.52 100.0% 0+0k 0+0io 793pf+0w Backend: GD, LOD 1, AA 0 0.270u 0.000s 0:00.28 96.4% 0+0k 0+0io 794pf+0w Backend: GD, LOD 0, AA 0 0.240u 0.030s 0:00.27 100.0% 0+0k 0+0io 794pf+0w In other words, if you are using the new GD in it's default configuration, you are paying a *100 fold performance hit* for antialiased line drawing. Without it, I can draw and save your figure (including python startup time, etc, etc) in 0.25s on a 2GHz Pentium 4. Is this in the ballpark for you, performance wise? While we're on the subject of performance, I took the opportunity to test the other backends. Note the numbers are not strictly comparable (discussed below) but are informative. Backend: Paint, LOD 0, AA 0 0.520u 0.000s 0:00.52 100.0% 0+0k 0+0io 726pf+0w Backend: PS, LOD 0, AA 0 1.030u 0.040s 0:01.08 99.0% 0+0k 0+0io 582pf+0w Backend: Agg, LOD 0, AA 0 0.320u 0.010s 0:00.28 117.8% 0+0k 0+0io 681pf+0w Backend: GTK, LOD 0, AA 0 0.650u 0.020s 0:00.66 101.5% 0+0k 0+0io 3031pf+0w The GTK results are in xvfb so it appears to be a no-go for you even if we could figure out how to print to stdout. These numbers are repeatable and consistent. Worthy of comment: * GD with antialiased off wins * paint is not as fast as I hoped * GTK is not as fast as I thought * Agg is an interesting case. It is doing antialiased drawing despite the AA 0 flag because I haven't made this conditional in the backend. It draws antialised unconditionally currently. But it hasn't implemented text yet. So it's not strictly comparable, but it is noteworthy that it is 100 times faster than GD at AA lines. It remains to be seen what speed we can get with plain vanilla aliased rendering. My guess is: when you turn off antialiasing you'll be a whole lot happier. Let me know. The last thing I looked at was how the GD numbers scale with line size. Below, N is the number of data points (with LOD false the numbers are very close to these results where LOD is true) Backend: GD, LOD 1, AA 0, N 10000 0.230u 0.040s 0:00.24 112.5% 0+0k 0+0io 794pf+0w Backend: GD, LOD 1, AA 0, N 20000 0.260u 0.060s 0:00.31 103.2% 0+0k 0+0io 794pf+0w Backend: GD, LOD 1, AA 0, N 40000 0.390u 0.030s 0:00.41 102.4% 0+0k 0+0io 794pf+0w Backend: GD, LOD 1, AA 0, N 80000 0.590u 0.060s 0:00.60 108.3% 0+0k 0+0io 815pf+0w Backend: GD, LOD 1, AA 0, N 160000 1.070u 0.090s 0:01.13 102.6% 0+0k 0+0io 818pf+0w JDH |
From: Peter G. <pgr...@ge...> - 2004-02-12 19:16:02
|
John: Thanks very much for your investigative work. >antialiased line drawing. Without it, I can draw and save your figure >(including python startup time, etc, etc) in 0.25s on a 2GHz Pentium >4. Is this in the ballpark for you, performance wise? > > yes.. yes..yes.. >My guess is: when you turn off antialiasing you'll be a whole lot >happier. Let me know. > > With antialiasing off, the performance is superb!.. I plot 500,000 points in ~4-5 seconds.. The visual quality of the graphs is (naturally) inferior to the antialiased counterparts, but the software is now feasible for my purposes. Just couple more questions: 1) Seems like setting 'lod' to true does not improve performance? I would imagine it should, because it limits the amount of points used. What am I missing? 2) Is there any way to make the graphs look "prettier"? They really look quite OK but in some cases having a little more detail would be nice. Is it possible specify just how much antialiasing is needed? Are there any other "visual enchantment options" that can be set, and will not impact performace too much? 3) When I do: plot1 = plot(arange(10000), arange(20000,30000)) #dummy data.. 10,000 pairs lod, aa = False, False set(l, 'lod', lod, 'antialiased', aa) Do these options only apply to the current plot (ie. plot1)? Is it possible to have a plot inside a plot with one being antialiased, and the other one not? Do I have to re-set them after I call savefig() (Will test this.. ) I have been playing around with the dpi setting a little. Is it supposed to change the size of the image and/or the resolution?? Thanks again. -- Peter Groszkowski Gemini Observatory Tel: +1 808 974-2509 670 N. A'ohoku Place Fax: +1 808 935-9235 Hilo, Hawai'i 96720, USA |
From: John H. <jdh...@ac...> - 2004-02-12 19:58:41
|
>>>>> "Peter" == Peter Groszkowski <pgr...@ge...> writes: >> Peter> With antialiasing off, the performance is superb!.. I plot Peter> 500,000 points in ~4-5 seconds.. The visual quality of the Peter> graphs is (naturally) inferior to the antialiased Peter> counterparts, but the software is now feasible for my Peter> purposes. Glad to hear it... The next big performance boost will come from some frontend refactoring along the lines Perry discussed, but I'm glad to hear it's usable for you now. Peter> 1) Seems like setting 'lod' to true does not improve Peter> performance? I would imagine it should, because it limits Peter> the amount of points used. What am I missing? I'll look into this further. In the special case of EEG (128 channels plotted simultaneously over the same time axis), I do see significant benefits but I cache the sampling indexes from one line to the next. It may be that for single lines the time it takes to do the subsampling balances the time it takes to plot them in a fast backend. Peter> 2) Is there any way to make the graphs look "prettier"? Peter> They really look quite OK but in some cases having a little Peter> more detail would be nice. Is it possible specify just how Peter> much antialiasing is needed? Are there any other "visual Peter> enchantment options" that can be set, and will not impact Peter> performace too much? Well, fortunately for you, I just finished the agg backend this morning. This backend draws antialiased lines as fast as GD draws unaliased lines. I still don't have support for turning off antialiasing in agg, but it sounds like you want to have it. Also, it doesn't suffer from a known color allocation and fill bug that GD has. See install instructions at the end of this email. Peter> 3) When I do: plot1 = plot(arange(10000), arange(20000,30000)) lod, aa = False, False set(l, 'lod', lod, 'antialiased', aa) This code isn't correct. plot returns a list of lines. The set command should operate on that list of lines. It applies only to the lines returned. So *you can* apply antialising with respect to one set of lines and not another, in the same axes lines1 = plot(arange(10000), arange(20000,30000)) set(lines1, 'antialiased', False) lines2 = plot([1,2,3]) # a small plot set(lines2, 'antialiased', True) Now lines1 is aliased and lines2 is antialiased. Peter> I have been playing around with the dpi setting a Peter> little. Is it supposed to change the size of the image Peter> and/or the resolution?? The figure size in pixels is determined by the figsize parameter and dpi. width, height = figsize width *= dpi height *= dpi Everything scales with DPI, line width, text size, dash spacing, etc.. So the answer to your question is: both figure size and resolution increase with dpi. If you want to change figure size w/o changing resolution, change the figsize argument to figure. The agg backend Warning: you will be the first agg crash test dummy. I just ran a suite of examples across all backends and agg was the fastest - it's even faster than template, which does no rendering or filesaving! And in my opinion it also produced the highest quality output. Features that are implemented * capstyles and join styles * dashes * linewidth * lines, rectangles, ellipses, polygone * clipping to a rectangle * output to RGBA and PNG * alpha blending * DPI scaling - (dashes, linewidths, fontsizes, etc) * freetype1 TODO: * use ttf manager to get font - right now I just use Vera INSTALLING Grab the latest matplotlib from http://nitace.bsd.uchicago.edu:8080/files/share/matplotlib-0.50l.tar.gz REQUIREMENTs python2.2+ Numeric 22+ agg2 (see below) freetype 1 libpng libz ? Install AGG2 (cut and paste below into xterm should work) wget http://www.antigrain.com/agg2.tar.gz tar xvfz agg2.tar.gz cd agg2 make (Optional) if you want to make the examples: cd examples/X11 make Installing backend_agg Edit setup.py: change aggsrc to point to the agg2 src tree and replace if 0: with if 1: in the backend_agg section Then just do the usual thing: python setup.py build Please let me know if you encounter build problems, and tell me platform, gcc version, etc... Currently the paths in setupext.py assume as linux like filesystem (eg X11 include dir, location of libttf, etcc) so you may need to tweak these. But if I recall correctly, we're both on RHL9 so you shouldn't have a problem. Using agg backend python somefile.py -dAgg or import matplotlib matplotlib.use('Agg') Let me know how it works out! Note also that backend agg is the first backend to support alpha blending; see scatter_demo2.py. JDH |
From: Peter G. <pgr...@ge...> - 2004-02-12 20:16:27
|
> > > Peter> 3) When I do: > > plot1 = plot(arange(10000), arange(20000,30000)) > lod, aa = False, False > set(l, 'lod', lod, 'antialiased', aa) > >This code isn't correct. plot returns a list of lines. The set > > Of course.. changed the 'l' to 'plot1' because I hate how my 'l's look like '1's.. But i didn't do the same in the 'set' command.. I meant it though.. :) > lines1 = plot(arange(10000), arange(20000,30000)) > set(lines1, 'antialiased', False) > > lines2 = plot([1,2,3]) # a small plot > set(lines2, 'antialiased', True) > >Now lines1 is aliased and lines2 is antialiased. > > Great! This proides awesome flexibility! Thanks for all the info. I will get a usable-skeleton-proof-of-concept-type app going with GD first, and once that is working, will get experiment with agg. Peter |
From: Perry G. <pe...@st...> - 2004-02-12 19:44:48
|
Peter Groszkowski wrote: > > > Yes I am only planning to work offline. Want to be able to pipe the > output images to stdout. I am looking for the fastest solution possible. > Following up on this, I was curious what exactly you meant by this. A stream of byte values in ascii separated by spaces? Or the actual binary bytes? If the latter, it wouldn't appear to be difficult to write a C extension to return the image as a string, but I'm figuring there is more to it than that since the representations for the image structure can change from one backend to another. Perry |
From: Peter G. <pgr...@ge...> - 2004-02-12 20:05:33
|
Perry Greenfield wrote: That was a response to: >If you want to work off line (ie, draw >to image only and not display to screen ) and are on a linux box, you >can do this with GTK and Xvfb Perhaps it was a misinterpretation on my part. All I meant was that I do not need to look at the images through any of the standard tools (ie, via show()). For my purposes I need to write the image to stdout (not to disk) in its binary form - which is what I do now after modifying some stuff in backend_gd.py. >Peter Groszkowski wrote: > > >>Yes I am only planning to work offline. Want to be able to pipe the >>output images to stdout. I am looking for the fastest solution possible. >> >> >> >Following up on this, I was curious what exactly you meant by >this. A stream of byte values in ascii separated by spaces? >Or the actual binary bytes? If the latter, it wouldn't appear >to be difficult to write a C extension to return the image >as a string, but I'm figuring there is more to it than that >since the representations for the image structure can change >from one backend to another. > >Perry > > > |