Thread: [Matplotlib-users] Can the postscript backend be persuaded to do landscape instead of portrait?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I have some plots I'd like to print out and they would make better use
of the paper if they were done landscape.

Can the postscript backend do this?

John

>>>>> "John" == John Gill <jn...@eu...> writes:

    John> I have some plots I'd like to print out and they would make
    John> better use of the paper if they were done landscape.

    John> Can the postscript backend do this?

Hi John,

I haven't had time to take a close look at this.  My initial
suggestions is to experiment with the paper size

import matplotlib
matplotlib.use('PS')
import matplotlib.backends.backend_ps as backend_ps
backend_ps.defaultPaperSize = 11,8.5  # default is 8.5, 11

You may also have to specify a landscape portrait at print time, or
rotate it.

I'll take a closer look later.

JDH

John,

I've looked into this a bit more.

I think I'll have it working shortly, seems I have to do three things to
get landscape:

  * set the paper size
  * add a '90 rotate' incantation to the postscript at a suitable point
  * add a 'x y translate' command to adjust the origin 

Getting last bit right is making my head hurt, but I should have
something working by tomorrow.

John

On Tue, 2004-02-10 at 15:05, John Hunter wrote:
> >>>>> "John" == John Gill <jn...@eu...> writes:
> 
>     John> I have some plots I'd like to print out and they would make
>     John> better use of the paper if they were done landscape.
> 
>     John> Can the postscript backend do this?
> 
> Hi John,
> 
> I haven't had time to take a close look at this.  My initial
> suggestions is to experiment with the paper size
> 
> import matplotlib
> matplotlib.use('PS')
> import matplotlib.backends.backend_ps as backend_ps
> backend_ps.defaultPaperSize = 11,8.5  # default is 8.5, 11
> 
> You may also have to specify a landscape portrait at print time, or
> rotate it.
> 
> I'll take a closer look later.
> 
> JDH
> 
> 
> 

Hello:

We will be dealing with large (> 100,000 but in some instances as big as 
500,000 points) data sets. They are to be plotted, and I would like to 
use matplotlib. I did a few preliminary tests, and it seems like 
plotting that many pairs is a little too much for the system to handle. 
Currently we are using (as a backend to some other software) gnuplot for 
doing this plotting. It seems to be "lighting-fast" but I suspect (may 
be wrong!) that it reduces this data before the plotting takes place, 
and only selects every nth point. I have to go through the code that 
calls it to be certain. I would imagine that it is not necessary to get 
evrey point in 100,000 to produce a page-size plot, but I'm not sure if 
simply grabbing every nth point and reducing the data like that is the 
best way about this. So my question is to anyone else out there who is 
also dealing with these large (and very large) data sets? What do you 
do? Any library routines that you use before plotting to massage that 
data? Are there any ways (ie flags to set) to optimize this in 
matplotlib? Any other software you use? I should note that I use the GD 
backend and pipe the output to stdout for a cgi scrpit to pick up.

Thanks.

-- 
Peter Groszkowski         Gemini Observatory
Tel: +1 808 974-2509      670 N. A'ohoku Place
Fax: +1 808 935-9235      Hilo, Hawai'i 96720, USA

How are you plotting the data? As a scatter plot (e.g., symbols
or points) or as a connected line plot. The former can be quite
a bit slower and we have some thoughts about speeding that up
(which we haven't broached with JDH yet). How long is it taking
and how much faster do you need it?

Perry Greenfield

> -----Original Message-----
> From: mat...@li...
> [mailto:mat...@li...]On Behalf Of Peter
> Groszkowski
> Sent: Wednesday, February 11, 2004 2:14 PM
> To: mat...@li...
> Subject: [Matplotlib-users] large data sets and performance
> 
> 
> Hello:
> 
> We will be dealing with large (> 100,000 but in some instances as big as 
> 500,000 points) data sets. They are to be plotted, and I would like to 
> use matplotlib. I did a few preliminary tests, and it seems like 
> plotting that many pairs is a little too much for the system to handle. 
> Currently we are using (as a backend to some other software) gnuplot for 
> doing this plotting. It seems to be "lighting-fast" but I suspect (may 
> be wrong!) that it reduces this data before the plotting takes place, 
> and only selects every nth point. I have to go through the code that 
> calls it to be certain. I would imagine that it is not necessary to get 
> evrey point in 100,000 to produce a page-size plot, but I'm not sure if 
> simply grabbing every nth point and reducing the data like that is the 
> best way about this. So my question is to anyone else out there who is 
> also dealing with these large (and very large) data sets? What do you 
> do? Any library routines that you use before plotting to massage that 
> data? Are there any ways (ie flags to set) to optimize this in 
> matplotlib? Any other software you use? I should note that I use the GD 
> backend and pipe the output to stdout for a cgi scrpit to pick up.
> 
> Thanks.
> 
> -- 
> Peter Groszkowski         Gemini Observatory
> Tel: +1 808 974-2509      670 N. A'ohoku Place
> Fax: +1 808 935-9235      Hilo, Hawai'i 96720, USA
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click
> _______________________________________________
> Matplotlib-users mailing list
> Mat...@li...
> https://lists.sourceforge.net/lists/listinfo/matplotlib-users
> 

>>>>> "Peter" == Peter Groszkowski <pgr...@ge...> writes:

    Peter> Hello: We will be dealing with large (> 100,000 but in some
    Peter> instances as big as 500,000 points) data sets. They are to
    Peter> be plotted, and I would like to use matplotlib.

Are you working with plot/loglog/etc (line data) or
pcolor/hist/scatter/bar (patch data)?

I routinely plot data sets this large.  500,000 data points is a
typical 10 seconds of EEG, which is the application that led me to
write matplotlib.  EEG is fairly special: the x axis time is
monotonically increasing and the y axis is smooth.  This lets me take
advantage of level of detail subsampling.

If your xdata are sorted, ie like time, the following

  l = plot(blah, blah)
  set(l, 'lod', True)

could be a big win.  LOD is "Level of Detail" and if true subsamples
the data according to the pixel width of the output, as you described.
Whether this is appropriate or not depends on the data set of course,
whether it is continuous, and so on.  Can you describe your dataset in
more detail, because I would like to add whatever optimizations are
appropriate -- if others can pipe in here too that would help.

Secondly, the standard gdmodule will iterate over the x, y values in a
python loop in gd.py.  This is slow for lines with lots of points.  I
have a patched gdmodule that I can send you (provide platform info)
that moves this step to the extension module.  Potentially a very big
win.

Another possibility: change backends.  The GTK backend is
significantly faster than GD.  If you want to work off line (ie, draw
to image only and not display to screen ) and are on a linux box, you
can do this with GTK and Xvfb.  I'll give you instructions if
interested.  In the next release of matplotlib, there will be a libart
paint backend (cross platform) that may be faster than GD.  I'm
working on an Agg backend that should be considerably faster than all
the other backends since it does everything in extension code -- we'll
see :-).

JDH

John Hunter writes:
> 
> could be a big win.  LOD is "Level of Detail" and if true subsamples
> the data according to the pixel width of the output, as you described.
> Whether this is appropriate or not depends on the data set of course,
> whether it is continuous, and so on.  Can you describe your dataset in
> more detail, because I would like to add whatever optimizations are
> appropriate -- if others can pipe in here too that would help.
> 
>
What I was alluding to was that if a backend primitive was added that
allowed plotting a symbol (patch?) or point for an array of points.
The base implementation would just do a python loop over the single
point case so there is no requirement for a backend to overload this
call. But it could do so if it wanted to loop over all points in C. 
How flexible to make this is open to discussion (e.g., allowing
x, and y scaling factors, as arrays, for the symbol to be plotted, and
other attributes that may vary with point such as color)

Perry 

>>>>> "Perry" == Perry Greenfield <pe...@st...> writes:

    Perry> What I was alluding to was that if a backend primitive was
    Perry> added that allowed plotting a symbol (patch?) or point for
    Perry> an array of points.  The base implementation would just do
    Perry> a python loop over the single point case so there is no
    Perry> requirement for a backend to overload this call. But it
    Perry> could do so if it wanted to loop over all points in C.  How
    Perry> flexible to make this is open to discussion (e.g., allowing
    Perry> x, and y scaling factors, as arrays, for the symbol to be
    Perry> plotted, and other attributes that may vary with point such
    Perry> as color)

To make this work in the current design, you'll need more than a new
backend method.

Plot commands like scatter instantiate Artists (Circle) and add them
to the Axes as a generic patch instances.  On a call to draw, the Axes
instance iterates over all of it's patch instances and forwards the
call on to the artists it contains.  These, in turn instantiate gc
instances which contain information like linewidth, facecolor,
edgecolor, alpha , etc...  The patch instance also transforms its data
into display units and calls the relevant backend method.  Eg, a
Circle instance would call

  renderer.draw_arc(gc, x, y, width, ...)

This makes it relatively easy to write a backend since you only have
to worry about 1 coordinate system (display) and don't need to know
anything about the Artist objects (Circle, Line, Rectangle, Text, ...)

The point is that no existing entity knows that a collection of
patches are all circles, and noone is keeping track of whether they
share a property or not.  This buys you total flexibility to set
individual properties, but you pay for it in performance, since you
have to set every property for every object and call render methods
for each one, and so on.

My first response to this problem was to use a naive container class,
eg Circles, and an appropriate backend method, eg, draw_circles.  In
this case, scatter would instantiate a Circles instance with a list of
circles.  When Circles was called to render, it would need to create a
sequence of location data and a sequence of gcs

  locs = [ (x0, y0, w0, h0), (x1, y1, w1, h1), ...]
  gcs =  [ circ0.get_gc(), circ1.get_gc(), ...]  

and then call

  renderer.draw_ellipses( locs, gcs).

This would provide some savings, but probably not dramatic ones.  The
backends would need to know how to read the GCs.  In backend_agg
extension code, I've implemented the code (in CVS) to read the python
GraphicsContextBase information using the python API.  

  _gc_get_linecap
  _gc_get_joinstyle
  _gc_get_color  # returns rgb

This is kind of backward, implementing an object in python and then
accessing it at the extension level code using the Python API, but it
does keep as much of the frontend in python as possible, which is
desirable.  The point is that for your approach to work and to not
break encapsulation, the backends have to know about the GC.

The discussion above was focused on preserving all the individual
properties of the actors (eg every circle can have it's own linewidth,
color, alpha, dash style).  But this is rare. Usually, we just want to
vary one or two properties across a large collection, eg, color in
pcolor and size and color in scatter.

Much better is to implement a GraphicsContextCollection, where the
relevant properties can be either individual elements or
len(collection) sequences.  If a property is an element, it's
homogeneous across the collection.  If it's len(collection), iterate
over it.  The CircleCollection, instead of storing individual Circle
instances as I wrote about above, stores just the location and size
data in arrays and a single GraphicsContextCollection.

def scatter(x, y, s, c):

  collection = CircleCollection(x, y, s)
  gc = GraphicsContextCollection()
  gc.set_linewidth(1.0)  # a single line width
  gc.set_foreground(c)   # a len(x) array of facecolors
  gc.set_edgecolor('k')  # a single edgecolor

  collection.set_gc(gc)

  axes.add_collection(collection)
  return collection

And this will be blazingly fast compared to the solution above, since,
for example, you transform the x, y, and s coordinates as numeric
arrays rather than individually.  And there is almost no function call
overhead.  And as you say, if the backend doesn't implement a
draw_circles method, the CircleCollection can just fall back on
calling the existing methods in a loop.

Thoughts?

JDH

John Hunter writes:
> >>>>> "Perry" == Perry Greenfield <pe...@st...> writes:
> 
>     Perry> What I was alluding to was that if a backend primitive was
>     Perry> added that allowed plotting a symbol (patch?) or point for
>     Perry> an array of points.  The base implementation would just do
>     Perry> a python loop over the single point case so there is no
>     Perry> requirement for a backend to overload this call. But it
>     Perry> could do so if it wanted to loop over all points in C.  How
>     Perry> flexible to make this is open to discussion (e.g., allowing
>     Perry> x, and y scaling factors, as arrays, for the symbol to be
>     Perry> plotted, and other attributes that may vary with point such
>     Perry> as color)
> 
> To make this work in the current design, you'll need more than a new
> backend method.
> 
[much good explanation of why...]

OK, I understand.

> My first response to this problem was to use a naive container class,
> eg Circles, and an appropriate backend method, eg, draw_circles.  In
> this case, scatter would instantiate a Circles instance with a list of
> circles.  When Circles was called to render, it would need to create a
> sequence of location data and a sequence of gcs
[...]
I'd agree that this doesn't seem worth the trouble
> 
> Much better is to implement a GraphicsContextCollection, where the
> relevant properties can be either individual elements or
> len(collection) sequences.  If a property is an element, it's
> homogeneous across the collection.  If it's len(collection), iterate
> over it.  The CircleCollection, instead of storing individual Circle
> instances as I wrote about above, stores just the location and size
> data in arrays and a single GraphicsContextCollection.
> 
> def scatter(x, y, s, c):
> 
>   collection = CircleCollection(x, y, s)
>   gc = GraphicsContextCollection()
>   gc.set_linewidth(1.0)  # a single line width
>   gc.set_foreground(c)   # a len(x) array of facecolors
>   gc.set_edgecolor('k')  # a single edgecolor
> 
>   collection.set_gc(gc)
> 
>   axes.add_collection(collection)
>   return collection
> 
> And this will be blazingly fast compared to the solution above, since,
> for example, you transform the x, y, and s coordinates as numeric
> arrays rather than individually.  And there is almost no function call
> overhead.  And as you say, if the backend doesn't implement a
> draw_circles method, the CircleCollection can just fall back on
> calling the existing methods in a loop.
> 
> Thoughts?
>         
I like the sounds of this approach even more. But I wonder if
it can be made somewhat more generic. This approach (if I read
it correctly seems to need a backend function for each shape:
perhaps only for circle?). What I was thinking was if there was a way
to pass it the vectors or path for a symbol (for very often, 
many points will share the same shape, if not all the same x,y
scale). Here the circle is a bit of a special case compared to
crosses, error bars triangles and other symbols that are usually
made up of a few straight lines. In these cases you could pass
the backend the context collection along with the shape
(and perhaps some scaling info if that isn't part of the context).
That way only one backend routine is needed. 

I suppose circle and other curved items could be handled with 
A bezier type call. 

But perhaps I still misunderstand.

Thanks for your very detailed response.

Perry

>>>>> "Perry" == Perry Greenfield <pe...@st...> writes:
    Perry> I like the sounds of this approach even more. But I wonder
    Perry> if it can be made somewhat more generic. This approach (if
    Perry> I read it correctly seems to need a backend function for
    Perry> each shape: perhaps only for circle?). What I was thinking
    Perry> was if there was a way to pass it the vectors or path for a
    Perry> symbol (for very often, many points will share the same
    Perry> shape, if not all the same x,y scale). 

Of course (slaps self on head).  matplotlib 0.1 was designed around
gtk drawing which doesn't support paths.  Although I've been mumbling
about adding paths for sometime (what with paint, ps, and agg), I'm
still thinking inside the box.  A collection of paths is the natural
solution

    Perry> I suppose circle and other curved items could be handled
    Perry> with A bezier type call.

Agg special cases this one with a dedicated ellipse function.  

  ellipse(ex, y, width, height, numsteps)

It's still a path, but you have a dedicated function to build that
path up speedily.

One potential drawback: how do you bring along the other backends that
don't have path support?  In the RectangleCollection approach, we can
always fall back on draw_rectangle.  In the path collection, it's more
difficult.

  backend_gtk (pygtk)     - no support for paths AFAIK
  backend_wx (wxpython)   - no support for paths AFAIK; Jeremy?
  backend_ps              - full path support
  backend_agg             - ditto
  backend_gd              - partial, I think; gotta check
  backend_paint (libart)  - full, perhaps with bugs

JDH

John Hunter writes:
> >>>>> "Perry" == Perry Greenfield <pe...@st...> writes:
>     Perry> I like the sounds of this approach even more. But I wonder
>     Perry> if it can be made somewhat more generic. This approach (if
>     Perry> I read it correctly seems to need a backend function for
>     Perry> each shape: perhaps only for circle?). What I was thinking
>     Perry> was if there was a way to pass it the vectors or path for a
>     Perry> symbol (for very often, many points will share the same
>     Perry> shape, if not all the same x,y scale). 
> 
> Of course (slaps self on head).  matplotlib 0.1 was designed around
> gtk drawing which doesn't support paths.  Although I've been mumbling
> about adding paths for sometime (what with paint, ps, and agg), I'm
> still thinking inside the box.  A collection of paths is the natural
> solution
> 
Based on your previous description of collection of circles, I think
so (though I wonder about the name, paths may imply many independent
paths whereas I'm implying that sharing of one path for a collection
of points. Since circles are all identical in shape, that confusion
doesn't exist with the plural.

But I can see this approach being used for things like error bars
(one can view them as scalable symbols).

>     Perry> I suppose circle and other curved items could be handled
>     Perry> with A bezier type call.
> 
> Agg special cases this one with a dedicated ellipse function.  
> 
>   ellipse(ex, y, width, height, numsteps)
> 
> It's still a path, but you have a dedicated function to build that
> path up speedily.
> 
> One potential drawback: how do you bring along the other backends that
> don't have path support?  In the RectangleCollection approach, we can
> always fall back on draw_rectangle.  In the path collection, it's more
> difficult.
> 
Maybe I'm still missing something, but couldn't paths be implemented
using the backend lines primitive? After all, any path is a finite
set of lines (unless you are using bezier curves). And if lines
are available in python, the loop could also be coded in C.
Now it is true that some backends don't have the concept of
defining a path that can be reused with different coordinate 
transforms. But that isn't needed for the functionality of rendering
it, is it. It just makes it a bit more work to keep rendering the
same set of points with different offsets and scales (i.e., you 
must keep giving the transformed path array(s) to the backend to render
within a loop (in python or C). Right?

Perry
> 

Perry:
Currently using connected line plots, but do not want to limit myself in 
any way when it comes to presenting data. I am certain that at one 
point, I will use every plot available in the matplotlib arsenal. On a 
3.2Ghz P4 with 2GB RAM get ~90 seconds for a 100,000 data set, ~50 
seconds for 50,000 and ~9 seconds for a 10,000 (sort of linear). This is 
way to long for my purposes. I was hoping more for ~5 seconds for 
100,000 points.

John:

I routinely plot data sets this large.  500,000 data points is a

>I routinely plot data sets this large.  500,000 data points is a
>typical 10 seconds of EEG, which is the application that led me to
>  
>
That sounds good!

>If your xdata are sorted, ie like time, the following
>
>  l = plot(blah, blah)
>  set(l, 'lod', True)
>
>could be a big win.
>
>Whether this is appropriate or not depends on the data set of course,
>whether it is continuous, and so on.  Can you describe your dataset in
>more detail, because I would like to add whatever optimizations are
>appropriate -- if others can pipe in here too that would help.
>  
>
Will mostly be plotting time Vs value(time) but in certain cases will 
need plots of other data, and therefore have to look at the worst case 
scenario.
Not exactly sure what you mean by "continuous" since all are descrete 
data points. The data may not be smooth (could have misbehaving sensors 
giving garbage) and jump all over the place.

>econdly, the standard gdmodule will iterate over the x, y values in a
>python loop in gd.py.  This is slow for lines with lots of points.  I
>have a patched gdmodule that I can send you (provide platform info)
>that moves this step to the extension module.  Potentially a very big
>win.
>  
>
Yes, that would be great!
System info:

OS: RedHat9 ( kernel 2.4.20)

gcc version from running 'gcc -v':
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/3.2.2/specs
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
--infodir=/usr/share/info --enable-shared --enable-threads=posix 
--disable-checking --with-system-zlib --enable-__cxa_atexit 
--host=i386-redhat-linux
Thread model: posix
gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)

Python: Python 2.2.2 (#1, Feb 24 2003, 19:13:11)

matplotlig: matplotlib-0.50e

gdpython: 0.51 (with modified _gdmodule.c)

gd: gd-2.0.21

>Another possibility: change backends.  The GTK backend is
>significantly faster than GD.  If you want to work off line (ie, draw
>to image only and not display to screen ) and are on a linux box, you
>can do this with GTK and Xvfb.  I'll give you instructions if
>interested.  In the next release of matplotlib, there will be a libart
>paint backend (cross platform) that may be faster than GD.  I'm
>working on an Agg backend that should be considerably faster than all
>the other backends since it does everything in extension code -- we'll
>see 
>
Yes I am only planning to work offline. Want to be able to pipe the 
output images to stdout. I am looking for the fastest solution possible.

Thanks again.
Peter

>>>>> "Peter" == Peter Groszkowski <pgr...@ge...> writes:

    Peter> Will mostly be plotting time Vs value(time) but in certain
    Peter> cases will need plots of other data, and therefore have to
    Peter> look at the worst case scenario.  Not exactly sure what you
    Peter> mean by "continuous" since all are descrete data
    Peter> points. The data may not be smooth (could have misbehaving
    Peter> sensors giving garbage) and jump all over the place.

Bad terminology: for x I meant sorted (monotonic) and for y the ideal
cases is smooth and not varying too rapidly.  Try the lod feature and
see if it works for you.

Perhaps it would be better to extend the LOD functionality, so that
you control the extent of subsampling.  Eg, suppose you have 100,000 x
data points but only 1000 pixels of display.  Then for every data 100
points you could set the decimation factor, perhaps as a percentage.

More generally, we could implement a LOD base class users could supply
their own derived instances to subsample the data how they see fit,
eg, min and max over the 100 points, and so on.  By reshaping the
points into a 1000x100 matrix, this could be done in Numeric
efficiently.  

    >> econdly, the standard gdmodule will iterate over the x, y
    >> values in a python loop in gd.py.  This is slow for lines with
    >> lots of points.  I have a patched gdmodule that I can send you
    >> (provide platform info) that moves this step to the extension
    >> module.  Potentially a very big win.

    Peter> Yes, that would be great!  System info:

Here is the link

http://nitace.bsd.uchicago.edu:8080/files/share/gdmodule-0.52b.tar.gz

You must also upgrade gd to 2.0.22 (alas 2.0.21 is obsolete!) since I
needed the latest version to get this sucker ported to win32.

    >> Another possibility: change backends.  The GTK backend is
    >> significantly faster than GD.  If you want to work off line
    >> (ie, draw to image only and not display to screen ) and are on
    >> a linux box, you can do this with GTK and Xvfb.  I'll give you
    >> instructions if interested.  In the next release of matplotlib,
    >> there will be a libart paint backend (cross platform) that may
    >> be faster than GD.  I'm working on an Agg backend that should
    >> be considerably faster than all the other backends since it
    >> does everything in extension code -- we'll see

    Peter> Yes I am only planning to work offline. Want to be able to
    Peter> pipe the output images to stdout. I am looking for the
    Peter> fastest solution possible.

I don't know how to write a GTK pixbuf to stdout.  I inquired on the
pygtk mailing list, so perhaps we'll learn something soon.  To use GTK
in Xvfb, make sure you have Xvfb (X virtual frame buffer) installed
(/usr/X11R6/bin/Xvfb).  There is probably an RPM, but I don't
remember.

You then need to start it with something like

XVFB_HOME=/usr/X11R6

$XVFB_HOME/bin/Xvfb :1 -co $XVFB_HOME/lib/X11/rgb -fp $XVFB_HOME/lib/X11/fonts/misc/,$XVFB_HOME/lib/X11/fonts/Speedo/,$XVFB_HOME/lib/X11/fonts/Type1/,$XVFB_HOME/lib/X11/fonts/75dpi/,$XVFB_HOME/lib/X11/fonts/100dpi/ &

And connect your display to it

> setenv DISPLAY :1

Now you can use gtk as follows 

from matplotlib.matlab import *
from matplotlib.backends.backend_gtk import show_xvfb
def f(t):
    s1 = cos(2*pi*t)
    e1 = exp(-t)
    return multiply(s1,e1)

t1 = arange(0.0, 5.0, 0.1)
t2 = arange(0.0, 5.0, 0.02)
t3 = arange(0.0, 2.0, 0.01)

subplot(211)
plot(t1, f(t1), 'bo', t2, f(t2), 'k')
title('A tale of 2 subplots')
ylabel('Damped oscillation')

subplot(212)
plot(t3, cos(2*pi*t3), 'r--')
xlabel('time (s)')
ylabel('Undamped')

savefig('subplot_demo')
show_xvfb()  # not show!

Thanks for the prompt answers.

>Bad terminology: for x I meant sorted (monotonic) and for y the ideal
>cases is smooth and not varying too rapidly.  Try the lod feature and
>see if it works for you.
>
Although the data I'm playing with right now is monotonic (in x), I 
cannot assume that this will always be the case, and need an efficient 
solutions for all situations.

the 'lod' option in:
l = plot(arange(10000), arange(20000,30000))    #dummy data.. 10,000 pairs
set(l, 'lod', True)
option does not work for me. It's still roughly 1000 points/second

>    >> econdly, the standard gdmodule will iterate over the x, y
>    >> values in a python loop in gd.py.  This is slow for lines with
>    >> lots of points.  I have a patched gdmodule that I can send you
>    >> (provide platform info) that moves this step to the extension
>    >> module.  Potentially a very big win.
>
>    Peter> Yes, that would be great!  System info:
>
>Here is the link
>
>http://nitace.bsd.uchicago.edu:8080/files/share/gdmodule-0.52b.tar.gz
>
>You must also upgrade gd to 2.0.22 (alas 2.0.21 is obsolete!) since I
>needed the latest version to get this sucker ported to win32.
>  
>
Installed gd 2.0.22, and gdmodule-0.52b (from the link you provided) but 
there is no change in the times. Not sure why.. I should probably notice 
at least a little difference.

>I don't know how to write a GTK pixbuf to stdout.  I inquired on the
>pygtk mailing list, so perhaps we'll learn something soon.  To use GTK
>in Xvfb, make sure you have Xvfb (X virtual frame buffer) installed
>(/usr/X11R6/bin/Xvfb).  There is probably an RPM, but I don't
>remember.
>
[...]
Installed Xvfb, and ran the little script you included. It complained about:
File 
"/usr/lib/python2.2/site-packages/matplotlib/backends/backend_gtk.py", 
line 528, in _quit_after_print_xvfb
    if len(manager.drawingArea._printQued): break
AttributeError: FigureManagerGTK instance has no attribute 'drawingArea'

Didn't inquire further because in my case it is crucial to have stdout 
output.. I have to be able to pipe these plots to cgi scrips.

If you have any other ideas, please let me know. Can anyone else tell me 
what kind of performance they're getting doing these 10k, 50k, 100k plots?

Best,
Peter

>>>>> "Peter" == Peter Groszkowski <pgr...@ge...> writes:

    Peter> Although the data I'm playing with right now is monotonic
    Peter> (in x), I cannot assume that this will always be the case,
    Peter> and need an efficient solutions for all situations.

Agreed. 

    Peter> the 'lod' option in: l = plot(arange(10000),
    Peter> arange(20000,30000)) #dummy data.. 10,000 pairs set(l,
    Peter> 'lod', True) option does not work for me. It's still
    Peter> roughly 1000 points/second

I left out a *critical* detail.  The new gd backend code implements
antialiased drawing by default.  Very slow.  Check out the numbers
below based on the demo script you supplied

    backend = 'GD'
    import matplotlib
    matplotlib.use(backend)
    from matplotlib.matlab import *
    l = plot(arange(10000), arange(20000,30000))    #dummy data.. 10,000 pairs
    lod, aa = False, False
    print 'Backend: %s, LOD %d, AA %d' % (backend, lod, aa)
    set(l, 'lod', lod, 'antialiased', aa)
    savefig('test')

  Backend: GD, LOD 1, AA 1
  23.770u 0.030s 0:23.77 100.1%   0+0k 0+0io 793pf+0w

  Backend: GD, LOD 0, AA 1
  23.500u 0.020s 0:23.52 100.0%   0+0k 0+0io 793pf+0w

  Backend: GD, LOD 1, AA 0
  0.270u 0.000s 0:00.28 96.4%     0+0k 0+0io 794pf+0w

  Backend: GD, LOD 0, AA 0
  0.240u 0.030s 0:00.27 100.0%    0+0k 0+0io 794pf+0w

In other words, if you are using the new GD in it's default
configuration, you are paying a *100 fold performance hit* for
antialiased line drawing.  Without it, I can draw and save your figure
(including python startup time, etc, etc) in 0.25s on a 2GHz Pentium
4.  Is this in the ballpark for you, performance wise?

While we're on the subject of performance, I took the opportunity to
test the other backends.  Note the numbers are not strictly comparable
(discussed below) but are informative.

  Backend: Paint, LOD 0, AA 0
  0.520u 0.000s 0:00.52 100.0%    0+0k 0+0io 726pf+0w

  Backend: PS, LOD 0, AA 0
  1.030u 0.040s 0:01.08 99.0%     0+0k 0+0io 582pf+0w

  Backend: Agg, LOD 0, AA 0
  0.320u 0.010s 0:00.28 117.8%    0+0k 0+0io 681pf+0w

  Backend: GTK, LOD 0, AA 0
  0.650u 0.020s 0:00.66 101.5%    0+0k 0+0io 3031pf+0w

The GTK results are in xvfb so it appears to be a no-go for you even
if we could figure out how to print to stdout.  These numbers are
repeatable and consistent.

Worthy of comment:

  * GD with antialiased off wins

  * paint is not as fast as I hoped

  * GTK is not as fast as I thought

  * Agg is an interesting case.  It is doing antialiased drawing
    despite the AA 0 flag because I haven't made this conditional in
    the backend.  It draws antialised unconditionally currently.  But
    it hasn't implemented text yet.  So it's not strictly comparable,
    but it is noteworthy that it is 100 times faster than GD at AA
    lines.  It remains to be seen what speed we can get with plain
    vanilla aliased rendering.

My guess is: when you turn off antialiasing you'll be a whole lot
happier.  Let me know.

The last thing I looked at was how the GD numbers scale with line
size.  Below, N is the number of data points (with LOD false the
numbers are very close to these results where LOD is true)

  Backend: GD, LOD 1, AA 0, N 10000
  0.230u 0.040s 0:00.24 112.5%    0+0k 0+0io 794pf+0w

  Backend: GD, LOD 1, AA 0, N 20000
  0.260u 0.060s 0:00.31 103.2%    0+0k 0+0io 794pf+0w

  Backend: GD, LOD 1, AA 0, N 40000
  0.390u 0.030s 0:00.41 102.4%    0+0k 0+0io 794pf+0w

  Backend: GD, LOD 1, AA 0, N 80000
  0.590u 0.060s 0:00.60 108.3%    0+0k 0+0io 815pf+0w

  Backend: GD, LOD 1, AA 0, N 160000
  1.070u 0.090s 0:01.13 102.6%    0+0k 0+0io 818pf+0w

JDH

John:
Thanks very much for your investigative work.

>antialiased line drawing.  Without it, I can draw and save your figure
>(including python startup time, etc, etc) in 0.25s on a 2GHz Pentium
>4.  Is this in the ballpark for you, performance wise?
>  
>
yes.. yes..yes..

>My guess is: when you turn off antialiasing you'll be a whole lot
>happier.  Let me know.
>  
>
With antialiasing off, the performance is superb!.. I plot 500,000 
points in ~4-5 seconds.. The visual quality of the graphs is (naturally) 
inferior to the antialiased counterparts, but the software is now 
feasible for my purposes.

Just couple more questions:

1) Seems like setting 'lod' to true does not improve performance?  I 
would imagine it should, because it limits the amount of points used. 
What am I missing?

2) Is there any way to make the graphs look "prettier"? They really look 
quite OK but in some cases having a little more detail would be nice. Is 
it possible specify just how much antialiasing is needed? Are there any 
other "visual enchantment options" that can be set, and will not impact 
performace too much?

3) When I do:

plot1 = plot(arange(10000), arange(20000,30000))    #dummy data.. 10,000 pairs
lod, aa = False, False
set(l, 'lod', lod, 'antialiased', aa)

Do these options only apply to the current plot (ie. plot1)?
Is it possible to have a plot inside a plot with one being antialiased, 
and the other one not?
Do I have to re-set them after I call savefig()  (Will test this.. )

I have been playing around with the dpi setting a little. Is it supposed 
to change the size of the image and/or the resolution??

Thanks again.

-- 
Peter Groszkowski         Gemini Observatory
Tel: +1 808 974-2509      670 N. A'ohoku Place
Fax: +1 808 935-9235      Hilo, Hawai'i 96720, USA

>>>>> "Peter" == Peter Groszkowski <pgr...@ge...> writes:
    >> 
    Peter> With antialiasing off, the performance is superb!.. I plot
    Peter> 500,000 points in ~4-5 seconds.. The visual quality of the
    Peter> graphs is (naturally) inferior to the antialiased
    Peter> counterparts, but the software is now feasible for my
    Peter> purposes.

Glad to hear it...  The next big performance boost will come from some
frontend refactoring along the lines Perry discussed, but I'm glad to
hear it's usable for you now.

    Peter> 1) Seems like setting 'lod' to true does not improve
    Peter> performance?  I would imagine it should, because it limits
    Peter> the amount of points used. What am I missing?

I'll look into this further.  In the special case of EEG (128 channels
plotted simultaneously over the same time axis), I do see significant
benefits but I cache the sampling indexes from one line to the next.
It may be that for single lines the time it takes to do the
subsampling balances the time it takes to plot them in a fast backend.

    Peter> 2) Is there any way to make the graphs look "prettier"?
    Peter> They really look quite OK but in some cases having a little
    Peter> more detail would be nice. Is it possible specify just how
    Peter> much antialiasing is needed? Are there any other "visual
    Peter> enchantment options" that can be set, and will not impact
    Peter> performace too much?

Well, fortunately for you, I just finished the agg backend this
morning.  This backend draws antialiased lines as fast as GD draws
unaliased lines.  I still don't have support for turning off
antialiasing in agg, but it sounds like you want to have it.  Also, it
doesn't suffer from a known color allocation and fill bug that GD has.
See install instructions at the end of this email.

    Peter> 3) When I do:

        plot1 = plot(arange(10000), arange(20000,30000))
        lod, aa = False, False
        set(l, 'lod', lod, 'antialiased', aa)

This code isn't correct.  plot returns a list of lines.  The set
command should operate on that list of lines.  It applies only to the
lines returned.  So *you can* apply antialising with respect to one
set of lines and not another, in the same axes

        lines1 = plot(arange(10000), arange(20000,30000))
        set(lines1, 'antialiased', False)

        lines2 = plot([1,2,3])  # a small plot
        set(lines2, 'antialiased', True)

Now lines1 is aliased and lines2 is antialiased.

    Peter> I have been playing around with the dpi setting a
    Peter> little. Is it supposed to change the size of the image
    Peter> and/or the resolution??

The figure size in pixels is determined by the figsize parameter and
dpi.  

  width, height = figsize
  width *= dpi
  height *= dpi

Everything scales with DPI, line width, text size, dash spacing,
etc..  So the answer to your question is: both figure size and
resolution increase with dpi.  If you want to change figure size w/o
changing resolution, change the figsize argument to figure.

The agg backend

  Warning: you will be the first agg crash test dummy.  I just ran a
  suite of examples across all backends and agg was the fastest - it's
  even faster than template, which does no rendering or filesaving!
  And in my opinion it also produced the highest quality output.

  Features that are implemented

    * capstyles and join styles
    * dashes
    * linewidth 
    * lines, rectangles, ellipses, polygone
    * clipping to a rectangle
    * output to RGBA and PNG
    * alpha blending
    * DPI scaling -  (dashes, linewidths, fontsizes, etc)
    * freetype1

  TODO:
     * use ttf manager to get font - right now I just use Vera

  INSTALLING 

   Grab the latest matplotlib from
   http://nitace.bsd.uchicago.edu:8080/files/share/matplotlib-0.50l.tar.gz

  REQUIREMENTs

    python2.2+
    Numeric 22+
    agg2 (see below)
    freetype 1
    libpng
    libz ?

  Install AGG2 (cut and paste below into xterm should work)

    wget http://www.antigrain.com/agg2.tar.gz
    tar xvfz agg2.tar.gz
    cd agg2
    make

    (Optional) if you want to make the examples:
    cd examples/X11
    make

  Installing backend_agg

   Edit setup.py: change aggsrc to point to the agg2 src tree and
   replace if 0: with if 1: in the backend_agg section

   Then just do the usual thing: python setup.py build

   Please let me know if you encounter build problems, and tell me
   platform, gcc version, etc...  Currently the paths in setupext.py
   assume as linux like filesystem (eg X11 include dir, location of
   libttf, etcc) so you may need to tweak these.  But if I recall
   correctly, we're both on RHL9 so you shouldn't have a problem.

  Using agg backend

    python somefile.py -dAgg   

  or

    import matplotlib
    matplotlib.use('Agg')

Let me know how it works out!  Note also that backend agg is the first
backend to support alpha blending; see scatter_demo2.py.

JDH

>
>
>    Peter> 3) When I do:
>
>        plot1 = plot(arange(10000), arange(20000,30000))
>        lod, aa = False, False
>        set(l, 'lod', lod, 'antialiased', aa)
>
>This code isn't correct.  plot returns a list of lines.  The set
>  
>
Of course.. changed the 'l' to 'plot1' because I hate how my 'l's look 
like '1's.. But i didn't do the same in the 'set' command..  I meant it 
though..   :)

>        lines1 = plot(arange(10000), arange(20000,30000))
>        set(lines1, 'antialiased', False)
>
>        lines2 = plot([1,2,3])  # a small plot
>        set(lines2, 'antialiased', True)
>
>Now lines1 is aliased and lines2 is antialiased.
>  
>
Great! This proides awesome flexibility!

Thanks for all the info. I will get a 
usable-skeleton-proof-of-concept-type app going with GD first, and once 
that is working, will get experiment with agg.

Peter

Peter Groszkowski wrote:
> >
> Yes I am only planning to work offline. Want to be able to pipe the 
> output images to stdout. I am looking for the fastest solution possible.
> 
Following up on this, I was curious what exactly you meant by
this. A stream of byte values in ascii separated by spaces?
Or the actual binary bytes?  If the latter, it wouldn't appear
to be difficult to write a C extension to return the image
as a string, but I'm figuring there is more to it than that
since the representations for the image structure can change
from one backend to another.

Perry

Perry Greenfield wrote:

That was a response to:

>If you want to work off line (ie, draw
>to image only and not display to screen ) and are on a linux box, you
>can do this with GTK and Xvfb

Perhaps it was a misinterpretation on my part. All I meant was that I do not need to look at the images through any of the standard tools (ie, via show()). For my purposes I need to write the image to stdout (not to disk) in its binary form - which is what I do now after modifying some stuff in backend_gd.py. 

>Peter Groszkowski wrote:
>  
>
>>Yes I am only planning to work offline. Want to be able to pipe the 
>>output images to stdout. I am looking for the fastest solution possible.
>>
>>    
>>
>Following up on this, I was curious what exactly you meant by
>this. A stream of byte values in ascii separated by spaces?
>Or the actual binary bytes?  If the latter, it wouldn't appear
>to be difficult to write a C extension to return the image
>as a string, but I'm figuring there is more to it than that
>since the representations for the image structure can change
>from one backend to another.
>
>Perry
>
>  
>