|
From: nertskull <ner...@gm...> - 2014-05-01 12:09:11
|
I am trying to create a multipage pdf of about 750 different graphs. Each graph has around 5,000 - 15,000 data points, giving me roughly 7 million points across the pdf. I make it in a large pdf with a page length of about 20 inches, and then plot about 10 graphs to a page. So I end up with basically 75 pages in my pdf. I'm basically trying to graph a line of XY data points. The problem, is the pdf is unbearably slow when plotting as a scatter plot or as a line with markers. If I make a regular line plot, with no markers, just a single line, it is plotted and the pdf is fine. But then it connects my points which I don't want. I assume this is all because its making the pdf in vector format. And when I convert it to single lines, I only have ~750 line vectors. But when I try to scatter plot, or line plot with markers, I end up with millions of vectors. I've tried the 'rasterized=True' and that definitely works. But the quality is really bad. I need to be able to zoom in close on the pdf and still see rough resolution of the points. For clarity, I don't actually need to see each individual points. The graphs have two lines on them, and I just need to be able to distinguish between the two lines. The two lines are just made up of thousands of points each. Is there anyway to keep scalable vectors and do this? Or will I just be forced to go to a rasterized image file in order to load the pdf in a reasonable time. Thanks. -- View this message in context: http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
|
From: Alan G I. <ala...@gm...> - 2014-05-01 12:41:11
|
Suppose each data point is only 1 point (1/72 ") in diameter. A solid line across a 20" page is less than 1500 points. You're using a fraction of a page per graph and trying to plot 5,000-15,000 points per graph. This is pointless (pun intended) for visual display, especially since you do not care about the individual points. What happens if you decimate the points? Is the result acceptable? Perhaps you could do even better than that, given your posted description. Fit a line to the points, and only plot the fitted line. Or use something like `hexbin`. Alan Isaac |
|
From: Shahar S. K. <ka...@po...> - 2014-05-01 12:48:54
|
How about different line styles or colors instead of markers?— Sent from Mailbox On Thu, May 1, 2014 at 2:10 PM, nertskull <ner...@gm...> wrote: > I am trying to create a multipage pdf of about 750 different graphs. > Each graph has around 5,000 - 15,000 data points, giving me roughly 7 > million points across the pdf. I make it in a large pdf with a page length > of about 20 inches, and then plot about 10 graphs to a page. So I end up > with basically 75 pages in my pdf. I'm basically trying to graph a line of > XY data points. > The problem, is the pdf is unbearably slow when plotting as a scatter plot > or as a line with markers. > If I make a regular line plot, with no markers, just a single line, it is > plotted and the pdf is fine. But then it connects my points which I don't > want. > I assume this is all because its making the pdf in vector format. And when > I convert it to single lines, I only have ~750 line vectors. But when I try > to scatter plot, or line plot with markers, I end up with millions of > vectors. > I've tried the 'rasterized=True' and that definitely works. But the quality > is really bad. I need to be able to zoom in close on the pdf and still see > rough resolution of the points. > For clarity, I don't actually need to see each individual points. The > graphs have two lines on them, and I just need to be able to distinguish > between the two lines. The two lines are just made up of thousands of > points each. > Is there anyway to keep scalable vectors and do this? Or will I just be > forced to go to a rasterized image file in order to load the pdf in a > reasonable time. > Thanks. > -- > View this message in context: http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338.html > Sent from the matplotlib - users mailing list archive at Nabble.com. > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users |
|
From: Shahar S. K. <ka...@po...> - 2014-05-01 13:08:44
|
What do you consider a gap?Perhaps if you know that you can find those in your data and if you really want to visualize the gaps, plot those instead of the data. — Sent from Mailbox On Thu, May 1, 2014 at 2:41 PM, Alan G Isaac <ala...@gm...> wrote: > Suppose each data point is only 1 point (1/72 ") in diameter. > A solid line across a 20" page is less than 1500 points. > You're using a fraction of a page per graph and trying to > plot 5,000-15,000 points per graph. This is pointless (pun > intended) for visual display, especially since you do not > care about the individual points. What happens if you > decimate the points? Is the result acceptable? > Perhaps you could do even better than that, given your > posted description. Fit a line to the points, and only > plot the fitted line. Or use something like `hexbin`. > Alan Isaac > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users |
|
From: Dominik K. <dk...@as...> - 2014-05-01 13:22:18
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, when reading the number of points you have in each plot, I have to ask why you need so many (plotted) data points. If you plot e.g. every 10th or 50th data point, you reduce the number of points by a factor of 10 (or 50). This should make the PDF smaller and faster and even if you zoom into each plot, you should be able to see enough details (of course, if there are one or two outliers you might not see them). And probably you are not able to distinguish between two data points if they are too close to each other so you probably don't need every data point. Cheers, Dominik On 05/01/2014 02:09 PM, nertskull wrote: > I am trying to create a multipage pdf of about 750 different > graphs. > > Each graph has around 5,000 - 15,000 data points, giving me roughly > 7 million points across the pdf. I make it in a large pdf with a > page length of about 20 inches, and then plot about 10 graphs to a > page. So I end up with basically 75 pages in my pdf. I'm > basically trying to graph a line of XY data points. > > The problem, is the pdf is unbearably slow when plotting as a > scatter plot or as a line with markers. > > If I make a regular line plot, with no markers, just a single line, > it is plotted and the pdf is fine. But then it connects my points > which I don't want. > > I assume this is all because its making the pdf in vector format. > And when I convert it to single lines, I only have ~750 line > vectors. But when I try to scatter plot, or line plot with > markers, I end up with millions of vectors. > > I've tried the 'rasterized=True' and that definitely works. But > the quality is really bad. I need to be able to zoom in close on > the pdf and still see rough resolution of the points. > > For clarity, I don't actually need to see each individual points. > The graphs have two lines on them, and I just need to be able to > distinguish between the two lines. The two lines are just made up > of thousands of points each. > > Is there anyway to keep scalable vectors and do this? Or will I > just be forced to go to a rasterized image file in order to load > the pdf in a reasonable time. > > Thanks. > > > > -- View this message in context: > http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338.html > > Sent from the matplotlib - users mailing list archive at Nabble.com. > > ------------------------------------------------------------------------------ > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. > Get unparalleled scalability from the best Selenium testing > platform available. Simple to use. Nothing to install. Get started > now for free." http://p.sf.net/sfu/SauceLabs > _______________________________________________ Matplotlib-users > mailing list Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > - -- Dominik Klaes Deputy student representative of the AIfA Argelander-Institut für Astronomie Room 2.027a Auf dem Hügel 71 53121 Bonn Telefon: 0228/73-5773 E-Mail: dk...@as... Homepage: http://www.astro.uni-bonn.de/~dklaes/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBAgAGBQJTYjxlAAoJEG4xBfJ3OX2tck4QALZ4raj6ZHW50Ie2uj31dC7n q7LynLhBCfYr8Hs/m8wad/VfNStNqdoyJta683YF4ev6qskUY6lvh3qzYRNZdYMk 8yzT2CmhrWss+jEQYyrfKjrjSZbWMtCRaNHdrF/Ne6Je7VaPE/y6+4PXmkKNNeKU bCdRdyUsUb+cjQPXgIn0bN9AFqNDOcVpMkkkzxfHU2n0kGGOymVRpOLRqQbSFR5X AsyawYmia08RRr312dNja27BJcvXA9JHJ+qk0r7UPCOZow2GZGb6BlCu+eE/xtBt j/r9Ym5KUgSD0q35veT34BLoMuD/L3Q7RujCxkToKezTdNGbRZh/8EkRKbu5Zc48 SnZAkmcAO8GTXqdBaD32l67iOjSCK1qqXLv8/Sb+0OOhZ3gMdF/6PbxvBM3m7U0g zthwBYWBkJrZUuBRyi3fzYs5olvlQDW6RWPnB4tA5acrOmrHAhSqp5I4nk+ln/mZ s41R+uG2sx5+F3aJGMLEL0lNpRtQbWUIPQ7RHJ48TVHhrG3oB/41Li4FOcYvboKr 8B5XIygw8eNULja7Q7Coz2T/uFg42pMyRMMMDz92eLhOxlmk8k0bWqAqUp4dj85l Hodz94PwYvloWV6tagSHVAYiUtkWhkyHOYBSH1oPSfB4Tpsh8aiGCk5oIZn92Kty kswsXBVzAe4npHOfl05Y =9Tfy -----END PGP SIGNATURE----- |
|
From: nertskull <ner...@gm...> - 2014-05-01 13:28:17
|
No we definitely aren't really interested in the gaps. Gaps are just where we were unable to collect the data. I don't know if we can attach pictures to this thread or not, but I'm going to try. The attached is roughly what I want, but with all 750 as vectors. I want to see the 'movement' of the line, but I need the gaps to remain, so I know where they are. The problem with plotting a reduced data set, is I lose some of the very small sections of line. I'll play around with that idea, but we want to be able to zoom in on a vector file, and see the tiny areas of less than 10points that would be lost if we plot a reduced data set. But what it sounds like, is it is unlikely this will work in vector graphics form. Its just too much to do without reducing the dataset. <http://matplotlib.1069221.n5.nabble.com/file/n43344/figure_1.png> -- View this message in context: http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338p43344.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
|
From: Benjamin R. <ben...@ou...> - 2014-05-01 13:35:49
|
This makes me wonder if you would be better served with something like bokeh: http://bokeh.pydata.org/ Cheers! Ben Root On Thu, May 1, 2014 at 9:28 AM, nertskull <ner...@gm...> wrote: > No we definitely aren't really interested in the gaps. Gaps are just where > we were unable to collect the data. > > I don't know if we can attach pictures to this thread or not, but I'm going > to try. > > The attached is roughly what I want, but with all 750 as vectors. > > I want to see the 'movement' of the line, but I need the gaps to remain, so > I know where they are. > > The problem with plotting a reduced data set, is I lose some of the very > small sections of line. I'll play around with that idea, but we want to be > able to zoom in on a vector file, and see the tiny areas of less than > 10points that would be lost if we plot a reduced data set. > > But what it sounds like, is it is unlikely this will work in vector > graphics > form. Its just too much to do without reducing the dataset. > > <http://matplotlib.1069221.n5.nabble.com/file/n43344/figure_1.png> > > > > -- > View this message in context: > http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338p43344.html > Sent from the matplotlib - users mailing list archive at Nabble.com. > > > ------------------------------------------------------------------------------ > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Matplotlib-users mailing list > Mat...@li... > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > |
|
From: Jouni K. S. <jk...@ik...> - 2014-05-01 17:19:03
|
nertskull <ner...@gm...> writes:
> The problem, is the pdf is unbearably slow when plotting as a scatter plot
> or as a line with markers.
>
> If I make a regular line plot, with no markers, just a single line, it is
> plotted and the pdf is fine. But then it connects my points which I don't
> want.
Others have commented on the volume of data, but that paragraph makes
me curious: are you saying that the results are acceptable if you do
something like
plot(x, y, '-')
but not if you do
plot(x, y, 'o') or plot(x, y, '-o')?
The amount of data in the pdf file should be within a constant factor in
all cases, but the '-' case there are only moveto and lineto commands,
while the two other cases render markers as something called an XObject,
which is repeated a lot of times on the page. I wonder if the overhead
from using an XObject is making the rendering application slow.
Does it help at all to use a simpler marker, e.g. plot(x, y, ',')? One
change you could try if you're feeling adventurous is the following
function in lib/matplotlib/backends/backend_pdf.py:
def draw_markers(self, gc, marker_path, marker_trans, path, trans,
rgbFace=None):
# For simple paths or small numbers of markers, don't bother
# making an XObject
if len(path) * len(marker_path) <= 10:
RendererBase.draw_markers(self, gc, marker_path, marker_trans,
path, trans, rgbFace)
return
# ...
The comment is not quite right: only if the path is short *and* the
number of markers is small does the XObject code get skipped. You could
just change the if statemt to "if True:" and rerun your code (possibly
with the ',' marker style). If that helps, it's evidence that we need to
revisit the condition for using XObjects for markers.
--
Jouni K. Seppänen
http://www.iki.fi/jks
|
|
From: nertskull <ner...@gm...> - 2014-05-01 17:50:44
|
That definitely helps. Here's what I did. First. Yeah, the results are totally acceptable if I do '-' as my line/marker. The pdf renders and loads just fine. If I do 'o' or even ',' as my marker, then the pdf is horrendously slow. I'm talking minutes to render a page. So, I tried your idea of altering the backend If I change that line the "if True:" then I get MUCH better results. But I also get enormous file sizes. I've taken a subset of 10 of my 750 graphs. Those 10, before changing the backend, would make file sizes about about 290KiB. After changing the backend, if I use plot(x, y, '-') I still get a file size about 290KiB. But after changing the backend, if I use plot(x, y, '.') for my markers, my file size is no 21+ MB. Just for 10 of my graphs. I'm afraid making all 750 in the same pdf may be impossible at those size. BUT, at least now I can render those 10 in vector format. Before it took the pdf minutes to load a page. Now it only takes maybe 15-20 seconds to load a page of 10 graphs. So that definitely helped. Thanks! Is there anyway to do this even better? At this rate I'd have to split my pdf file into multiple chunks, which really isn't ideal to have to send people 70 pdf files. Is there anyway to have reasonable pdf sizes as well as this improved performance for keeping them in vector format? Thanks again. On 05/01/2014 01:19 PM, Jouni K. Seppänen [via matplotlib] wrote: > nertskull <[hidden email] > </user/SendEmail.jtp?type=node&node=43348&i=0>> writes: > > > The problem, is the pdf is unbearably slow when plotting as a > scatter plot > > or as a line with markers. > > > > If I make a regular line plot, with no markers, just a single line, > it is > > plotted and the pdf is fine. But then it connects my points which I > don't > > want. > > Others have commented on the volume of data, but that paragraph makes > me curious: are you saying that the results are acceptable if you do > something like > > plot(x, y, '-') > > but not if you do > > plot(x, y, 'o') or plot(x, y, '-o')? > > The amount of data in the pdf file should be within a constant factor in > all cases, but the '-' case there are only moveto and lineto commands, > while the two other cases render markers as something called an XObject, > which is repeated a lot of times on the page. I wonder if the overhead > from using an XObject is making the rendering application slow. > > Does it help at all to use a simpler marker, e.g. plot(x, y, ',')? One > change you could try if you're feeling adventurous is the following > function in lib/matplotlib/backends/backend_pdf.py: > > def draw_markers(self, gc, marker_path, marker_trans, path, trans, > rgbFace=None): > # For simple paths or small numbers of markers, don't bother > # making an XObject > if len(path) * len(marker_path) <= 10: > RendererBase.draw_markers(self, gc, marker_path, > marker_trans, > path, trans, rgbFace) > return > # ... > > The comment is not quite right: only if the path is short *and* the > number of markers is small does the XObject code get skipped. You could > just change the if statemt to "if True:" and rerun your code (possibly > with the ',' marker style). If that helps, it's evidence that we need to > revisit the condition for using XObjects for markers. > > -- > Jouni K. Seppänen > http://www.iki.fi/jks > > > ------------------------------------------------------------------------------ > > "Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE > Instantly run your Selenium tests across 300+ browser/OS combos. Get > unparalleled scalability from the best Selenium testing platform > available. > Simple to use. Nothing to install. Get started now for free." > http://p.sf.net/sfu/SauceLabs > _______________________________________________ > Matplotlib-users mailing list > [hidden email] </user/SendEmail.jtp?type=node&node=43348&i=1> > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > > ------------------------------------------------------------------------ > If you reply to this email, your message will be added to the > discussion below: > http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338p43348.html > > To unsubscribe from Millions of data points saved to pdf, click here > <http://matplotlib.1069221.n5.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=43338&code=bmVydHNrdWxsQGdtYWlsLmNvbXw0MzMzOHwtMTQ3Nzk2OTQ5Nw==>. > NAML > <http://matplotlib.1069221.n5.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://matplotlib.1069221.n5.nabble.com/Millions-of-data-points-saved-to-pdf-tp43338p43349.html Sent from the matplotlib - users mailing list archive at Nabble.com. |
|
From: Daniele N. <da...@gr...> - 2014-05-02 12:25:45
|
On 01/05/2014 19:50, nertskull wrote: > Is there anyway to have reasonable pdf sizes as well as this improved > performance for keeping them in vector format? As others tried to explain to you, plotting that many points in a plot does not make any sense. The only thing that makes sense is to down-sample your data to a manageable size. Depending on which features of your data you are interested in, there are different methods for doing that. PS: which viewer are you using to render the PDF? I believe different renders may have substantially different performances in rendering such PDFs... Cheers, Daniele |
|
From: Jouni K. S. <jk...@ik...> - 2014-05-02 15:54:23
|
nertskull <ner...@gm...> writes:
> If I change that line the "if True:" then I get MUCH better results.
> But I also get enormous file sizes.
That's interesting! It means that your pdf viewing program (which one,
by the way? Adobe Reader or some alternative?) is slow at compositing a
large number of prerendered markers, or perhaps it just renders each of
them again and again instead of prerendering, and does so more slowly
than if they were part of the same path.
> I've taken a subset of 10 of my 750 graphs.
>
> Those 10, before changing the backend, would make file sizes about about
> 290KiB. After changing the backend, if I use plot(x, y, '-') I still
> get a file size about 290KiB.
>
> But after changing the backend, if I use plot(x, y, '.') for my markers,
> my file size is no 21+ MB. Just for 10 of my graphs. I'm afraid making
> all 750 in the same pdf may be impossible at those size.
Does using ',' (comma) instead of '.' (full stop) as the marker help? I
think the '.' marker is a circle, just at a small size, while the ','
marker is just two very short lines in the pdf backend. If the ','
marker produces an acceptable file size but its shape is not good
enough, we could experiment with creating a marker of intermediate
complexity.
One thing that I never thought about much is the precision in the
numbers the pdf backend outputs in the file. It seems that they are
being output with a fixed precision of ten digits after the decimal
point, which is probably overkill. There is currently no way to change
this except by editing the source code - the critical line is
r = ("%.10f" % obj).encode('ascii')
where 10 is the number of digits used. The same precision is used for
all floating-point numbers, including various transformation matrices,
so I can't offer a simple rule for how large deviations you will cause
by reducing the precision - you could experiment by making one figure
with the existing code and another with '%.3f', and see if the latter
looks good enough at the kind of zoom levels you are going to use (and
if it really reduces the file size much - there's a compression layer on
top of the ASCII representation).
That reminds me: one thing that could have an effect is the
pdf.compression setting, which defaults to 6 but you can set it to 9
to make the compressed size a little bit smaller, at the expense of
spending more time when writing the file. That's not going to be a major
difference, though.
> Is there anyway to have reasonable pdf sizes as well as this improved
> performance for keeping them in vector format?
Like others have recommended, rendering huge clouds of single points is
a problematic task. I think it's an entirely valid thing to ask for, but
it's not likely that there will be a perfect solution, and some other
way of visualizing the data may be needed. Bokeh (suggested by Benjamin
Root) looks like something that could fit your needs better than a pdf
file in a viewer.
--
Jouni K. Seppänen
http://www.iki.fi/jks
|
|
From: <cl...@br...> - 2014-05-02 17:05:29
|
Dear colleagues,
I had a similar issues with a large plot and several thousands of elements
printed under Linux and Qt4Agg back-end. At the PDF render I got some
vector overlay and distortion of markers in the drawing, so I've changed
the plotting output into a two step process, generating first a high
resolution ".png" file and the using the Python image library to compress
it into a much smaller .jpeg image output, which produces a browser
friendly file or input source for Adobe .pdf editors like OpenOffice.
Source:
import Image
# size for jpg and png output (16000 x 12000 pixel)
w = 80
h = 60
#
dpi_resolution = 400
fig.set_size_inches(w,h)
DPI = fig.get_dpi()
print "DPI:", DPI
Size = fig.get_size_inches()
print "Size in Inches", Size
myformats = plt.gcf().canvas.get_supported_filetypes()
print "Supported formats are: " + str(myformats)
mybackend = plt.get_backend()
print "Backend used is: " + str(mybackend)
# save screen copy
fig.savefig('myplot.png', format='png', dpi= (dpi_resolution))
# JPEG compression with quality of 10
myimage = Image.open('myplot.png')
myimage = myimage.resize((16000, 12000), Image.ANTIALIAS)
#quality = 10% .. very high compression with few blurs
quality_val = 10
myimage.save('myplot.jpg', 'JPEG', quality=quality_val)
The visual result looks acceptable with no distortion. This process gives
some control about compression and quality.
Hope this is useful.
Regards,
Claude
Claude Falbriard
Certified IT Specialist L2 - Middleware
AMS Hortolândia / SP - Brazil
phone: +55 13 9 9760 0453
cell: +55 13 9 8117 3316
e-mail: cl...@br...
From: Jouni K. Seppänen <jk...@ik...>
To: mat...@li...,
Date: 02/05/2014 12:55
Subject: Re: [Matplotlib-users] Millions of data points saved to
pdf
nertskull <ner...@gm...> writes:
> If I change that line the "if True:" then I get MUCH better results.
> But I also get enormous file sizes.
That's interesting! It means that your pdf viewing program (which one,
by the way? Adobe Reader or some alternative?) is slow at compositing a
large number of prerendered markers, or perhaps it just renders each of
them again and again instead of prerendering, and does so more slowly
than if they were part of the same path.
> I've taken a subset of 10 of my 750 graphs.
>
> Those 10, before changing the backend, would make file sizes about about
> 290KiB. After changing the backend, if I use plot(x, y, '-') I still
> get a file size about 290KiB.
>
> But after changing the backend, if I use plot(x, y, '.') for my markers,
> my file size is no 21+ MB. Just for 10 of my graphs. I'm afraid making
> all 750 in the same pdf may be impossible at those size.
Does using ',' (comma) instead of '.' (full stop) as the marker help? I
think the '.' marker is a circle, just at a small size, while the ','
marker is just two very short lines in the pdf backend. If the ','
marker produces an acceptable file size but its shape is not good
enough, we could experiment with creating a marker of intermediate
complexity.
One thing that I never thought about much is the precision in the
numbers the pdf backend outputs in the file. It seems that they are
being output with a fixed precision of ten digits after the decimal
point, which is probably overkill. There is currently no way to change
this except by editing the source code - the critical line is
r = ("%.10f" % obj).encode('ascii')
where 10 is the number of digits used. The same precision is used for
all floating-point numbers, including various transformation matrices,
so I can't offer a simple rule for how large deviations you will cause
by reducing the precision - you could experiment by making one figure
with the existing code and another with '%.3f', and see if the latter
looks good enough at the kind of zoom levels you are going to use (and
if it really reduces the file size much - there's a compression layer on
top of the ASCII representation).
That reminds me: one thing that could have an effect is the
pdf.compression setting, which defaults to 6 but you can set it to 9
to make the compressed size a little bit smaller, at the expense of
spending more time when writing the file. That's not going to be a major
difference, though.
> Is there anyway to have reasonable pdf sizes as well as this improved
> performance for keeping them in vector format?
Like others have recommended, rendering huge clouds of single points is
a problematic task. I think it's an entirely valid thing to ask for, but
it's not likely that there will be a perfect solution, and some other
way of visualizing the data may be needed. Bokeh (suggested by Benjamin
Root) looks like something that could fit your needs better than a pdf
file in a viewer.
--
Jouni K. Seppänen
http://www.iki.fi/jks
------------------------------------------------------------------------------
"Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos. Get
unparalleled scalability from the best Selenium testing platform
available.
Simple to use. Nothing to install. Get started now for free."
http://p.sf.net/sfu/SauceLabs
_______________________________________________
Matplotlib-users mailing list
Mat...@li...
https://lists.sourceforge.net/lists/listinfo/matplotlib-users
|