From: <gn...@di...> - 2012-09-22 23:32:14
Attachments:
x11.fixedaspectratio.patch
|
Hi all. I tackled the long-standing issue of the x11 terminal not respecting the requested plot aspect ratio. There have been many bugs about this on the tracker. The main one appears to be http://sourceforge.net/tracker/index.php?func=detail&aid=3331162&group_id=2055&atid=102055 I have a branch that handles this issue similar to the way the wxt terminal does: - inboard driver computes a particular x,y scale factors - terminal ALWAYS respects this aspect ratio - resizing the window does NOT touch the aspect ratio - a replot is required to re-compute the scale factors Similar to the qt terminal, I added a replot-on-resize option to make things 'just work', at the expense of some extra cpu cycles. The code works for my test cases, but I had to touch enough stuff to make me concerned about cases that I missed. How are such changes tested, usually? I didn't see a test suite. The code is in a git repo at https://github.com/dkogan/gnuplot I'm also attaching a patch for a diff from the latest code in CVS, as of Sep 2012. As an aside, the main reason I'm touching the x11 terminal at all (instead of just moving to wxt or qt) is that it appears to be much faster than the newer terminals. Particularly, if you have a 3d plot with lots of points, interactively rotating the data is noticeably much snappier with x11. Is this expected? dima |
From: Jonathan T. <jt...@as...> - 2012-09-23 02:27:19
|
On Sat, 22 Sep 2012, gn...@di... wrote: > As an aside, the main reason I'm touching the x11 terminal at all > (instead of just moving to wxt or qt) is that it appears to be much > faster than the newer terminals. Particularly, if you have a 3d plot > with lots of points, interactively rotating the data is noticeably much > snappier with x11. Another advantage of the x11 terminal is that (in my experience on various slightly unusual Unix-flavored systems) it's often easier to build gnuplot with x11 terminal support than with wxt or qt support. ciao, -- -- "Jonathan Thornburg [remove -animal to reply]" <jt...@as...> Dept of Astronomy & IUCSS, Indiana University, Bloomington, Indiana, USA on sabbatical in Canada starting August 2012 "Washing one's hands of the conflict between the powerful and the powerless means to side with the powerful, not to be neutral." -- quote by Freire / poster by Oxfam |
From: Daniel J S. <dan...@ie...> - 2012-09-23 03:48:24
|
On 09/22/2012 06:14 PM, gn...@di... wrote: > Hi all. > > I tackled the long-standing issue of the x11 terminal not respecting the > requested plot aspect ratio. There have been many bugs about this on the > tracker. The main one appears to be > > http://sourceforge.net/tracker/index.php?func=detail&aid=3331162&group_id=2055&atid=102055 > > I have a branch that handles this issue similar to the way the wxt > terminal does: > > - inboard driver computes a particular x,y scale factors > - terminal ALWAYS respects this aspect ratio > - resizing the window does NOT touch the aspect ratio > - a replot is required to re-compute the scale factors > > Similar to the qt terminal, I added a replot-on-resize option to make > things 'just work', at the expense of some extra cpu cycles. > > The code works for my test cases, but I had to touch enough stuff to > make me concerned about cases that I missed. How are such changes > tested, usually? I didn't see a test suite. > > The code is in a git repo at > > https://github.com/dkogan/gnuplot > > I'm also attaching a patch for a diff from the latest code in CVS, as > of Sep 2012. > > As an aside, the main reason I'm touching the x11 terminal at all > (instead of just moving to wxt or qt) is that it appears to be much > faster than the newer terminals. Particularly, if you have a 3d plot > with lots of points, interactively rotating the data is noticeably much > snappier with x11. Is this expected? I'm not sure many of us here know what to expect given limited experience with Qt. I thought I had read there is somewhere that Qt can be tweaked to refresh at a faster rate. I suspect too that Qt terminal is doing much more internally, such as anti-aliasing and/or alpha scaling. X11 terminal is really efficient code but has limited features on account of the limited nature of X terminals. Anyway, I think X11 is still something good to have around. Some systems might not have Qt installed for whatever reason. Dan |
From: <pl...@pi...> - 2012-09-23 09:17:47
|
On 09/23/12 01:14, gn...@di... wrote: > As an aside, the main reason I'm touching the x11 terminal at all > (instead of just moving to wxt or qt) is that it appears to be much > faster than the newer terminals. Particularly, if you have a 3d plot > with lots of points, interactively rotating the data is noticeably much > snappier with x11. Is this expected? > > dima I'll have to try x11 again, I've been doing some 3D plots this week, Smoother , fast rotation would be nice. I'm in the habit of using wxt since it looks a little less "bare-bones" than x11 and the button functions are useful. Since both qt and wxt end up driving x11 windows, it would be expected that working directly with x11 would be more efficient. Thanks for the patch. Peter. |
From: sfeam (E. Merritt) <eam...@gm...> - 2012-09-23 18:04:33
|
On Saturday, 22 September 2012, gn...@di... wrote: > Hi all. > > I tackled the long-standing issue of the x11 terminal not respecting the > requested plot aspect ratio. That's great. As you say, It has been a very long-standing request. > I have a branch that handles this issue similar to the way the wxt > terminal does: > > - inboard driver computes a particular x,y scale factors > - terminal ALWAYS respects this aspect ratio > - resizing the window does NOT touch the aspect ratio > - a replot is required to re-compute the scale factors > > Similar to the qt terminal, I added a replot-on-resize option to make > things 'just work', at the expense of some extra cpu cycles. > > The code works for my test cases, but I had to touch enough stuff to > make me concerned about cases that I missed. How are such changes > tested, usually? I didn't see a test suite. Several demos rely on "set size square", notably including poldat.dem polar.dem These work well with your patch. We don't seem to have a demo that exercises "set view equal xyz". That would be a nice addition. It's hard to make a test suite for manual interaction with the terminal window. What did you have in mind? > The code is in a git repo at > > https://github.com/dkogan/gnuplot > > I'm also attaching a patch for a diff from the latest code in CVS, as > of Sep 2012. > > As an aside, the main reason I'm touching the x11 terminal at all > (instead of just moving to wxt or qt) is that it appears to be much > faster than the newer terminals. Particularly, if you have a 3d plot > with lots of points, interactively rotating the data is noticeably much > snappier with x11. Is this expected? I suppose so. The x11 output goes through fewer layers of processing and buffer transfer. Also it isn't being antialiased or oversampled. Still, for plots containing only points and lines I find the 3D response in both wxt and qt to be perfectly acceptable. Both the wxt and qt terminals allow you to disable antialiasing and oversampling; that may or may not make a noticeable difference to your particular plots. 3D image plots are another story, slower, particularly in qt. Note the following caveat about the speed of qt rendering [from "help set term qt"] The Qt rendering speed is affected strongly by the rendering mode used. In Qt version 4.7 or newer this can be controlled by the environmental variable QT_GRAPHICSSYSTEM. The options are "native", "raster", or "opengl" in order of increasing rendering speed. For earlier versions of Qt the terminal defaults to "raster". It should note that there may be additional platform-dependent rendering options (e.g. "openvg" on symbian). Ethan |
From: Dima K. <gn...@di...> - 2012-09-23 21:31:14
|
> On Sun, 23 Sep 2012 11:04:18 -0700 > "sfeam (Ethan Merritt)" <eam...@gm...> wrote: > > Several demos rely on "set size square", notably including > poldat.dem polar.dem > These work well with your patch. > > We don't seem to have a demo that exercises "set view equal xyz". > That would be a nice addition. > > It's hard to make a test suite for manual interaction with the > terminal window. What did you have in mind? It would be a large undertaking to write a reliable tester that actually looks at the generated output, so I'm not suggesting anything in particular. I did manually test - 2d, 3d plots - pressing '7' and 'e' multiple times - resizing windows - starting with various aspect ratio settings - all of the above in various combinations I did NOT test - multiplots - switching terminals (i.e. making sure that nothing blows up when you set term pdf, and then set term x11 again) - fancier x11 features, such as plotting into embedded windows - any OS other than amd64 Debian - any window manager other than ion3 These things are complicated, so subtle regressions may have been introduced. I built a package with the patch, and will use it as my everyday plotter for a while to at least exercise the common cases. If some people on this list try it out as well, that'd be great. > > As an aside, the main reason I'm touching the x11 terminal at all > > (instead of just moving to wxt or qt) is that it appears to be much > > faster than the newer terminals. Particularly, if you have a 3d plot > > with lots of points, interactively rotating the data is noticeably > > much snappier with x11. Is this expected? > > I suppose so. The x11 output goes through fewer layers of processing > and buffer transfer. Also it isn't being antialiased or > oversampled. Still, for plots containing only points and lines I find > the 3D response in both wxt and qt to be perfectly acceptable. > > Both the wxt and qt terminals allow you to disable antialiasing and > oversampling; that may or may not make a noticeable difference to > your particular plots. 3D image plots are another story, slower, > particularly in qt. > > Note the following caveat about the speed of qt rendering > [from "help set term qt"] > > The Qt rendering speed is affected strongly by the rendering mode > used. In Qt version 4.7 or newer this can be controlled by the > environmental variable QT_GRAPHICSSYSTEM. The options are "native", > "raster", or "opengl" in order of increasing rendering speed. For > earlier versions of Qt the terminal defaults to "raster". When I did these tests a few months ago, x11 was by far the most responsive, even after turning off all fancy rendering options in the newer terminals (antialiasing and such). I'm perfectly happy to just use the x11 terminal, so at least for me these aren't major issues. dima |
From: sfeam (E. Merritt) <eam...@gm...> - 2012-09-26 04:32:07
|
On Saturday, 22 September 2012, Dima Kogan <gn...@di...> wrote: > > I tackled the long-standing issue of the x11 terminal not respecting the > requested plot aspect ratio. There have been many bugs about this on the > tracker. The main one appears to be > > http://sourceforge.net/tracker/index.php?func=detail&aid=3331162&group_id=2055&atid=102055 > > I have a branch that handles this issue similar to the way the wxt > terminal does: > > - inboard driver computes a particular x,y scale factors > - terminal ALWAYS respects this aspect ratio > - resizing the window does NOT touch the aspect ratio > - a replot is required to re-compute the scale factors > > Similar to the qt terminal, I added a replot-on-resize option to make > things 'just work', at the expense of some extra cpu cycles. > > The code works for my test cases, but I had to touch enough stuff to > make me concerned about cases that I missed. How are such changes > tested, usually? I didn't see a test suite. Hi Dima. I've applied your patch to CVS in the development branch after review and testing. This is a great addition. Thanks so much! I made some trivial code changes to conform to coding style and wrapped the relevant sections with #ifdef X11 so that it does not affect builds for non-X11 systems. Also I flipped the default to "replotonresize", because at least for me the speed penalty on resizing is negligible, while the benefit is clear. The only actual logic change I made was to restore the previous on_event() code for terminals other than x11. I.e., it now reads #ifdef X11 if (!strcmp(term->name,"x11")) { /* New code */ } else #endif /* Old code */ Now to go close some old, old feature requests and bug tracker entries. Ethan |
From: Dima K. <gn...@di...> - 2012-09-27 08:40:27
Attachments:
x11_larger_points.patch
|
> On Tue, 25 Sep 2012 21:31:55 -0700 > "sfeam (Ethan Merritt)" <eam...@gm...> wrote: > > On Saturday, 22 September 2012, Dima Kogan > <gn...@di...> wrote: > > > > I tackled the long-standing issue of the x11 terminal not > > respecting the requested plot aspect ratio > <snip> > > Hi Dima. > > I've applied your patch to CVS in the development branch after review > and testing. <snip> Hi Ethan. Thanks for applying the patch. Main difference I noticed so far is that the point sizes appear smaller in the x11 terminal than they were before this patch. I'm attaching another patch that tries to handle the point sizes in the same way as the wxt and qt terminals do. After this, the points look similar to the way they looked before. dima |
From: Ethan A M. <sf...@us...> - 2012-09-27 18:43:46
|
On Thursday, September 27, 2012 01:40:18 am Dima Kogan wrote: > > On Tue, 25 Sep 2012 21:31:55 -0700 > > "sfeam (Ethan Merritt)" <eam...@gm...> wrote: > > > > On Saturday, 22 September 2012, Dima Kogan > > <gn...@di...> wrote: > > > > > > I tackled the long-standing issue of the x11 terminal not > > > respecting the requested plot aspect ratio > > <snip> > > > > Hi Dima. > > > > I've applied your patch to CVS in the development branch after review > > and testing. <snip> > > Hi Ethan. Thanks for applying the patch. Main difference I noticed so far is > that the point sizes appear smaller in the x11 terminal than they were before > this patch. I have found another problem. I suspect it may be an integer overflow, but if so I can't figure out where. Here's a simple recipe to show the problem: gnuplot> set term x11 gnuplot> set sample 15 gnuplot> plot x with points pt 5 1) Adjust the X-window so that the plot is very tall. 2) Gradually make the plot window narrower and narrower. At some point the top border of the plot stops being correctly drawn near the top of the window; instead it is drawn near the bottom. gnuplot> show var GPVAL_TERM GPVAL_TERM_XMIN = 473 GPVAL_TERM_XMAX = 3837 GPVAL_TERM_YMIN = 280 GPVAL_TERM_YMAX = 10072 GPVAL_TERM_XSIZE = 4096 GPVAL_TERM_YSIZE = 10245 The problem seems to trigger when YMAX crosses exactly 10000, but I don't see any relevant use of 9999 or 10000 as a constant in the code. Perhaps it's a formatted field overflow from 4 digits to 5 in the y coordinate? > I'm attaching another patch that tries to handle the point sizes in > the same way as the wxt and qt terminals do. After this, the points look similar > to the way they looked before. OK, thanks. But please have a look at the problem described above. > > dima > Ethan |
From: Ethan M. <merritt@u.washington.edu> - 2012-09-27 19:06:40
|
On Thursday, September 27, 2012 11:41:50 am Ethan A Merritt wrote: > On Thursday, September 27, 2012 01:40:18 am Dima Kogan wrote: > > > On Tue, 25 Sep 2012 21:31:55 -0700 > > > "sfeam (Ethan Merritt)" <eam...@gm...> wrote: > > > > > > On Saturday, 22 September 2012, Dima Kogan > > > <gn...@di...> wrote: > > > > > > > > I tackled the long-standing issue of the x11 terminal not > > > > respecting the requested plot aspect ratio > > > <snip> > > > > > > Hi Dima. > > > > > > I've applied your patch to CVS in the development branch after review > > > and testing. <snip> > > > > Hi Ethan. Thanks for applying the patch. Main difference I noticed so far is > > that the point sizes appear smaller in the x11 terminal than they were before > > this patch. > > I have found another problem. I suspect it may be an integer overflow, but > if so I can't figure out where. Here's a simple recipe to show the problem: > gnuplot> set term x11 > gnuplot> set sample 15 > gnuplot> plot x with points pt 5 > > 1) Adjust the X-window so that the plot is very tall. > 2) Gradually make the plot window narrower and narrower. > At some point the top border of the plot stops being correctly drawn near > the top of the window; instead it is drawn near the bottom. > > gnuplot> show var GPVAL_TERM > GPVAL_TERM_XMIN = 473 > GPVAL_TERM_XMAX = 3837 > GPVAL_TERM_YMIN = 280 > GPVAL_TERM_YMAX = 10072 > GPVAL_TERM_XSIZE = 4096 > GPVAL_TERM_YSIZE = 10245 > > The problem seems to trigger when YMAX crosses exactly 10000, > but I don't see any relevant use of 9999 or 10000 as a constant in the code. > Perhaps it's a formatted field overflow from 4 digits to 5 in the y coordinate? I think that must be it. The coordinates sent from x11.trm to gnuplot_x11 use format statements like PRINT2("V%04d%04d\n", x, y); It looks like either the field width has to be increased everywhere to 5 digits, or else the new rescaling code needs to rethought (perhaps by adjusting both xsize and ysize rather than only adjusting ysize?) Increasing the field width has the drawback that more bytes will be sent over the channel regardless of the window size, which can slow down the communication. What to do? Ethan > > > > I'm attaching another patch that tries to handle the point sizes in > > the same way as the wxt and qt terminals do. After this, the points look similar > > to the way they looked before. > > OK, thanks. > But please have a look at the problem described above. > > > > > dima > > > > Ethan > > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://ad.doubleclick.net/clk;258768047;13503038;j? > http://info.appdynamics.com/FreeJavaPerformanceDownload.html > _______________________________________________ > gnuplot-beta mailing list > gnu...@li... > https://lists.sourceforge.net/lists/listinfo/gnuplot-beta > -- Ethan A Merritt Biomolecular Structure Center, K-428 Health Sciences Bldg University of Washington, Seattle 98195-7742 |
From: Dima K. <gn...@di...> - 2012-09-28 08:35:20
Attachments:
x11_comm_separated_scanf.patch
|
> On Thu, 27 Sep 2012 12:00:10 -0700 > Ethan Merritt <merritt@u.washington.edu> wrote: > > On Thursday, September 27, 2012 11:41:50 am Ethan A Merritt wrote: > > On Thursday, September 27, 2012 01:40:18 am Dima Kogan wrote: > > > > On Tue, 25 Sep 2012 21:31:55 -0700 > > > > "sfeam (Ethan Merritt)" <eam...@gm...> wrote: > > > > > > > > On Saturday, 22 September 2012, Dima Kogan > > > > <gn...@di...> wrote: > > > > > > > > > > I tackled the long-standing issue of the x11 terminal not > > > > > respecting the requested plot aspect ratio > > > > <snip> > > > > > > > > Hi Dima. > > > > > > > > I've applied your patch to CVS in the development branch after > > > > review and testing. <snip> > > > > > > Hi Ethan. Thanks for applying the patch. Main difference I > > > noticed so far is that the point sizes appear smaller in the x11 > > > terminal than they were before this patch. > > > > I have found another problem. I suspect it may be an integer > > overflow, but if so I can't figure out where. Here's a simple > > recipe to show the problem: > > gnuplot> set term x11 > > gnuplot> set sample 15 > > gnuplot> plot x with points pt 5 > > > > 1) Adjust the X-window so that the plot is very tall. > > 2) Gradually make the plot window narrower and narrower. > > At some point the top border of the plot stops being correctly > > drawn near the top of the window; instead it is drawn near the > > bottom. > > > > gnuplot> show var GPVAL_TERM > > GPVAL_TERM_XMIN = 473 > > GPVAL_TERM_XMAX = 3837 > > GPVAL_TERM_YMIN = 280 > > GPVAL_TERM_YMAX = 10072 > > GPVAL_TERM_XSIZE = 4096 > > GPVAL_TERM_YSIZE = 10245 > > > > The problem seems to trigger when YMAX crosses exactly 10000, > > but I don't see any relevant use of 9999 or 10000 as a constant in > > the code. Perhaps it's a formatted field overflow from 4 digits to > > 5 in the y coordinate? > > I think that must be it. The coordinates sent from x11.trm to > gnuplot_x11 use format statements like PRINT2("V%04d%04d\n", x, y); > > It looks like either the field width has to be increased everywhere > to 5 digits, or else the new rescaling code needs to rethought > (perhaps by adjusting both xsize and ysize rather than only adjusting > ysize?) Increasing the field width has the drawback that more bytes > will be sent over the channel regardless of the window size, which > can slow down the communication. > > What to do? > > Ethan Hi Ethan. Good you found the issue. Do you know why field widths are specified at all? Why don't we PRINT2("V%d %d\n") and then sscanf("%d %d", ...) on the other side? No efficiency is gained by specifying the widths, it only builds in limitations and makes the code brittle, as we have just observed. I'm attaching a patch that removes hardcoded field sizes from ALL %d and %u fields. Some extra care had to be taken in places where a numerical field is followed by a string (%s), since the '4' was hardcoded in those places too. The bug you described is now gone, and the code is more robust. I noticed another, very related issue. x11.trm has /* badly outrange labels can overflow into text field */ if (x < 10000 && y < 10000) { PRINT3("T%d %d %s\n", x, y, str); } Here we're hardcoding the 10000 again. This has the effect of not printing the legend label when the window is skinny (as you were making it). This test needs to be adjusted or removed entirely; not sure which. CVS says this test was added by lhecking in 1998. This predates all of sourceforge. Newsgroup logs go back that far, but I didn't see any mention of this issue. Thoughts? Finally, in writing the patch, I stumbled on what looks like a bug in what looks like dead code. In gplt_x11.c, look at the else if (*buffer == X11_GR_FILLED_POLYGON) { /* filled polygon */ ... } block. Look at where the ptr is updated (the ptr += ... lines). I suspect the 2nd update of this pointer (the one not in an if() or a while()) was meant to happen only if the first one didn't happen. Otherwise we're skipping over data we haven't looked at yet. The attached patch could be wrong here, depending on what is actually meant. In any case, I can't find anything in the inboard driver that sends X11_GR_FILLED_POLYGON commands. None of the demos trigger this block either. I suspect the whole block is dead, and this bug is thus never hit. If this whole block is truly dead, it should be removed. If this actually IS called somehow, I'd like to know from where to make sure the attached patch is good. dima |
From: Ethan A M. <sf...@us...> - 2012-09-28 16:40:38
|
On Friday, September 28, 2012 01:35:07 am Dima Kogan wrote: > > On Thu, 27 Sep 2012 12:00:10 -0700 > > Ethan Merritt <merritt@u.washington.edu> wrote: > > > The problem seems to trigger when YMAX crosses exactly 10000, > > > Perhaps it's a formatted field overflow from 4 digits to > > > 5 in the y coordinate? > > > > I think that must be it. The coordinates sent from x11.trm to > > gnuplot_x11 use format statements like PRINT2("V%04d%04d\n", x, y); > > > > It looks like either the field width has to be increased everywhere > > to 5 digits, or else the new rescaling code needs to rethought > > (perhaps by adjusting both xsize and ysize rather than only adjusting > > ysize?) Increasing the field width has the drawback that more bytes > > will be sent over the channel regardless of the window size, which > > can slow down the communication. > > > > What to do? > > > > Ethan > > Hi Ethan. > > Good you found the issue. Do you know why field widths are specified at > all? Why don't we PRINT2("V%d %d\n") and then sscanf("%d %d", ...) on > the other side? No efficiency is gained by specifying the widths, it > only builds in limitations and makes the code brittle, as we have just > observed. The basic X11 framework was in place long before my involvement, so I do not know. My best guess is that the fixed field width was chosen so as to reduce the amount of data sent through the gnuplot->gnuplot_x11 channel. Adding a space in front of every coordinate pair would increase the traffic by as much as 25%. As I recall, when the polygon encoding was switched from formatted to binary (2004), the advocates of binary encoding/decoding had benchmarks showing data transfer really was a performance bottleneck, and reducing the number of bytes sent over the channel resulted in faster plotting. I do not know if this would still be true on modern machines. We should benchmark before and after your latest patch. > I'm attaching a patch that removes hardcoded field sizes from ALL %d > and %u fields. Some extra care had to be taken in places where a > numerical field is followed by a string (%s), since the '4' was > hardcoded in those places too. The bug you described is now gone, and > the code is more robust. > > I noticed another, very related issue. x11.trm has > > /* badly outrange labels can overflow into text field */ > if (x < 10000 && y < 10000) { > PRINT3("T%d %d %s\n", x, y, str); > } > > Here we're hardcoding the 10000 again. This has the effect of not > printing the legend label when the window is skinny (as you were making > it). This test needs to be adjusted or removed entirely; not sure > which. CVS says this test was added by lhecking in 1998. This predates > all of sourceforge. Newsgroup logs go back that far, but I didn't see > any mention of this issue. Thoughts? That does look like an early instance of the same problem. But I wonder how that code ever triggered? > Finally, in writing the patch, I stumbled on what looks like a bug in > what looks like dead code. In gplt_x11.c, look at the > > else if (*buffer == X11_GR_FILLED_POLYGON) { /* filled > [snip] > I suspect the whole block is dead, and this bug is thus > never hit. If this whole block is truly dead, it should be removed. > dima I think it must be left over from when the choice between binary/ascii polygon encoding was a configuration option. Since 2009 the ascii option no longer exists. So yeah, that section is dead code. Ethan |
From: Dima K. <gn...@di...> - 2012-09-28 17:37:13
|
> > Hi Ethan. > > > > Good you found the issue. Do you know why field widths are specified at > > all? Why don't we PRINT2("V%d %d\n") and then sscanf("%d %d", ...) on > > the other side? No efficiency is gained by specifying the widths, it > > only builds in limitations and makes the code brittle, as we have just > > observed. > > The basic X11 framework was in place long before my involvement, so I do > not know. My best guess is that the fixed field width was chosen so as > to reduce the amount of data sent through the gnuplot->gnuplot_x11 > channel. > Adding a space in front of every coordinate pair would increase the > traffic by as much as 25%. > > As I recall, when the polygon encoding was switched from formatted to > binary (2004), the advocates of binary encoding/decoding had benchmarks > showing data transfer really was a performance bottleneck, and reducing > the number of bytes sent over the channel resulted in faster plotting. > I do not know if this would still be true on modern machines. > > We should benchmark before and after your latest patch. The extra space can also save bytes when the values being sent across have fewer than 4 digits in them. As I see it, one should use ASCII data links if they want robustness and readability, and binary ones if they want speed. Here we're sending data in ASCII, while worrying about a few extra cycles. If you make up a test that benchmarks before and after this patch, I'll write another version of the data passing, that uses binary data; if there're any performance gains here, that's where they are. > > > I'm attaching a patch that removes hardcoded field sizes from ALL %d > > and %u fields. Some extra care had to be taken in places where a > > numerical field is followed by a string (%s), since the '4' was > > hardcoded in those places too. The bug you described is now gone, and > > the code is more robust. > > > > I noticed another, very related issue. x11.trm has > > > > /* badly outrange labels can overflow into text field */ > > if (x < 10000 && y < 10000) { > > PRINT3("T%d %d %s\n", x, y, str); > > } > > > > Here we're hardcoding the 10000 again. This has the effect of not > > printing the legend label when the window is skinny (as you were making > > it). This test needs to be adjusted or removed entirely; not sure > > which. CVS says this test was added by lhecking in 1998. This predates > > all of sourceforge. Newsgroup logs go back that far, but I didn't see > > any mention of this issue. Thoughts? > > That does look like an early instance of the same problem. > But I wonder how that code ever triggered? Is lhecking still around? Think he still remembers? > > > Finally, in writing the patch, I stumbled on what looks like a bug in > > what looks like dead code. In gplt_x11.c, look at the > > > > else if (*buffer == X11_GR_FILLED_POLYGON) { /* filled > > [snip] > > I suspect the whole block is dead, and this bug is thus > > never hit. If this whole block is truly dead, it should be removed. > > dima > > I think it must be left over from when the choice between binary/ascii > polygon encoding was a configuration option. Since 2009 the ascii > option no longer exists. So yeah, that section is dead code. Great. Can you remove it then? Dima |
From: Dima K. <gn...@di...> - 2012-09-29 09:55:55
Attachments:
0001-V-command-now-binary.patch
|
> On Fri, 28 Sep 2012 10:37:06 -0700 > Dima Kogan <gn...@di...> wrote: > > > > Hi Ethan. > > > > > > Good you found the issue. Do you know why field widths are > > > specified at all? Why don't we PRINT2("V%d %d\n") and then > > > sscanf("%d %d", ...) on the other side? No efficiency is gained > > > by specifying the widths, it only builds in limitations and makes > > > the code brittle, as we have just observed. > > > > The basic X11 framework was in place long before my involvement, so > > I do not know. My best guess is that the fixed field width was > > chosen so as to reduce the amount of data sent through the > > gnuplot->gnuplot_x11 channel. > > Adding a space in front of every coordinate pair would increase the > > traffic by as much as 25%. > > > > As I recall, when the polygon encoding was switched from formatted > > to binary (2004), the advocates of binary encoding/decoding had > > benchmarks showing data transfer really was a performance > > bottleneck, and reducing the number of bytes sent over the channel > > resulted in faster plotting. I do not know if this would still be > > true on modern machines. > > > > We should benchmark before and after your latest patch. > > The extra space can also save bytes when the values being sent across > have fewer than 4 digits in them. As I see it, one should use ASCII > data links if they want robustness and readability, and binary ones > if they want speed. Here we're sending data in ASCII, while worrying > about a few extra cycles. If you make up a test that benchmarks > before and after this patch, I'll write another version of the data > passing, that uses binary data; if there're any performance gains > here, that's where they are. I just ran some benchmarks myself. The results are quite interesting. First off, the test machine description: gcc (Debian 4.7.0-12) 4.7.0 CPU: vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz stepping : 6 This is a 2-core machine, but everything in gnuplot is single-threaded so this doesn't matter. I generated 2 large-ish data files. Both contain an identical sinusoid: one stores it in ascii, another in binary with packed single-precision floats. This isn't what we're testing, but I wanted to get this conversion step into the data as a reference. Commands to generate data: $ perl -e 'for(0..2000000) { print pack "f*", $_, sin($_/100000); }' > dat.bin $ perl -e 'for(0..2000000) { print "$_ " . sin($_/100000) . "\n"; }' > dat.ascii For each gnuplot build I tested, I timed 3 things: 1. inboard x11 from binary (with 'terminal xlib') 2. inboard x11 from ascii (with 'terminal xlib') 3. outboard x11 (just gnuplot_x11 executable) I tested 4 different gnuplot builds: 1. before the split-printf patch (uses %04d format) 2. after the split-printf patch (uses space-separated %d format) 3. after the split-printf-patch but ALSO sending the V command in binary. V commands were the bulk of the xlib-generated data stream, so this is our hotspot. 16-bit integers for each argument of V 4. Same as previous, but using 32-bit integers ASCII input tests were run by making a 'tst.gp' file with ================ set term xlib set output "out.xlib" plot "dat.ascii" with lines ================ Then generating timings by running multiple times $ time ./gnuplot tst.gp $ time ./gnuplot_x11 < out.xlib > /dev/null Binary input tests were run similarly, but with a 'tst.gp' such as ================ set term xlib set output "out.xlib" plot "dat.bin" binary format="%float32%float32" with lines ================ Results. Each record is usertime,systemtime in seconds. | | %04d | %d | %d, 16-bit binary V | %d, 32-bit binary V | |---------------------+-----------+-----------+---------------------+---------------------| | inboard from ASCII | 1.91,0.21 | 1.86,0.21 | 1.53,0.19 | 1.53,0.21 | | inboard from binary | 0.95,0.20 | 0.89,0.20 | 0.56,0.16 | 0.56,0.20 | | outboard | 0.82,0.11 | 0.85,0.11 | 0.19,0.10 | 0.21,0.11 | First off, we see that splitting the fields with whitespace speeds up the inboard driver a little bit, while slowing down the outboard one a little bit. Not completely sure why, but the difference is negligible, so I didn't go digging. However, sending the data over in binary produces HUGE performance gains. The inboard driver is 37% faster (when the original input is binary too), while the outboard one is a whopping 76% faster. Not quite sure why this is so uneven; maybe the outboard driver parses the data more times than it needs to? None of this is really surprising, but it really reinforces the earlier point that seeking performance gains in our ASCII representation is foolish, leading to minimal speedups while making the code less manageable. When I started this, I wasn't going to advocate that we move to a binary data stream, but the speedup is so significant that we really should, I think. I'm attaching a patch that changes the V command to work in binary. Note that this patch is not at all good-enough to merge yet; I'm attaching it so that others could run these tests as well. So, should we move the intensive commands to binary? What commands are these, other than 'V'? dima |
From: sfeam (E. Merritt) <eam...@gm...> - 2012-09-29 22:30:54
Attachments:
x11_strtol_is_faster.patch
|
On Saturday, 29 September 2012, Dima Kogan wrote: > Results. Each record is usertime,systemtime in seconds. > > | | %04d | %d | %d, 16-bit binary V | %d, 32-bit binary V | > |---------------------+-----------+-----------+---------------------+---------------------| > | inboard from ASCII | 1.91,0.21 | 1.86,0.21 | 1.53,0.19 | 1.53,0.21 | > | inboard from binary | 0.95,0.20 | 0.89,0.20 | 0.56,0.16 | 0.56,0.20 | > | outboard | 0.82,0.11 | 0.85,0.11 | 0.19,0.10 | 0.21,0.11 | > > First off, we see that splitting the fields with whitespace speeds up > the inboard driver a little bit, while slowing down the outboard one a > little bit. Not completely sure why, but the difference is negligible, > so I didn't go digging. > > However, sending the data over in binary produces HUGE performance > gains. The inboard driver is 37% faster (when the original input is > binary too), while the outboard one is a whopping 76% faster. Not quite > sure why this is so uneven; maybe the outboard driver parses the data > more times than it needs to? I get a 50% speed improvement in the ascii version by replacing scanf() with strtol() (see attached patch). That reduces the speed difference between ascii and binary from ~4x to ~2x. There is a slight further improvement by replacing the second call to strtol() with atoi(); Ethan |
From: sfeam (E. Merritt) <eam...@gm...> - 2012-09-29 18:44:24
|
On Saturday, 29 September 2012, Dima Kogan wrote: > > On Fri, 28 Sep 2012 10:37:06 -0700 Hi Dima, Just some preliminary thoughts... I quickly replicated your benchmarks and see roughly the same results (faster CPU but limited memory; 32bit environment). However, I have a few general concerns. 1) The output of the ascii version is hugely redundant. Most of the successive V commands are identical because the vectors are shorter than the resolution of the plot coordinates. If this were a common thing we would do well to add a filtering step in x11.trm so that a new "V" command is only sent if it differs from the previous command. For example running the ascii output through "uniq" reduces the file size by a factor of 7, with a speed increase of 50x(!!) running through gnuplot_x11. 2) The binary output is not terminating the "V" records with a '\n'. This is fine on linux, but there is a long history of problem reports on Windows arising from very long buffers sent through a pipe. I don't know the details well enough to say whether this would trigger similar problems, but I do worry. I think it's worth also trying a variant that writes a trailing '\n'. If writing a newline for every V command causes a significant slowdown then perhaps we could generalize the existing code in X11_filled_polygon() that breaks long binary buffers into smaller chunks. 3) I worry that the binary code might not work on all supported platforms. Since it's just a few lines of code, I suggest that rather than replacing the existing 'V' command we add a parallel command 'B' for the bainry version. In x11.trm the choice between using 'V' or 'B' could be a compile-time option. We can default to the binary version, but anyone having problems with it could revert to the old ascii version with a configuration flag. I'll play around with more serious benchmarking if I get time later this weekend. Oh, and I noticed something else about the new aspect-ratio code while running the benchmarks. I use ./gnuplot_x11 -noevents < foo.x11 to benchmark the outboard driver. Since there is no feedback in place, the inboard driver doesn't know about the aspect ratio and the plot is not scaled properly to the plot window. Now this is not the usual path for display x11 output, but I wonder if we can fix that easily. Maybe disabling the rescaling code if the feedback pipe is not present? Or maybe sending an initial set of scaling commands that are always correct, which are later over-ridden as needed in the interactive case but remain in effect for the case of -noevents or no feedback pipe? Ethan > I just ran some benchmarks myself. The results are quite interesting. > > First off, the test machine description: > > gcc (Debian 4.7.0-12) 4.7.0 > > CPU: > vendor_id : GenuineIntel > cpu family : 6 > model : 15 > model name : Intel(R) Core(TM)2 CPU T7400 @ 2.16GHz > stepping : 6 > > This is a 2-core machine, but everything in gnuplot is single-threaded > so this doesn't matter. > > I generated 2 large-ish data files. Both contain an identical sinusoid: > one stores it in ascii, another in binary with packed single-precision > floats. This isn't what we're testing, but I wanted to get this > conversion step into the data as a reference. Commands to generate data: > > $ perl -e 'for(0..2000000) { print pack "f*", $_, sin($_/100000); }' > dat.bin > $ perl -e 'for(0..2000000) { print "$_ " . sin($_/100000) . "\n"; }' > dat.ascii > > > For each gnuplot build I tested, I timed 3 things: > > 1. inboard x11 from binary (with 'terminal xlib') > 2. inboard x11 from ascii (with 'terminal xlib') > 3. outboard x11 (just gnuplot_x11 executable) > > I tested 4 different gnuplot builds: > > 1. before the split-printf patch (uses %04d format) > 2. after the split-printf patch (uses space-separated %d format) > 3. after the split-printf-patch but ALSO sending the V command in > binary. V commands were the bulk of the xlib-generated data stream, so > this is our hotspot. 16-bit integers for each argument of V > 4. Same as previous, but using 32-bit integers > > ASCII input tests were run by making a 'tst.gp' file with > > ================ > set term xlib > set output "out.xlib" > plot "dat.ascii" with lines > ================ > > Then generating timings by running multiple times > $ time ./gnuplot tst.gp > $ time ./gnuplot_x11 < out.xlib > /dev/null > > > Binary input tests were run similarly, but with a 'tst.gp' such as > > ================ > set term xlib > set output "out.xlib" > plot "dat.bin" binary format="%float32%float32" with lines > ================ > > > Results. Each record is usertime,systemtime in seconds. > > | | %04d | %d | %d, 16-bit binary V | %d, 32-bit binary V | > |---------------------+-----------+-----------+---------------------+---------------------| > | inboard from ASCII | 1.91,0.21 | 1.86,0.21 | 1.53,0.19 | 1.53,0.21 | > | inboard from binary | 0.95,0.20 | 0.89,0.20 | 0.56,0.16 | 0.56,0.20 | > | outboard | 0.82,0.11 | 0.85,0.11 | 0.19,0.10 | 0.21,0.11 | > > First off, we see that splitting the fields with whitespace speeds up > the inboard driver a little bit, while slowing down the outboard one a > little bit. Not completely sure why, but the difference is negligible, > so I didn't go digging. > > However, sending the data over in binary produces HUGE performance > gains. The inboard driver is 37% faster (when the original input is > binary too), while the outboard one is a whopping 76% faster. Not quite > sure why this is so uneven; maybe the outboard driver parses the data > more times than it needs to? > > None of this is really surprising, but it really reinforces the earlier > point that seeking performance gains in our ASCII representation is > foolish, leading to minimal speedups while making the code less > manageable. When I started this, I wasn't going to advocate that we > move to a binary data stream, but the speedup is so significant that we > really should, I think. > > I'm attaching a patch that changes the V command to work in binary. > Note that this patch is not at all good-enough to merge yet; I'm > attaching it so that others could run these tests as well. So, should > we move the intensive commands to binary? What commands are these, > other than 'V'? > > dima > |
From: Dima K. <gn...@di...> - 2012-09-29 23:24:21
|
Hi! > On Sat, 29 Sep 2012 11:44:09 -0700 > "sfeam (Ethan Merritt)" <eam...@gm...> wrote: > > On Saturday, 29 September 2012, Dima Kogan wrote: > > > On Fri, 28 Sep 2012 10:37:06 -0700 > > Hi Dima, > > Just some preliminary thoughts... > I quickly replicated your benchmarks and see roughly the same results > (faster CPU but limited memory; 32bit environment). > > However, I have a few general concerns. > > 1) The output of the ascii version is hugely redundant. Most of the > successive V commands are identical because the vectors are shorter > than the resolution of the plot coordinates. If this were a common > thing we would do well to add a filtering step in x11.trm so that > a new "V" command is only sent if it differs from the previous command. > For example running the ascii output through "uniq" reduces the file > size by a factor of 7, with a speed increase of 50x(!!) running > through gnuplot_x11. Sounds great. Should I do this, or do you want to? > 2) The binary output is not terminating the "V" records with a '\n'. > This is fine on linux, but there is a long history of problem reports > on Windows arising from very long buffers sent through a pipe. I don't > know the details well enough to say whether this would trigger similar > problems, but I do worry. I think it's worth also trying a variant > that writes a trailing '\n'. If writing a newline for every V command > causes a significant slowdown then perhaps we could generalize the > existing code in X11_filled_polygon() that breaks long binary buffers > into smaller chunks. The binary-V implementation I had wouldn't touch the trailing '\n', since a particular binary field size is assumed. Furthermore, the binary numbers could have '\n' bytes in them, making a trailing '\n' even less meaningful. Out of curiosity, I changed to code to output a '\n' after each binary V record. There was a slight, but noticeable speed penalty. If we go down the binary-data path, I'd prefer to not to add the '\n' simply because if anything ever looks at that byte, then something had already gone wrong. > 3) I worry that the binary code might not work on all supported > platforms. Since it's just a few lines of code, I suggest that rather > than replacing the existing 'V' command we add a parallel command > 'B' for the bainry version. In x11.trm the choice between using 'V' > or 'B' could be a compile-time option. We can default to the binary > version, but anyone having problems with it could revert to the old > ascii version with a configuration flag. My preference would be to switch entirely so that the amount of code being maintained doesn't grow, but you're the boss. :) > Oh, and I noticed something else about the new aspect-ratio code while > running the benchmarks. I use > ./gnuplot_x11 -noevents < foo.x11 > to benchmark the outboard driver. Since there is no feedback in place, > the inboard driver doesn't know about the aspect ratio and the plot > is not scaled properly to the plot window. Now this is not the usual > path for display x11 output, but I wonder if we can fix that easily. > Maybe disabling the rescaling code if the feedback pipe is not present? > Or maybe sending an initial set of scaling commands that are always > correct, which are later over-ridden as needed in the interactive case > but remain in effect for the case of -noevents or no feedback pipe? I'll take a look into this. Keeping track of all the patches, proposals, the following are currently being considered: 1. Patch for variable-width, space-separated fields to allow 5-digit values to 'just work'; sent out to this list a few days ago. There was a concern that the extra spaces carry with them a performance cost. The benchmarks I sent out indicate that the performance cost is negligible at worst. I think this patch should be merged 2. Missing legend label with tall windows. Caused by the 'if (x < 10000 && y < 10000)' test in x11.trm. Removing this test entirely seems to work for the most part. The only issue I've seen is that if the window is too tall, the label can overflow past the plot edge. However, the existing test is a very poor way to check for that condition anyway; and if an overflow is detected then both the legend label AND the legend symbol should be removed. So I argue the current test is not useful, and should be removed entirely. 3. Dead code such as the 'else if (*buffer == X11_GR_FILLED_POLYGON)' block. I'd like to push for these to be removed. These make it more difficult for new people to look at the codebase, and make maintenance more difficult than it should be. 4. strtolstrtol() instead of scanf() sounds like an easy win. Can you think of any reason to NOT make that change? 5. Adding default window size to the inboard x11 driver so that the 'terminal xlib' produces plots that have decent defaults when sent to the outboard driver manually. I'll do this. 6. Removing duplicate messages (such as duplicate consecutive V commands) sounds great. We should do it 7. General thoughts about speeding up the inboard -> outboard link by making some things binary. I like binary. Making this switch will probably make some things break at first, but it'll likely be worth it. If we're touching that at all, more radical methods may be better. For instance, instead of a V command for each point, we could have a V command that predeclares a long stream of points. So for instance you'd have "V100\n" to indicate that 100 binary tuples follow. This would reduce the overhead even more, but would require read_input() to be changed to work with long streams of binary data. I prefer a method like this much more than what was in the last patch. If you'd consider merging something like this, tell me and I'll write it. dima |
From: Ethan M. <merritt@u.washington.edu> - 2012-09-30 00:08:19
|
On Saturday, 29 September 2012, Dima Kogan wrote: > Hi! > > > On Sat, 29 Sep 2012 11:44:09 -0700 > > "sfeam (Ethan Merritt)" <eam...@gm...> wrote: > > > > On Saturday, 29 September 2012, Dima Kogan wrote: > > > > On Fri, 28 Sep 2012 10:37:06 -0700 > > 1) The output of the ascii version is hugely redundant. Most of the > > successive V commands are identical because the vectors are shorter > > than the resolution of the plot coordinates. If this were a common > > thing we would do well to add a filtering step in x11.trm so that > > a new "V" command is only sent if it differs from the previous command. > > For example running the ascii output through "uniq" reduces the file > > size by a factor of 7, with a speed increase of 50x(!!) running > > through gnuplot_x11. > > Sounds great. Should I do this, or do you want to? I will look into it. > > 3) I worry that the binary code might not work on all supported > > platforms. Since it's just a few lines of code, I suggest that rather > > than replacing the existing 'V' command we add a parallel command > > 'B' for the bainry version. In x11.trm the choice between using 'V' > > or 'B' could be a compile-time option. We can default to the binary > > version, but anyone having problems with it could revert to the old > > ascii version with a configuration flag. > > My preference would be to switch entirely so that the amount of code being > maintained doesn't grow, but you're the boss. :) I am inclined to let both sit in CVS while we test. If no problems turn up with the binary version we can remove the old ascii code before release. > Keeping track of all the patches, proposals, the following are currently being > considered: > > 1. Patch for variable-width, space-separated fields to allow 5-digit values Merged yesterday. > 2. Missing legend label with tall windows. Caused by the 'if (x < 10000 && y < > 10000)' test in x11.trm. Removing this test entirely seems to work for the > most part. ... should be removed entirely. Merged yesterday. > 3. Dead code such as the 'else if (*buffer == X11_GR_FILLED_POLYGON)' block. Merged yesterday. > 4. strtolstrtol() instead of scanf() sounds like an easy win. Can you think of > any reason to NOT make that change? Only that you are in the process of replacing it altogether with a binary version :-) But if it's a win for 'V' it's almost certainly a win everywhere else as well. > 5. Adding default window size to the inboard x11 driver so that the 'terminal > xlib' produces plots that have decent defaults when sent to the outboard > driver manually. I'll do this. OK. It's not just xlib, however. I think (not 100% sure) that the same problem arises whenever (ipc_back_fd == IPC_BACK_UNUSABLE), e.g. x11 output from a script run non-interactively. > 6. Removing duplicate messages (such as duplicate consecutive V commands) sounds > great. We should do it OK. Low priority because it's relatively rare for normal plots. > 7. General thoughts about speeding up the inboard -> outboard link by making > some things binary. I like binary. Making this switch will probably make some > things break at first, but it'll likely be worth it. If we're touching that > at all, more radical methods may be better. For instance, instead of a V > command for each point, we could have a V command that predeclares a long > stream of points. We do. That's what the X11_POLYLINE code is for. Ethan |
From: Mojca M. <moj...@gm...> - 2012-09-30 09:23:30
|
On Sun, Sep 30, 2012 at 2:07 AM, Ethan Merritt wrote: > >> 6. Removing duplicate messages (such as duplicate consecutive V commands) sounds >> great. We should do it > OK. Low priority because it's relatively rare for normal plots. Not necessary, in particular when plotting "using 0:1" with a huge number of points (or almost any number bigger than screen resolution). Actually, it is what I do all the time. A dedicated program for drawing long signals that I'm using goes even a step further and does the following optimisation: - keep collecting all the points that would use the same pixel in x-direction - out of those points remember the minimum, maximum, first and last value of y. This only works for plots (1) with more or less sequential values of x (most plots?) and it only makes any difference (2) when the number of points is significantly bigger than width of the plot. I agree with Ethan that most plots don't meet the second criteria, but for plots that do, it makes an enormous difference (unless the bottleneck is reading the data from input). When drawing one million points on the screen with 1000 pixels this results in 2-4 points per 1000 values. That's perfectly consistent with Ethan's observation of 50-fold speed-up (and again, it is something that I do on a daily basis). I'll try to figure out how exactly you do the benchmarking of Dima's patches. I'm very curious about the effect on Mac OS X. (I really hated x11 terminal on Mac OS X, but now that I'm using those long traces and want to zoom in, AquaTerm still lacks mouse support and Qt has some serious efficiency problems, x11 is currently the only terminal left that does the job quickly and a lot more efficient than the rest.) Mojca |
From: Dima K. <gn...@di...> - 2012-10-03 08:39:17
|
> On Sat, 29 Sep 2012 17:07:51 -0700 > Ethan Merritt <merritt@u.washington.edu> wrote: > > > 5. Adding default window size to the inboard x11 driver so that the 'terminal > > xlib' produces plots that have decent defaults when sent to the outboard > > driver manually. I'll do this. > OK. It's not just xlib, however. I think (not 100% sure) that the same > problem arises whenever (ipc_back_fd == IPC_BACK_UNUSABLE), e.g. x11 output > from a script run non-interactively. Still on my list; haven't gotten to it yet. > > 6. Removing duplicate messages (such as duplicate consecutive V commands) sounds > > great. We should do it > OK. Low priority because it's relatively rare for normal plots. I did a first pass at this. Patch attached. Good news is that as expected, the traffic drops dramatically. Inboard timing drops from about 0.9s to about 0.45s. The outboard, however, drops from 0.85s to 0.01s! The reason the inboard didn't drop as much is that it still has to parse the original huge data file. I have some lingering concerns about the patch I'm attaching. I use ftell() to check to see that the V commands are indeed consecutive. This might have a non-negligible cost, so I'd check before committing this. At this point, my test case is clearly broken since we've been able to optimize away all its complexity. Is the same optimization valid for P commands? What would be a good real-world test case where this optimization isn't valid? I'm thinking of just generating a bunch of discrete points, and sending them over as P commands. That sounds good, right? > > 7. General thoughts about speeding up the inboard -> outboard link by making > > some things binary. I like binary. Making this switch will probably make some > > things break at first, but it'll likely be worth it. If we're touching that > > at all, more radical methods may be better. For instance, instead of a V > > command for each point, we could have a V command that predeclares a long > > stream of points. > We do. That's what the X11_POLYLINE code is for. That's not quite what I meant. The polyline code we have chunks up data from the V command we receive before sending it to X. I'm talking about doing a similar thing, but in the inboard->outboard link: chunk up the V commands before sending it across. I'll write and benchmark a patch in a bit. Are P and V commands the main ones to worry about? dima |
From: Ethan A M. <sf...@us...> - 2012-10-03 17:24:58
Attachments:
x11_nonredundancy.patch
|
On Wednesday, October 03, 2012 01:39:08 am Dima Kogan wrote: > > On Sat, 29 Sep 2012 17:07:51 -0700 > > Ethan Merritt <merritt@u.washington.edu> wrote: > > > > > 6. Removing duplicate messages (such as duplicate consecutive V commands) sounds > > > great. We should do it > > OK. Low priority because it's relatively rare for normal plots. > > I did a first pass at this. Patch attached. Good news is that as expected, the > traffic drops dramatically. Inboard timing drops from about 0.9s to about 0.45s. > The outboard, however, drops from 0.85s to 0.01s! The reason the inboard didn't > drop as much is that it still has to parse the original huge data file. I have > some lingering concerns about the patch I'm attaching. > I use ftell() to check to see that the V commands are indeed consecutive. > This might have a non-negligible cost, so I'd check before committing this. That's clever, but I agree that there is potential cost. Other terminal drivers do the same job without resorting to ftell(). The trick is that any command that potentially affects the current active position must either update or invalidate the inboard copy of x_last and y_last. For example, term->put_text() would set x_last = y_last = INVALID; /* #define INVALID -1 */ before leaving. The down side is that unlike your ftell() version this approach requires finding all the places that might affect current position. I've attached a first-pass patch that catches most of them, but I probably missed some. Other terminal drivers can serve as a model. > At this point, my test case is clearly broken since we've been able to optimize > away all its complexity. Right. I modified your original data generation script to produce longer vectors and some gaps, so the the plot would contain move commands as well as vector commands. This gives a more realistic mix of commands: perl -e 'for(0..2000000) \ { print "$_ " . sin($_/1000) . "\n"; print "\n" if $_ % 100 == 0; }' \ > breaks.ascii With this test data the reduction in size from removing redundant commands is less than 10%. I tested using the "uniq" command rather than patching the driver source code. 10% max didn't seem very significant to me, which was why I said it was low priority. > Is the same optimization valid for P commands? I don't think so. But it does apply to M commands. > I'm thinking of just generating a bunch of discrete points, and sending > them over as P commands. That sounds good, right? I don't think that the active position after drawing a point symbol is guaranteed to be at the center of the point. So in the sequence Move(x,y); Point(x,y); Move(x,y); Vector(x1,y1); the second Move is not redundant. Of course, we could change the code so that Point(x,y) it _is_ guaranteed to leave the active position at (x,y). > > > 7. General thoughts about speeding up the inboard -> outboard link by making > > > some things binary. I like binary. Making this switch will probably make some > > > things break at first, but it'll likely be worth it. If we're touching that > > > at all, more radical methods may be better. For instance, instead of a V > > > command for each point, we could have a V command that predeclares a long > > > stream of points. > > We do. That's what the X11_POLYLINE code is for. > > That's not quite what I meant. The polyline code we have chunks up data from the > V command we receive before sending it to X. I'm talking about doing a similar > thing, but in the inboard->outboard link: chunk up the V commands before sending > it across. I'll write and benchmark a patch in a bit. I doubt you'll see any gain. Since there is not a flush() command between successive writes to the output stream, the system should do this chunking for you anyhow. > Are P and V commands the main ones to worry about? M and V. Ethan |
From: Dima K. <gn...@di...> - 2012-10-19 12:46:34
|
> On Wed, 3 Oct 2012 10:21:19 -0700 > Ethan A Merritt <sf...@us...> wrote: > > On Wednesday, October 03, 2012 01:39:08 am Dima Kogan wrote: > > > On Sat, 29 Sep 2012 17:07:51 -0700 > > > Ethan Merritt <merritt@u.washington.edu> wrote: > > > > > > > 6. Removing duplicate messages (such as duplicate consecutive V commands) sounds > > > > great. We should do it > > > OK. Low priority because it's relatively rare for normal plots. > > > > I did a first pass at this. Patch attached. Good news is that as expected, the > > traffic drops dramatically. Inboard timing drops from about 0.9s to about 0.45s. > > The outboard, however, drops from 0.85s to 0.01s! The reason the inboard didn't > > drop as much is that it still has to parse the original huge data file. I have > > some lingering concerns about the patch I'm attaching. > > > I use ftell() to check to see that the V commands are indeed consecutive. > > This might have a non-negligible cost, so I'd check before committing this. > > That's clever, but I agree that there is potential cost. > Other terminal drivers do the same job without resorting to ftell(). > The trick is that any command that potentially affects the current > active position must either update or invalidate the inboard copy > of x_last and y_last. For example, term->put_text() would set > x_last = y_last = INVALID; /* #define INVALID -1 */ > before leaving. > > The down side is that unlike your ftell() version this approach requires > finding all the places that might affect current position. I've attached a > first-pass patch that catches most of them, but I probably missed some. > Other terminal drivers can serve as a model. > > > > At this point, my test case is clearly broken since we've been able to optimize > > away all its complexity. > > Right. I modified your original data generation script to produce longer > vectors and some gaps, so the the plot would contain move commands as well > as vector commands. This gives a more realistic mix of commands: > > perl -e 'for(0..2000000) \ > { print "$_ " . sin($_/1000) . "\n"; print "\n" if $_ % 100 == 0; }' \ > > breaks.ascii > > With this test data the reduction in size from removing redundant commands > is less than 10%. I tested using the "uniq" command rather than patching > the driver source code. 10% max didn't seem very significant to me, > which was why I said it was low priority. > > > Is the same optimization valid for P commands? > > I don't think so. But it does apply to M commands. > > > I'm thinking of just generating a bunch of discrete points, and sending > > them over as P commands. That sounds good, right? > > I don't think that the active position after drawing a point symbol is > guaranteed to be at the center of the point. So in the sequence > Move(x,y); Point(x,y); Move(x,y); Vector(x1,y1); > the second Move is not redundant. > > Of course, we could change the code so that Point(x,y) it _is_ guaranteed > to leave the active position at (x,y). OK. I revisited this (duplication suppression). Patch attached. I believe this optimization is equally applicable to P, M, and V. Note that this is all purely inboard, so there's no active position at all; that's an outboard concept. So for instance, a duplicated P command would normally draw the same point glyph multiple times in the same exact position, thus removing the duplication doesn't change the output. Tell me if I'm misunderstanding. I took your suggestion to keep track of pipe accesses myself, instead of using ftell. I'm troubled by the manual maintenance this requires, but the performance difference looks significant. The data file you generated in your post had fewer duplicates because it was a much higher frequency sinusoid, so it was aliasing heavily. I took measurements from a highly redundant sinusoid and a highly aliased one: No suppression at all: | | /1e6 (1% uniq) | /500 dataset (90% uniq) | |------+----------------+-------------------------| | user | 0.91 | 0.92 | | sys | 0.20 | 0.21 | Suppression with attached patch: | | /1e6 (1% uniq) | /500 dataset (90% uniq) | |------+----------------+-------------------------| | user | 0.40 | 0.89 | | sys | 0.14 | 0.19 | Suppression with ftell: | | /1e6 (1% uniq) | /500 dataset (90% uniq) | |------+----------------+-------------------------| | user | 0.46 | 1.75 | | sys | 0.21 | 4.3 | Conclusions: 1. ftell() is way too slow 2. the code with the attached patch is significantly faster than before in the best case, and about the same in the worst case Next, I'm going to look into binary communication again. dima |
From: sfeam (E. Merritt) <eam...@gm...> - 2012-10-20 16:41:48
|
On Friday, 19 October 2012, Dima Kogan wrote: > > On Wed, 3 Oct 2012 10:21:19 -0700 > > Ethan A Merritt <sf...@us...> wrote: > > > > On Wednesday, October 03, 2012 01:39:08 am Dima Kogan wrote: > > > > On Sat, 29 Sep 2012 17:07:51 -0700 > > > > Ethan Merritt <merritt@u.washington.edu> wrote: > > > > > > > > > 6. Removing duplicate messages (such as duplicate consecutive V commands) sounds > > > > > great. We should do it > > > > OK. Low priority because it's relatively rare for normal plots. > > > > > > I did a first pass at this. Patch attached. Good news is that as expected, the > > > traffic drops dramatically. Inboard timing drops from about 0.9s to about 0.45s. > > > The outboard, however, drops from 0.85s to 0.01s! The reason the inboard didn't > > > drop as much is that it still has to parse the original huge data file. I have > > > some lingering concerns about the patch I'm attaching. > > > > > I use ftell() to check to see that the V commands are indeed consecutive. > > > This might have a non-negligible cost, so I'd check before committing this. > > > > That's clever, but I agree that there is potential cost. > > Other terminal drivers do the same job without resorting to ftell(). > > The trick is that any command that potentially affects the current > > active position must either update or invalidate the inboard copy > > of x_last and y_last. For example, term->put_text() would set > > x_last = y_last = INVALID; /* #define INVALID -1 */ > > before leaving. > > > > The down side is that unlike your ftell() version this approach requires > > finding all the places that might affect current position. I've attached a > > first-pass patch that catches most of them, but I probably missed some. > > Other terminal drivers can serve as a model. > > > > > > > At this point, my test case is clearly broken since we've been able to optimize > > > away all its complexity. > > > > Right. I modified your original data generation script to produce longer > > vectors and some gaps, so the the plot would contain move commands as well > > as vector commands. This gives a more realistic mix of commands: > > > > perl -e 'for(0..2000000) \ > > { print "$_ " . sin($_/1000) . "\n"; print "\n" if $_ % 100 == 0; }' \ > > > breaks.ascii > > > > With this test data the reduction in size from removing redundant commands > > is less than 10%. I tested using the "uniq" command rather than patching > > the driver source code. 10% max didn't seem very significant to me, > > which was why I said it was low priority. > > > > > Is the same optimization valid for P commands? > > > > I don't think so. But it does apply to M commands. > > > > > I'm thinking of just generating a bunch of discrete points, and sending > > > them over as P commands. That sounds good, right? > > > > I don't think that the active position after drawing a point symbol is > > guaranteed to be at the center of the point. So in the sequence > > Move(x,y); Point(x,y); Move(x,y); Vector(x1,y1); > > the second Move is not redundant. > > > > Of course, we could change the code so that Point(x,y) it _is_ guaranteed > > to leave the active position at (x,y). > > > OK. I revisited this (duplication suppression). Patch attached. I believe this > optimization is equally applicable to P, M, and V. Note that this is all purely > inboard, so there's no active position at all; that's an outboard concept. So > for instance, a duplicated P command would normally draw the same point glyph > multiple times in the same exact position, thus removing the duplication doesn't > change the output. Tell me if I'm misunderstanding. I think you are not unstanding what I was trying to say. The issue is not whether repeated P commands are redundant, the question is whether or not a M command is needed after the P. Scenario: one might think that a series of points connected by lines could be drawn as P(x1,y1) V(x2,y2) P(x2,y2) V(x3,y3) ... But that doesn't work because P(x1,y1) doesn't leave the current position at (x1,y1). So instead one needs to do P(x1,y1) M(x1,y1) V(x2,y2) P(x2,y2) M(x2,y2) V(x3,y3) ... My point is that all the M commands may seem redundant but they are not. [NB: This is not the ordering produced by "with linespoints"] > Conclusions: > > 1. ftell() is way too slow > 2. the code with the attached patch is significantly faster than before in the > best case, and about the same in the worst case I've been too busy to have a serious look at your non-redundancy patch, but at first glance it looks more complicated than necessary. Do you really need to set X11_IPC_LASTDATARUN_NONE for every single command that doesn't change the current position? Other terminal drivers manage just fine without this. If the small set of commands that _do_ change the position (M, V, P, T, ??) track the current position then no one else needs to care. Ethan > Next, I'm going to look into binary communication again. > > dima > |
From: Dima K. <gn...@di...> - 2012-10-21 00:46:45
|
> > OK. I revisited this (duplication suppression). Patch attached. I believe this > > optimization is equally applicable to P, M, and V. Note that this is all purely > > inboard, so there's no active position at all; that's an outboard concept. So > > for instance, a duplicated P command would normally draw the same point glyph > > multiple times in the same exact position, thus removing the duplication doesn't > > change the output. Tell me if I'm misunderstanding. > > I think you are not unstanding what I was trying to say. The issue is not whether > repeated P commands are redundant, the question is whether or not a M command is > needed after the P. Scenario: one might think that a series of points connected > by lines could be drawn as > P(x1,y1) V(x2,y2) P(x2,y2) V(x3,y3) ... > But that doesn't work because P(x1,y1) doesn't leave the current position > at (x1,y1). So instead one needs to do > P(x1,y1) M(x1,y1) V(x2,y2) P(x2,y2) M(x2,y2) V(x3,y3) ... > My point is that all the M commands may seem redundant but they are not. > [NB: This is not the ordering produced by "with linespoints"] > > > Conclusions: > > > > 1. ftell() is way too slow > > 2. the code with the attached patch is significantly faster than before in the > > best case, and about the same in the worst case > > I've been too busy to have a serious look at your non-redundancy patch, > but at first glance it looks more complicated than necessary. > Do you really need to set X11_IPC_LASTDATARUN_NONE for every single > command that doesn't change the current position? > Other terminal drivers manage just fine without this. > If the small set of commands that _do_ change the position (M, V, P, T, ??) > track the current position then no one else needs to care. You're right, I wasn't fully understanding what you meant. The optimization implemented in the patch I attached is a bit simpler than what you're describing. It only looks for identical, consecutive M, P or V commands, and suppresses any found duplicates. If ANY command is seen between successive P commands, say, nothing is suppressed. So for instance, P 1 2 3 M 3 4 5 P 1 2 3 would NOT suppress anything because there's an M between the two Ps. To make this optimization I need to remember what the last command was, and its arguments. The enum is the command type. To be clear, in the patch I attached, only duplicate commands are removed, thus the M is always preserved after the P in your example. The optimization you're describing is more powerful. Do you think there will be a noticeable difference between the two approaches with real-world data? The main thing I don't like about the patch is all the X11_IPC_LASTDATARUN_NONE resets, but I can't think of a better way to eliminate these without resorting to ftell() or redefining printf(). It's likely that I've gone way overboard on the resets, and you can get away with them in just the few places you mention (M, V, P, T, ...). Let me know what you think. For the record, I did find a way to handle this without requiring the manual X11_IPC_LASTDATARUN_NONE reset or suffering the performance hits of doing an ftell. One can redefine the functions that interact with the pipe: static int X11_ipc_counter; static int _fprintf_return, _fputs_return, _fputc_return; #define fprintf(stream, format, ...) \ ( _fprintf_return = fprintf(stream, format, ## __VA_ARGS__ ), \ (( stream == X11_ipc ) && X11_ipc_counter++), \ _fprintf_return ) #define fputs(s, stream) \ ( _fputs_return = fputs(s, stream), \ (( stream == X11_ipc ) && X11_ipc_counter++), \ _fputs_return ) #define fputc(c, stream) \ ( _fputc_return = fputc(c, stream), \ (( stream == X11_ipc ) && X11_ipc_counter++), \ _fputc_return ) This is sorta weird, but serves to automatically keep track of the pipe position. The extra code then is localized to the functions that need the duplicate suppression (X11_vector, X11_point, etc). I.e. no resets are required anywhere, like with ftell(), and the performance is as good as with the attached patch. The downsides are that it 1. uses a C99-only construct (__VA_ARGS__ in the preprocessor) 2. uses a gcc-ism: , ## __VA_ARGS__ 3. adds an unexpected surprise to anybody looking at the code, since a plain-looking fprintf() now does more than an uninitiated reader would expect. Maybe these are acceptable, or somebody can see a way to work around these downsides. dima |
From: Dima K. <gn...@di...> - 2012-10-21 05:09:02
|
> On Sat, 20 Oct 2012 23:23:35 -0500 > Daniel J Sebald <dan...@ie...> wrote: > > Dima, > > Is this optimization something that should happen? The duplicate moves > is no issue. However, I'm not completely comfortable with tossing out > duplicate P and duplicate V, in a general sense. I think in x11's > current form throwing out duplicates won't cause an issue. However, > from the general viewpoint, it probably isn't good practice. > > Imagine I have some results with data points (0.123,0.321) (0,0) (0,0) > (0.654,0.456). The data appearing on the graph is then (0.123,0.321) > (0,0) (0.654,0.456). Say my terminal can create a file version of the > graph, such as Qt currently does. Then I have some secondary program > that can import that graph and somehow pull out the data points for > processing (e.g., statistics). The resulting data would be missing one > of the original samples, thereby throwing off statistics. I have no > specific example, but I'm imagining things like CAD programs with 2D > splines interpolation that might have a point on top of another. Or, > think of the googlemaps where one can interactively drag points around. I'm only looking at the x11 terminal, and have no intentions of applying this elsewhere, although this could certainly be done if one so desires. I would argue that the main purpose of gnuplot is visualization, and we're allowed to interpret the input in whichever way we like, as long as the visualization result doesn't change. If somebody wants to manipulate the data, they should be manipulating the input data, NOT the gnuplot output. At the very least this is true of the x11 terminal. On top of that, the terminal already makes modifications to the input: scaling and quantizing to the terminal coordinates. > Is it correct that the scenario you have in mind is where the user plots > a relatively low frequency function and extremely oversamples that > function? This optimization is then sort of correcting something the > user should know better about. How often will this situation arise? Yes, this is for heavily oversampled data. Most of the time this optimization would do nothing, probably, but for particular data sets, the performance gains are pretty big (see some earlier posts in this thread). This optimization is pretty low-hanging fruit, so I think we should apply it. If I want to plot a huge, smooth data set, it'd be great if I didn't have to manually downsample it first, and if gnuplot was able to efficiently deal with it all by itself. dima |