From: Alan W. I. <ir...@be...> - 2017-10-04 17:06:53
|
On 2017-10-04 12:11+0100 Phil Rosenberg wrote: Note, I am including everything you wrote here (as opposed to dropping parts of it) because the list has not seen what you wrote because of the plplot-devel address that I flubbed. I also respond in a few places below. > On 4 October 2017 at 05:54, Alan W. Irwin <ir...@be...> wrote: >> On 2017-10-03 23:44+0100 Phil Rosenberg wrote: >> >>> On Windows the fill test took 5 seconds using the old comms method and >>> 12 with the new. That's with optimisations turned on and I just timed >>> it with my phone stopwatch from the point where I hit enter after >>> choosing the driver. >>> >>> Interestingly I ran the viewer in a profiler to see why the >>> differences. Running the 3sem version first, it spent almost all its >>> time in a GDI rendering function, so no reason to think that the >>> different comms would make any difference. However, when I profiled >>> the old comms method, the profiler showed that the viewer spent all >>> it's time in a different GDI rendering function - this time called >>> NtGDIPolyPolyDraw. I saw something in wxWidgets the other day. This >>> was a function also called something like PolyPolygonFill and it said >>> that using this function plotting many polygons at once was faster >>> than plotting them all individually. So I am going to guess that GDI >>> maybe has some runtime optimisation or something and it was able to >>> better optimise the old comms than the new 3sem one. Maybe the >>> polygons arrive more rapidly????? >> >> >> That's an interesting comparison, and it sure is a surprise that the >> IPC method affects how the GDI rendering is optimized. My bet is it >> has nothing to do with specifically how the data are transmitted and >> assembled, and instead that difference in GDI rendering optimization >> is due to some "minor" difference in the code paths between IPC3 and >> non-IPC3 case on the viewer side. In other words, instead of looking >> at transmitBytes and receiveBytes details, I think you should be >> looking for IPC3 and non-IPC3 differences in utils/wxplframe.cpp >> concerning how wxPlFrame::ReadTransmission() is called and also the >> large number of IPC3 versus non-IPC3 code-path differences within that >> routine. >> >> Since the above is an interesting comparison I have decided to add >> it to my results as well. >> >> Just to be clear about nomenclature, >> >> IPC3 wxwidgets is what I previously called default wxwidgets and which you >> have called new comms. You get that by default or by >> specifying -DOLD_WXWIDGETS=OFF -DPL_WXWIDGETS_IPC3=ON >> >> The non-IPC3 wxwidgets result I have added is what you have called >> old comms. You get that by specifying -DOLD_WXWIDGETS=OFF >> -DPL_WXWIDGETS_IPC3=OFF >> >> The old wxwidgets result corresponds to Werner's wxwidgets-related >> software as updated by you until you decided to do completely rewrite >> that software. You get that by specifing -DOLD_WXWIDGETS=ON >> >> So here is my old timing result table with non-IPC3 wxwidgets timings added >> where those added timings are defined in exactly the same way and with >> the same compiler options as the others. >> >> device plline test plfill test >> IPC3 wxwidgets 26 seconds 32 seconds >> non-IPC3 wxwidgets 27 seconds 32 seconds >> old wxwidgets 18 seconds 30 seconds >> xcairo 1.4 seconds 2.2 seconds >> qtwidget 1.5 seconds 1.6 seconds >> xwin 9.5 seconds 3.4 seconds >> >> So on Linux there is no significant measured time difference between >> what you call new comms (IPC3) and old coms (non-IPC3) contrary to >> your results on MSVC Windows. >> >> So just one timing comparison like you did on a given platform is >> tricky to generalize, and to get a better idea of what is going on for >> a given platform it is a good idea to get as many comparisons as >> possible. Therefore, could you please fill out a similar table to the >> above with the first 3 devices the same and the last two for wingcc >> and wingdi? For example, if the three wxwidgets variants are roughly >> the same speed as wingcc and wingdi, then it is likely there is >> some remaining efficiency issue that just occurs for the Linux case. >> But if on your platform all wxwidgets variants are roughly an order of >> magnitude slower >> than wingcc and wingdi, then we likely have a cross-platform efficiency >> issue >> with -dev wxwidgets. > > For some reason I cannot build wingcc or wingdi, they do not come up > as enabled on my system when I run cmake. I have never looked into why > as I don't use them. I will say more on this topic separately, but for now then please fill in the first three rows since you do have access to all those variations of wxwidgets. > >> >>> I wonder why so slow on Linux? >> >> >> I have been wondering about that issue forever.... :-) >> >> More seriously though, it is certainly possible there is a unique >> inefficiency issue on Linux that makes all IPC3 versus non-IPC3 >> comparisons look identical in (very slow) speed. Also, as you know >> such cross-platform time comparisons are notoriously unreliable since >> we have different hardware, different underlying graphics systems >> which wxwidgets necessarily wraps in extremely different ways, >> different wxwidgets releases (probably), different compilers, and >> different levels of optimizations of libraries and PLplot. So I would >> prefer to reserve judgement on MSVC Windows versus Linux comparisons >> until you fill in the rest of the requested table, and probably only >> pay attention to the relative results even then rather than the >> absolute results. >> >> By the way, I should have mentioned the above table was created >> with the current HEAD of master branch (commit 124a0c3) with >> no local changes (other than the two different patches >> to examples/c/x00c.c to produce the plline and plfill tests above). >> So when you produce your table would you be sure to do the same? >> >>> Do you have a profiler you can use? >>> Again if you uncomment #define WXPLVIEWER_DEBUG, then set the example >>> running normally it will display the command line params that you can >>> use to execute wxPLViewer in a profiler to see where it is spending >>> its time. There really is no other good way to work out the timings >>> other than by using a profiler as there are so many unexpected >>> optimisations that can happen. >> >> >> I have never done profiling, but I agree this is an excellent idea >> both for core and viewer for the 6 simple examples (plline and >> plfill for the three wxwidgets variants). >> >> I am quite familiar with valgrind so I am thinking of using callgrind >> <https://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-40/Nice/RuleRefinement/bin/valgrind-3.2.0/docs/html/cl-manual.html> >> to do the profiling. >> >> What do you think of that callgrind description and have you heard any >> caveats/kudos about it? >> >> One caveat with valgrind (and presumably callgrind) is identification >> of source code lines depends on the -g option symbols being available >> for the library. For wxwidgets, Debian apparently provides those >> symbols in separate packaged form, e.g., package libwxbase3.0-0-dbg, >> and my extrapolation from some web discussion is such >> wxWidgets-related *-dbg packages will automatically allow me to >> profile (with source-code line identifications) the official wxWidgets >> Debian libraries. But I will see. > > I think that is a feature of all profilers and debuggers. But I got > the impression that most linux libraries were distributed with the > debug information. Maybe I'm mistaken. I can't really comment on > callgrind. I chose to use a tool called very sleepy. It was > recommended as very easy to use. It is a graphical tool - I simply > enter a command to run or select a currently running process and it > repeatedly checks over and over which function the current execution > path is in and which line of code it is executing until either the > process ends or you tell it to stop. Then I can view a heirarchy of % > time spent in each function or load a page of code and see time spent > on each line. I think some profilers work a bit like debuggers > tracking function calls. I don't know which is better. I've only used > very sleepy and found it perfect for my needs so never changed. Visual > Studio now has a profiler built in, but I haven't played with it > really other than that it shows diagnostics like CPU and memory usage > when I run stuff from the IDE. > > Of course beware - your optimiser will agressively inline things, so I > would often find that whole classes had 0 execution time because they > had been totally optimised away. This is of course a good thing as it > is the optimiser doing its job, but it just means a little care must > be taken when interpretting profiler info. OK. Thanks for that profiling advice. Alan __________________________ Alan W. Irwin Astronomical research affiliation with Department of Physics and Astronomy, University of Victoria (astrowww.phys.uvic.ca). Programming affiliations with the FreeEOS equation-of-state implementation for stellar interiors (freeeos.sf.net); the Time Ephemerides project (timeephem.sf.net); PLplot scientific plotting software package (plplot.sf.net); the libLASi project (unifont.org/lasi); the Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project (lbproject.sf.net). __________________________ Linux-powered Science __________________________ |