Re: [Plplot-devel] Some remaining wxwidgets inefficiency concerns for examples 17 and 08

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

On 2017-10-04 12:11+0100 Phil Rosenberg wrote:

Note, I am including everything you wrote here (as opposed to dropping
parts of it) because the list has
not seen what you wrote because of the plplot-devel address that I flubbed.

I also respond in a few places below.

> On 4 October 2017 at 05:54, Alan W. Irwin <ir...@be...> wrote:
>> On 2017-10-03 23:44+0100 Phil Rosenberg wrote:
>>
>>> On Windows the fill test took 5 seconds using the old comms method and
>>> 12 with the new. That's with optimisations turned on and I just timed
>>> it with my phone stopwatch from the point where I hit enter after
>>> choosing the driver.
>>>
>>> Interestingly I ran the viewer in a profiler to see why the
>>> differences. Running the 3sem version first, it spent almost all its
>>> time in a GDI rendering function, so no reason to think that the
>>> different comms would make any difference. However, when I profiled
>>> the old comms method, the profiler showed that the viewer spent all
>>> it's time in a different GDI rendering function - this time called
>>> NtGDIPolyPolyDraw. I saw something in wxWidgets the other day. This
>>> was a function also called something like PolyPolygonFill and it said
>>> that using this function plotting many polygons at once was faster
>>> than plotting them all individually. So I am going to guess that GDI
>>> maybe has some runtime optimisation or something and it was able to
>>> better optimise the old comms than the new 3sem one. Maybe the
>>> polygons arrive more rapidly?????
>>
>>
>> That's an interesting comparison, and it sure is a surprise that the
>> IPC method affects how the GDI rendering is optimized.  My bet is it
>> has nothing to do with specifically how the data are transmitted and
>> assembled, and instead that difference in GDI rendering optimization
>> is due to some "minor" difference in the code paths between IPC3 and
>> non-IPC3 case on the viewer side.  In other words, instead of looking
>> at transmitBytes and receiveBytes details, I think you should be
>> looking for IPC3 and non-IPC3 differences in utils/wxplframe.cpp
>> concerning how wxPlFrame::ReadTransmission() is called and also the
>> large number of IPC3 versus non-IPC3 code-path differences within that
>> routine.
>>
>> Since the above is an interesting comparison I have decided to add
>> it to my results as well.
>>
>> Just to be clear about nomenclature,
>>
>> IPC3 wxwidgets is what I previously called default wxwidgets and which you
>> have called new comms.  You get that by default or by
>> specifying -DOLD_WXWIDGETS=OFF -DPL_WXWIDGETS_IPC3=ON
>>
>> The non-IPC3 wxwidgets result I have added is what you have called
>> old comms.  You get that by specifying -DOLD_WXWIDGETS=OFF
>> -DPL_WXWIDGETS_IPC3=OFF
>>
>> The old wxwidgets result corresponds to Werner's wxwidgets-related
>> software as updated by you until you decided to do completely rewrite
>> that software.  You get that by specifing -DOLD_WXWIDGETS=ON
>>
>> So here is my old timing result table with non-IPC3 wxwidgets timings added
>> where those added timings are defined in exactly the same way and with
>> the same compiler options as the others.
>>
>> device              plline test    plfill test
>> IPC3 wxwidgets      26  seconds    32  seconds
>> non-IPC3 wxwidgets  27  seconds    32  seconds
>> old wxwidgets       18  seconds    30  seconds
>> xcairo              1.4 seconds    2.2 seconds
>> qtwidget            1.5 seconds    1.6 seconds
>> xwin                9.5 seconds    3.4 seconds
>>
>> So on Linux there is no significant measured time difference between
>> what you call new comms (IPC3) and old coms (non-IPC3) contrary to
>> your results on MSVC Windows.
>>
>> So just one timing comparison like you did on a given platform is
>> tricky to generalize, and to get a better idea of what is going on for
>> a given platform it is a good idea to get as many comparisons as
>> possible. Therefore, could you please fill out a similar table to the
>> above with the first 3 devices the same and the last two for wingcc
>> and wingdi?  For example, if the three wxwidgets variants are roughly
>> the same speed as wingcc and wingdi, then it is likely there is
>> some remaining efficiency issue that just occurs for the Linux case.
>> But if on your platform all wxwidgets variants are roughly an order of
>> magnitude slower
>> than wingcc and wingdi, then we likely have a cross-platform efficiency
>> issue
>> with -dev wxwidgets.
>
> For some reason I cannot build wingcc or wingdi, they do not come up
> as enabled on my system when I run cmake. I have never looked into why
> as I don't use them.

I will say more on this topic separately, but for now then
please fill in the first three rows since you do have access to
all those variations of wxwidgets.

>
>>
>>> I wonder why so slow on Linux?
>>
>>
>> I have been wondering about that issue forever.... :-)
>>
>> More seriously though, it is certainly possible there is a unique
>> inefficiency issue on Linux that makes all IPC3 versus non-IPC3
>> comparisons look identical in (very slow) speed.  Also, as you know
>> such cross-platform time comparisons are notoriously unreliable since
>> we have different hardware, different underlying graphics systems
>> which wxwidgets necessarily wraps in extremely different ways,
>> different wxwidgets releases (probably), different compilers, and
>> different levels of optimizations of libraries and PLplot.  So I would
>> prefer to reserve judgement on MSVC Windows versus Linux comparisons
>> until you fill in the rest of the requested table, and probably only
>> pay attention to the relative results even then rather than the
>> absolute results.
>>
>> By the way, I should have mentioned the above table was created
>> with the current HEAD of master branch (commit 124a0c3) with
>> no local changes (other than the two different patches
>> to examples/c/x00c.c to produce the plline and plfill tests above).
>> So when you produce your table would you be sure to do the same?
>>
>>> Do you have a profiler you can use?
>>> Again if you uncomment #define WXPLVIEWER_DEBUG, then set the example
>>> running normally it will display the command line params that you can
>>> use to execute wxPLViewer in a profiler to see where it is spending
>>> its time. There really is no other good way to work out the timings
>>> other than by using a profiler as there are so many unexpected
>>> optimisations that can happen.
>>
>>
>> I have never done profiling, but I agree this is an excellent idea
>> both for core and viewer for the 6 simple examples (plline and
>> plfill for the three wxwidgets variants).
>>
>> I am quite familiar with valgrind so I am thinking of using callgrind
>> <https://www.cs.cmu.edu/afs/cs.cmu.edu/project/cmt-40/Nice/RuleRefinement/bin/valgrind-3.2.0/docs/html/cl-manual.html>
>> to do the profiling.
>>
>> What do you think of that callgrind description and have you heard any
>> caveats/kudos about it?
>>
>> One caveat with valgrind (and presumably callgrind) is identification
>> of source code lines depends on the -g option symbols being available
>> for the library. For wxwidgets, Debian apparently provides those
>> symbols in separate packaged form, e.g., package libwxbase3.0-0-dbg,
>> and my extrapolation from some web discussion is such
>> wxWidgets-related *-dbg packages will automatically allow me to
>> profile (with source-code line identifications) the official wxWidgets
>> Debian libraries.  But I will see.
>
> I think that is a feature of all profilers and debuggers. But I got
> the impression that most linux libraries were distributed with the
> debug information. Maybe I'm mistaken. I can't really comment on
> callgrind. I chose to use a tool called very sleepy. It was
> recommended as very easy to use. It is a graphical tool - I simply
> enter a command to run or select a currently running process and it
> repeatedly checks over and over which function the current execution
> path is in and which line of code it is executing until either the
> process ends or you tell it to stop. Then I can view a heirarchy of %
> time spent in each function or load a page of code and see time spent
> on each line.  I think some profilers work a bit like debuggers
> tracking function calls. I don't know which is better. I've only used
> very sleepy and found it perfect for my needs so never changed. Visual
> Studio now has a profiler built in, but I haven't played with it
> really other than that it shows diagnostics like CPU and memory usage
> when I run stuff from the IDE.
>
> Of course beware - your optimiser will agressively inline things, so I
> would often find that whole classes had 0 execution time because they
> had been totally optimised away. This is of course a good thing as it
> is the optimiser doing its job, but it just means a little care must
> be taken when interpretting profiler info.

OK.  Thanks for that profiling advice.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state
implementation for stellar interiors (freeeos.sf.net); the Time
Ephemerides project (timeephem.sf.net); PLplot scientific plotting
software package (plplot.sf.net); the libLASi project
(unifont.org/lasi); the Loads of Linux Links project (loll.sf.net);
and the Linux Brochure Project (lbproject.sf.net).
__________________________

Linux-powered Science
__________________________

Re: [Plplot-devel] Some remaining wxwidgets inefficiency concerns for examples 17 and 08

Cross-platform, scientific graphics plotting library

Re: [Plplot-devel] Some remaining wxwidgets inefficiency concerns for examples 17 and 08