|
From: Philippe W. <phi...@sk...> - 2014-07-25 06:07:43
|
The 2 nightly builds that have perf comparison enabled (gcc110/fedora20/ppc64, sless390/suse11/s390) have huge difference in performance between valgrind-old and valgrind-new. (the pattern of what slows down is not always the same but both shows big regressions e.g. in massif or memcheck). massif seems consistenly slowed down on s390 and ppc64. We also have other huge slowdown at various places. I re-measured on gcc110/ppc64 the memcheck/heap_pdb4 slowdown. heap_pdb4 on my old pentium seems also to slowdown significantly for massif (40%). So, it looks like a commit between r14183 and 14190 causes this. They are a bunch of changes (we work too hard :) so several candidates are possible. Intuitively, the biggest change (the VexendNess change) is the first candidate to look at. Philippe NB: we only have a ppc64 and s390 "perf measurement enabled" nightly build, which is maybe not good enough (we are lacking many platforms, in particular x86 and amd64). Any volunteers having computers doing nothing most of the time ? :) |
|
From: Tom H. <to...@co...> - 2014-07-25 06:12:10
|
On 25/07/14 07:07, Philippe Waroquiers wrote: > NB: we only have a ppc64 and s390 "perf measurement enabled" nightly > build, which is maybe not good enough (we are lacking many platforms, > in particular x86 and amd64). > Any volunteers having computers doing nothing most of the time ? > :) Well I haven't deliberately disabled anything for any of my test systems, and this is the first I've heard of the idea of a "perf measurement enabled" build. What exactly does this involve? Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Philippe W. <phi...@sk...> - 2014-07-25 06:17:32
|
On Fri, 2014-07-25 at 07:11 +0100, Tom Hughes wrote: > On 25/07/14 07:07, Philippe Waroquiers wrote: > > > NB: we only have a ppc64 and s390 "perf measurement enabled" nightly > > build, which is maybe not good enough (we are lacking many platforms, > > in particular x86 and amd64). > > Any volunteers having computers doing nothing most of the time ? > > :) > > Well I haven't deliberately disabled anything for any of my test > systems, and this is the first I've heard of the idea of a "perf > measurement enabled" build. > > What exactly does this involve? The gcc110 conf file I am using is: export ABT_DETAILS="`head -1 /etc/issue`, `uname -m`" export ABT_JOBS=16 export ABT_PERF="--vg=../valgrind-new --vg=../valgrind-old" There are other vars you can set to configure the perf tests e.g. depending on the cpu you want to allow for such perf, you can run the perf for a subset of the tools, or do less 'repeats', or only valgrind-new or ... See nightly/README.txt for more details. Philippe |
|
From: Philippe W. <phi...@sk...> - 2014-07-25 06:30:26
|
On Fri, 2014-07-25 at 07:11 +0100, Tom Hughes wrote: > Well I haven't deliberately disabled anything for any of my test > systems, and this is the first I've heard of the idea of a "perf > measurement enabled" build. > > What exactly does this involve? I saw I only replied in terms of configuration, while that of course involves also cpu consumption :). In terms of cpu, if you configure to run for both valgrind-old and valgrind-new (easier to compare), with the rest being defaulted, I guess it costs between 30 minutes to 2 or 3 hours of cpu, depending on the computer (gcc110 is one hour, the suse/s390 uses somewhat more. I guess an ARM system will take 3 hours or so). Modern intel should be faster (30 minutes ? 45 minutes) ? The cpu time spent for perf is given in the nightly mail, after the comparison Philippe |
|
From: Philippe W. <phi...@sk...> - 2014-07-25 19:37:10
|
On Fri, 2014-07-25 at 08:07 +0200, Philippe Waroquiers wrote:
> Intuitively, the biggest change (the VexendNess change) is
> the first candidate to look at.
And as extremely often with performance related subjects,
intuition gives reliable guesses, as reliable as
interpreting the way the birds fly
or as a crystal ball,
or interpret coffee grounds
and other similar scientific ways to predict perf impact.
So, it is not at all the VexendNess change.
Rather from (manual) bisect, the regression is caused by r14186,
which is given below.
The perf problem is created because the new
implementation of VG_(strncpy_safely) "paints" complete big
arrays of chars with null bytes.
strncpy man page:
If the length of src is less than n, strncpy() pads the remainder of
dest with null bytes.
So, VG_(strncpy_safely) was not compliant with strncpy manual,
and is now compliant on that padding aspect, but causes a huge
performance regression (and explicitely differs in behaviour from
strncpy/VG_(strncpy)).
svn diff -r 14185:14186
Index: coregrind/m_debuginfo/debuginfo.c
===================================================================
--- coregrind/m_debuginfo/debuginfo.c (revision 14185)
+++ coregrind/m_debuginfo/debuginfo.c (revision 14186)
@@ -1870,7 +1870,6 @@
&& di->text_avma <= a
&& a < di->text_avma + di->text_size) {
VG_(strncpy_safely)(buf, di->fsm.filename, nbuf);
- buf[nbuf-1] = 0;
return True;
}
}
Index: coregrind/m_libcbase.c
===================================================================
--- coregrind/m_libcbase.c (revision 14185)
+++ coregrind/m_libcbase.c (revision 14186)
@@ -304,14 +304,8 @@
{
libcbase_assert(ndest > 0);
- SizeT i = 0;
- while (True) {
- dest[i] = 0;
- if (src[i] == 0) return;
- if (i >= ndest-1) return;
- dest[i] = src[i];
- i++;
- }
+ VG_(strncpy)(dest, src, ndest);
+ dest[ndest - 1] = '\0';
}
HChar* VG_(strncpy) ( HChar* dest, const HChar* src, SizeT ndest )
|
|
From: Florian K. <fl...@ei...> - 2014-07-26 08:21:40
|
On 25.07.2014 21:37, Philippe Waroquiers wrote: > On Fri, 2014-07-25 at 08:07 +0200, Philippe Waroquiers wrote: >> Intuitively, the biggest change (the VexendNess change) is >> the first candidate to look at. > And as extremely often with performance related subjects, > intuition gives reliable guesses, as reliable as > interpreting the way the birds fly > or as a crystal ball, > or interpret coffee grounds > and other similar scientific ways to predict perf impact. > > So, it is not at all the VexendNess change. > > Rather from (manual) bisect, the regression is caused by r14186, > which is given below. > Thanks for investigating this. I'm working on a patch which attempts to eliminate these large buffers that are meant to hold function names (and file names). Those have caused usability issues in the past e.g. BZ 155125. That patch should eliminate many of these strncpy calls and therefore help with the perfprmance regression. But let's see.. Florian |