You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
1
|
|
2
|
3
|
4
|
5
(2) |
6
(1) |
7
|
8
|
|
9
(1) |
10
(1) |
11
|
12
|
13
|
14
(3) |
15
(4) |
|
16
(4) |
17
(2) |
18
(18) |
19
|
20
|
21
(7) |
22
|
|
23
(2) |
24
(3) |
25
(1) |
26
(5) |
27
(12) |
28
(1) |
29
(2) |
|
30
(4) |
31
|
|
|
|
|
|
|
From: Nicholas N. <nj...@ca...> - 2003-03-29 23:29:38
|
On Sat, 29 Mar 2003, Julian Seward wrote:
> A big bottleneck (we surmise, but haven't accurately measured)
> for skins which deal with A bits, is handling %ESP changes.
> Consider a source push instr:
>
> 0x40184623: pushl %esi
>
> 10: GETL %ESI, t8
> 11: GETL %ESP, t12
> 12: MOVL t12, t10
> 13: SUBL $0x4, t10
> 14: PUTL t10, %ESP
> 15: STL t8, (t10)
> 16: INCEIPo $1
>
> Eventually there is, a result of this, a call to VG_(handle_esp_assignment),
> assuming the skin has asked for shadow memory (aiui).
It gets called if any of the following events are tracked by the skin:
new_mem_stack, new_mem_stack_aligned, die_mem_stack, die_mem_stack_aligned.
> This passes the old and new %ESP values, and in
> VG_(handle_esp_assignment) these are subtracted to discover that the
> stack delta is -4. Then we do
>
> VG_TRACK(new_mem_stack_aligned, new_esp, -delta);
>
> which necessarily involve a loop, and probably a test on the sign of
> delta, in the relevant skin's permissions-mangling function.
>
> This is all rather inefficient considering that we knew from the start
> that delta would be -4, and it would have been better to plant, in the
> generated code, a call directly to a skin function specialised to a delta
> of -4. Most deltas are small (+4, -4, +16, -16) and so a small bunch of
> specialised functions + a general fallback case would probably constitute
> a significant improvement.
>
> The improvements would be because (1) the specialised functions have no
> need to test the sign of delta nor have a loop to handle all values of
> it, and (2) because we could save having to call VG_(handle_esp_assignment)
> and then call onwards to the skin function; we could just call the skin
> function directly.
You're right about the inefficiency. I just tried altering Nulgrind so it
tracked the four events I mentioned above, where the called functions were
empty. My one test program slowed down from 0.20s to 0.35s, on a longer
run it went from 1.0s to 2.4s. Just to make sure it wasn't simply the
function call overhead, I tried instead inserting calls to an empty
function wherever %esp was updated; my test program (small input) only
slowed to 0.24s. So yes, the current mechanism is slow, well done for
spotting it.
[It's weird, something is wrong with profiling which is maybe why this
hasn't been spotted before -- there is an event VgpStack for measuring
exactly this stack-adjustment overhead, but I always get zero ticks for
it. I even put in a small busy loop into VG_(handle_esp_assignment)() so
that the program ran ten times longer than normal, 90% of that in the
VG_(handle_esp_assignment), and it still gave zero ticks (the ticks were
allocated to "running"). I'll investigate that when I look at this in
more detail.]
Your idea is quite plausible. Basically the skins would bypass the core's
built-in way of handling stack updates in order to do them itself, because
it can be more efficient that way.
Which makes me think: if we want to make this improvement for
{Mem,Addr}check, any other skin that tracks %esp changes probably wants it
as well. So let's try to improve the general mechanism rather than
implementing a {Mem,Addr}check-only optimisation.
The current approach is simplest from a code-generation point of view;
doing it this optimised way would require a little bit of smartness to
work out the delta (you have to look back a few UCode instructions to see
when %ESP was loaded and then what ADDs/SUBs were done on it), but it
shouldn't be too hard.
You could then have a number of trackable events for skins to hook into:
new_mem_stack_aligned_4
new_mem_stack_aligned_8
etc.
die_mem_stack_aligned_4
die_mem_stack_aligned_8
etc.
for the most common cases (eg. 4, 8, 12, 16, 20, 24). They would be
passed the old %esp. The skins could have unrolled versions of the
general stack-adjusting code for these cases.
Also have:
new_mem_stack_aligned_gen
new_mem_stack
die_mem_stack_aligned_gen
die_mem_stack
If a skin didn't provide these special case functions, the core could fall
back to using the general case ones if they were provided -- this would be
useful when first writing skins, when you don't want to write five
different versions of the same function. Ie. new_mem_stack_aligned_4
would be used if present, but fall back to new_mem_stack_aligned_gen if
present, but fall back to new_mem_stack if present, else do nothing.
One complication -- how do we know at compile-time if a stack-adjustment
is aligned? We can't (AFAICT) so maybe the events shouldn't have any
mention of alignment, and it's up to the skin to do an alignment check and
speed up its actions based on this if it wants. So the events might be
new_mem_stack_4, new_mem_stack_8, ..., new_mem_stack_gen.
Doing some highly unjustifiable mental calculations, I think this could
result in a speedup for Memcheck of 15%, and Addrcheck a bit better, maybe
20%, at least for my test program. That would be pretty nice.
I'll definitely look into this once I've finished looking at moving
malloc() et al out of core.
N
|
|
From: Julian S. <js...@ac...> - 2003-03-29 12:45:44
|
While you're on the software-structure warpath, here's something
I noticed a while back, around xmas, when JeremyF and me were trying
to improve performance.
A big bottleneck (we surmise, but haven't accurately measured)
for skins which deal with A bits, is handling %ESP changes.
The core/skin split has made this more expensive, I think, by
causing longer chains of calls into the relevant skin's memory-
permissions adjustor functions. That isn't the problem, tho;
the following nonsensicality was there prior to that.
Consider a source push instr:
0x40184623: pushl %esi
10: GETL %ESI, t8
11: GETL %ESP, t12
12: MOVL t12, t10
13: SUBL $0x4, t10
14: PUTL t10, %ESP
15: STL t8, (t10)
16: INCEIPo $1
Eventually there is, a result of this, a call to VG_(handle_esp_assignment),
assuming the skin has asked for shadow memory (aiui). This passes the
old and new %ESP values, and in VG_(handle_esp_assignment) these are
subtracted to discover that the stack delta is -4. Then we do
VG_TRACK(new_mem_stack_aligned, new_esp, -delta);
(vg_memory.c:320ish)
which necessarily involve a loop, and probably a test on the sign of
delta, in the relevant skin's permissions-mangling function.
This is all rather inefficient considering that we knew from the start
that delta would be -4, and it would have been better to plant, in the
generated code, a call directly to a skin function specialised to a delta
of -4. Most deltas are small (+4, -4, +16, -16) and so a small bunch of
specialised functions + a general fallback case would probably constitute
a significant improvement.
The improvements would be because (1) the specialised functions have no
need to test the sign of delta nor have a loop to handle all values of
it, and (2) because we could save having to call VG_(handle_esp_assignment)
and then call onwards to the skin function; we could just call the skin
function directly.
You're the Architect(tm); I don't know if the idea makes sense or would
cause other difficulties, but I did notice this a while back.
J
|