You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
1
(5) |
2
(3) |
3
(1) |
4
(4) |
5
(1) |
6
(11) |
7
(5) |
|
8
|
9
(6) |
10
(2) |
11
(10) |
12
|
13
|
14
(4) |
|
15
(7) |
16
(1) |
17
(3) |
18
|
19
|
20
|
21
(1) |
|
22
(1) |
23
|
24
|
25
|
26
|
27
|
28
(4) |
|
29
|
30
|
31
|
|
|
|
|
|
From: Julian S. <js...@ac...> - 2002-12-14 19:58:20
|
I've spent several hours messing with this and arrived at the attached
result. Outcome so far is that some programs (bzip2) run about 25%
faster; others (OO) are unchanged, and moz is a little slower (57 s increases
to 62 s). Attached version measures a load of numbers.
There were a couple of performance bogons in the 73- patch. First is
that it's important to use requests to make an address range accessible
as an opportunity to prefetch into the cache, since in most cases, most
especially with %esp moving down, there's a pretty good bet that the
new area is just about to be referenced.
Doing so dramatically reduces the miss rate. That allows the size of the
cache to be shrunk from 2^20 to 2^14 (currently) or even 2^13 entries,
which helps a lot because now the real machine's D1/L2 caches aren't so
hammered (I assume).
Finally the miss handlers all call cache_valid_word() to fetch into the
cache. Mostly they do this (eg, in from the ACCESS4 fn) when they have
already established that the word in question is accessible, so cache_
valid_word's test for validity is redundant.
That said, performance is still not good enough to make it worthwhile.
Looking at the counts for make_{noaccess,writable}_aligned shown by the
attached version, I think we are losing a great deal of time messing
with the stack permissions every time %esp changes. Not only is the
loop surrounding that counter used a lot, the loop body is surprisingly
long (see below), and then there is the cost of getting here at all
from handle_esp_assignment in vg_memory.c. So I'd guess that doing
something about this would help performance, and would probably help
memcheck too. The only problem is I can't think a way to improve it :-(
This is all a bit disappointing, because for progs which don't mess
with %esp much, it makes a big improvement.
J
Main loop in ac_make_writable_aligned; is done once for each word on
the stack covered or uncovered (!)
.L247:
movl %esi, %eax
shrl $16, %eax
leal 0(,%eax,4), %ebx
cmpl $distinguished_secondary_map, (%ebx,%ebp)
jne .L249
pushl $.LC35
call alloc_secondary_map
movl %eax, (%ebx,%ebp)
addl $4, %esp
.L249:
movl %esi, %eax
shrl $16, %eax
movl primary_map(,%eax,4), %ebx
movl %esi, %edx
andl $65535, %edx
movl $15, %eax
movl %esi, %ecx
andl $4, %ecx
sall %cl, %eax
shrl $3, %edx
notl %eax
andb %al, (%edx,%ebx)
incl mw_aligned
movl %esi, %eax
andl $32764, %eax
movl %esi, valid_cache(%eax)
addl $4, %esi
cmpl %edi, %esi
jb .L247
|
|
From: Jeremy F. <je...@go...> - 2002-12-14 09:08:51
|
On Fri, 2002-12-13 at 19:51, Julian Seward wrote:
> > I'm wondering why AND and OR don't get to take anything but TempReg
> > args. From inspecting some largish chunks of code (mozilla), almost all
> > the AND instructions are with a constant.
> >
> > Is it something to do with the special properties of AND and OR
> > regarding memcheck verification?
>
> Yes, it's because it makes it a bit simpler to generate the wierd
> bits of code needed to do the memcheck AND/OR stuff.
OK, I worked all this through. I changed it so that AND and OR are
treated the same way as ADD/SUB/XOR/etc, but when generating code for
ImproveAND/ORn_TQ it copies the value into a TempReg, so I didn't have
to go through and touch all that.
This was worth a couple of percent with --skin=none on gcc.
> > Speaking of which, I presume the intent behind generating:
> >
> > movl $0, %reg
> > xorl %reg, %reg
>
> The problem here is that an original-code xorl %reg, %reg both
> makes %reg = 0 and does something or other with the flags. Treating
> it as if it was preceded by a literal load of zero stops memcheck
> yelping if %reg is undefined before the xor. Retaining the xor
> means I don't have to figure out what to set the flags to following
> the mov of $0.
As an experiment, I added a DEFINE UInstr which simply has the effect of
treating a register as defined. This makes the regalloc happy (since
the TempReg has been written before use), and memcheck happy (since the
xor ends up with a defined result), but it doesn't generate any more
code. I also added cases for SBB and SUB. SUB is another recommended
sequence for zeroing things, but I didn't see it in practice; SBB %reg,
%reg is surprisingly common (I guess as an idiom to move CF into a
register, though I'm not sure what for).
J
|
|
From: Julian S. <js...@ac...> - 2002-12-14 03:43:42
|
On Saturday 14 December 2002 2:41 am, Jeremy Fitzhardinge wrote: > I'm wondering why AND and OR don't get to take anything but TempReg > args. From inspecting some largish chunks of code (mozilla), almost all > the AND instructions are with a constant. > > Is it something to do with the special properties of AND and OR > regarding memcheck verification? Yes, it's because it makes it a bit simpler to generate the wierd bits of code needed to do the memcheck AND/OR stuff. > Speaking of which, I presume the intent behind generating: > > movl $0, %reg > xorl %reg, %reg The problem here is that an original-code xorl %reg, %reg both makes %reg = 0 and does something or other with the flags. Treating it as if it was preceded by a literal load of zero stops memcheck yelping if %reg is undefined before the xor. Retaining the xor means I don't have to figure out what to set the flags to following the mov of $0. J |
|
From: Jeremy F. <je...@go...> - 2002-12-14 02:41:58
|
I'm wondering why AND and OR don't get to take anything but TempReg
args. From inspecting some largish chunks of code (mozilla), almost all
the AND instructions are with a constant.
Is it something to do with the special properties of AND and OR
regarding memcheck verification?
Speaking of which, I presume the intent behind generating:
movl $0, %reg
xorl %reg, %reg
is to break the dependency on any previous value of %reg. Unfortunately
it actually generates the mov in the final code. I wonder if a better
solution is add a DEFINE UInstr which defines (ie, is considered to
write to) its argument without actually generating any code.
I've hacked up a patch to treat AND and OR the same way as the rest of
that group of instructions, but it is falling over in regalloc with
memcheck.
J
|