You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
1
(8) |
2
(2) |
3
(1) |
4
(2) |
|
5
(1) |
6
(4) |
7
(6) |
8
(5) |
9
(3) |
10
(5) |
11
(1) |
|
12
(6) |
13
(4) |
14
(1) |
15
(4) |
16
(1) |
17
|
18
|
|
19
|
20
|
21
(2) |
22
(28) |
23
(17) |
24
(6) |
25
(4) |
|
26
(2) |
27
(2) |
28
|
29
(5) |
30
(8) |
31
(14) |
|
|
From: <sv...@va...> - 2015-07-15 18:07:44
|
Author: petarj
Date: Wed Jul 15 19:07:36 2015
New Revision: 15413
Log:
mips32: fix build error caused by r15404
Fix typo that caused build break for mips32.
"error: invalid preprocessing directive #elsif"
Modified:
trunk/coregrind/m_vkiscnums.c
Modified: trunk/coregrind/m_vkiscnums.c
==============================================================================
--- trunk/coregrind/m_vkiscnums.c (original)
+++ trunk/coregrind/m_vkiscnums.c Wed Jul 15 19:07:36 2015
@@ -60,7 +60,7 @@
#if defined(VGP_mips32_linux)
STATIC_ASSERT(__NR_pipe == 4042);
STATIC_ASSERT(__NR_pipe2 == 4328);
-#elsif defined(VGP_mips64_linux)
+#elif defined(VGP_mips64_linux)
STATIC_ASSERT(__NR_pipe == 5021);
STATIC_ASSERT(__NR_pipe2 == 5287);
#endif
|
|
From: Josef W. <Jos...@gm...> - 2015-07-15 13:55:43
|
Am 15.07.2015 um 14:19 schrieb John Reiser: > It might affect the instruction prefetcher: usually 8 bytes (aligned) per 1 cycle; > perhaps 16 bytes on recent high-end chips. Recent Intel x86 (Sandy-Bridge+) have a trace cache (L0) for decoded micro-ops, which allows the x86 decoder to go idle. Then performance much more depends on whether there is a associativity conflict in L0 such that the decoder cannot go idle... difficult to predict. > Always align the start of a loop. Prefer fall-through (no branch) over branch. Usually if there is no branch prediction available, static prediction expects fall-through for forward branches, but branch for backwards (predicting a loop). >> First performance measurements show no clear indication faster/slower here. >> About the perf test suite: It measures cpu time used in userspace. Why >> is it not deterministic? > > Linux allocates physical page frames randomly, so pages might not map evenly Another reason: Changing frequency (turbo boost) depending on other stuff running and current CPU temperature, and whether the system was in some sleep mode before. Josef |
|
From: John R. <jr...@bi...> - 2015-07-15 12:19:45
|
>> .. is there any performance gain or change compared to doing a 32 bit
>> comparison at the full width?
>>
>> cmpl $0xaaaa, %edx
>> cmpl $0xaa, %edx
>>
>> I would prefer to use 32 bits throughout, since it avoids any possible
>> microarchitectural bad effects -- due to sub-register reads, or length-
>> changing prefixes (0x66) -- that might happen.
Sub-register writes usually take extra cycles (writing a narrow result
requires an extra Read and perhaps an extra cycle for the Merge), but
sub-register reads usually have no penalty at _execution_. The penalty
on many chips is for decoding the 0x66 prefix (16 bit length when 32-bit default.)
>
> I did a first try today.
> Observations:
> cmpl instruction has no 0x66 prefix, but is encoded with 32bit immediate
> value. But I guess instruction length is not really relevant for
> performance.
It might affect the instruction prefetcher: usually 8 bytes (aligned) per 1 cycle;
perhaps 16 bytes on recent high-end chips.
>
> 0x28008870 <+48>: 66 81 fa 55 55 cmp $0x5555,%dx
> vs.
> 0x28008870 <+48>: 81 fa 55 55 00 00 cmp $0x5555,%edx
>
> I think it does not make sense to replace the cmp in the parts of
> LOADV16le and LOADV8 that care about 16bit or 8bit chunks.
Most instruction prefixes (except perhaps 0x0f, and REX on amd64) add 1 cycle
to instruction decode on low-end chips (Celeron, perhaps Pentium) but often not
on high-end chips (Core, Core i3/5/7, Xeon). Sometimes the extra cycle(s)
can be hidden by execution of preceding slow opcodes, but instruction decode
often is a bottleneck for memcheck.
>
> Does it make sense to align jump targets to 16 bytes?
The purpose of alignment is to increase efficiency of the prefetcher
so that the decoder has as many complete instructions as it can process.
Always align the start of a loop. Prefer fall-through (no branch) over branch.
Do not align if the prefetcher already has the complete target; else align,
but beware fragmentation. Aligning to 8 bytes often is enough
if the first prefetch at the target holds two complete instructions.
Decoding and usage of execution units can matter, especially on low-end chips.
High-end chips have "too much hardware ;-)" that often compensates
for less-than-ideal compiling. Example: for vgMemCheck_helperc_LOAD64le
on amd64, then gcc-4.6.3 generated
movabs $0xfffffff000000007,%rax
test %rax,%rdi
jne slow
mov %rdi,%rdx
movzwl %di,%eax
shr $0x10,%rdx
shr $0x3,%rax
mov table(,%rdx,8),%rdx
movzwl (%rdx,%rax,2),%eax
cmp $0xaaaa,%rax
jne not_allV
xor %eax,%eax
which is ugly because most CPU have only one shifter (and sometimes
must be decoded first in a cycle) [and for other reasons, too.]
Hand code of
mov %rdi,%rdx
movzwl %di,%ecx
movabs $0xfffffff000000007,%rax
shr $0x10,%rdx
test %rax,%rdi; jne slow
shr $0x3,%ecx
mov table(,%rdx,8),%rdx
xor %eax,%eax
movzwl (%rdx,%rcx,2),%ecx
cmp $0xaaaa,%ecx; jne not_allV
is beautiful but no faster in Core i3/5/7 because dynamic scheduling
and a plethora of internal hardware compensate for the ugly code.
The hand code is 6% faster on an old AMD Phenom(tm) II X2 555.
>
> First performance measurements show no clear indication faster/slower here.
> About the perf test suite: It measures cpu time used in userspace. Why
> is it not deterministic?
Linux allocates physical page frames randomly, so pages might not map evenly
into the data cache. For instance, if the physical frame numbers of your
data pages all have the same remainder modulo 64, then the dcache effectively
is only 1/64 as big (or 1/32, or 1/16, etc., depending on associativity), so there
will be many more cache misses, and execution will be slower. I saw one case with
variance of 15% in otherwise-controlled execution. Matching within 2%
can be difficult.
|
Author: florian
Date: Wed Jul 15 11:37:36 2015
New Revision: 15412
Log:
Add a few more changes that were forgotten in r15402.
Also setup a list of ignored files for coregrind/m_aspacemgr/unit-test.
Modified:
branches/ASPACEM_TWEAKS/Makefile.am
branches/ASPACEM_TWEAKS/configure.ac
branches/ASPACEM_TWEAKS/coregrind/Makefile.am
branches/ASPACEM_TWEAKS/coregrind/m_aspacemgr/unit-test/ (props changed)
branches/ASPACEM_TWEAKS/include/pub_tool_aspacemgr.h
Modified: branches/ASPACEM_TWEAKS/Makefile.am
==============================================================================
--- branches/ASPACEM_TWEAKS/Makefile.am (original)
+++ branches/ASPACEM_TWEAKS/Makefile.am Wed Jul 15 11:37:36 2015
@@ -30,6 +30,7 @@
perf \
gdbserver_tests \
memcheck/tests/vbit-test \
+ coregrind/m_aspacemgr/unit-test \
auxprogs \
mpi \
docs
Modified: branches/ASPACEM_TWEAKS/configure.ac
==============================================================================
--- branches/ASPACEM_TWEAKS/configure.ac (original)
+++ branches/ASPACEM_TWEAKS/configure.ac Wed Jul 15 11:37:36 2015
@@ -2991,6 +2991,7 @@
auxprogs/Makefile
mpi/Makefile
coregrind/Makefile
+ coregrind/m_aspacemgr/unit-test/Makefile
memcheck/Makefile
memcheck/tests/Makefile
memcheck/tests/common/Makefile
Modified: branches/ASPACEM_TWEAKS/coregrind/Makefile.am
==============================================================================
--- branches/ASPACEM_TWEAKS/coregrind/Makefile.am (original)
+++ branches/ASPACEM_TWEAKS/coregrind/Makefile.am Wed Jul 15 11:37:36 2015
@@ -315,6 +315,7 @@
m_aspacemgr/aspacemgr-linux.c \
m_aspacemgr/aspacemgr-segnames.c \
m_aspacemgr/aspacemgr-segments.c \
+ m_aspacemgr/aspacemgr-dot.c \
m_coredump/coredump-elf.c \
m_coredump/coredump-macho.c \
m_debuginfo/misc.c \
Modified: branches/ASPACEM_TWEAKS/include/pub_tool_aspacemgr.h
==============================================================================
--- branches/ASPACEM_TWEAKS/include/pub_tool_aspacemgr.h (original)
+++ branches/ASPACEM_TWEAKS/include/pub_tool_aspacemgr.h Wed Jul 15 11:37:36 2015
@@ -103,7 +103,7 @@
(viz, not allowed to make translations from non-client areas)
*/
typedef
- struct {
+ struct NSegment {
SegKind kind;
/* Extent (SkFree, SkAnon{C,V}, SkFile{C,V}, SkResvn) */
Addr start; // lowest address in range
@@ -124,6 +124,10 @@
// been taken from this segment
/* Identifies what this segment is part of */
WhatsIt whatsit;
+ /* Tree structure. */
+ struct NSegment *left;
+ struct NSegment *right;
+ struct NSegment *up;
}
NSegment;
|