You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
|
|
|
|
1
(4) |
2
(2) |
|
3
(1) |
4
(1) |
5
|
6
|
7
|
8
(1) |
9
(1) |
|
10
(4) |
11
(1) |
12
(2) |
13
(2) |
14
(3) |
15
(2) |
16
(2) |
|
17
|
18
(1) |
19
(5) |
20
|
21
|
22
(8) |
23
(4) |
|
24
(1) |
25
|
26
(3) |
27
(8) |
28
(4) |
29
(4) |
30
(1) |
|
From: Philippe W. <phi...@so...> - 2017-09-22 21:52:59
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=f053756e2880dbb7291aac086aa48e1fd3b6f812 commit f053756e2880dbb7291aac086aa48e1fd3b6f812 Author: Philippe Waroquiers <phi...@sk...> Date: Fri Sep 22 23:50:35 2017 +0200 Follow up to 345307 - Warning about "still reachable" memory when using libstdc++ from gcc 5 The bug itself was solved in 3.12 by the addition of __gnu_cxx::__freeres in the libstdc++ and have valgrind calling it before exit. However, depending on the version of the libstdc++, the test leak_cpp_interior was giving different results. This commit adds some filtering specific to the test, so as to not depend anymore of the absolute number of bytes leaked, and adds a suppression entry to ignore the memory allocated by libstdc++. This allows to have only 2 .exp files, instead of 4 (or worse, if we would have to handle yet other .exp files depending on the libstdc++ version). Diff: --- memcheck/tests/Makefile.am | 2 +- memcheck/tests/filter_leak_cpp_interior | 13 ++ memcheck/tests/leak_cpp_interior.stderr.exp | 111 ++++++++-------- memcheck/tests/leak_cpp_interior.stderr.exp-64bit | 111 ++++++++-------- .../leak_cpp_interior.stderr.exp-64bit-solaris | 142 --------------------- .../tests/leak_cpp_interior.stderr.exp-solaris | 142 --------------------- memcheck/tests/leak_cpp_interior.vgtest | 3 +- memcheck/tests/libstdc++.supp | 75 +++++++++++ 8 files changed, 193 insertions(+), 406 deletions(-) diff --git a/memcheck/tests/Makefile.am b/memcheck/tests/Makefile.am index b8529f6..b9ba67b 100644 --- a/memcheck/tests/Makefile.am +++ b/memcheck/tests/Makefile.am @@ -64,6 +64,7 @@ dist_noinst_SCRIPTS = \ filter_allocs \ filter_dw4 \ filter_leak_cases_possible \ + filter_leak_cpp_interior \ filter_stderr filter_xml \ filter_strchr \ filter_varinfo3 \ @@ -114,7 +115,6 @@ EXTRA_DIST = \ cond_st.stderr.exp-64bit-non-arm \ cond_st.stderr.exp-32bit-non-arm \ leak_cpp_interior.stderr.exp leak_cpp_interior.stderr.exp-64bit leak_cpp_interior.vgtest \ - leak_cpp_interior.stderr.exp-solaris leak_cpp_interior.stderr.exp-64bit-solaris \ custom_alloc.stderr.exp custom_alloc.vgtest \ custom_alloc.stderr.exp-s390x-mvc \ custom-overlap.stderr.exp custom-overlap.vgtest \ diff --git a/memcheck/tests/filter_leak_cpp_interior b/memcheck/tests/filter_leak_cpp_interior new file mode 100755 index 0000000..ae09087 --- /dev/null +++ b/memcheck/tests/filter_leak_cpp_interior @@ -0,0 +1,13 @@ +#! /bin/sh +# +# Remove the suppressed line and the total heap usage line. +# Replace the absolute number of bytes different of 0 by x. +./filter_stderr "$@" | + sed -e '/ suppressed:/d'\ + -e '/ total heap usage:/d'\ + -e 's/[1-9][0-9,]* bytes/x bytes/' \ + -e 's/[1-9][0-9,]* (\([+-]\)[1-9][0-9,]*) bytes/x (\1x) bytes/' \ + -e 's/0 (\([+-]\)[1-9][0-9,]*) bytes/0 (\1x) bytes/' \ + -e 's/[1-9][0-9,]* (\([+-]\)0) bytes/x (\10) bytes/' + + diff --git a/memcheck/tests/leak_cpp_interior.stderr.exp b/memcheck/tests/leak_cpp_interior.stderr.exp index 70e2764..df6cad2 100644 --- a/memcheck/tests/leak_cpp_interior.stderr.exp +++ b/memcheck/tests/leak_cpp_interior.stderr.exp @@ -1,118 +1,110 @@ valgrind output will go to log VALGRIND_DO_LEAK_CHECK -4 bytes in 1 blocks are definitely lost in loss record ... of ... +x bytes in 1 blocks are definitely lost in loss record ... of ... by 0x........: doit() (leak_cpp_interior.cpp:116) by 0x........: main (leak_cpp_interior.cpp:131) LEAK SUMMARY: - definitely lost: 4 bytes in 1 blocks + definitely lost: x bytes in 1 blocks indirectly lost: 0 bytes in 0 blocks possibly lost: 0 bytes in 0 blocks - still reachable: 163 bytes in 8 blocks + still reachable: x bytes in 8 blocks of which reachable via heuristic: - stdstring : 56 bytes in 2 blocks - length64 : 31 bytes in 1 blocks - newarray : 28 bytes in 1 blocks - multipleinheritance: 24 bytes in 2 blocks - suppressed: 0 bytes in 0 blocks + stdstring : x bytes in 2 blocks + length64 : x bytes in 1 blocks + newarray : x bytes in 1 blocks + multipleinheritance: x bytes in 2 blocks Reachable blocks (those to which a pointer was found) are not shown. To see them, rerun with: --leak-check=full --show-leak-kinds=all leak_check summary heuristics multipleinheritance LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 115 (+115) bytes in 4 (+4) blocks - still reachable: 48 (-115) bytes in 4 (-4) blocks + possibly lost: x (+x) bytes in 4 (+4) blocks + still reachable: x (-x) bytes in 4 (-4) blocks of which reachable via heuristic: - stdstring : 0 (-56) bytes in 0 (-2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - newarray : 0 (-28) bytes in 0 (-1) blocks - multipleinheritance: 24 (+0) bytes in 2 (+0) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + stdstring : 0 (-x) bytes in 0 (-2) blocks + length64 : 0 (-x) bytes in 0 (-1) blocks + newarray : 0 (-x) bytes in 0 (-1) blocks + multipleinheritance: x (+0) bytes in 2 (+0) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary any heuristics newarray LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 111 (-4) bytes in 5 (+1) blocks - still reachable: 52 (+4) bytes in 3 (-1) blocks + possibly lost: x (-x) bytes in 5 (+1) blocks + still reachable: x (+x) bytes in 3 (-1) blocks of which reachable via heuristic: - newarray : 28 (+28) bytes in 1 (+1) blocks - multipleinheritance: 0 (-24) bytes in 0 (-2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + newarray : x (+x) bytes in 1 (+1) blocks + multipleinheritance: 0 (-x) bytes in 0 (-2) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary heuristics length64 LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 108 (-3) bytes in 5 (+0) blocks - still reachable: 55 (+3) bytes in 3 (+0) blocks + possibly lost: x (-x) bytes in 5 (+0) blocks + still reachable: x (+x) bytes in 3 (+0) blocks of which reachable via heuristic: - length64 : 31 (+31) bytes in 1 (+1) blocks - newarray : 0 (-28) bytes in 0 (-1) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + length64 : x (+x) bytes in 1 (+1) blocks + newarray : 0 (-x) bytes in 0 (-1) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary heuristics stdstring LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 83 (-25) bytes in 4 (-1) blocks - still reachable: 80 (+25) bytes in 4 (+1) blocks + possibly lost: x (-x) bytes in 4 (-1) blocks + still reachable: x (+x) bytes in 4 (+1) blocks of which reachable via heuristic: - stdstring : 56 (+56) bytes in 2 (+2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + stdstring : x (+x) bytes in 2 (+2) blocks + length64 : 0 (-x) bytes in 0 (-1) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary heuristics multipleinheritance,newarray,stdstring,length64 LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 0 (-83) bytes in 0 (-4) blocks - still reachable: 163 (+83) bytes in 8 (+4) blocks + possibly lost: 0 (-x) bytes in 0 (-4) blocks + still reachable: x (+x) bytes in 8 (+4) blocks of which reachable via heuristic: - stdstring : 56 (+0) bytes in 2 (+0) blocks - length64 : 31 (+31) bytes in 1 (+1) blocks - newarray : 28 (+28) bytes in 1 (+1) blocks - multipleinheritance: 24 (+24) bytes in 2 (+2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + stdstring : x (+0) bytes in 2 (+0) blocks + length64 : x (+x) bytes in 1 (+1) blocks + newarray : x (+x) bytes in 1 (+1) blocks + multipleinheritance: x (+x) bytes in 2 (+2) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary heuristics all LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks possibly lost: 0 (+0) bytes in 0 (+0) blocks - still reachable: 163 (+0) bytes in 8 (+0) blocks + still reachable: x (+0) bytes in 8 (+0) blocks of which reachable via heuristic: - stdstring : 56 (+0) bytes in 2 (+0) blocks - length64 : 31 (+0) bytes in 1 (+0) blocks - newarray : 28 (+0) bytes in 1 (+0) blocks - multipleinheritance: 24 (+0) bytes in 2 (+0) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + stdstring : x (+0) bytes in 2 (+0) blocks + length64 : x (+0) bytes in 1 (+0) blocks + newarray : x (+0) bytes in 1 (+0) blocks + multipleinheritance: x (+0) bytes in 2 (+0) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary heuristics none LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 139 (+139) bytes in 6 (+6) blocks - still reachable: 24 (-139) bytes in 2 (-6) blocks + possibly lost: x (+x) bytes in 6 (+6) blocks + still reachable: x (-x) bytes in 2 (-6) blocks of which reachable via heuristic: - stdstring : 0 (-56) bytes in 0 (-2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - newarray : 0 (-28) bytes in 0 (-1) blocks - multipleinheritance: 0 (-24) bytes in 0 (-2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + stdstring : 0 (-x) bytes in 0 (-2) blocks + length64 : 0 (-x) bytes in 0 (-1) blocks + newarray : 0 (-x) bytes in 0 (-1) blocks + multipleinheritance: 0 (-x) bytes in 0 (-2) blocks To see details of leaked memory, give 'full' arg to leak_check -Searching for pointers pointing in 20 bytes from 0x........ -*0x........ interior points at 4 bytes inside 0x........ +Searching for pointers pointing in x bytes from 0x........ +*0x........ interior points at x bytes inside 0x........ Address 0x........ is 0 bytes inside data symbol "ptr" block at 0x........ considered reachable by ptr 0x........ using newarray heuristic destruct MyClass @@ -134,7 +126,6 @@ Finished! HEAP SUMMARY: in use at exit: 0 bytes in 0 blocks - total heap usage: 9 allocs, 9 frees, 167 bytes allocated All heap blocks were freed -- no leaks are possible diff --git a/memcheck/tests/leak_cpp_interior.stderr.exp-64bit b/memcheck/tests/leak_cpp_interior.stderr.exp-64bit index 612fa3e..3899730 100644 --- a/memcheck/tests/leak_cpp_interior.stderr.exp-64bit +++ b/memcheck/tests/leak_cpp_interior.stderr.exp-64bit @@ -1,118 +1,110 @@ valgrind output will go to log VALGRIND_DO_LEAK_CHECK -8 bytes in 1 blocks are definitely lost in loss record ... of ... +x bytes in 1 blocks are definitely lost in loss record ... of ... by 0x........: doit() (leak_cpp_interior.cpp:116) by 0x........: main (leak_cpp_interior.cpp:131) LEAK SUMMARY: - definitely lost: 8 bytes in 1 blocks + definitely lost: x bytes in 1 blocks indirectly lost: 0 bytes in 0 blocks possibly lost: 0 bytes in 0 blocks - still reachable: 239 bytes in 8 blocks + still reachable: x bytes in 8 blocks of which reachable via heuristic: - stdstring : 80 bytes in 2 blocks - length64 : 31 bytes in 1 blocks - newarray : 32 bytes in 1 blocks - multipleinheritance: 48 bytes in 2 blocks - suppressed: 0 bytes in 0 blocks + stdstring : x bytes in 2 blocks + length64 : x bytes in 1 blocks + newarray : x bytes in 1 blocks + multipleinheritance: x bytes in 2 blocks Reachable blocks (those to which a pointer was found) are not shown. To see them, rerun with: --leak-check=full --show-leak-kinds=all leak_check summary heuristics multipleinheritance LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 143 (+143) bytes in 4 (+4) blocks - still reachable: 96 (-143) bytes in 4 (-4) blocks + possibly lost: x (+x) bytes in 4 (+4) blocks + still reachable: x (-x) bytes in 4 (-4) blocks of which reachable via heuristic: - stdstring : 0 (-80) bytes in 0 (-2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - newarray : 0 (-32) bytes in 0 (-1) blocks - multipleinheritance: 48 (+0) bytes in 2 (+0) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + stdstring : 0 (-x) bytes in 0 (-2) blocks + length64 : 0 (-x) bytes in 0 (-1) blocks + newarray : 0 (-x) bytes in 0 (-1) blocks + multipleinheritance: x (+0) bytes in 2 (+0) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary any heuristics newarray LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 128 (-15) bytes in 4 (+0) blocks - still reachable: 111 (+15) bytes in 4 (+0) blocks + possibly lost: x (-x) bytes in 4 (+0) blocks + still reachable: x (+x) bytes in 4 (+0) blocks of which reachable via heuristic: - newarray : 63 (+63) bytes in 2 (+2) blocks - multipleinheritance: 0 (-48) bytes in 0 (-2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + newarray : x (+x) bytes in 2 (+2) blocks + multipleinheritance: 0 (-x) bytes in 0 (-2) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary heuristics length64 LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 160 (+32) bytes in 5 (+1) blocks - still reachable: 79 (-32) bytes in 3 (-1) blocks + possibly lost: x (+x) bytes in 5 (+1) blocks + still reachable: x (-x) bytes in 3 (-1) blocks of which reachable via heuristic: - length64 : 31 (+31) bytes in 1 (+1) blocks - newarray : 0 (-63) bytes in 0 (-2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + length64 : x (+x) bytes in 1 (+1) blocks + newarray : 0 (-x) bytes in 0 (-2) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary heuristics stdstring LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 111 (-49) bytes in 4 (-1) blocks - still reachable: 128 (+49) bytes in 4 (+1) blocks + possibly lost: x (-x) bytes in 4 (-1) blocks + still reachable: x (+x) bytes in 4 (+1) blocks of which reachable via heuristic: - stdstring : 80 (+80) bytes in 2 (+2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + stdstring : x (+x) bytes in 2 (+2) blocks + length64 : 0 (-x) bytes in 0 (-1) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary heuristics multipleinheritance,newarray,stdstring,length64 LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 0 (-111) bytes in 0 (-4) blocks - still reachable: 239 (+111) bytes in 8 (+4) blocks + possibly lost: 0 (-x) bytes in 0 (-4) blocks + still reachable: x (+x) bytes in 8 (+4) blocks of which reachable via heuristic: - stdstring : 80 (+0) bytes in 2 (+0) blocks - length64 : 31 (+31) bytes in 1 (+1) blocks - newarray : 32 (+32) bytes in 1 (+1) blocks - multipleinheritance: 48 (+48) bytes in 2 (+2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + stdstring : x (+0) bytes in 2 (+0) blocks + length64 : x (+x) bytes in 1 (+1) blocks + newarray : x (+x) bytes in 1 (+1) blocks + multipleinheritance: x (+x) bytes in 2 (+2) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary heuristics all LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks possibly lost: 0 (+0) bytes in 0 (+0) blocks - still reachable: 239 (+0) bytes in 8 (+0) blocks + still reachable: x (+0) bytes in 8 (+0) blocks of which reachable via heuristic: - stdstring : 80 (+0) bytes in 2 (+0) blocks - length64 : 31 (+0) bytes in 1 (+0) blocks - newarray : 32 (+0) bytes in 1 (+0) blocks - multipleinheritance: 48 (+0) bytes in 2 (+0) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + stdstring : x (+0) bytes in 2 (+0) blocks + length64 : x (+0) bytes in 1 (+0) blocks + newarray : x (+0) bytes in 1 (+0) blocks + multipleinheritance: x (+0) bytes in 2 (+0) blocks To see details of leaked memory, give 'full' arg to leak_check leak_check summary heuristics none LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks + definitely lost: x (+0) bytes in 1 (+0) blocks indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 191 (+191) bytes in 6 (+6) blocks - still reachable: 48 (-191) bytes in 2 (-6) blocks + possibly lost: x (+x) bytes in 6 (+6) blocks + still reachable: x (-x) bytes in 2 (-6) blocks of which reachable via heuristic: - stdstring : 0 (-80) bytes in 0 (-2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - newarray : 0 (-32) bytes in 0 (-1) blocks - multipleinheritance: 0 (-48) bytes in 0 (-2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks + stdstring : 0 (-x) bytes in 0 (-2) blocks + length64 : 0 (-x) bytes in 0 (-1) blocks + newarray : 0 (-x) bytes in 0 (-1) blocks + multipleinheritance: 0 (-x) bytes in 0 (-2) blocks To see details of leaked memory, give 'full' arg to leak_check -Searching for pointers pointing in 20 bytes from 0x........ -*0x........ interior points at 8 bytes inside 0x........ +Searching for pointers pointing in x bytes from 0x........ +*0x........ interior points at x bytes inside 0x........ Address 0x........ is 0 bytes inside data symbol "ptr" block at 0x........ considered reachable by ptr 0x........ using newarray heuristic destruct MyClass @@ -134,7 +126,6 @@ Finished! HEAP SUMMARY: in use at exit: 0 bytes in 0 blocks - total heap usage: 9 allocs, 9 frees, 247 bytes allocated All heap blocks were freed -- no leaks are possible diff --git a/memcheck/tests/leak_cpp_interior.stderr.exp-64bit-solaris b/memcheck/tests/leak_cpp_interior.stderr.exp-64bit-solaris deleted file mode 100644 index f7e1a07..0000000 --- a/memcheck/tests/leak_cpp_interior.stderr.exp-64bit-solaris +++ /dev/null @@ -1,142 +0,0 @@ - -valgrind output will go to log -VALGRIND_DO_LEAK_CHECK -8 bytes in 1 blocks are definitely lost in loss record ... of ... - by 0x........: doit() (leak_cpp_interior.cpp:116) - by 0x........: main (leak_cpp_interior.cpp:131) - -LEAK SUMMARY: - definitely lost: 8 bytes in 1 blocks - indirectly lost: 0 bytes in 0 blocks - possibly lost: 0 bytes in 0 blocks - still reachable: 273 bytes in 8 blocks - of which reachable via heuristic: - stdstring : 114 bytes in 2 blocks - length64 : 31 bytes in 1 blocks - newarray : 32 bytes in 1 blocks - multipleinheritance: 48 bytes in 2 blocks - suppressed: 0 bytes in 0 blocks -Reachable blocks (those to which a pointer was found) are not shown. -To see them, rerun with: --leak-check=full --show-leak-kinds=all - -leak_check summary heuristics multipleinheritance -LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 177 (+177) bytes in 4 (+4) blocks - still reachable: 96 (-177) bytes in 4 (-4) blocks - of which reachable via heuristic: - stdstring : 0 (-114) bytes in 0 (-2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - newarray : 0 (-32) bytes in 0 (-1) blocks - multipleinheritance: 48 (+0) bytes in 2 (+0) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary any heuristics newarray -LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 162 (-15) bytes in 4 (+0) blocks - still reachable: 111 (+15) bytes in 4 (+0) blocks - of which reachable via heuristic: - newarray : 63 (+63) bytes in 2 (+2) blocks - multipleinheritance: 0 (-48) bytes in 0 (-2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary heuristics length64 -LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 194 (+32) bytes in 5 (+1) blocks - still reachable: 79 (-32) bytes in 3 (-1) blocks - of which reachable via heuristic: - length64 : 31 (+31) bytes in 1 (+1) blocks - newarray : 0 (-63) bytes in 0 (-2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary heuristics stdstring -LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 111 (-83) bytes in 4 (-1) blocks - still reachable: 162 (+83) bytes in 4 (+1) blocks - of which reachable via heuristic: - stdstring : 114 (+114) bytes in 2 (+2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary heuristics multipleinheritance,newarray,stdstring,length64 -LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 0 (-111) bytes in 0 (-4) blocks - still reachable: 273 (+111) bytes in 8 (+4) blocks - of which reachable via heuristic: - stdstring : 114 (+0) bytes in 2 (+0) blocks - length64 : 31 (+31) bytes in 1 (+1) blocks - newarray : 32 (+32) bytes in 1 (+1) blocks - multipleinheritance: 48 (+48) bytes in 2 (+2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary heuristics all -LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 0 (+0) bytes in 0 (+0) blocks - still reachable: 273 (+0) bytes in 8 (+0) blocks - of which reachable via heuristic: - stdstring : 114 (+0) bytes in 2 (+0) blocks - length64 : 31 (+0) bytes in 1 (+0) blocks - newarray : 32 (+0) bytes in 1 (+0) blocks - multipleinheritance: 48 (+0) bytes in 2 (+0) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary heuristics none -LEAK SUMMARY: - definitely lost: 8 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 225 (+225) bytes in 6 (+6) blocks - still reachable: 48 (-225) bytes in 2 (-6) blocks - of which reachable via heuristic: - stdstring : 0 (-114) bytes in 0 (-2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - newarray : 0 (-32) bytes in 0 (-1) blocks - multipleinheritance: 0 (-48) bytes in 0 (-2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -Searching for pointers pointing in 20 bytes from 0x........ -*0x........ interior points at 8 bytes inside 0x........ - Address 0x........ is 0 bytes inside data symbol "ptr" -block at 0x........ considered reachable by ptr 0x........ using newarray heuristic -destruct MyClass -destruct MyClass -destruct MyClass -destruct Ce -destruct Be -destruct Ae -destruct Ce -destruct Be -destruct Ae -destruct C -destruct B -destruct A -destruct C -destruct B -destruct A -Finished! - -HEAP SUMMARY: - in use at exit: 0 bytes in 0 blocks - total heap usage: 9 allocs, 9 frees, 281 bytes allocated - -All heap blocks were freed -- no leaks are possible - -For counts of detected and suppressed errors, rerun with: -v -ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) diff --git a/memcheck/tests/leak_cpp_interior.stderr.exp-solaris b/memcheck/tests/leak_cpp_interior.stderr.exp-solaris deleted file mode 100644 index f9fc390..0000000 --- a/memcheck/tests/leak_cpp_interior.stderr.exp-solaris +++ /dev/null @@ -1,142 +0,0 @@ - -valgrind output will go to log -VALGRIND_DO_LEAK_CHECK -4 bytes in 1 blocks are definitely lost in loss record ... of ... - by 0x........: doit() (leak_cpp_interior.cpp:116) - by 0x........: main (leak_cpp_interior.cpp:131) - -LEAK SUMMARY: - definitely lost: 4 bytes in 1 blocks - indirectly lost: 0 bytes in 0 blocks - possibly lost: 0 bytes in 0 blocks - still reachable: 197 bytes in 8 blocks - of which reachable via heuristic: - stdstring : 90 bytes in 2 blocks - length64 : 31 bytes in 1 blocks - newarray : 28 bytes in 1 blocks - multipleinheritance: 24 bytes in 2 blocks - suppressed: 0 bytes in 0 blocks -Reachable blocks (those to which a pointer was found) are not shown. -To see them, rerun with: --leak-check=full --show-leak-kinds=all - -leak_check summary heuristics multipleinheritance -LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 149 (+149) bytes in 4 (+4) blocks - still reachable: 48 (-149) bytes in 4 (-4) blocks - of which reachable via heuristic: - stdstring : 0 (-90) bytes in 0 (-2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - newarray : 0 (-28) bytes in 0 (-1) blocks - multipleinheritance: 24 (+0) bytes in 2 (+0) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary any heuristics newarray -LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 145 (-4) bytes in 5 (+1) blocks - still reachable: 52 (+4) bytes in 3 (-1) blocks - of which reachable via heuristic: - newarray : 28 (+28) bytes in 1 (+1) blocks - multipleinheritance: 0 (-24) bytes in 0 (-2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary heuristics length64 -LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 142 (-3) bytes in 5 (+0) blocks - still reachable: 55 (+3) bytes in 3 (+0) blocks - of which reachable via heuristic: - length64 : 31 (+31) bytes in 1 (+1) blocks - newarray : 0 (-28) bytes in 0 (-1) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary heuristics stdstring -LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 83 (-59) bytes in 4 (-1) blocks - still reachable: 114 (+59) bytes in 4 (+1) blocks - of which reachable via heuristic: - stdstring : 90 (+90) bytes in 2 (+2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary heuristics multipleinheritance,newarray,stdstring,length64 -LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 0 (-83) bytes in 0 (-4) blocks - still reachable: 197 (+83) bytes in 8 (+4) blocks - of which reachable via heuristic: - stdstring : 90 (+0) bytes in 2 (+0) blocks - length64 : 31 (+31) bytes in 1 (+1) blocks - newarray : 28 (+28) bytes in 1 (+1) blocks - multipleinheritance: 24 (+24) bytes in 2 (+2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary heuristics all -LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 0 (+0) bytes in 0 (+0) blocks - still reachable: 197 (+0) bytes in 8 (+0) blocks - of which reachable via heuristic: - stdstring : 90 (+0) bytes in 2 (+0) blocks - length64 : 31 (+0) bytes in 1 (+0) blocks - newarray : 28 (+0) bytes in 1 (+0) blocks - multipleinheritance: 24 (+0) bytes in 2 (+0) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -leak_check summary heuristics none -LEAK SUMMARY: - definitely lost: 4 (+0) bytes in 1 (+0) blocks - indirectly lost: 0 (+0) bytes in 0 (+0) blocks - possibly lost: 173 (+173) bytes in 6 (+6) blocks - still reachable: 24 (-173) bytes in 2 (-6) blocks - of which reachable via heuristic: - stdstring : 0 (-90) bytes in 0 (-2) blocks - length64 : 0 (-31) bytes in 0 (-1) blocks - newarray : 0 (-28) bytes in 0 (-1) blocks - multipleinheritance: 0 (-24) bytes in 0 (-2) blocks - suppressed: 0 (+0) bytes in 0 (+0) blocks -To see details of leaked memory, give 'full' arg to leak_check - -Searching for pointers pointing in 20 bytes from 0x........ -*0x........ interior points at 4 bytes inside 0x........ - Address 0x........ is 0 bytes inside data symbol "ptr" -block at 0x........ considered reachable by ptr 0x........ using newarray heuristic -destruct MyClass -destruct MyClass -destruct MyClass -destruct Ce -destruct Be -destruct Ae -destruct Ce -destruct Be -destruct Ae -destruct C -destruct B -destruct A -destruct C -destruct B -destruct A -Finished! - -HEAP SUMMARY: - in use at exit: 0 bytes in 0 blocks - total heap usage: 9 allocs, 9 frees, 201 bytes allocated - -All heap blocks were freed -- no leaks are possible - -For counts of detected and suppressed errors, rerun with: -v -ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) diff --git a/memcheck/tests/leak_cpp_interior.vgtest b/memcheck/tests/leak_cpp_interior.vgtest index 4ecc219..2fed646 100644 --- a/memcheck/tests/leak_cpp_interior.vgtest +++ b/memcheck/tests/leak_cpp_interior.vgtest @@ -1,2 +1,3 @@ prog: leak_cpp_interior -vgopts: --leak-check=summary --leak-check-heuristics=multipleinheritance,stdstring,newarray,length64 +vgopts: --leak-check=summary --leak-check-heuristics=multipleinheritance,stdstring,newarray,length64 --suppressions=libstdc++.supp +stderr_filter: filter_leak_cpp_interior diff --git a/memcheck/tests/libstdc++.supp b/memcheck/tests/libstdc++.supp new file mode 100644 index 0000000..fad537f --- /dev/null +++ b/memcheck/tests/libstdc++.supp @@ -0,0 +1,75 @@ +# Below is a temporary patch (slightly modified) from +# 345307 - Warning about "still reachable" memory when using libstdc++ from gcc 5 +# This patch is not needed anymore if libstdc++ provides __gnu_cxx::__freeres +# but we still need to ignore these allocations during the leak_cpp_interior +# to have the test behaviour not depending on libstdc++ version. + + + +# Some programs are using the C++ STL and string classes. +# Valgrind reports 'still reachable' memory leaks involving these classes +# at the exit of the program, but there should be none. +# +# Many implementations of the C++ standard libraries use their own memory +# pool allocators. Memory for quite a number of destructed objects is not +# immediately freed and given back to the OS, but kept in the pool(s) for +# later re-use. The fact that the pools are not freed at the exit of the +# program cause Valgrind to report this memory as still reachable. +# +# The behavior not to free pools at the exit could be called a bug of the +# library though. +# +# Using GCC, you can force the STL to use malloc and to free memory as soon +# as possible by globally disabling memory caching. Beware! Doing so will +# probably slow down your program, sometimes drastically. +# +# There are other ways to disable memory pooling: using the malloc_alloc +# template with your objects (not portable, but should work for GCC) or +# even writing your own memory allocators. But beware: allocators belong +# to the more messy parts of the STL and people went to great lengths to +# make the STL portable across platforms. Chances are good that your +# solution will work on your platform, but not on others. +# +# 72,704 bytes in 1 blocks are still reachable in loss record 1 of 1 +# at 0x4C28D06: malloc (vg_replace_malloc.c:299) +# by 0x50C317F: ??? (in /usr/lib64/libstdc++.so.6.0.21) +# by 0x400F759: call_init.part.0 (dl-init.c:72) +# by 0x400F86A: call_init (dl-init.c:30) +# by 0x400F86A: _dl_init (dl-init.c:120) +# by 0x4000CB9: ??? (in /usr/lib64/ld-2.22.so) +# +# HEAP SUMMARY: +# in use at exit: 72,704 bytes in 1 blocks +# total heap usage: 4 allocs, 3 frees, 72,864 bytes allocated +# +# LEAK SUMMARY: +# definitely lost: 0 bytes in 0 blocks +# indirectly lost: 0 bytes in 0 blocks +# possibly lost: 0 bytes in 0 blocks +# still reachable: 72,704 bytes in 1 blocks +# suppressed: 0 bytes in 0 blocks + +{ + malloc-leaks-cxx-stl-string-classes + Memcheck:Leak + match-leak-kinds: reachable + fun:malloc + obj:*lib*/libstdc++.so* + fun:call_init.part.0 + fun:call_init + fun:_dl_init + obj:*lib*/ld-2.*.so +} +{ + malloc-leaks-cxx-stl-string-classes-debug + Memcheck:Leak + match-leak-kinds: reachable + fun:malloc + fun:pool + fun:__static_initialization_and_destruction_0 + fun:_GLOBAL__sub_I_eh_alloc.cc + fun:call_init.part.0 + fun:call_init + fun:_dl_init + obj:*lib*/ld-2.*.so +} |
|
From: Carl L. <ce...@us...> - 2017-09-22 17:53:27
|
Ivo, Julian: Looks like our issue is the same as was seen in https://bugs.kde.org/show_bug.cgi?id=375839 The bug reports the same error and symptoms. Found the bug based on comments in guest_generic_bb_to_IR.c. /* Although we will try to disassemble up to vex_control.guest_max_insns insns into the block, the individual insn assemblers may hint to us that a disassembled instruction is verbose. In that case we will lower the limit so as to ensure that the JIT doesn't run out of space. See bug 375839 for the motivating example. */ Carl Love On Fri, 2017-09-22 at 10:20 -0700, Carl Love wrote: > On Fri, 2017-09-22 at 17:52 +0200, Ivo Raisr wrote: > > > From the comments in the code, it doesn't look like increasing the 15000 > > > is a viable option. > > > > Actually this limit is somewhat arbitrary. > > For register live ranges, type Short is used but only because invalid > > start/end range is indicated with -2. > > The rest of negative range is actually unused. > > I think this can be easily changed to UShort instead, and the limit > > raised to 62000, for example. > > > > I can send you a patch if you are interested and willing to try it. > > > > > It appears that something is just generating too > > > much stuff. I am wondering if anyone can give me some idea what is going > > > on here. It all appears to be architecture independent code. Any > > > suggestions on how to go about debugging this would be helpful. Thanks. > > > > Dump the information what the VEX JIT is doing. > > 1) Start with --trace-flags=10000000 --trace-notbelow=0 > > From the last block dumped, note the SB number. > > 2) Refine with --trace-flags=11111100 --trace-notbelow=<SB-1> > > You'll have relatively short dump with very useful information. > > > > I. > > > Ivo: > > So, with some help from Aaron, it looks like we are generating a lot of > temporaries. At one point, I see the temporary map with: > ------------------------ After pre-instr IR optimisation ------------------------ > > IRSB { > t0:F128 t1:F128 t2:F128 t3:I32 t4:I32 t5:I1 t6:I1 t7:I1 > > <cut for readability> > > t9312:I32 t9313:I32 t9314:I32 t9315:I1 t9316:I32 t9317:I32 t9318:I1 t9319:I32 > t9320:I32 t9321:I32 t9322:I1 t9323:I32 t9324:I32 t9325:I64 > > > This occurs in the middle of a block of a bunch of P9 Floating point 128 > instructions. Some of the P9 floating point 128 instructions take a > fair number of Iops to implement. I don't remember specifically for > each instruction at this point. It looks to us like there may just > be too many of these instructions in the Valgrind basic block. When the > instructions get converted to Iops it is just too big. Looking at a > partial assembly listing that Aaron gave me for the floating point test, > it looks like the assembly code is a very large sequential block of > instructions. One thought Aaron had was to see if we can tell Valgrind > to limit the size of its basic block. Not seeing a command line option > to do that. > > In VEX/priv/main_main.c > > There are the line: > > vcon->iropt_unroll_thresh = 120; > vcon->guest_max_insns = 60; > > I found if they are changed to > > vcon->iropt_unroll_thresh = 4; > vcon->guest_max_insns = 2; > > It appears to "fix" the issue. Not sure what the ramifications of > forcing these so low. Will do some more looking in the code to see what > these really do. Don't know if there is an existing mechanism to allow > contorl of this or not? Thoughts? > > Carl Love > > |
|
From: Carl L. <ce...@us...> - 2017-09-22 17:20:49
|
On Fri, 2017-09-22 at 17:52 +0200, Ivo Raisr wrote:
> > From the comments in the code, it doesn't look like increasing the 15000
> > is a viable option.
>
> Actually this limit is somewhat arbitrary.
> For register live ranges, type Short is used but only because invalid
> start/end range is indicated with -2.
> The rest of negative range is actually unused.
> I think this can be easily changed to UShort instead, and the limit
> raised to 62000, for example.
>
> I can send you a patch if you are interested and willing to try it.
>
> > It appears that something is just generating too
> > much stuff. I am wondering if anyone can give me some idea what is going
> > on here. It all appears to be architecture independent code. Any
> > suggestions on how to go about debugging this would be helpful. Thanks.
>
> Dump the information what the VEX JIT is doing.
> 1) Start with --trace-flags=10000000 --trace-notbelow=0
> From the last block dumped, note the SB number.
> 2) Refine with --trace-flags=11111100 --trace-notbelow=<SB-1>
> You'll have relatively short dump with very useful information.
>
> I.
>
Ivo:
So, with some help from Aaron, it looks like we are generating a lot of
temporaries. At one point, I see the temporary map with:
------------------------ After pre-instr IR optimisation ------------------------
IRSB {
t0:F128 t1:F128 t2:F128 t3:I32 t4:I32 t5:I1 t6:I1 t7:I1
<cut for readability>
t9312:I32 t9313:I32 t9314:I32 t9315:I1 t9316:I32 t9317:I32 t9318:I1 t9319:I32
t9320:I32 t9321:I32 t9322:I1 t9323:I32 t9324:I32 t9325:I64
This occurs in the middle of a block of a bunch of P9 Floating point 128
instructions. Some of the P9 floating point 128 instructions take a
fair number of Iops to implement. I don't remember specifically for
each instruction at this point. It looks to us like there may just
be too many of these instructions in the Valgrind basic block. When the
instructions get converted to Iops it is just too big. Looking at a
partial assembly listing that Aaron gave me for the floating point test,
it looks like the assembly code is a very large sequential block of
instructions. One thought Aaron had was to see if we can tell Valgrind
to limit the size of its basic block. Not seeing a command line option
to do that.
In VEX/priv/main_main.c
There are the line:
vcon->iropt_unroll_thresh = 120;
vcon->guest_max_insns = 60;
I found if they are changed to
vcon->iropt_unroll_thresh = 4;
vcon->guest_max_insns = 2;
It appears to "fix" the issue. Not sure what the ramifications of
forcing these so low. Will do some more looking in the code to see what
these really do. Don't know if there is an existing mechanism to allow
contorl of this or not? Thoughts?
Carl Love
|
|
From: Ivo R. <iv...@iv...> - 2017-09-22 15:53:01
|
> From the comments in the code, it doesn't look like increasing the 15000 > is a viable option. Actually this limit is somewhat arbitrary. For register live ranges, type Short is used but only because invalid start/end range is indicated with -2. The rest of negative range is actually unused. I think this can be easily changed to UShort instead, and the limit raised to 62000, for example. I can send you a patch if you are interested and willing to try it. > It appears that something is just generating too > much stuff. I am wondering if anyone can give me some idea what is going > on here. It all appears to be architecture independent code. Any > suggestions on how to go about debugging this would be helpful. Thanks. Dump the information what the VEX JIT is doing. 1) Start with --trace-flags=10000000 --trace-notbelow=0 >From the last block dumped, note the SB number. 2) Refine with --trace-flags=11111100 --trace-notbelow=<SB-1> You'll have relatively short dump with very useful information. I. |
|
From: Carl L. <ce...@us...> - 2017-09-22 15:28:45
|
Valgrind developers:
I have two users that have recently run into what appears to be the same
issue. One of the workloads is a video application, the other is a
float128 workload for the new Power 9 instructions. Both workloads fail
with the message:
Pool = TEMP, start 0x5861af28 curr 0x58adfa48 end 0x58adfa67 (size
5000000)
vex: the `impossible' happened:
VEX temporary storage exhausted.
Increase N_{TEMPORARY,PERMANENT}_BYTES and recompile.
I increased the N_TEMPORARY_BYTES and N_PERMANENT_BYTES #defines as
given below.
--- a/VEX/priv/main_util.c
+++ b/VEX/priv/main_util.c
@@ -55,10 +55,10 @@
#if defined(ENABLE_INNER)
/* 5 times more memory to be on the safe side: consider each
allocation is
8 bytes, and we need 16 bytes redzone before and after. */
-#define N_TEMPORARY_BYTES (5*5000000)
+#define N_TEMPORARY_BYTES (5*2000000000)
static Bool mempools_created = False;
#else
-#define N_TEMPORARY_BYTES 5000000
+#define N_TEMPORARY_BYTES 2000000000
#endif
static HChar temporary[N_TEMPORARY_BYTES]
__attribute__((aligned(REQ_ALIGN)));
@@ -70,9 +70,9 @@ static ULong temporary_bytes_allocd_TOT = 0;
#if defined(ENABLE_INNER)
/* See N_TEMPORARY_BYTES */
-#define N_PERMANENT_BYTES (5*10000)
+#define N_PERMANENT_BYTES (5*100000)
#else
-#define N_PERMANENT_BYTES 10000
+#define N_PERMANENT_BYTES 100000
Once these were increased both workloads then hit the error:
x264 [info]: profile High, level 3.1
vex: priv/host_generic_reg_alloc3.c:470 (doRegisterAllocation_v3):
Assertion `instrs_in->arr_used <= 15000' failed.
vex storage: T total 373013384 bytes allocated
vex storage: P total 192 bytes allocated
>From the comments in the code, it doesn't look like increasing the 1500
is a viable option. It appears that something is just generating too
much stuff. I am wondering if anyone can give me some idea what is going
on here. It all appears to be architecture independent code. Any
suggestions on how to go about debugging this would be helpful. Thanks.
Carl Love
|
|
From: Tom H. <to...@co...> - 2017-09-22 10:09:47
|
On 22/09/17 09:33, Ivo Raisr wrote: > 2017-09-22 7:24 GMT+02:00 John Reiser <jr...@bi...>: >> What is the purpose of the bits DF_1_INTERPOSE and DF_1_INITFIRST that are >> set in >> the DT_FLAGS_1 word of the Dynamic section of files >> $PREFIX/lib/valgrind/vgpreload_$TOOL-$ARCH-linux.so >> for which the ultimate source is: >> $ grep -e -z, Makefile* >> Makefile.all.am:PRELOAD_LDFLAGS_COMMON_LINUX = -nodefaultlibs -shared >> -Wl,-z,interpose,-z,initfirst >> Makefile.all.am:PRELOAD_LDFLAGS_COMMON_SOLARIS = -nodefaultlibs -shared >> -Wl,-z,interpose,-z,initfirst >> Makefile.tool.am:TOOL_LDFLAGS_ARM_LINUX += -Wl,-z,noexecstack >> >> Which particular symbols are those flags designed to affect? > > I can only add that Solaris followed Linux lead here. > I tried to remove the -z flags on Solaris and did not observe problems... They appear to go all the way back to commit 918c3a7b7e0 by Jeremy in 2003 so I suspect the reason for them is long since forgotten... Tom -- Tom Hughes (to...@co...) http://compton.nu/ |
|
From: Ivo R. <iv...@iv...> - 2017-09-22 08:33:23
|
2017-09-22 7:24 GMT+02:00 John Reiser <jr...@bi...>: > What is the purpose of the bits DF_1_INTERPOSE and DF_1_INITFIRST that are > set in > the DT_FLAGS_1 word of the Dynamic section of files > $PREFIX/lib/valgrind/vgpreload_$TOOL-$ARCH-linux.so > for which the ultimate source is: > $ grep -e -z, Makefile* > Makefile.all.am:PRELOAD_LDFLAGS_COMMON_LINUX = -nodefaultlibs -shared > -Wl,-z,interpose,-z,initfirst > Makefile.all.am:PRELOAD_LDFLAGS_COMMON_SOLARIS = -nodefaultlibs -shared > -Wl,-z,interpose,-z,initfirst > Makefile.tool.am:TOOL_LDFLAGS_ARM_LINUX += -Wl,-z,noexecstack > > Which particular symbols are those flags designed to affect? I can only add that Solaris followed Linux lead here. I tried to remove the -z flags on Solaris and did not observe problems... I. |
|
From: John R. <jr...@bi...> - 2017-09-22 05:24:15
|
What is the purpose of the bits DF_1_INTERPOSE and DF_1_INITFIRST that are set in
the DT_FLAGS_1 word of the Dynamic section of files
$PREFIX/lib/valgrind/vgpreload_$TOOL-$ARCH-linux.so
for which the ultimate source is:
$ grep -e -z, Makefile*
Makefile.all.am:PRELOAD_LDFLAGS_COMMON_LINUX = -nodefaultlibs -shared -Wl,-z,interpose,-z,initfirst
Makefile.all.am:PRELOAD_LDFLAGS_COMMON_SOLARIS = -nodefaultlibs -shared -Wl,-z,interpose,-z,initfirst
Makefile.tool.am:TOOL_LDFLAGS_ARM_LINUX += -Wl,-z,noexecstack
Which particular symbols are those flags designed to affect?
It seems to me that there are no symbols whose resolution changes
because the flags are present, and that it would be a bug if there were
any such symbols at all.
The symbols of valgrind and its tools never should mix with
the symbols of the target program that a valgrind tool is analyzing.
Therefore those -z flags should be omitted.
This would help usage on other platforms of the same $ARCH
(such as building valgrind for Linux and using it also for Android programs),
where other ldso do not understand INTERPOSE.
--
John
|
|
From: Philippe W. <phi...@so...> - 2017-09-19 21:24:38
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=f1ff8597ef9c37ff1a853411b9e3be1696c36d92 commit f1ff8597ef9c37ff1a853411b9e3be1696c36d92 Author: Philippe Waroquiers <phi...@sk...> Date: Tue Sep 19 23:17:48 2017 +0200 Implement static TLS code for more platforms gdbserver_tests/hgtls is failing on a number of platforms as it looks like static tls handling is now needed. So, omplement static tls for a few more platforms. The formulas that are platform dependent are somewhat wild guesses obtained with trial and errors. Note that arm/arm64/ppc32 are not (yet) done Diff: --- coregrind/m_gdbserver/target.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/coregrind/m_gdbserver/target.c b/coregrind/m_gdbserver/target.c index 10e52fc..1f03c12 100644 --- a/coregrind/m_gdbserver/target.c +++ b/coregrind/m_gdbserver/target.c @@ -712,6 +712,7 @@ Bool valgrind_get_tls_addr (ThreadState *tst, // Check we can read the modid CHECK_DEREF(lm+lm_modid_offset, sizeof(unsigned long int), "link_map modid"); modid = *(unsigned long int *)(lm+lm_modid_offset); + dlog (2, "tid %u modid %lu\n", tst->tid, modid); // Check we can access the dtv entry for modid CHECK_DEREF(dtv + 2 * modid, sizeof(CORE_ADDR), "dtv[2*modid]"); @@ -719,7 +720,6 @@ Bool valgrind_get_tls_addr (ThreadState *tst, // Compute the base address of the tls block. *tls_addr = *(dtv + 2 * modid); -#if defined(VGA_mips32) || defined(VGA_mips64) if (*tls_addr & 1) { /* This means that computed address is not valid, most probably because given module uses Static TLS. @@ -731,17 +731,24 @@ Bool valgrind_get_tls_addr (ThreadState *tst, CORE_ADDR tls_offset_addr; PtrdiffT tls_offset; - dlog(1, "computing tls_addr using static TLS\n"); + dlog(2, "tls_addr (%p & 1) => computing tls_addr using static TLS\n", + (void*) *tls_addr); /* Assumes that tls_offset is placed right before tls_modid. To check the assumption, start a gdb on none/tests/tls and do: - p &((struct link_map*)0x0)->l_tls_modid - p &((struct link_map*)0x0)->l_tls_offset */ + p &((struct link_map*)0x0)->l_tls_modid + p &((struct link_map*)0x0)->l_tls_offset + Instead of assuming this, we could calculate this similarly to + lm_modid_offset, by extending getplatformoffset to support querying + more than one offset. + */ tls_offset_addr = lm + lm_modid_offset - sizeof(PtrdiffT); // Check we can read the tls_offset. CHECK_DEREF(tls_offset_addr, sizeof(PtrdiffT), "link_map tls_offset"); tls_offset = *(PtrdiffT *)(tls_offset_addr); + dlog(2, "tls_offset_addr %p tls_offset %ld\n", + (void*)tls_offset_addr, (long)tls_offset); /* Following two values represent platform dependent constants NO_TLS_OFFSET and FORCED_DYNAMIC_TLS_OFFSET, respectively. */ @@ -751,9 +758,18 @@ Bool valgrind_get_tls_addr (ThreadState *tst, } // This calculation is also platform dependent. +#if defined(VGA_mips32) || defined(VGA_mips64) *tls_addr = ((CORE_ADDR)dtv_loc + 2 * sizeof(CORE_ADDR) + tls_offset); - } +#elif defined(VGA_ppc64be) || defined(VGA_ppc64le) + *tls_addr = ((CORE_ADDR)dtv_loc + sizeof(CORE_ADDR) + tls_offset); +#elif defined(VGA_x86) || defined(VGA_amd64) || defined(VGA_s390x) + *tls_addr = (CORE_ADDR)dtv_loc - tls_offset - sizeof(CORE_ADDR); +#else + // ppc32, arm, arm64 + dlog(0, "target.c is missing platform code for static TLS\n"); + return False; #endif + } // Finally, add tls variable offset to tls block base address. *tls_addr += offset; |
|
From: Philippe W. <phi...@so...> - 2017-09-19 21:15:33
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=92ec6d08bbe3e06e76b13373ff31ff81d94550b7 commit 92ec6d08bbe3e06e76b13373ff31ff81d94550b7 Author: Philippe Waroquiers <phi...@sk...> Date: Tue Sep 19 23:12:35 2017 +0200 Fix assert on ppc32 due to typo for GPR28 The below commit introduced a regression on ppc32 ommit 00d4667295a821fef9eb198abcb0c942dffb6045 Author: Ivo Raisr <iv...@iv...> Date: Wed Sep 6 08:10:36 2017 +0200 Reorder allocatable registers for AMD64, X86, and PPC so that the callee saved are listed first. Helper calls always trash all caller saved registers. By listing the callee saved first then VEX register allocator (both v2 and v3) is more likely to pick them and does not need to spill that much before helper calls. Investigation/fix done by Ivo. Diff: --- VEX/priv/host_ppc_defs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/VEX/priv/host_ppc_defs.h b/VEX/priv/host_ppc_defs.h index 8ee789a..27b3b38 100644 --- a/VEX/priv/host_ppc_defs.h +++ b/VEX/priv/host_ppc_defs.h @@ -71,7 +71,7 @@ ST_IN HReg hregPPC_GPR24 ( Bool mode64 ) { return GPR(mode64, 24, 10, 10); } ST_IN HReg hregPPC_GPR25 ( Bool mode64 ) { return GPR(mode64, 25, 11, 11); } ST_IN HReg hregPPC_GPR26 ( Bool mode64 ) { return GPR(mode64, 26, 12, 12); } ST_IN HReg hregPPC_GPR27 ( Bool mode64 ) { return GPR(mode64, 27, 13, 13); } -ST_IN HReg hregPPC_GPR28 ( Bool mode64 ) { return GPR(mode64, 28, 14, 44); } +ST_IN HReg hregPPC_GPR28 ( Bool mode64 ) { return GPR(mode64, 28, 14, 14); } ST_IN HReg hregPPC_GPR3 ( Bool mode64 ) { return GPR(mode64, 3, 15, 15); } ST_IN HReg hregPPC_GPR4 ( Bool mode64 ) { return GPR(mode64, 4, 16, 16); } |
|
From: Philippe W. <phi...@sk...> - 2017-09-19 20:52:04
|
It looks like all ppc32 is broken on gcc110.
I do not know exactly when it was broken, but it was sometime between
valgrind-3.14.0.SVN-16452M-vex-3398 and the current git version
valgrind-3.14.0.GIT-b9df4c8dec-20170916
any ppc32 under valgrind directly asserts with the below assert.
It might be linked with the work on the v2 and/or v3 regalloc
Philippe
./vg-in-place -q ./none/tests/ppc32/test_fx
vex: priv/host_generic_regs.c:114 (RRegUniverse__check_is_sane): Assertion `hregIndex(reg) == i' failed.
vex storage: T total 0 bytes allocated
vex storage: P total 0 bytes allocated
valgrind: the 'impossible' happened:
LibVEX called failure_exit().
host stacktrace:
==44115== at 0x58076B74: show_sched_status_wrk (m_libcassert.c:355)
==44115== by 0x58076CC7: report_and_quit (m_libcassert.c:426)
==44115== by 0x58076F3F: panic (m_libcassert.c:502)
==44115== by 0x58076F3F: vgPlain_core_panic_at (m_libcassert.c:507)
==44115== by 0x58076F73: vgPlain_core_panic (m_libcassert.c:512)
==44115== by 0x58098BDB: failure_exit (m_translate.c:740)
==44115== by 0x5816EBEF: vex_assert_fail (main_util.c:247)
==44115== by 0x581D521B: RRegUniverse__check_is_sane (host_generic_regs.c:114)
==44115== by 0x581B53DF: getRRegUniverse_PPC (host_ppc_defs.c:152)
==44115== by 0x5816DA3B: libvex_BackEnd (main_main.c:895)
==44115== by 0x5816DA3B: LibVEX_Translate (main_main.c:1198)
==44115== by 0x5809B2C7: vgPlain_translate (m_translate.c:1794)
==44115== by 0x580DD7E7: handle_tt_miss (scheduler.c:1056)
==44115== by 0x580DD7E7: vgPlain_scheduler (scheduler.c:1417)
==44115== by 0x580F2A7F: thread_wrapper (syswrap-linux.c:103)
==44115== by 0x580F2A7F: run_a_thread_NORETURN (syswrap-linux.c:156)
sched status:
running_tid=1
Thread 1: status = VgTs_Runnable (lwpid 44115)
==44115== at 0x401B204: _start (in /usr/lib/ld-2.17.so)
|
|
From: Mark W. <ma...@kl...> - 2017-09-19 09:19:31
|
Hi, On Tue, 2017-09-19 at 06:43 +0200, Ivo Raisr wrote: > Dear Valgrind and gdb hackers, > > Please have a look at FOSDEM 2018 devroom proposal. > Let me know your comments, suggestions, etc. > The deadline is tomorrow (20th September). > > ------------------------------------------------------- > Title: Valgrind, gdb, debugging tools > Coordinator: Ivo Raisr > Coordinator email: iv...@iv... > Secondary contact: Mark Wielaard (if Mark has no strict objections) > Secondary email: ma...@kl... I certainly have no objections :) But it would be good to also have one of the gdb hackers as contact. If only to better judge the talk proposals. Any volunteers? > Description: > The valgrind and gdb hackers would like to meet during FOSDEM 2018. > > Several core developers said they would like to attend a hacker > meeting to meet each other in person and to coordinate various > topics. And we would like to invite other hackers of toolchain > projects to discuss cross project ideas. I would also mention we had a successful combined devroom in 2014: https://archive.fosdem.org/2014/schedule/track/valgrind/ > Subjects for core hackers, new developers, users, packagers and cross > project functionality, that we would like to discuss and give > presentations on include: > > - The recently added functional changes (for valgrind and gdb users). > - Get feedback on what what kinds of new functionality would > be useful. Which tools and functionality users would like to see. > (valgrind and gdb users). > - How to add simple Valgrind features (adding syscalls for a platform > or VEX > instructions for a architecture port). (new Valgrind core > developers). > - Infrastructure changes to the JIT framework. (core hackers). > - Discuss release/bugfixing strategy/policy (core hackers, > packagers). > - Packaging Valgrind and gdb for distros, handling patches, > suppressions, etc. > (packagers). If we are talking combining the power of debugging tools I think there should be some suggestions for talks about: - Advances in gdbserver and the GDB remote serial protocol. Connecting debugging tools together. - Latest DWARF extensions, going from binary back to source. - Multi, multi, multi... threads, processes and targets. Debugging anything, everywhere. Dealing with complex systems. - Dealing with the dynamic loader and the kernel. Intercepting and interposing functions and events. > Coordinator's affinity to the topic of the devroom: > core hacker, Solaris port maintainer > > Why does it fit FOSDEM > Valgrind is an instrumentation framework for building dynamic > analysis > tools. There are Valgrind tools that can automatically detect many > memory management and threading bugs, and profile your programs in > detail. You can also use Valgrind to build new tools. > > GDB, the GNU Project debugger, allows you to see what is going on > `inside' another program while it executes -- or what another program > was doing at the moment it crashed. > > Valgrind and GDB is Open Source / Free Software, and is freely > available under the GNU General Public License, version 2. are ... version 2 and 3 (or later). > Relevant URLs: > Valgrind website: http://www.valgrind.org > GDB website: https://www.gnu.org/software/gdb/ > > Timeslot: > Half day If we make it a shared debugging tools devroom and both gdb and valgrind hackers submit talks then I would request a whole day. Cheers, Mark |
|
From: Ivo R. <iv...@iv...> - 2017-09-19 04:43:14
|
Dear Valgrind and gdb hackers, Please have a look at FOSDEM 2018 devroom proposal. Let me know your comments, suggestions, etc. The deadline is tomorrow (20th September). I. ------------------------------------------------------- Title: Valgrind, gdb, debugging tools Coordinator: Ivo Raisr Coordinator email: iv...@iv... Secondary contact: Mark Wielaard (if Mark has no strict objections) Secondary email: ma...@kl... Description: The valgrind and gdb hackers would like to meet during FOSDEM 2018. Several core developers said they would like to attend a hacker meeting to meet each other in person and to coordinate various topics. And we would like to invite other hackers of toolchain projects to discuss cross project ideas. Subjects for core hackers, new developers, users, packagers and cross project functionality, that we would like to discuss and give presentations on include: - The recently added functional changes (for valgrind and gdb users). - Get feedback on what what kinds of new functionality would be useful. Which tools and functionality users would like to see. (valgrind and gdb users). - How to add simple Valgrind features (adding syscalls for a platform or VEX instructions for a architecture port). (new Valgrind core developers). - Infrastructure changes to the JIT framework. (core hackers). - Discuss release/bugfixing strategy/policy (core hackers, packagers). - Packaging Valgrind and gdb for distros, handling patches, suppressions, etc. (packagers). Coordinator's affinity to the topic of the devroom: core hacker, Solaris port maintainer Why does it fit FOSDEM Valgrind is an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. You can also use Valgrind to build new tools. GDB, the GNU Project debugger, allows you to see what is going on `inside' another program while it executes -- or what another program was doing at the moment it crashed. Valgrind and GDB is Open Source / Free Software, and is freely available under the GNU General Public License, version 2. Relevant URLs: Valgrind website: http://www.valgrind.org GDB website: https://www.gnu.org/software/gdb/ Timeslot: Half day Special requirements: |
|
From: Andrew Y. <you...@gm...> - 2017-09-18 16:26:26
|
Hi Valgrind devs, Recently, Eclipse OMR[0] had a GSOC student complete a project to add Valgrind Memcheck support to the Garbage Collector component, using the Memory Pool API[1]. This is the garbage collector used in IBM's J9 JVM. We did have one problem: we had to add a new memory pool client request to Valgrind to get this working cleanly. What I want to know is if you would be interested in this API, so I can decide if it is worth completing the work (tests, docs, etc.), or if we should stick with the complicated work around. The actual API is simple: VALGRIND_MEMPOOL_CLEAR(pool, addr, size) Remove all chunks associated with `pool` from `address..address+size` as though they had `VALGRIND_MEMPOOL_FREE` called on them. I am open to suggestions on a better name (maybe VALGRIND_MEMPOOL_SWEEP?). This API is actually the opposite of VALGRIND_MEMPOOL_TRIM(pool, addr, size), which trims (i.e. VALGRIND_MEMPOOL_FREEs) all objects *outside* of a range. This makes the implementation very simple, as we can just copy what VALGRIND_MEMPOOL_TRIM. This API is used during the sweep phase of a GC. When sweeping the heap, free regions of memory have VALGRIND_MEMPOOL_CLEAR on them. The reason why existing API did not suffice is because they require us to call VALGRIND_MEMPOOL_FREE on every object which is free'd. The GC only keeps track of which objects are currently alive, it cannot tell if an object was actually free'd during a GC in any of the regions which were swept. We could theoretically work around this limitation by walking a swept region of memory, and calling VALGRIND_MEMPOOL_FREE on each address. Let me know what you think. I personally think this would be a great addition to the memory pool API, and it should make it easier for other mark-and-sweep garbage collectors to start using Valgrind. Best regards, Andrew Young [0] https://github.com/eclipse/omr [1] https://github.com/eclipse/omr/pull/1311#issuecomment-330190951 |
|
From: Ivo R. <ir...@so...> - 2017-09-16 20:23:45
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=e2b59435e443ef4a786c31d320a3c1c0e1ff40d0 commit e2b59435e443ef4a786c31d320a3c1c0e1ff40d0 Author: Ivo Raisr <iv...@iv...> Date: Sat Sep 16 22:22:53 2017 +0200 Cherry pick b9df4c8dec4d3154257818eb81111df43f2a7bf2 from master. Fix a typo bug in VEX register allocator v3. Also scanning a few more instructions ahead helps producing better code. Diff: --- VEX/priv/host_generic_reg_alloc3.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/VEX/priv/host_generic_reg_alloc3.c b/VEX/priv/host_generic_reg_alloc3.c index ef8f583..7e1e609 100644 --- a/VEX/priv/host_generic_reg_alloc3.c +++ b/VEX/priv/host_generic_reg_alloc3.c @@ -509,7 +509,7 @@ static inline HReg find_vreg_to_spill( - reg_usage[scan_forward_end], where scan_forward_end = MIN(scan_forward_max, scan_forward_start + FEW_INSTRUCTIONS). reg_usage uses chunk instruction numbering. */ -# define FEW_INSTRUCTIONS 5 +# define FEW_INSTRUCTIONS 20 Short scan_forward_end = (scan_forward_max <= scan_forward_start + FEW_INSTRUCTIONS) ? scan_forward_max : scan_forward_start + FEW_INSTRUCTIONS; @@ -532,10 +532,10 @@ static inline HReg find_vreg_to_spill( } } - if (ii_chunk - scan_forward_start > distance_so_far) { - distance_so_far = ii_chunk - scan_forward_start; + if (ii_chunk >= distance_so_far) { + distance_so_far = ii_chunk; vreg_found = vreg; - if (ii_chunk + distance_so_far == scan_forward_end) { + if (distance_so_far == scan_forward_end) { break; /* We are at the end. Nothing could be better. */ } } |
|
From: Ivo R. <ir...@so...> - 2017-09-16 16:50:11
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=b9df4c8dec4d3154257818eb81111df43f2a7bf2 commit b9df4c8dec4d3154257818eb81111df43f2a7bf2 Author: Ivo Raisr <iv...@iv...> Date: Sat Sep 16 18:48:36 2017 +0200 Fix a typo bug in VEX register allocator v3. Also scanning a few more instructions ahead helps producing better code. Diff: --- VEX/priv/host_generic_reg_alloc3.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/VEX/priv/host_generic_reg_alloc3.c b/VEX/priv/host_generic_reg_alloc3.c index 8ee4e48..5b2a9f2 100644 --- a/VEX/priv/host_generic_reg_alloc3.c +++ b/VEX/priv/host_generic_reg_alloc3.c @@ -323,7 +323,7 @@ static inline HReg find_vreg_to_spill( - reg_usage[scan_forward_from] - reg_usage[scan_forward_end], where scan_forward_end = MIN(scan_forward_max, scan_forward_from + FEW_INSTRUCTIONS). */ -# define FEW_INSTRUCTIONS 5 +# define FEW_INSTRUCTIONS 20 UInt scan_forward_end = (scan_forward_max <= scan_forward_from + FEW_INSTRUCTIONS) ? scan_forward_max : scan_forward_from + FEW_INSTRUCTIONS; @@ -344,10 +344,10 @@ static inline HReg find_vreg_to_spill( } } - if (ii - scan_forward_from > distance_so_far) { - distance_so_far = ii = scan_forward_from; + if (ii >= distance_so_far) { + distance_so_far = ii; vreg_found = vreg; - if (ii + distance_so_far == scan_forward_end) { + if (distance_so_far == scan_forward_end) { break; /* We are at the end. Nothing could be better. */ } } |
|
From: Petar J. <pe...@so...> - 2017-09-15 16:34:30
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=e4f2fdfa4b242b5af17321d512939a267152cd56 commit e4f2fdfa4b242b5af17321d512939a267152cd56 Author: Petar Jovanovic <mip...@gm...> Date: Fri Sep 15 18:29:29 2017 +0200 mips: finetune none/tests/(mips32|64)/test_math test Compiler may optimize out call to cbrt. Change test to prevent that. Otherwise, the test does not exercise a desired codepath for cbrt, and it prints precalculated value. Diff: --- none/tests/mips32/test_math.cpp | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/none/tests/mips32/test_math.cpp b/none/tests/mips32/test_math.cpp index 8a0f2dc..3d724d3 100644 --- a/none/tests/mips32/test_math.cpp +++ b/none/tests/mips32/test_math.cpp @@ -10,6 +10,8 @@ static void DivideByZero() { volatile float result __attribute__((unused)) = 123.0f / zero; } +volatile double cube = 27.0; + int main () { /* Testing lrint. */ fesetround(FE_UPWARD); // lrint/lrintf/lrintl obey the rounding mode. @@ -100,7 +102,7 @@ int main () { printf("tgamma(5.0): %lf\n", tgamma(5.0)); /* Test cbrt. */ - printf("cbrt(27.0): %lf\n", cbrt(27.0)); + printf("cbrt(27.0): %lf\n", cbrt(cube)); /* Test dividing by zero. */ // Clearing clears. |
|
From: Petar J. <pe...@so...> - 2017-09-15 16:34:25
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=211b0c303afc0dd0222ff8c6f73f1ab347120af2 commit 211b0c303afc0dd0222ff8c6f73f1ab347120af2 Author: Petar Jovanovic <mip...@gm...> Date: Fri Sep 15 16:04:18 2017 +0200 mips: add clearing $ra to CLEAR_CALLER_SAVED_REGS macro Return address register belongs to caller saved registers, and compiler can use it to store temporary values. Clear it. Diff: --- memcheck/tests/leak.h | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/memcheck/tests/leak.h b/memcheck/tests/leak.h index 7809ace..a1f93e8 100644 --- a/memcheck/tests/leak.h +++ b/memcheck/tests/leak.h @@ -82,10 +82,11 @@ "move $15, $0 \n\t" /* t7 = 0 */ \ "move $24, $0 \n\t" /* t8 = 0 */ \ "move $25, $0 \n\t" /* t9 = 0 */ \ + "move $31, $0 \n\t" /* ra = 0 */ \ ".set pop \n\t" \ : : : "$1", "$2", "$3", "$4", "$5", "$6", "$7", \ "$8", "$9", "$10", "$11", "$12", "$13", \ - "$14", "$15", "$24", "$25"); \ + "$14", "$15", "$24", "$25", "$31"); \ } while (0) #elif (__mips == 64) #define CLEAR_CALLER_SAVED_REGS \ @@ -109,10 +110,11 @@ "move $15, $0 \n\t" /* t3 = 0 */ \ "move $24, $0 \n\t" /* t8 = 0 */ \ "move $25, $0 \n\t" /* t9 = 0 */ \ + "move $31, $0 \n\t" /* ra = 0 */ \ ".set pop \n\t" \ : : : "$1", "$2", "$3", "$4", "$5", "$6", "$7", \ "$8", "$9", "$10", "$11", "$12", "$13", \ - "$14", "$15", "$24", "$25"); \ + "$14", "$15", "$24", "$25", "$31"); \ } while (0) #else #define CLEAR_CALLER_SAVED_REGS /*nothing*/ |
|
From: Ivo R. <ir...@so...> - 2017-09-14 12:44:35
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=00d4667295a821fef9eb198abcb0c942dffb6045 commit 00d4667295a821fef9eb198abcb0c942dffb6045 Author: Ivo Raisr <iv...@iv...> Date: Wed Sep 6 08:10:36 2017 +0200 Reorder allocatable registers for AMD64, X86, and PPC so that the callee saved are listed first. Helper calls always trash all caller saved registers. By listing the callee saved first then VEX register allocator (both v2 and v3) is more likely to pick them and does not need to spill that much before helper calls. Diff: --- NEWS | 1 + VEX/priv/host_amd64_defs.c | 16 +++++++------- VEX/priv/host_amd64_defs.h | 18 ++++++++-------- VEX/priv/host_ppc_defs.c | 34 ++++++++++++++++-------------- VEX/priv/host_ppc_defs.h | 52 +++++++++++++++++++++++----------------------- VEX/priv/host_x86_defs.c | 6 +++--- VEX/priv/host_x86_defs.h | 12 +++++------ 7 files changed, 70 insertions(+), 69 deletions(-) diff --git a/NEWS b/NEWS index e910b7f..eccbd19 100644 --- a/NEWS +++ b/NEWS @@ -54,6 +54,7 @@ where XXXXXX is the bug number as listed below. 383275 massif valgrind: m_xarray.c:162 (ensureSpaceXA): Assertion '!xa->arr' failed 384096 Mention AddrCheck at Memcheck's command line option --undef-value-errors=no 384526 reduce number of spill instructions generated by VEX register allocator v3 +384584 Callee saved registers listed first for AMD64, X86, and PPC architectures Release 3.13.0 (15 June 2017) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/VEX/priv/host_amd64_defs.c b/VEX/priv/host_amd64_defs.c index ebe2b00..d9949d4 100644 --- a/VEX/priv/host_amd64_defs.c +++ b/VEX/priv/host_amd64_defs.c @@ -64,15 +64,15 @@ const RRegUniverse* getRRegUniverse_AMD64 ( void ) those available for allocation by reg-alloc, and those that follow are not available for allocation. */ ru->allocable_start[HRcInt64] = ru->size; - ru->regs[ru->size++] = hregAMD64_RSI(); - ru->regs[ru->size++] = hregAMD64_RDI(); - ru->regs[ru->size++] = hregAMD64_R8(); - ru->regs[ru->size++] = hregAMD64_R9(); ru->regs[ru->size++] = hregAMD64_R12(); ru->regs[ru->size++] = hregAMD64_R13(); ru->regs[ru->size++] = hregAMD64_R14(); ru->regs[ru->size++] = hregAMD64_R15(); ru->regs[ru->size++] = hregAMD64_RBX(); + ru->regs[ru->size++] = hregAMD64_RSI(); + ru->regs[ru->size++] = hregAMD64_RDI(); + ru->regs[ru->size++] = hregAMD64_R8(); + ru->regs[ru->size++] = hregAMD64_R9(); ru->regs[ru->size++] = hregAMD64_R10(); ru->allocable_end[HRcInt64] = ru->size - 1; @@ -1460,18 +1460,16 @@ void getRegUsage_AMD64Instr ( HRegUsage* u, const AMD64Instr* i, Bool mode64 ) /* This is a bit subtle. */ /* First off, claim it trashes all the caller-saved regs which fall within the register allocator's jurisdiction. - These I believe to be: rax rcx rdx rsi rdi r8 r9 r10 r11 - and all the xmm registers. - */ + These I believe to be: rax rcx rdx rdi rsi r8 r9 r10 + and all the xmm registers. */ addHRegUse(u, HRmWrite, hregAMD64_RAX()); addHRegUse(u, HRmWrite, hregAMD64_RCX()); addHRegUse(u, HRmWrite, hregAMD64_RDX()); - addHRegUse(u, HRmWrite, hregAMD64_RSI()); addHRegUse(u, HRmWrite, hregAMD64_RDI()); + addHRegUse(u, HRmWrite, hregAMD64_RSI()); addHRegUse(u, HRmWrite, hregAMD64_R8()); addHRegUse(u, HRmWrite, hregAMD64_R9()); addHRegUse(u, HRmWrite, hregAMD64_R10()); - addHRegUse(u, HRmWrite, hregAMD64_R11()); addHRegUse(u, HRmWrite, hregAMD64_XMM0()); addHRegUse(u, HRmWrite, hregAMD64_XMM1()); addHRegUse(u, HRmWrite, hregAMD64_XMM3()); diff --git a/VEX/priv/host_amd64_defs.h b/VEX/priv/host_amd64_defs.h index 8a3eea8..92730fa 100644 --- a/VEX/priv/host_amd64_defs.h +++ b/VEX/priv/host_amd64_defs.h @@ -47,15 +47,15 @@ */ #define ST_IN static inline -ST_IN HReg hregAMD64_RSI ( void ) { return mkHReg(False, HRcInt64, 6, 0); } -ST_IN HReg hregAMD64_RDI ( void ) { return mkHReg(False, HRcInt64, 7, 1); } -ST_IN HReg hregAMD64_R8 ( void ) { return mkHReg(False, HRcInt64, 8, 2); } -ST_IN HReg hregAMD64_R9 ( void ) { return mkHReg(False, HRcInt64, 9, 3); } -ST_IN HReg hregAMD64_R12 ( void ) { return mkHReg(False, HRcInt64, 12, 4); } -ST_IN HReg hregAMD64_R13 ( void ) { return mkHReg(False, HRcInt64, 13, 5); } -ST_IN HReg hregAMD64_R14 ( void ) { return mkHReg(False, HRcInt64, 14, 6); } -ST_IN HReg hregAMD64_R15 ( void ) { return mkHReg(False, HRcInt64, 15, 7); } -ST_IN HReg hregAMD64_RBX ( void ) { return mkHReg(False, HRcInt64, 3, 8); } +ST_IN HReg hregAMD64_R12 ( void ) { return mkHReg(False, HRcInt64, 12, 0); } +ST_IN HReg hregAMD64_R13 ( void ) { return mkHReg(False, HRcInt64, 13, 1); } +ST_IN HReg hregAMD64_R14 ( void ) { return mkHReg(False, HRcInt64, 14, 2); } +ST_IN HReg hregAMD64_R15 ( void ) { return mkHReg(False, HRcInt64, 15, 3); } +ST_IN HReg hregAMD64_RBX ( void ) { return mkHReg(False, HRcInt64, 3, 4); } +ST_IN HReg hregAMD64_RSI ( void ) { return mkHReg(False, HRcInt64, 6, 5); } +ST_IN HReg hregAMD64_RDI ( void ) { return mkHReg(False, HRcInt64, 7, 6); } +ST_IN HReg hregAMD64_R8 ( void ) { return mkHReg(False, HRcInt64, 8, 7); } +ST_IN HReg hregAMD64_R9 ( void ) { return mkHReg(False, HRcInt64, 9, 8); } ST_IN HReg hregAMD64_R10 ( void ) { return mkHReg(False, HRcInt64, 10, 9); } ST_IN HReg hregAMD64_XMM3 ( void ) { return mkHReg(False, HRcVec128, 3, 10); } diff --git a/VEX/priv/host_ppc_defs.c b/VEX/priv/host_ppc_defs.c index 33ee292..1ef9c5c 100644 --- a/VEX/priv/host_ppc_defs.c +++ b/VEX/priv/host_ppc_defs.c @@ -69,6 +69,24 @@ const RRegUniverse* getRRegUniverse_PPC ( Bool mode64 ) // GPR1 = stack pointer // GPR2 = TOC pointer ru->allocable_start[(mode64) ? HRcInt64 : HRcInt32] = ru->size; + // GPR14 and above are callee save. List them first. + ru->regs[ru->size++] = hregPPC_GPR14(mode64); + ru->regs[ru->size++] = hregPPC_GPR15(mode64); + ru->regs[ru->size++] = hregPPC_GPR16(mode64); + ru->regs[ru->size++] = hregPPC_GPR17(mode64); + ru->regs[ru->size++] = hregPPC_GPR18(mode64); + ru->regs[ru->size++] = hregPPC_GPR19(mode64); + ru->regs[ru->size++] = hregPPC_GPR20(mode64); + ru->regs[ru->size++] = hregPPC_GPR21(mode64); + ru->regs[ru->size++] = hregPPC_GPR22(mode64); + ru->regs[ru->size++] = hregPPC_GPR23(mode64); + ru->regs[ru->size++] = hregPPC_GPR24(mode64); + ru->regs[ru->size++] = hregPPC_GPR25(mode64); + ru->regs[ru->size++] = hregPPC_GPR26(mode64); + ru->regs[ru->size++] = hregPPC_GPR27(mode64); + ru->regs[ru->size++] = hregPPC_GPR28(mode64); + + // Caller save registers now. ru->regs[ru->size++] = hregPPC_GPR3(mode64); ru->regs[ru->size++] = hregPPC_GPR4(mode64); ru->regs[ru->size++] = hregPPC_GPR5(mode64); @@ -85,22 +103,6 @@ const RRegUniverse* getRRegUniverse_PPC ( Bool mode64 ) ru->regs[ru->size++] = hregPPC_GPR12(mode64); } // GPR13 = thread specific pointer - // GPR14 and above are callee save. Yay. - ru->regs[ru->size++] = hregPPC_GPR14(mode64); - ru->regs[ru->size++] = hregPPC_GPR15(mode64); - ru->regs[ru->size++] = hregPPC_GPR16(mode64); - ru->regs[ru->size++] = hregPPC_GPR17(mode64); - ru->regs[ru->size++] = hregPPC_GPR18(mode64); - ru->regs[ru->size++] = hregPPC_GPR19(mode64); - ru->regs[ru->size++] = hregPPC_GPR20(mode64); - ru->regs[ru->size++] = hregPPC_GPR21(mode64); - ru->regs[ru->size++] = hregPPC_GPR22(mode64); - ru->regs[ru->size++] = hregPPC_GPR23(mode64); - ru->regs[ru->size++] = hregPPC_GPR24(mode64); - ru->regs[ru->size++] = hregPPC_GPR25(mode64); - ru->regs[ru->size++] = hregPPC_GPR26(mode64); - ru->regs[ru->size++] = hregPPC_GPR27(mode64); - ru->regs[ru->size++] = hregPPC_GPR28(mode64); ru->allocable_end[(mode64) ? HRcInt64 : HRcInt32] = ru->size - 1; // GPR29 is reserved for the dispatcher // GPR30 is reserved as AltiVec spill reg temporary diff --git a/VEX/priv/host_ppc_defs.h b/VEX/priv/host_ppc_defs.h index 6b7fcc8..8ee789a 100644 --- a/VEX/priv/host_ppc_defs.h +++ b/VEX/priv/host_ppc_defs.h @@ -57,35 +57,35 @@ mkHReg(False, HRcVec128, \ (_enc), (_mode64) ? (_ix64) : (_ix32)) -ST_IN HReg hregPPC_GPR3 ( Bool mode64 ) { return GPR(mode64, 3, 0, 0); } -ST_IN HReg hregPPC_GPR4 ( Bool mode64 ) { return GPR(mode64, 4, 1, 1); } -ST_IN HReg hregPPC_GPR5 ( Bool mode64 ) { return GPR(mode64, 5, 2, 2); } -ST_IN HReg hregPPC_GPR6 ( Bool mode64 ) { return GPR(mode64, 6, 3, 3); } -ST_IN HReg hregPPC_GPR7 ( Bool mode64 ) { return GPR(mode64, 7, 4, 4); } -ST_IN HReg hregPPC_GPR8 ( Bool mode64 ) { return GPR(mode64, 8, 5, 5); } -ST_IN HReg hregPPC_GPR9 ( Bool mode64 ) { return GPR(mode64, 9, 6, 6); } -ST_IN HReg hregPPC_GPR10 ( Bool mode64 ) { return GPR(mode64, 10, 7, 7); } +ST_IN HReg hregPPC_GPR14 ( Bool mode64 ) { return GPR(mode64, 14, 0, 0); } +ST_IN HReg hregPPC_GPR15 ( Bool mode64 ) { return GPR(mode64, 15, 1, 1); } +ST_IN HReg hregPPC_GPR16 ( Bool mode64 ) { return GPR(mode64, 16, 2, 2); } +ST_IN HReg hregPPC_GPR17 ( Bool mode64 ) { return GPR(mode64, 17, 3, 3); } +ST_IN HReg hregPPC_GPR18 ( Bool mode64 ) { return GPR(mode64, 18, 4, 4); } +ST_IN HReg hregPPC_GPR19 ( Bool mode64 ) { return GPR(mode64, 19, 5, 5); } +ST_IN HReg hregPPC_GPR20 ( Bool mode64 ) { return GPR(mode64, 20, 6, 6); } +ST_IN HReg hregPPC_GPR21 ( Bool mode64 ) { return GPR(mode64, 21, 7, 7); } +ST_IN HReg hregPPC_GPR22 ( Bool mode64 ) { return GPR(mode64, 22, 8, 8); } +ST_IN HReg hregPPC_GPR23 ( Bool mode64 ) { return GPR(mode64, 23, 9, 9); } +ST_IN HReg hregPPC_GPR24 ( Bool mode64 ) { return GPR(mode64, 24, 10, 10); } +ST_IN HReg hregPPC_GPR25 ( Bool mode64 ) { return GPR(mode64, 25, 11, 11); } +ST_IN HReg hregPPC_GPR26 ( Bool mode64 ) { return GPR(mode64, 26, 12, 12); } +ST_IN HReg hregPPC_GPR27 ( Bool mode64 ) { return GPR(mode64, 27, 13, 13); } +ST_IN HReg hregPPC_GPR28 ( Bool mode64 ) { return GPR(mode64, 28, 14, 44); } + +ST_IN HReg hregPPC_GPR3 ( Bool mode64 ) { return GPR(mode64, 3, 15, 15); } +ST_IN HReg hregPPC_GPR4 ( Bool mode64 ) { return GPR(mode64, 4, 16, 16); } +ST_IN HReg hregPPC_GPR5 ( Bool mode64 ) { return GPR(mode64, 5, 17, 17); } +ST_IN HReg hregPPC_GPR6 ( Bool mode64 ) { return GPR(mode64, 6, 18, 18); } +ST_IN HReg hregPPC_GPR7 ( Bool mode64 ) { return GPR(mode64, 7, 19, 19); } +ST_IN HReg hregPPC_GPR8 ( Bool mode64 ) { return GPR(mode64, 8, 20, 20); } +ST_IN HReg hregPPC_GPR9 ( Bool mode64 ) { return GPR(mode64, 9, 21, 21); } +ST_IN HReg hregPPC_GPR10 ( Bool mode64 ) { return GPR(mode64, 10, 22, 22); } // r11 and r12 are only allocatable in 32-bit mode. Hence the 64-bit // index numbering doesn't advance for these two. -ST_IN HReg hregPPC_GPR11 ( Bool mode64 ) { return GPR(mode64, 11, 0, 8); } -ST_IN HReg hregPPC_GPR12 ( Bool mode64 ) { return GPR(mode64, 12, 0, 9); } - -ST_IN HReg hregPPC_GPR14 ( Bool mode64 ) { return GPR(mode64, 14, 8, 10); } -ST_IN HReg hregPPC_GPR15 ( Bool mode64 ) { return GPR(mode64, 15, 9, 11); } -ST_IN HReg hregPPC_GPR16 ( Bool mode64 ) { return GPR(mode64, 16, 10, 12); } -ST_IN HReg hregPPC_GPR17 ( Bool mode64 ) { return GPR(mode64, 17, 11, 13); } -ST_IN HReg hregPPC_GPR18 ( Bool mode64 ) { return GPR(mode64, 18, 12, 14); } -ST_IN HReg hregPPC_GPR19 ( Bool mode64 ) { return GPR(mode64, 19, 13, 15); } -ST_IN HReg hregPPC_GPR20 ( Bool mode64 ) { return GPR(mode64, 20, 14, 16); } -ST_IN HReg hregPPC_GPR21 ( Bool mode64 ) { return GPR(mode64, 21, 15, 17); } -ST_IN HReg hregPPC_GPR22 ( Bool mode64 ) { return GPR(mode64, 22, 16, 18); } -ST_IN HReg hregPPC_GPR23 ( Bool mode64 ) { return GPR(mode64, 23, 17, 19); } -ST_IN HReg hregPPC_GPR24 ( Bool mode64 ) { return GPR(mode64, 24, 18, 20); } -ST_IN HReg hregPPC_GPR25 ( Bool mode64 ) { return GPR(mode64, 25, 19, 21); } -ST_IN HReg hregPPC_GPR26 ( Bool mode64 ) { return GPR(mode64, 26, 20, 22); } -ST_IN HReg hregPPC_GPR27 ( Bool mode64 ) { return GPR(mode64, 27, 21, 23); } -ST_IN HReg hregPPC_GPR28 ( Bool mode64 ) { return GPR(mode64, 28, 22, 24); } +ST_IN HReg hregPPC_GPR11 ( Bool mode64 ) { return GPR(mode64, 11, 22, 23); } +ST_IN HReg hregPPC_GPR12 ( Bool mode64 ) { return GPR(mode64, 12, 22, 24); } ST_IN HReg hregPPC_FPR14 ( Bool mode64 ) { return FPR(mode64, 14, 23, 25); } ST_IN HReg hregPPC_FPR15 ( Bool mode64 ) { return FPR(mode64, 15, 24, 26); } diff --git a/VEX/priv/host_x86_defs.c b/VEX/priv/host_x86_defs.c index 2e5c044..2457cc1 100644 --- a/VEX/priv/host_x86_defs.c +++ b/VEX/priv/host_x86_defs.c @@ -64,12 +64,12 @@ const RRegUniverse* getRRegUniverse_X86 ( void ) those available for allocation by reg-alloc, and those that follow are not available for allocation. */ ru->allocable_start[HRcInt32] = ru->size; - ru->regs[ru->size++] = hregX86_EAX(); ru->regs[ru->size++] = hregX86_EBX(); - ru->regs[ru->size++] = hregX86_ECX(); - ru->regs[ru->size++] = hregX86_EDX(); ru->regs[ru->size++] = hregX86_ESI(); ru->regs[ru->size++] = hregX86_EDI(); + ru->regs[ru->size++] = hregX86_EAX(); + ru->regs[ru->size++] = hregX86_ECX(); + ru->regs[ru->size++] = hregX86_EDX(); ru->allocable_end[HRcInt32] = ru->size - 1; ru->allocable_start[HRcFlt64] = ru->size; diff --git a/VEX/priv/host_x86_defs.h b/VEX/priv/host_x86_defs.h index 614b751..e1a5767 100644 --- a/VEX/priv/host_x86_defs.h +++ b/VEX/priv/host_x86_defs.h @@ -47,12 +47,12 @@ */ #define ST_IN static inline -ST_IN HReg hregX86_EAX ( void ) { return mkHReg(False, HRcInt32, 0, 0); } -ST_IN HReg hregX86_EBX ( void ) { return mkHReg(False, HRcInt32, 3, 1); } -ST_IN HReg hregX86_ECX ( void ) { return mkHReg(False, HRcInt32, 1, 2); } -ST_IN HReg hregX86_EDX ( void ) { return mkHReg(False, HRcInt32, 2, 3); } -ST_IN HReg hregX86_ESI ( void ) { return mkHReg(False, HRcInt32, 6, 4); } -ST_IN HReg hregX86_EDI ( void ) { return mkHReg(False, HRcInt32, 7, 5); } +ST_IN HReg hregX86_EBX ( void ) { return mkHReg(False, HRcInt32, 3, 0); } +ST_IN HReg hregX86_ESI ( void ) { return mkHReg(False, HRcInt32, 6, 1); } +ST_IN HReg hregX86_EDI ( void ) { return mkHReg(False, HRcInt32, 7, 2); } +ST_IN HReg hregX86_EAX ( void ) { return mkHReg(False, HRcInt32, 0, 3); } +ST_IN HReg hregX86_ECX ( void ) { return mkHReg(False, HRcInt32, 1, 4); } +ST_IN HReg hregX86_EDX ( void ) { return mkHReg(False, HRcInt32, 2, 5); } ST_IN HReg hregX86_FAKE0 ( void ) { return mkHReg(False, HRcFlt64, 0, 6); } ST_IN HReg hregX86_FAKE1 ( void ) { return mkHReg(False, HRcFlt64, 1, 7); } |
|
From: <sv...@va...> - 2017-09-14 09:00:40
|
Author: iraisr
Date: Thu Sep 14 10:00:26 2017
New Revision: 527
Log:
Add missing image file kcachegrind_xtree.png to Valgrind user manual.
Added:
trunk/images/kcachegrind_xtree.png (with props)
Added: trunk/images/kcachegrind_xtree.png
==============================================================================
Binary file - no diff available.
|
|
From: Ivo R. <iv...@iv...> - 2017-09-14 07:56:57
|
2017-09-12 21:12 GMT+02:00 Peter Bergner <be...@vn...>: > On 9/12/17 12:51 PM, Ivo Raisr wrote: >> Are there any comments, suggestions, objections to the patch attached to bug: >> https://bugs.kde.org/show_bug.cgi?id=384584 >> Callee saved registers listed first for AMD64, X86, and PPC architectures > > My guess on why the caller saved (aka volatile) regs are listed before > the callee saved (aka non-volatile) registers, is that is the order most > register allocators in compilers (eg, gcc, etc.) try and assign them. > They attempt to use caller saved regs for the majority of pseudos/vregs > that are not live across a function call, since those regs do not need > to be saved/restored in the prologue/epilogue (ie, they're cheap to use) > and it leaves the callee saved regs available for pseudos/vregs that > are live across calls, which means you don't have to spill them around > calls. Thank you for your response. This is something our current VEX register allocators don't do, at present. Would you like to extend v3? > Looking through host_generic_reg_alloc3.c, it doesn't seem like the VEX > register allocator keeps track of vregs that are live across calls... Yes, that's true. It only keeps tracks of start-end range. Then it allocates registers on the first-come first-serve basis within the constrains imposed by instruction's register usage, trying to get the callee save first. So it could happen that a callee saved register is allocated to a short-lived vreg which does not span call boundary, and several instructions later, under register shortage pressure, long-lived vreg is allocated to a caller saved register, leading to a necessary spill before a call. > Is that for simplicity reasons or it just didn't seem like it was needed? Simplicity and performance would be probably the main drivers here. VEX does not have any runtime profile-feedback mechanism which could tell which blocks are hot and which are cold. So all blocks get the same treatment. It would need to be carefully measured first if the added complexity of tracking vreg usage vs call span would benefit overall performance. Would you like to try that? Perhaps the current algorithm can be easily extended? I. |
|
From: Philippe W. <phi...@so...> - 2017-09-13 20:52:01
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=621cde90f7d23e916d3ce2716df02d261a72f5f3 commit 621cde90f7d23e916d3ce2716df02d261a72f5f3 Author: Philippe Waroquiers <phi...@sk...> Date: Wed Sep 13 22:47:11 2017 +0200 Fix Bug 255603 - exp-sgcheck Assertion '!already_present' failed The code handling array bounds is not ready to accept a reference to something else (not very clear what this reference could be) : the code only expects directly the value of a bound. So, it was using the reference (i.e. an offset somewehere in the debug info) as the value of the bound. This then gave huge bounds for some arrays, causing an overlap in the stack variable handling code in exp-sgcheck. Such references seems to be used sometimes for arrays with variable size stack allocated. Fix (or rather bypass) the problem by not considering that we have a usable array bound when a reference is given. Diff: --- NEWS | 1 + coregrind/m_debuginfo/readdwarf3.c | 20 ++++++++++++++++++-- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index e43410a..e910b7f 100644 --- a/NEWS +++ b/NEWS @@ -35,6 +35,7 @@ To see details of a given bug, visit https://bugs.kde.org/show_bug.cgi?id=XXXXXX where XXXXXX is the bug number as listed below. +255603 exp-sgcheck Assertion '!already_present' failed 379373 Fix syscall param msg->desc.port.name points to uninitialised byte(s) on macOS 10.12 379748 Fix missing pselect syscall (OS X 10.11) diff --git a/coregrind/m_debuginfo/readdwarf3.c b/coregrind/m_debuginfo/readdwarf3.c index 4d8f21b..2f538c9 100644 --- a/coregrind/m_debuginfo/readdwarf3.c +++ b/coregrind/m_debuginfo/readdwarf3.c @@ -2993,6 +2993,20 @@ static Bool subrange_type_denotes_array_bounds ( const D3TypeParser* parser, && parser->qparentE[parser->sp].tag == Te_TyArray); } +/* True if the form is one of the forms supported to give an array bound. + For some arrays (scope local arrays with variable size), + a DW_FORM_ref4 was used, and was wrongly used as the bound value. + So, refuse the forms that are known to give a problem. */ +static Bool form_expected_for_bound ( DW_FORM form ) { + if (form == DW_FORM_ref1 + || form == DW_FORM_ref2 + || form == DW_FORM_ref4 + || form == DW_FORM_ref8) + return False; + + return True; +} + /* Parse a type-related DIE. 'parser' holds the current parser state. 'admin' is where the completed types are dumped. 'dtag' is the tag for this DIE. 'c_die' points to the start of the data fields (FORM @@ -3598,11 +3612,13 @@ static void parse_type_DIE ( /*MOD*/XArray* /* of TyEnt */ tyents, nf_i++; if (attr == 0 && form == 0) break; get_Form_contents( &cts, cc, c_die, False/*td3*/, form ); - if (attr == DW_AT_lower_bound && cts.szB > 0) { + if (attr == DW_AT_lower_bound && cts.szB > 0 + && form_expected_for_bound (form)) { lower = (Long)cts.u.val; have_lower = True; } - if (attr == DW_AT_upper_bound && cts.szB > 0) { + if (attr == DW_AT_upper_bound && cts.szB > 0 + && form_expected_for_bound (form)) { upper = (Long)cts.u.val; have_upper = True; } |
|
From: Ivo R. <ir...@so...> - 2017-09-13 16:26:24
|
https://sourceware.org/git/gitweb.cgi?p=valgrind.git;h=9837d39a71a52d869a06996c2b438e006f5cf1b9 commit 9837d39a71a52d869a06996c2b438e006f5cf1b9 Author: Ivo Raisr <iv...@iv...> Date: Wed Sep 13 17:38:13 2017 +0200 Clone and merge the register allocator states for If-Then-Else support. The register allocator state is cloned in stage 4, before fall-through and out-of-line legs are processed. The states are then merged back at the legs join. Diff: --- VEX/priv/host_generic_reg_alloc3.c | 619 +++++++++++++++++++++++-------------- 1 file changed, 383 insertions(+), 236 deletions(-) diff --git a/VEX/priv/host_generic_reg_alloc3.c b/VEX/priv/host_generic_reg_alloc3.c index a13d620..ef8f583 100644 --- a/VEX/priv/host_generic_reg_alloc3.c +++ b/VEX/priv/host_generic_reg_alloc3.c @@ -72,7 +72,7 @@ insn6 1 2 4 */ -/* Register allocator state is kept in an array of VRegState's. +/* The main register allocator state is kept in an array of VRegState's. There is an element for every virtual register (vreg). Elements are indexed [0 .. n_vregs-1]. Records information about vreg live range (in total ordering) and its state. @@ -170,8 +170,18 @@ typedef } RRegLRState; -#define IS_VALID_VREGNO(v) ((v) >= 0 && (v) < n_vregs) -#define IS_VALID_RREGNO(r) ((r) >= 0 && (r) < n_rregs) +/* Register allocator state. A composition of VRegState and RRegState arrays. */ +typedef + struct { + VRegState* vregs; + UInt n_vregs; + RRegState* rregs; + UInt n_rregs; + } + RegAllocState; + +#define IS_VALID_VREGNO(v) ((v) >= 0 && (v) < state->n_vregs) +#define IS_VALID_RREGNO(r) ((r) >= 0 && (r) < state->n_rregs) /* Represents register allocator state corresponding to one contiguous chunk of instructions. The chunk either continues with If-Then-Else legs or @@ -252,43 +262,34 @@ static void print_depth(UInt depth) } } -#define WALK_CHUNKS(process_one_chunk, process_legs_fork, \ - process_fall_through_leg, \ - process_out_of_line_leg, process_phi_nodes) \ - do { \ - while (chunk != NULL) { \ - process_one_chunk; \ - if (chunk->isIfThenElse) { \ - process_legs_fork; \ - if (DEBUG_REGALLOC) { \ - print_depth(depth); \ - vex_printf("if (!"); \ - con->ppCondCode(chunk->IfThenElse.ccOOL); \ - vex_printf(") then fall-through {\n"); \ - } \ - process_fall_through_leg; \ - if (DEBUG_REGALLOC) { \ - print_depth(depth); \ - vex_printf("} else out-of-line {\n"); \ - } \ - process_out_of_line_leg; \ - if (DEBUG_REGALLOC) { \ - print_depth(depth); \ - vex_printf("}\n"); \ - } \ - if (chunk->IfThenElse.n_phis > 0) { \ - process_phi_nodes; \ - if (DEBUG_REGALLOC) { \ - for (UInt p = 0; p < chunk->IfThenElse.n_phis; p++) { \ - print_depth(depth); \ - ppHPhiNode(&chunk->IfThenElse.phi_nodes[p]); \ - vex_printf("\n"); \ - } \ - } \ - } \ - } \ - chunk = chunk->next; \ - } \ +#define WALK_CHUNKS(process_one_chunk, process_legs_fork, \ + process_fall_through_leg, process_out_of_line_leg, \ + process_legs_join) \ + do { \ + while (chunk != NULL) { \ + process_one_chunk; \ + if (chunk->isIfThenElse) { \ + process_legs_fork; \ + if (DEBUG_REGALLOC) { \ + print_depth(depth); \ + vex_printf("if (!"); \ + con->ppCondCode(chunk->IfThenElse.ccOOL); \ + vex_printf(") then fall-through {\n"); \ + } \ + process_fall_through_leg; \ + if (DEBUG_REGALLOC) { \ + print_depth(depth); \ + vex_printf("} else out-of-line {\n"); \ + } \ + process_out_of_line_leg; \ + if (DEBUG_REGALLOC) { \ + print_depth(depth); \ + vex_printf("}\n"); \ + } \ + process_legs_join; \ + } \ + chunk = chunk->next; \ + } \ } while (0) @@ -315,14 +316,13 @@ static inline void enlarge_rreg_lrs(RRegLRState* rreg_lrs) rreg_lrs->lrs_size = 2 * rreg_lrs->lrs_used; } -#define PRINT_STATE(what) \ - do { \ - print_state(chunk, vreg_state, n_vregs, rreg_state, INSTRNO_TOTAL, \ - depth, con, what); \ +#define PRINT_STATE(what) \ + do { \ + print_state(chunk, state, INSTRNO_TOTAL, depth, con, what); \ } while (0) -static inline void print_state(const RegAllocChunk* chunk, - const VRegState* vreg_state, UInt n_vregs, const RRegState* rreg_state, +static inline void print_state( + const RegAllocChunk* chunk, const RegAllocState* state, Short ii_total_current, UInt depth, const RegAllocControl* con, const HChar* comment) { @@ -330,8 +330,8 @@ static inline void print_state(const RegAllocChunk* chunk, vex_printf("%s (current instruction total #%d):\n", comment, ii_total_current); - for (UInt v_idx = 0; v_idx < n_vregs; v_idx++) { - const VRegState* vreg = &vreg_state[v_idx]; + for (UInt v_idx = 0; v_idx < state->n_vregs; v_idx++) { + const VRegState* vreg = &state->vregs[v_idx]; if (vreg->live_after == INVALID_INSTRNO) { continue; /* This is a dead vreg. Never comes into live. */ @@ -370,7 +370,7 @@ static inline void print_state(const RegAllocChunk* chunk, } for (UInt r_idx = 0; r_idx < chunk->n_rregs; r_idx++) { - const RRegState* rreg = &rreg_state[r_idx]; + const RRegState* rreg = &state->rregs[r_idx]; const RRegLR* lr = chunk->rreg_lr_state[r_idx].lr_current; print_depth(depth); vex_printf("rreg_state[%2u] = ", r_idx); @@ -399,6 +399,35 @@ static inline void print_state(const RegAllocChunk* chunk, } } +static RegAllocState* clone_state(const RegAllocState* orig) +{ + RegAllocState* st2 = LibVEX_Alloc_inline(sizeof(RegAllocState)); + st2->n_vregs = orig->n_vregs; + st2->vregs = NULL; + if (orig->n_vregs > 0) { + st2->vregs = LibVEX_Alloc_inline(orig->n_vregs * sizeof(VRegState)); + } + + for (UInt v_idx = 0; v_idx < orig->n_vregs; v_idx++) { + st2->vregs[v_idx].live_after = orig->vregs[v_idx].live_after; + st2->vregs[v_idx].dead_before = orig->vregs[v_idx].dead_before; + st2->vregs[v_idx].reg_class = orig->vregs[v_idx].reg_class; + st2->vregs[v_idx].disp = orig->vregs[v_idx].disp; + st2->vregs[v_idx].rreg = orig->vregs[v_idx].rreg; + st2->vregs[v_idx].spill_offset = orig->vregs[v_idx].spill_offset; + } + + st2->n_rregs = orig->n_rregs; + st2->rregs = LibVEX_Alloc_inline(orig->n_rregs * sizeof(RRegState)); + for (UInt r_idx = 0; r_idx < orig->n_rregs; r_idx++) { + st2->rregs[r_idx].disp = orig->rregs[r_idx].disp; + st2->rregs[r_idx].vreg = orig->rregs[r_idx].vreg; + st2->rregs[r_idx].eq_spill_slot = orig->rregs[r_idx].eq_spill_slot; + } + + return st2; +} + static inline void emit_instr(RegAllocChunk* chunk, HInstr* instr, UInt depth, const RegAllocControl* con, const HChar* why) { @@ -416,45 +445,40 @@ static inline void emit_instr(RegAllocChunk* chunk, HInstr* instr, UInt depth, } /* Updates register allocator state after vreg has been spilled. */ -static inline void mark_vreg_spilled( - UInt v_idx, VRegState* vreg_state, UInt n_vregs, - RRegState* rreg_state, UInt n_rregs) +static inline void mark_vreg_spilled(UInt v_idx, RegAllocState* state) { - HReg rreg = vreg_state[v_idx].rreg; + HReg rreg = state->vregs[v_idx].rreg; UInt r_idx = hregIndex(rreg); - vreg_state[v_idx].disp = Spilled; - vreg_state[v_idx].rreg = INVALID_HREG; - rreg_state[r_idx].disp = Free; - rreg_state[r_idx].vreg = INVALID_HREG; - rreg_state[r_idx].eq_spill_slot = False; + state->vregs[v_idx].disp = Spilled; + state->vregs[v_idx].rreg = INVALID_HREG; + state->rregs[r_idx].disp = Free; + state->rregs[r_idx].vreg = INVALID_HREG; + state->rregs[r_idx].eq_spill_slot = False; } /* Spills a vreg assigned to some rreg. The vreg is spilled and the rreg is freed. Returns rreg's index. */ static inline UInt spill_vreg( - RegAllocChunk* chunk, - VRegState* vreg_state, UInt n_vregs, RRegState* rreg_state, + RegAllocChunk* chunk, RegAllocState* state, HReg vreg, UInt v_idx, Short ii_total_current, UInt depth, const RegAllocControl* con) { - UInt n_rregs = chunk->n_rregs; - /* Check some invariants first. */ vassert(IS_VALID_VREGNO((v_idx))); - vassert(vreg_state[v_idx].disp == Assigned); - HReg rreg = vreg_state[v_idx].rreg; + vassert(state->vregs[v_idx].disp == Assigned); + HReg rreg = state->vregs[v_idx].rreg; UInt r_idx = hregIndex(rreg); vassert(IS_VALID_RREGNO(r_idx)); vassert(hregClass(con->univ->regs[r_idx]) == hregClass(vreg)); - vassert(vreg_state[v_idx].dead_before > ii_total_current); - vassert(vreg_state[v_idx].reg_class != HRcINVALID); + vassert(state->vregs[v_idx].dead_before > ii_total_current); + vassert(state->vregs[v_idx].reg_class != HRcINVALID); /* Generate spill. */ HInstr* spill1 = NULL; HInstr* spill2 = NULL; - con->genSpill(&spill1, &spill2, rreg, vreg_state[v_idx].spill_offset, + con->genSpill(&spill1, &spill2, rreg, state->vregs[v_idx].spill_offset, con->mode64); vassert(spill1 != NULL || spill2 != NULL); /* cannot be both NULL */ if (spill1 != NULL) { @@ -464,7 +488,7 @@ static inline UInt spill_vreg( emit_instr(chunk, spill2, depth, con, "spill2"); } - mark_vreg_spilled(v_idx, vreg_state, n_vregs, rreg_state, n_rregs); + mark_vreg_spilled(v_idx, state); return r_idx; } @@ -472,8 +496,7 @@ static inline UInt spill_vreg( The vreg must not be from the instruction being processed, that is, it must not be listed in reg_usage->vRegs. */ static inline HReg find_vreg_to_spill( - const RegAllocChunk* chunk, - const VRegState* vreg_state, UInt n_vregs, const RRegState* rreg_state, + const RegAllocChunk* chunk, RegAllocState* state, const HRegUsage* instr_regusage, HRegClass target_hregclass, Short ii_chunk_current, const RegAllocControl* con) { @@ -498,7 +521,7 @@ static inline HReg find_vreg_to_spill( for (UInt r_idx = con->univ->allocable_start[target_hregclass]; r_idx <= con->univ->allocable_end[target_hregclass]; r_idx++) { - const RRegState* rreg = &rreg_state[r_idx]; + const RRegState* rreg = &state->rregs[r_idx]; if (rreg->disp == Bound) { HReg vreg = rreg->vreg; if (! HRegUsage__contains(instr_regusage, vreg)) { @@ -536,8 +559,7 @@ static inline HReg find_vreg_to_spill( a callee-save register because it won't be used for parameter passing around helper function calls. */ static Bool find_free_rreg( - const RegAllocChunk* chunk, - const VRegState* vreg_state, UInt n_vregs, const RRegState* rreg_state, + const RegAllocChunk* chunk, RegAllocState* state, Short ii_chunk_current, HRegClass target_hregclass, Bool reserve_phase, const RegAllocControl* con, UInt* r_idx_found) { @@ -546,7 +568,7 @@ static Bool find_free_rreg( for (UInt r_idx = con->univ->allocable_start[target_hregclass]; r_idx <= con->univ->allocable_end[target_hregclass]; r_idx++) { - const RRegState* rreg = &rreg_state[r_idx]; + const RRegState* rreg = &state->rregs[r_idx]; const RRegLRState* rreg_lrs = &chunk->rreg_lr_state[r_idx]; if (rreg->disp == Free) { if (rreg_lrs->lrs_used == 0) { @@ -651,10 +673,10 @@ static UInt stage1(HInstrVec* instrs_in, UInt ii_total_start, UInt n_rregs, /* --- Stage 2. --- Scan the incoming instructions. - Note: vreg state is initially global (shared accross all chunks). - rreg state is inherently local to every chunk. */ -static void stage2_chunk(RegAllocChunk* chunk, VRegState* vreg_state, - UInt n_vregs, UInt n_rregs, UInt depth, const RegAllocControl* con) + Note: state->vregs is initially global (shared accross all chunks). + state->rregs is inherently local to every chunk. */ +static void stage2_chunk(RegAllocChunk* chunk, RegAllocState* state, + UInt depth, const RegAllocControl* con) { /* Info on register usage in the incoming instructions. Computed once and remains unchanged, more or less; updated sometimes by the @@ -712,35 +734,35 @@ static void stage2_chunk(RegAllocChunk* chunk, VRegState* vreg_state, print_depth(depth); con->ppInstr(instr, con->mode64); vex_printf("\n"); - vex_printf("vreg %u (n_vregs %u)\n", v_idx, n_vregs); + vex_printf("vreg %u (n_vregs %u)\n", v_idx, state->n_vregs); vpanic("registerAllocation (stage 2): out-of-range vreg"); } /* Note the register class. */ - if (vreg_state[v_idx].reg_class == HRcINVALID) { + if (state->vregs[v_idx].reg_class == HRcINVALID) { /* First mention of this vreg. */ - vreg_state[v_idx].reg_class = hregClass(vreg); + state->vregs[v_idx].reg_class = hregClass(vreg); } else { /* Seen it before, so check for consistency. */ - vassert(vreg_state[v_idx].reg_class == hregClass(vreg)); + vassert(state->vregs[v_idx].reg_class == hregClass(vreg)); } /* Consider live ranges. */ switch (chunk->reg_usage[ii_chunk].vMode[j]) { case HRmRead: - if (vreg_state[v_idx].live_after == INVALID_INSTRNO) { + if (state->vregs[v_idx].live_after == INVALID_INSTRNO) { OFFENDING_VREG(v_idx, instr, "Read"); } break; case HRmWrite: - if (vreg_state[v_idx].live_after == INVALID_INSTRNO) { - vreg_state[v_idx].live_after = INSTRNO_TOTAL; - } else if (vreg_state[v_idx].live_after > INSTRNO_TOTAL) { - vreg_state[v_idx].live_after = INSTRNO_TOTAL; + if (state->vregs[v_idx].live_after == INVALID_INSTRNO) { + state->vregs[v_idx].live_after = INSTRNO_TOTAL; + } else if (state->vregs[v_idx].live_after > INSTRNO_TOTAL) { + state->vregs[v_idx].live_after = INSTRNO_TOTAL; } break; case HRmModify: - if (vreg_state[v_idx].live_after == INVALID_INSTRNO) { + if (state->vregs[v_idx].live_after == INVALID_INSTRNO) { OFFENDING_VREG(v_idx, instr, "Modify"); } break; @@ -748,8 +770,8 @@ static void stage2_chunk(RegAllocChunk* chunk, VRegState* vreg_state, vassert(0); } - if (vreg_state[v_idx].dead_before < INSTRNO_TOTAL + 1) { - vreg_state[v_idx].dead_before = INSTRNO_TOTAL + 1; + if (state->vregs[v_idx].dead_before < INSTRNO_TOTAL + 1) { + state->vregs[v_idx].dead_before = INSTRNO_TOTAL + 1; } } @@ -766,8 +788,8 @@ static void stage2_chunk(RegAllocChunk* chunk, VRegState* vreg_state, are unavailable to the register allocator and so we never visit them. We asserted above that n_rregs > 0, so (n_rregs - 1) is safe. */ - if (rReg_maxIndex >= n_rregs) { - rReg_maxIndex = n_rregs - 1; + if (rReg_maxIndex >= state->n_rregs) { + rReg_maxIndex = state->n_rregs - 1; } for (UInt r_idx = rReg_minIndex; r_idx <= rReg_maxIndex; r_idx++) { @@ -810,8 +832,8 @@ static void stage2_chunk(RegAllocChunk* chunk, VRegState* vreg_state, } } -static void stage2_phi_nodes(RegAllocChunk* chunk, VRegState* vreg_state, - UInt n_vregs, UInt depth, const RegAllocControl* con) +static void stage2_phi_nodes(RegAllocChunk* chunk, RegAllocState* state, + UInt depth, const RegAllocControl* con) { vassert(chunk->next != NULL); Short ii_total_next = chunk->next->ii_total_start; @@ -822,35 +844,39 @@ static void stage2_phi_nodes(RegAllocChunk* chunk, VRegState* vreg_state, /* Extend dead-before of source vregs up to the first instruction after join from If-Then-Else. */ UInt v_idx_fallThrough = hregIndex(phi->srcFallThrough); - vassert(vreg_state[v_idx_fallThrough].live_after != INVALID_INSTRNO); - if (vreg_state[v_idx_fallThrough].dead_before < ii_total_next + 1) { - vreg_state[v_idx_fallThrough].dead_before = ii_total_next + 1; + vassert(state->vregs[v_idx_fallThrough].live_after != INVALID_INSTRNO); + if (state->vregs[v_idx_fallThrough].dead_before < ii_total_next + 1) { + state->vregs[v_idx_fallThrough].dead_before = ii_total_next + 1; } UInt v_idx_outOfLine = hregIndex(phi->srcOutOfLine); - vassert(vreg_state[v_idx_outOfLine].live_after != INVALID_INSTRNO); - if (vreg_state[v_idx_outOfLine].dead_before < ii_total_next + 1) { - vreg_state[v_idx_outOfLine].dead_before = ii_total_next + 1; + vassert(state->vregs[v_idx_outOfLine].live_after != INVALID_INSTRNO); + if (state->vregs[v_idx_outOfLine].dead_before < ii_total_next + 1) { + state->vregs[v_idx_outOfLine].dead_before = ii_total_next + 1; } /* Live range for destination vreg begins here. */ UInt v_idx_dst = hregIndex(phi->dst); - vassert(vreg_state[v_idx_dst].live_after == INVALID_INSTRNO); - vreg_state[v_idx_dst].live_after = ii_total_next; - vreg_state[v_idx_dst].dead_before = ii_total_next + 1; + vassert(state->vregs[v_idx_dst].live_after == INVALID_INSTRNO); + state->vregs[v_idx_dst].live_after = ii_total_next; + state->vregs[v_idx_dst].dead_before = ii_total_next + 1; + + if (DEBUG_REGALLOC) { + print_depth(depth); + ppHPhiNode(phi); + vex_printf("\n"); + } } } -static void stage2(RegAllocChunk* chunk, VRegState* vreg_state, UInt n_vregs, - UInt n_rregs, UInt depth, const RegAllocControl* con) +static void stage2(RegAllocChunk* chunk, RegAllocState* state, + UInt depth, const RegAllocControl* con) { - WALK_CHUNKS(stage2_chunk(chunk, vreg_state, n_vregs, n_rregs, depth, con), + WALK_CHUNKS(stage2_chunk(chunk, state, depth, con), ;, - stage2(chunk->IfThenElse.fallThrough, vreg_state, n_vregs, - n_rregs, depth + 1, con), - stage2(chunk->IfThenElse.outOfLine, vreg_state, n_vregs, - n_rregs, depth + 1, con), - stage2_phi_nodes(chunk, vreg_state, n_vregs, depth, con)); + stage2(chunk->IfThenElse.fallThrough, state, depth + 1, con), + stage2(chunk->IfThenElse.outOfLine, state, depth + 1, con), + stage2_phi_nodes(chunk, state, depth, con)); } static void stage2_debug_vregs(const VRegState* vreg_state, UInt n_vregs) @@ -1016,28 +1042,23 @@ static void stage3(VRegState* vreg_state, UInt n_vregs, } -static void stage4_chunk(RegAllocChunk* chunk, - VRegState* vreg_state, UInt n_vregs, RRegState* rreg_state, - UInt depth, const RegAllocControl* con) +static void stage4_chunk(RegAllocChunk* chunk, RegAllocState* state, + UInt depth, const RegAllocControl* con) { - UInt n_rregs = chunk->n_rregs; - /* Finds an rreg of the correct class. If a free rreg is not found, then spills a vreg not used by the current instruction and makes free the corresponding rreg. */ # define FIND_OR_MAKE_FREE_RREG(_v_idx, _reg_class, _reserve_phase) \ ({ \ UInt _r_free_idx = -1; \ - Bool free_rreg_found = find_free_rreg(chunk, \ - vreg_state, n_vregs, rreg_state, \ - ii_chunk, (_reg_class), (_reserve_phase), \ - con, &_r_free_idx); \ + Bool free_rreg_found = find_free_rreg(chunk, state, \ + ii_chunk, (_reg_class), (_reserve_phase), \ + con, &_r_free_idx); \ if (!free_rreg_found) { \ - HReg vreg_to_spill = find_vreg_to_spill(chunk, \ - vreg_state, n_vregs, rreg_state, \ + HReg vreg_to_spill = find_vreg_to_spill(chunk, state, \ &chunk->reg_usage[ii_chunk], (_reg_class), \ ii_chunk, con); \ - _r_free_idx = spill_vreg(chunk, vreg_state, n_vregs, rreg_state, \ + _r_free_idx = spill_vreg(chunk, state, \ vreg_to_spill, hregIndex(vreg_to_spill), \ INSTRNO_TOTAL, depth, con); \ } \ @@ -1081,42 +1102,42 @@ static void stage4_chunk(RegAllocChunk* chunk, ); if (do_sanity_check) { - /* Sanity check: the vreg_state and rreg_state mutually-redundant - mappings are consistent. If vreg_state[v].rreg points at some - rreg_state entry then that rreg_state entry should point back at - vreg_state[v]. */ - for (UInt v_idx = 0; v_idx < n_vregs; v_idx++) { - if (vreg_state[v_idx].disp == Assigned) { - vassert(!hregIsVirtual(vreg_state[v_idx].rreg)); - - UInt r_idx = hregIndex(vreg_state[v_idx].rreg); + /* Sanity check: the state->vregs and state->rregs mutually-redundant + mappings are consistent. If state->vregs[v].rreg points at some + state->rregs entry then that state->rregs entry should point back at + state->vregs[v]. */ + for (UInt v_idx = 0; v_idx < state->n_vregs; v_idx++) { + if (state->vregs[v_idx].disp == Assigned) { + vassert(!hregIsVirtual(state->vregs[v_idx].rreg)); + + UInt r_idx = hregIndex(state->vregs[v_idx].rreg); vassert(IS_VALID_RREGNO(r_idx)); - vassert(rreg_state[r_idx].disp == Bound); - vassert(hregIndex(rreg_state[r_idx].vreg) == v_idx); + vassert(state->rregs[r_idx].disp == Bound); + vassert(hregIndex(state->rregs[r_idx].vreg) == v_idx); - vassert(hregClass(vreg_state[v_idx].rreg) + vassert(hregClass(state->vregs[v_idx].rreg) == hregClass(con->univ->regs[r_idx])); } } - for (UInt r_idx = 0; r_idx < n_rregs; r_idx++) { - const RRegState* rreg = &rreg_state[r_idx]; + for (UInt r_idx = 0; r_idx < state->n_rregs; r_idx++) { + const RRegState* rreg = &state->rregs[r_idx]; if (rreg->disp == Bound) { vassert(hregIsVirtual(rreg->vreg)); UInt v_idx = hregIndex(rreg->vreg); vassert(IS_VALID_VREGNO(v_idx)); - vassert(vreg_state[v_idx].disp == Assigned); - vassert(hregIndex(vreg_state[v_idx].rreg) == r_idx); + vassert(state->vregs[v_idx].disp == Assigned); + vassert(hregIndex(state->vregs[v_idx].rreg) == r_idx); } else { - vassert(rreg_state[r_idx].eq_spill_slot == False); + vassert(state->rregs[r_idx].eq_spill_slot == False); } } /* Sanity check: if rreg has been marked as Reserved, there must be a corresponding hard live range for it. */ - for (UInt r_idx = 0; r_idx < n_rregs; r_idx++) { - if (rreg_state[r_idx].disp == Reserved) { + for (UInt r_idx = 0; r_idx < state->n_rregs; r_idx++) { + if (state->rregs[r_idx].disp == Reserved) { const RRegLRState* rreg_lrs = &chunk->rreg_lr_state[r_idx]; vassert(rreg_lrs->lrs_used > 0); vassert(rreg_lrs->lr_current_idx < rreg_lrs->lrs_used); @@ -1145,22 +1166,22 @@ static void stage4_chunk(RegAllocChunk* chunk, vassert(IS_VALID_VREGNO(vs_idx)); vassert(IS_VALID_VREGNO(vd_idx)); - if ((vreg_state[vs_idx].dead_before == INSTRNO_TOTAL + 1) - && (vreg_state[vd_idx].live_after == INSTRNO_TOTAL) - && (vreg_state[vs_idx].disp == Assigned)) { + if ((state->vregs[vs_idx].dead_before == INSTRNO_TOTAL + 1) + && (state->vregs[vd_idx].live_after == INSTRNO_TOTAL) + && (state->vregs[vs_idx].disp == Assigned)) { /* Live ranges are adjacent and source vreg is bound. Finally we can do the coalescing. */ - HReg rreg = vreg_state[vs_idx].rreg; - vreg_state[vd_idx].disp = Assigned; - vreg_state[vd_idx].rreg = rreg; - vreg_state[vs_idx].disp = Unallocated; - vreg_state[vs_idx].rreg = INVALID_HREG; + HReg rreg = state->vregs[vs_idx].rreg; + state->vregs[vd_idx].disp = Assigned; + state->vregs[vd_idx].rreg = rreg; + state->vregs[vs_idx].disp = Unallocated; + state->vregs[vs_idx].rreg = INVALID_HREG; UInt r_idx = hregIndex(rreg); - vassert(rreg_state[r_idx].disp == Bound); - rreg_state[r_idx].vreg = vregD; - rreg_state[r_idx].eq_spill_slot = False; + vassert(state->rregs[r_idx].disp == Bound); + state->rregs[r_idx].vreg = vregD; + state->rregs[r_idx].eq_spill_slot = False; if (DEBUG_REGALLOC) { print_depth(depth); @@ -1176,12 +1197,12 @@ static void stage4_chunk(RegAllocChunk* chunk, This effectively means that either the translated program contained dead code (although VEX iropt passes are pretty good at eliminating it) or the VEX backend generated dead code. */ - if (vreg_state[vd_idx].dead_before <= INSTRNO_TOTAL + 1) { - vreg_state[vd_idx].disp = Unallocated; - vreg_state[vd_idx].rreg = INVALID_HREG; - rreg_state[r_idx].disp = Free; - rreg_state[r_idx].vreg = INVALID_HREG; - rreg_state[r_idx].eq_spill_slot = False; + if (state->vregs[vd_idx].dead_before <= INSTRNO_TOTAL + 1) { + state->vregs[vd_idx].disp = Unallocated; + state->vregs[vd_idx].rreg = INVALID_HREG; + state->rregs[r_idx].disp = Free; + state->rregs[r_idx].vreg = INVALID_HREG; + state->rregs[r_idx].eq_spill_slot = False; } /* Move on to the next instruction. We skip the post-instruction @@ -1208,8 +1229,8 @@ static void stage4_chunk(RegAllocChunk* chunk, if (rMentioned != 0) { UInt rReg_minIndex = ULong__minIndex(rMentioned); UInt rReg_maxIndex = ULong__maxIndex(rMentioned); - if (rReg_maxIndex >= n_rregs) { - rReg_maxIndex = n_rregs - 1; + if (rReg_maxIndex >= state->n_rregs) { + rReg_maxIndex = state->n_rregs - 1; } for (UInt r_idx = rReg_minIndex; r_idx <= rReg_maxIndex; r_idx++) { @@ -1219,7 +1240,7 @@ static void stage4_chunk(RegAllocChunk* chunk, continue; } - RRegState* rreg = &rreg_state[r_idx]; + RRegState* rreg = &state->rregs[r_idx]; const RRegLRState* rreg_lrs = &chunk->rreg_lr_state[r_idx]; if (LIKELY(rreg_lrs->lrs_used == 0)) { continue; @@ -1240,17 +1261,16 @@ static void stage4_chunk(RegAllocChunk* chunk, if (! HRegUsage__contains(reg_usage, vreg)) { if (rreg->eq_spill_slot) { - mark_vreg_spilled(v_idx, vreg_state, n_vregs, - rreg_state, n_rregs); + mark_vreg_spilled(v_idx, state); } else { /* Spill the vreg. It is not used by this instruction.*/ - spill_vreg(chunk, vreg_state, n_vregs, rreg_state, - vreg, v_idx, INSTRNO_TOTAL, depth, con); + spill_vreg(chunk, state, vreg, v_idx, INSTRNO_TOTAL, + depth, con); } } else { /* Find or make a free rreg where to move this vreg to. */ UInt r_free_idx = FIND_OR_MAKE_FREE_RREG( - v_idx, vreg_state[v_idx].reg_class, True); + v_idx, state->vregs[v_idx].reg_class, True); /* Generate "move" between real registers. */ HInstr* move = con->genMove(con->univ->regs[r_idx], @@ -1259,11 +1279,11 @@ static void stage4_chunk(RegAllocChunk* chunk, emit_instr(chunk, move, depth, con, "move"); /* Update the register allocator state. */ - vassert(vreg_state[v_idx].disp == Assigned); - vreg_state[v_idx].rreg = con->univ->regs[r_free_idx]; - rreg_state[r_free_idx].disp = Bound; - rreg_state[r_free_idx].vreg = vreg; - rreg_state[r_free_idx].eq_spill_slot = rreg->eq_spill_slot; + vassert(state->vregs[v_idx].disp == Assigned); + state->vregs[v_idx].rreg = con->univ->regs[r_free_idx]; + state->rregs[r_free_idx].disp = Bound; + state->rregs[r_free_idx].vreg = vreg; + state->rregs[r_free_idx].eq_spill_slot = rreg->eq_spill_slot; rreg->disp = Free; rreg->vreg = INVALID_HREG; rreg->eq_spill_slot = False; @@ -1313,13 +1333,13 @@ static void stage4_chunk(RegAllocChunk* chunk, nreads++; UInt v_idx = hregIndex(vreg); vassert(IS_VALID_VREGNO(v_idx)); - if (vreg_state[v_idx].disp == Spilled) { + if (state->vregs[v_idx].disp == Spilled) { /* Is this its last use? */ - vassert(vreg_state[v_idx].dead_before >= INSTRNO_TOTAL + 1); - if ((vreg_state[v_idx].dead_before == INSTRNO_TOTAL + 1) + vassert(state->vregs[v_idx].dead_before >= INSTRNO_TOTAL + 1); + if ((state->vregs[v_idx].dead_before == INSTRNO_TOTAL + 1) && hregIsInvalid(vreg_found)) { vreg_found = vreg; - spill_offset = vreg_state[v_idx].spill_offset; + spill_offset = state->vregs[v_idx].spill_offset; } } } @@ -1383,30 +1403,30 @@ static void stage4_chunk(RegAllocChunk* chunk, UInt v_idx = hregIndex(vreg); vassert(IS_VALID_VREGNO(v_idx)); - HReg rreg = vreg_state[v_idx].rreg; + HReg rreg = state->vregs[v_idx].rreg; UInt r_idx; - if (vreg_state[v_idx].disp == Assigned) { + if (state->vregs[v_idx].disp == Assigned) { r_idx = hregIndex(rreg); - vassert(rreg_state[r_idx].disp == Bound); + vassert(state->rregs[r_idx].disp == Bound); addToHRegRemap(&remap, vreg, rreg); } else { vassert(hregIsInvalid(rreg)); /* Find or make a free rreg of the correct class. */ r_idx = FIND_OR_MAKE_FREE_RREG( - v_idx, vreg_state[v_idx].reg_class, False); + v_idx, state->vregs[v_idx].reg_class, False); rreg = con->univ->regs[r_idx]; /* Generate reload only if the vreg is spilled and is about to being read or modified. If it is merely written than reloading it first would be pointless. */ - if ((vreg_state[v_idx].disp == Spilled) + if ((state->vregs[v_idx].disp == Spilled) && (reg_usage->vMode[j] != HRmWrite)) { HInstr* reload1 = NULL; HInstr* reload2 = NULL; con->genReload(&reload1, &reload2, rreg, - vreg_state[v_idx].spill_offset, con->mode64); + state->vregs[v_idx].spill_offset, con->mode64); vassert(reload1 != NULL || reload2 != NULL); if (reload1 != NULL) { emit_instr(chunk, reload1, depth, con, "reload1"); @@ -1416,17 +1436,17 @@ static void stage4_chunk(RegAllocChunk* chunk, } } - rreg_state[r_idx].disp = Bound; - rreg_state[r_idx].vreg = vreg; - rreg_state[r_idx].eq_spill_slot = True; - vreg_state[v_idx].disp = Assigned; - vreg_state[v_idx].rreg = rreg; + state->rregs[r_idx].disp = Bound; + state->rregs[r_idx].vreg = vreg; + state->rregs[r_idx].eq_spill_slot = True; + state->vregs[v_idx].disp = Assigned; + state->vregs[v_idx].rreg = rreg; addToHRegRemap(&remap, vreg, rreg); } /* If this vreg is written or modified, mark it so. */ if (reg_usage->vMode[j] != HRmRead) { - rreg_state[r_idx].eq_spill_slot = False; + state->rregs[r_idx].eq_spill_slot = False; } } @@ -1444,8 +1464,8 @@ static void stage4_chunk(RegAllocChunk* chunk, /* Free rregs which: - Have been reserved and whose hard live range ended. - Have been bound to vregs whose live range ended. */ - for (UInt r_idx = 0; r_idx < n_rregs; r_idx++) { - RRegState* rreg = &rreg_state[r_idx]; + for (UInt r_idx = 0; r_idx < state->n_rregs; r_idx++) { + RRegState* rreg = &state->rregs[r_idx]; RRegLRState* rreg_lrs = &chunk->rreg_lr_state[r_idx]; switch (rreg->disp) { case Free: @@ -1454,9 +1474,9 @@ static void stage4_chunk(RegAllocChunk* chunk, if (rreg_lrs->lrs_used > 0) { /* Consider "dead before" the next instruction. */ if (rreg_lrs->lr_current->dead_before <= ii_chunk + 1) { - rreg_state[r_idx].disp = Free; - rreg_state[r_idx].vreg = INVALID_HREG; - rreg_state[r_idx].eq_spill_slot = False; + state->rregs[r_idx].disp = Free; + state->rregs[r_idx].vreg = INVALID_HREG; + state->rregs[r_idx].eq_spill_slot = False; if (rreg_lrs->lr_current_idx < rreg_lrs->lrs_used - 1) { rreg_lrs->lr_current_idx += 1; rreg_lrs->lr_current @@ -1468,12 +1488,12 @@ static void stage4_chunk(RegAllocChunk* chunk, case Bound: { UInt v_idx = hregIndex(rreg->vreg); /* Consider "dead before" the next instruction. */ - if (vreg_state[v_idx].dead_before <= INSTRNO_TOTAL + 1) { - vreg_state[v_idx].disp = Unallocated; - vreg_state[v_idx].rreg = INVALID_HREG; - rreg_state[r_idx].disp = Free; - rreg_state[r_idx].vreg = INVALID_HREG; - rreg_state[r_idx].eq_spill_slot = False; + if (state->vregs[v_idx].dead_before <= INSTRNO_TOTAL + 1) { + state->vregs[v_idx].disp = Unallocated; + state->vregs[v_idx].rreg = INVALID_HREG; + state->rregs[r_idx].disp = Free; + state->rregs[r_idx].vreg = INVALID_HREG; + state->rregs[r_idx].eq_spill_slot = False; } break; } @@ -1491,10 +1511,8 @@ static void stage4_emit_HInstrIfThenElse(RegAllocChunk* chunk, UInt depth, { vassert(chunk->isIfThenElse); - HInstrIfThenElse* hite = newHInstrIfThenElse( - chunk->IfThenElse.ccOOL, - chunk->IfThenElse.phi_nodes, - chunk->IfThenElse.n_phis); + HInstrIfThenElse* hite = newHInstrIfThenElse(chunk->IfThenElse.ccOOL, + NULL, 0); hite->fallThrough = chunk->IfThenElse.fallThrough->instrs_out; hite->outOfLine = chunk->IfThenElse.outOfLine->instrs_out; @@ -1502,16 +1520,143 @@ static void stage4_emit_HInstrIfThenElse(RegAllocChunk* chunk, UInt depth, } -static void stage4(RegAllocChunk* chunk, VRegState* vreg_state, UInt n_vregs, - RRegState* rreg_state, UInt depth, const RegAllocControl* con) +/* Merges states of two vregs into the destination vreg: + |v1_idx| + |v2_idx| -> |vd_idx|. + Usually |v1_idx| == |v2_idx| == |vd_idx| so the merging happens between + different states but for the same vreg. + For phi node merging, |v1_idx| != |v2_idx| != |vd_idx|. + Note: |v1_idx| and |vd_idx| are indexes to |state1|, |v2_idx| to |state2|. */ +static void merge_vreg_states(RegAllocChunk* chunk, + RegAllocState* state1, RegAllocState* state2, + UInt v1_idx, UInt v2_idx, UInt vd_idx, HReg vregD, + UInt depth, const RegAllocControl* con) { - WALK_CHUNKS(stage4_chunk(chunk, vreg_state, n_vregs, rreg_state, depth, con), - stage4_emit_HInstrIfThenElse(chunk, depth, con), - stage4(chunk->IfThenElse.fallThrough, vreg_state, n_vregs, - rreg_state, depth + 1, con), - stage4(chunk->IfThenElse.outOfLine, vreg_state, n_vregs, - rreg_state, depth + 1, con), - ;); + RegAllocChunk* outOfLine = chunk->IfThenElse.outOfLine; + VRegState* v1_src_state = &state1->vregs[v1_idx]; + VRegState* v2_src_state = &state2->vregs[v2_idx]; + VRegState* v1_dst_state = &state1->vregs[vd_idx]; + VRegState* v2_dst_state = &state2->vregs[vd_idx]; + + switch (v1_src_state->disp) { + case Unallocated: + vassert(v2_src_state->disp == Unallocated); + break; + + case Assigned: + switch (v2_src_state->disp) { + case Unallocated: + vpanic("Logic error during register allocator state merge " + "(Assigned/Unallocated)."); + + case Assigned: { + /* Check if both vregs are assigned to the same rreg. */ + HReg rreg1 = v1_src_state->rreg; + HReg rreg2 = v2_src_state->rreg; + if (! sameHReg(rreg1, rreg2)) { + /* Generate "move" from rreg2 to rreg1. */ + HInstr* move = con->genMove(con->univ->regs[hregIndex(rreg2)], + con->univ->regs[hregIndex(rreg1)], con->mode64); + vassert(move != NULL); + emit_instr(outOfLine, move, depth + 1, con, "move"); + } + + v1_src_state->disp = Unallocated; + v1_src_state->rreg = INVALID_HREG; + v2_src_state->disp = Unallocated; + v2_src_state->rreg = INVALID_HREG; + v1_dst_state->disp = Assigned; + v1_dst_state->rreg = rreg1; + v2_dst_state->disp = Assigned; + v2_dst_state->rreg = rreg1; + + UInt r_idx = hregIndex(rreg1); + vassert(state1->rregs[r_idx].disp == Bound); + state1->rregs[r_idx].eq_spill_slot = False; + if (v1_idx != vd_idx) { + vassert(!hregIsInvalid(vregD)); + state1->rregs[r_idx].vreg = vregD; + } + break; + } + case Spilled: + /* Generate reload. */ + vpanic("Reload not implemented, yet."); + break; + default: + vassert(0); + } + break; + + case Spilled: + switch (v2_src_state->disp) { + case Unallocated: + vpanic("Logic error during register allocator state merge " + " (Spilled/Unallocated)."); + case Assigned: + /* Generate spill. */ + vpanic("Spill not implemented, yet."); + break; + case Spilled: + /* Check if both vregs are spilled at the same spill slot. + Eventually reload vreg to a rreg and spill it again. */ + if (v1_src_state->spill_offset != v2_src_state->spill_offset) { + /* Find a free rreg in |state1|, reload from v2_src_state->spill_slot, + spill to v1_dst_state->spill_slot. */ + vpanic("Spilled/Spilled reload not implemented, yet."); + } + default: + vassert(0); + } + + default: + vassert(0); + } +} + +/* Merges |cloned| state from out-of-line leg back into the main |state|, + modified by fall-through leg since the legs fork. */ +static void stage4_merge_states(RegAllocChunk* chunk, + RegAllocState* state, RegAllocState* cloned, + UInt depth, const RegAllocControl* con) +{ + if (DEBUG_REGALLOC) { + print_state(chunk, state, chunk->next->ii_total_start, depth, con, + "Before state merge: fall-through leg"); + print_state(chunk, cloned, chunk->next->ii_total_start, depth, con, + "Before state merge: out-of-line leg"); + } + + /* Process phi nodes first. */ + for (UInt i = 0; i < chunk->IfThenElse.n_phis; i++) { + const HPhiNode* phi_node = &chunk->IfThenElse.phi_nodes[i]; + + merge_vreg_states(chunk, state, cloned, + hregIndex(phi_node->srcFallThrough), hregIndex(phi_node->srcOutOfLine), + hregIndex(phi_node->dst), phi_node->dst, depth, con); + } + + /* Merge remaining vreg states. VRegs mentioned by phi nodes are processed + as well but merging is no-op for them now. */ + for (UInt v_idx = 0; v_idx < state->n_vregs; v_idx++) { + merge_vreg_states(chunk, state, cloned, v_idx, v_idx, v_idx, INVALID_HREG, + depth, con); + } + + if (DEBUG_REGALLOC) { + print_state(chunk, state, chunk->next->ii_total_start, depth, con, + "After state merge"); + } +} + +static void stage4(RegAllocChunk* chunk, RegAllocState* state, + UInt depth, const RegAllocControl* con) +{ + WALK_CHUNKS(stage4_chunk(chunk, state, depth, con), + stage4_emit_HInstrIfThenElse(chunk, depth, con); + RegAllocState* cloned_state = clone_state(state), + stage4(chunk->IfThenElse.fallThrough, state, depth + 1, con), + stage4(chunk->IfThenElse.outOfLine, cloned_state, depth + 1, con), + stage4_merge_states(chunk, state, cloned_state, depth, con)); } @@ -1539,41 +1684,43 @@ HInstrSB* doRegisterAllocation( vassert((con->guest_sizeB % LibVEX_GUEST_STATE_ALIGN) == 0); /* The main register allocator state. */ - UInt n_vregs = sb_in->n_vregs; - VRegState* vreg_state = NULL; - if (n_vregs > 0) { - vreg_state = LibVEX_Alloc_inline(n_vregs * sizeof(VRegState)); + RegAllocState* state = LibVEX_Alloc_inline(sizeof(RegAllocState)); + state->n_vregs = sb_in->n_vregs; + state->vregs = NULL; + if (state->n_vregs > 0) { + state->vregs = LibVEX_Alloc_inline(state->n_vregs * sizeof(VRegState)); } /* If this is not so, the universe we have is nonsensical. */ - UInt n_rregs = con->univ->allocable; - vassert(n_rregs > 0); + state->n_rregs = con->univ->allocable; + vassert(state->n_rregs > 0); STATIC_ASSERT(N_RREGUNIVERSE_REGS == 64); /* --- Stage 0. --- */ /* Initialize the vreg state. It is initially global. --- */ - for (UInt v_idx = 0; v_idx < n_vregs; v_idx++) { - vreg_state[v_idx].live_after = INVALID_INSTRNO; - vreg_state[v_idx].dead_before = INVALID_INSTRNO; - vreg_state[v_idx].reg_class = HRcINVALID; - vreg_state[v_idx].disp = Unallocated; - vreg_state[v_idx].rreg = INVALID_HREG; - vreg_state[v_idx].spill_offset = 0; + for (UInt v_idx = 0; v_idx < state->n_vregs; v_idx++) { + state->vregs[v_idx].live_after = INVALID_INSTRNO; + state->vregs[v_idx].dead_before = INVALID_INSTRNO; + state->vregs[v_idx].reg_class = HRcINVALID; + state->vregs[v_idx].disp = Unallocated; + state->vregs[v_idx].rreg = INVALID_HREG; + state->vregs[v_idx].spill_offset = 0; } /* Initialize redundant rreg -> vreg state. A snaphost is taken for every Out-Of-Line leg. */ - RRegState* rreg_state = LibVEX_Alloc_inline(n_rregs * sizeof(RRegState)); - for (UInt r_idx = 0; r_idx < n_rregs; r_idx++) { - rreg_state[r_idx].disp = Free; - rreg_state[r_idx].vreg = INVALID_HREG; - rreg_state[r_idx].eq_spill_slot = False; + state->rregs = LibVEX_Alloc_inline(state->n_rregs * sizeof(RRegState)); + for (UInt r_idx = 0; r_idx < state->n_rregs; r_idx++) { + state->rregs[r_idx].disp = Free; + state->rregs[r_idx].vreg = INVALID_HREG; + state->rregs[r_idx].eq_spill_slot = False; } /* --- Stage 1. Determine total ordering of instructions and structure of HInstrIfThenElse. --- */ RegAllocChunk* first_chunk; - UInt ii_total_last = stage1(sb_in->insns, 0, n_rregs, &first_chunk, con); + UInt ii_total_last = stage1(sb_in->insns, 0, state->n_rregs, + &first_chunk, con); /* The live range numbers are signed shorts, and so limiting the number of instructions to 15000 comfortably guards against them @@ -1581,22 +1728,22 @@ HInstrSB* doRegisterAllocation( vassert(ii_total_last <= 15000); /* --- Stage 2. Scan the incoming instructions. --- */ - stage2(first_chunk, vreg_state, n_vregs, n_rregs, 0, con); + stage2(first_chunk, state, 0, con); if (DEBUG_REGALLOC) { vex_printf("\n\nInitial register allocator state:\n"); - stage2_debug_vregs(vreg_state, n_vregs); + stage2_debug_vregs(state->vregs, state->n_vregs); stage2_debug_rregs(first_chunk, 0, con); } /* --- Stage 3. Allocate spill slots. --- */ - stage3(vreg_state, n_vregs, con); + stage3(state->vregs, state->n_vregs, con); /* --- Stage 4. Process the instructions and allocate registers. --- */ - stage4(first_chunk, vreg_state, n_vregs, rreg_state, 0, con); + stage4(first_chunk, state, 0, con); /* The output SB of instructions. */ HInstrSB* sb_out = LibVEX_Alloc_inline(sizeof(HInstrSB)); - sb_out->n_vregs = n_vregs; + sb_out->n_vregs = state->n_vregs; sb_out->insns = first_chunk->instrs_out; return sb_out; } |
|
From: Peter B. <be...@vn...> - 2017-09-12 21:28:54
|
On 9/12/17 12:51 PM, Ivo Raisr wrote: > Are there any comments, suggestions, objections to the patch attached to bug: > https://bugs.kde.org/show_bug.cgi?id=384584 > Callee saved registers listed first for AMD64, X86, and PPC architectures My guess on why the caller saved (aka volatile) regs are listed before the callee saved (aka non-volatile) registers, is that is the order most register allocators in compilers (eg, gcc, etc.) try and assign them. They attempt to use caller saved regs for the majority of pseudos/vregs that are not live across a function call, since those regs do not need to be saved/restored in the prologue/epilogue (ie, they're cheap to use) and it leaves the callee saved regs available for pseudos/vregs that are live across calls, which means you don't have to spill them around calls. Looking through host_generic_reg_alloc3.c, it doesn't seem like the VEX register allocator keeps track of vregs that are live across calls... or at least it doesn't seem to make use of that info in find_free_rreg() if it has it (I didn't check). Is that for simplicity reasons or it just didn't seem like it was needed? Peter |
|
From: Ivo R. <iv...@iv...> - 2017-09-12 17:51:16
|
Dear developers, Are there any comments, suggestions, objections to the patch attached to bug: https://bugs.kde.org/show_bug.cgi?id=384584 Callee saved registers listed first for AMD64, X86, and PPC architectures Let me know. I. |