You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(122) |
Nov
(152) |
Dec
(69) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(6) |
Feb
(25) |
Mar
(73) |
Apr
(82) |
May
(24) |
Jun
(25) |
Jul
(10) |
Aug
(11) |
Sep
(10) |
Oct
(54) |
Nov
(203) |
Dec
(182) |
| 2004 |
Jan
(307) |
Feb
(305) |
Mar
(430) |
Apr
(312) |
May
(187) |
Jun
(342) |
Jul
(487) |
Aug
(637) |
Sep
(336) |
Oct
(373) |
Nov
(441) |
Dec
(210) |
| 2005 |
Jan
(385) |
Feb
(480) |
Mar
(636) |
Apr
(544) |
May
(679) |
Jun
(625) |
Jul
(810) |
Aug
(838) |
Sep
(634) |
Oct
(521) |
Nov
(965) |
Dec
(543) |
| 2006 |
Jan
(494) |
Feb
(431) |
Mar
(546) |
Apr
(411) |
May
(406) |
Jun
(322) |
Jul
(256) |
Aug
(401) |
Sep
(345) |
Oct
(542) |
Nov
(308) |
Dec
(481) |
| 2007 |
Jan
(427) |
Feb
(326) |
Mar
(367) |
Apr
(255) |
May
(244) |
Jun
(204) |
Jul
(223) |
Aug
(231) |
Sep
(354) |
Oct
(374) |
Nov
(497) |
Dec
(362) |
| 2008 |
Jan
(322) |
Feb
(482) |
Mar
(658) |
Apr
(422) |
May
(476) |
Jun
(396) |
Jul
(455) |
Aug
(267) |
Sep
(280) |
Oct
(253) |
Nov
(232) |
Dec
(304) |
| 2009 |
Jan
(486) |
Feb
(470) |
Mar
(458) |
Apr
(423) |
May
(696) |
Jun
(461) |
Jul
(551) |
Aug
(575) |
Sep
(134) |
Oct
(110) |
Nov
(157) |
Dec
(102) |
| 2010 |
Jan
(226) |
Feb
(86) |
Mar
(147) |
Apr
(117) |
May
(107) |
Jun
(203) |
Jul
(193) |
Aug
(238) |
Sep
(300) |
Oct
(246) |
Nov
(23) |
Dec
(75) |
| 2011 |
Jan
(133) |
Feb
(195) |
Mar
(315) |
Apr
(200) |
May
(267) |
Jun
(293) |
Jul
(353) |
Aug
(237) |
Sep
(278) |
Oct
(611) |
Nov
(274) |
Dec
(260) |
| 2012 |
Jan
(303) |
Feb
(391) |
Mar
(417) |
Apr
(441) |
May
(488) |
Jun
(655) |
Jul
(590) |
Aug
(610) |
Sep
(526) |
Oct
(478) |
Nov
(359) |
Dec
(372) |
| 2013 |
Jan
(467) |
Feb
(226) |
Mar
(391) |
Apr
(281) |
May
(299) |
Jun
(252) |
Jul
(311) |
Aug
(352) |
Sep
(481) |
Oct
(571) |
Nov
(222) |
Dec
(231) |
| 2014 |
Jan
(185) |
Feb
(329) |
Mar
(245) |
Apr
(238) |
May
(281) |
Jun
(399) |
Jul
(382) |
Aug
(500) |
Sep
(579) |
Oct
(435) |
Nov
(487) |
Dec
(256) |
| 2015 |
Jan
(338) |
Feb
(357) |
Mar
(330) |
Apr
(294) |
May
(191) |
Jun
(108) |
Jul
(142) |
Aug
(261) |
Sep
(190) |
Oct
(54) |
Nov
(83) |
Dec
(22) |
| 2016 |
Jan
(49) |
Feb
(89) |
Mar
(33) |
Apr
(50) |
May
(27) |
Jun
(34) |
Jul
(53) |
Aug
(53) |
Sep
(98) |
Oct
(206) |
Nov
(93) |
Dec
(53) |
| 2017 |
Jan
(65) |
Feb
(82) |
Mar
(102) |
Apr
(86) |
May
(187) |
Jun
(67) |
Jul
(23) |
Aug
(93) |
Sep
(65) |
Oct
(45) |
Nov
(35) |
Dec
(17) |
| 2018 |
Jan
(26) |
Feb
(35) |
Mar
(38) |
Apr
(32) |
May
(8) |
Jun
(43) |
Jul
(27) |
Aug
(30) |
Sep
(43) |
Oct
(42) |
Nov
(38) |
Dec
(67) |
| 2019 |
Jan
(32) |
Feb
(37) |
Mar
(53) |
Apr
(64) |
May
(49) |
Jun
(18) |
Jul
(14) |
Aug
(53) |
Sep
(25) |
Oct
(30) |
Nov
(49) |
Dec
(31) |
| 2020 |
Jan
(87) |
Feb
(45) |
Mar
(37) |
Apr
(51) |
May
(99) |
Jun
(36) |
Jul
(11) |
Aug
(14) |
Sep
(20) |
Oct
(24) |
Nov
(40) |
Dec
(23) |
| 2021 |
Jan
(14) |
Feb
(53) |
Mar
(85) |
Apr
(15) |
May
(19) |
Jun
(3) |
Jul
(14) |
Aug
(1) |
Sep
(57) |
Oct
(73) |
Nov
(56) |
Dec
(22) |
| 2022 |
Jan
(3) |
Feb
(22) |
Mar
(6) |
Apr
(55) |
May
(46) |
Jun
(39) |
Jul
(15) |
Aug
(9) |
Sep
(11) |
Oct
(34) |
Nov
(20) |
Dec
(36) |
| 2023 |
Jan
(79) |
Feb
(41) |
Mar
(99) |
Apr
(169) |
May
(48) |
Jun
(16) |
Jul
(16) |
Aug
(57) |
Sep
(19) |
Oct
|
Nov
|
Dec
|
| S | M | T | W | T | F | S |
|---|---|---|---|---|---|---|
|
|
1
|
2
|
3
|
4
|
5
(1) |
6
(4) |
|
7
(6) |
8
(1) |
9
(3) |
10
|
11
(6) |
12
|
13
(1) |
|
14
|
15
(1) |
16
(2) |
17
(3) |
18
|
19
(1) |
20
|
|
21
|
22
(1) |
23
|
24
|
25
|
26
(14) |
27
(2) |
|
28
|
29
(2) |
30
|
31
|
|
|
|
|
From: Jojo R <rj...@gm...> - 2023-05-22 11:46:42
|
Hi,
Any feedback or suggestion about this RFC ?
在 2023/4/21 17:25, Jojo R 写道:
>
> Hi,
>
> We consider to add RVV/Vector [1] feature in valgrind, there are some
> challenges.
> RVV like ARM's SVE [2] programming model, it's scalable/VLA, that
> means the vector length is agnostic.
> ARM's SVE is not supported in valgrind :(
>
> There are three major issues in implementing RVV instruction set in
> Valgrind as following:
>
> 1. Scalable vector register width VLENB
> 2. Runtime changing property of LMUL and SEW
> 3. Lack of proper VEX IR to represent all vector operations
>
> We propose applicable methods to solve 1 and 2. As for 3, we explore
> several possible but maybe imperfect approaches to handle different cases.
>
> We start from 1. As each guest register should be described in
> VEXGuestState struct, the vector registers with scalable width of
> VLENB can be added into VEXGuestState as arrays using an allowable
> maximum length like 2048/4096.
>
> The actual available access range can be determined at Valgrind
> startup time by querying the CPU for its vector capability or some
> suitable setup steps.
>
>
> To solve problem 2, we are inspired by already-proven techniques in
> QEMU, where translation blocks are broken up when certain critical
> CSRs are set. Because the guest code to IR translation relies on the
> precise value of LMUL/SEW and they may change within a basic block, we
> can break up the basic block each time encountering a vsetvl{i}
> instruction and return to the scheduler to execute the translated code
> and update LMUL/SEW. Accordingly, translation cache management should
> be refactored to detect the changing of LMUL/SEW to invalidate
> outdated code cache. Without losing the generality, the LMUL/SEW
> should be encoded into an ULong flag such that other architectures can
> leverage this flag to store their arch-dependent information. The
> TTentry struct should also take the flag into account no matter
> insertion or deletion. By doing this, the flag carries the newest
> LMUL/SEW throughout the simulation and can be passed to disassemble
> functions using the VEXArchInfo struct such that we can get the real
> and newest value of LMUL and SEW to facilitate our translation.
>
> Also, some architecture-related code should be taken care of. Like
> m_dispatch part, disp_cp_xindir function looks up code cache using
> hardcoded assembly by checking the requested guest state IP and
> translation cache entry address with no more constraints. Many other
> modules should be checked to ensure the in-time update of LMUL/SEW is
> instantly visible to essential parts in Valgrind.
>
>
> The last remaining big issue is 3, which we introduce some ad-hoc
> approaches to deal with. We summarize these approaches into three
> types as following:
>
> 1. Break down a vector instruction to scalar VEX IR ops.
> 2. Break down a vector instruction to fixed-length VEX IR ops.
> 3. Use dirty helpers to realize vector instructions.
>
> The very first method theoretically exists but is probably not
> applicable as the number of IR ops explodes when a large VLENB is
> adopted. Imaging a configuration of VLENB=512, SEW=8, LMUL=8, the VL
> is 512 * 8 / 8 = 512, meaning that a single vector instruction turns
> into 512 scalar instructions and each scalar instruction would be
> expanded to multiple IRs. To make things worse, the tool
> instrumentation will insert more IRs between adjacent scalar IR ops.
> As a result, the performance is likely to be slowed down thousand
> times during running a real-world application with lots of vector
> instructions. Therefore, the other two methods are more promising and
> we will discuss them below.
>
> 2 and 3 are not mutually exclusive as we may choose a suitable method
> from them to implement a vector instruction regarding its concrete
> behavior. To explain these methods in detail, we present some
> instances to illustrate their pros and cons.
>
> In terms of method 2, we have real values of VLENB/LMUL/SEW. The
> simple case is VLENB <= 256 and LMUL=1, where many SIMD IR ops are
> available and can be directly applied to represent vector operations.
> However, even when VLENB is restricted to 128, it still exceeds the
> maximum SIMD width of 256 supported by VEX IR if LMUL>2. Hence, here
> are two variants of method 2 to deal with long vectors:
>
>
> *2.1*Add more SIMD IR ops such as 1024/2048/4096, and translate vector
> instructions in the granularity of VLENB. Accordingly, VLENB=4096 with
> LMUL=2 is fulfilled by two 4096 SIMD VEX IR ops.
>
> * *pros*: it encourages VEX backend to generate more compact and
> efficient SIMD code (maybe). Particularly,it accommodatesmask and
> gather/scatter (indexed) instructions by delivering more
> information in IR itself.
> * *cons*: too many new IR ops need to be introduced in VEX as each
> op of different length should implement its add/sub/mul variants.
> New data types to denote long vectors are necessary too, causing
> difficulties in both VEX backend register allocation and tool
> instrumentation.
>
> *2.2*Break down long vectors to multiple repeated SIMD ops. For
> instance, a vadd.vv vector instruction with VLENB=256/LMUL=2/SEW=8 is
> composed of four operators of Iop_Add8x16 type.
>
> * *pros:*less efforts are required in register allocation and tool
> instrumentation. The VEX frontend is able to notify the backend to
> generate efficient vector instructions by existing Iops. It better
> trades off the complexity of adding many long vector IR ops and
> the benefit of generating high-efficiency host code.
> * *cons:*it is hard to describe a mask operation given that the mask
> is pretty flexible (the least significant bit of each segment of
> v0). Additionally, gather/scatter instructions may have similar
> problems in appropriately dividing index registers. There are
> various corner cases left here such as widening arithmetic
> operations (widening SIMD IR ops are currently not compatible) and
> vstart CSR register. When using fixed-length IR ops to comprise a
> vector instruction, we will inevitably tell each IR op which
> position encoded in vstart you can start to process the data. We
> can use vstart as a normal guest state virtual register to
> calculate each op's start position as a guard IRExpr or obtain the
> value of vstart like what we do in LMUL/SEW. Nevertheless, it is
> non-trivial to decompose a vector instruction concisely.
>
> In short, both 2.1 and 2.2 confront a dilemma in reducing engineering
> efforts of refactoring Valgrind elegantly as well as implementing the
> vector instruction set efficiently. Same obstacles exist in ARM SVE as
> they are scalable vector instructions and flexible in many ways.
>
> The final solution is the dirty helper. It is undoubtedly practical
> and requires possibly the least engineering efforts in dealing with so
> many details in Valgrind. In this design, each instruction is
> completed using an inline assembly running the same instruction on the
> host. Moreover, tool instrumentation already handles IRDirty except
> that new fields should be added in _IRDirty struct to indicate
> strided/indexed/masked memory accesses and arithmetic operations.
>
> * *pros:*it supports all instructions without bothering to build
> complicated IR expressions and statements. It executes vector
> instructions using host CPU to get acceleration to some extent.
> Besides, we do not need to add VEX backend to translate new IRs to
> vector instructions.
> * *cons:*the dirty helper always keeps its operations in a black box
> such that tools can never see what happens in a dirty helper. Like
> memcheck, the bit precision merit is missing once it meets a dirty
> helper as the V-bit propagation chain adopts a pretty coarse
> determination strategy. On the other hand, it is also not an
> elegant way to implement the entire ISA extension in dirty helpers.
>
> In summary, it is far to reach a truly applicable solution in adding
> vector extensions in Valgrind. We need to do detailed and
> comprehensive estimations on different vector instruction categories.
>
> Any feedback is welcome in github [3] also.
>
>
> [1] https://github.com/riscv/riscv-v-spec
>
> [2]
> https://community.arm.com/arm-research/b/articles/posts/the-arm-scalable-vector-extension-sve
>
> [3] https://github.com/petrpavlu/valgrind-riscv64/issues/17
>
>
> Thanks.
>
> Jojo
>
>
>
> _______________________________________________
> Valgrind-developers mailing list
> Val...@li...
> https://lists.sourceforge.net/lists/listinfo/valgrind-developers |