From: Petr P. <pet...@da...> - 2023-10-31 11:38:25
|
On 25. Sep 23 10:14, Wu, Fei wrote: > On 8/21/2023 4:57 PM, Wu, Fei wrote: > > On 5/26/2023 10:08 PM, Wu, Fei wrote: > >> On 5/26/2023 9:59 PM, Fei Wu wrote: > >>> I'm from Intel RISC-V team and working on a RISC-V International > >>> development partner project to add RISC-V vector (RVV) support on > >>> Valgrind, the target tool is memcheck. My work bases on commit > >>> 71272b252977 of Petr's riscv64-linux branch, many thanks to Petr for his > >>> great work first. > >>> https://github.com/petrpavlu/valgrind-riscv64 > >>> > > A vector IR version has been added into this repo (branch poc-rvv): > > https://github.com/intel/valgrind-rvv/commits/poc-rvv > > > > A bunch of RVV instructions have been enabled, and I plan to enable more > > instructions to run some general benchmarks, coremark is the first one. > > > I enabled all the RVV instructions except floating-point and fixed-point > ones, I think it's ready to have a try and an initial review. > https://github.com/intel/valgrind-rvv/commits/poc-rvv > > The designs of this patch series: > * All the RVV instructions have a corresponding vector-IR, except > load/store which is still using the split-to-scalar pattern according to > the mask. > * Most of the instructions where each element is calculated > independently such as vadd, are implemented as (read origin, calculate > the result as unmasked version, apply mask), so that it's not necessary > to define both masked and unmasked version for the same vector IR. For > the other instructions such as vredsum, the masked and unmasked > instructions are treated differently. > * Besides the `vl' which is encoded in vector op/ty, there are some > other info such as vlmax needed to pass from frontend to backend, we > have a basic method to do it. > > I have done very limited testing on the code, usually a very basic test > for single instruction correctness. The memcheck logic for some > instructions still wait for polishing, it might not be able to find all > the bugs memcheck is supposed to identify, but at least it shouldn't > block us to try it against rvv binaries and find the bugs unrelated to > RVV instructions. > > @Petr, could you please take a look? As previously touched upon in another thread [1], I'm not entirely convinced by the idea of multiple translations of a same code per various active vector lengths (vl). The main issue is that having such multiple translations requires spending more cycles on the translation and creates pressure on memory/caches. Some chaining opportunities may be also lost when it won't be possible in certain cases to statically determine which translation to select. Most vector operations shouldn't need to be aware of the current vl setting as they can operate on entire vectors. Loads/stores are one exception, they need to avoid reading/writing memory past the limit determined by the value of vl. The approach overall looks to me somewhat RVV centric. SVE as another scalable vector architecture is designed around predicate masks and so in general doesn't have a vl value. The RVV prototype adds new IRTypes called Ity_VLen1 (for masks), Ity_VLen8, Ity_VLen16, Ity_VLen32, Ity_VLen64 (for data) and uses the upper 16 bits of an IRType instance to record the current vl. New IROps named Iop_V* are added and similarly vl is stored in the upper 16 bits of an IROp. For instance, some new addition ops are Iop_VAdd_vv_8, Iop_VAdd_vv_16, Iop_VAdd_vv_32, Iop_VAdd_vv_64. My SVE prototype takes an alternative approach inspired by GCC and LLVM. The design derives from the "< vscale x <#elements> x <elementtype> >" LLVM syntax [2] and the corresponding "nxv<#elements><elementtype>" machine value types [3]. The "vscale"/"n" in this compiler model is constant and equal at runtime to VLEN/64. The prototype is available in the valgrind-riscv64 repository on GitHub, branch poc-sve [4]. The code introduces two new IRTypes: Ity_V8xN to represent "an 8-bit x N scalable vector predicate" and Ity_V64xN for "a 64-bit x N scalable vector data". New IROps are added too. For example, addition ops are Iop_Add8x8xN, Iop_Add16x4xN, Iop_Add32x2xN, Iop_Add64x1xN. Note that Valgrind has a different ordering in the notation of vector operations. Valgrind uses "< <elementtype> x <#elements> >" for fixed vectors, compared to the more common "< <#elements> x <elementtype> >". My prototype follows the reversed notation used by Valgrind, meaning "< <elementtype> x <#elements> x <vscale/N> >". The "N" dimension is critical. In contrast to compilers, Valgrind has the luxury to know its actual value. For SVE, the value would be naively equal to VLEN/64 and constant throughout the runtime of a client program. However, Armv9.2-A introduced Scalable Matrix Extension [5] along with the so-called Streaming SVE mode. This operating mode can have an alternative vector size and userspace programs may switch between the two modes as needed. This requires tracking the current scalable vector register size and therefore the "N" value at a higher granularity. It makes me think "N" should be attached to and be constant per an IRSB. RVV with its LMUL grouping is similar in that it results in a different vector size. Following the LLVM schema, LMUL could be modeled through the "<#elements>" dimension. For instance, Iop_Add8x16xN would be an 8-bit addition for LMUL=2. However, this would entail adding too many new iops which is not desirable. A better option in the Valgrind case looks to be for LMUL to affect "N" too. Its current value would be tracked and constant per an IRSB, same as in the SVE case. [1] https://sourceforge.net/p/valgrind/mailman/valgrind-developers/thread/zsjnggi3ngqkjmbsojjnvjcqtlr5crplev3inb55jlwxnr6tjr%40s5a65qvm3kjc/#msg37872645 [2] https://llvm.org/docs/LangRef.html#t-vector [3] https://github.com/llvm/llvm-project/blob/llvmorg-17.0.0/llvm/lib/Target/RISCV/RISCVRegisterInfo.td#L266 [4] https://github.com/petrpavlu/valgrind-riscv64/tree/poc-sve [5] https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/scalable-matrix-extension-armv9-a-architecture -- Cheers, Petr |