Re: [Valgrind-developers] RFC: support scalable vector model / riscv vector

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

-------- Courriel original --------
Objet: Re: [Valgrind-developers] RFC: support scalable vector model / riscv vector
Date: 2023-05-29 05:29
De: "Wu, Fei" <fe...@in...>
À: Petr Pavlu <pet...@da...>, Jojo R <rj...@gm...>
Cc: pa...@so..., yun...@al..., val...@li...,
val...@li..., zha...@al...

>On 5/28/2023 1:06 AM, Petr Pavlu wrote:
>> On 21. Apr 23 17:25, Jojo R wrote:
>>> The last remaining big issue is 3, which we introduce some ad-hoc 
>>> approaches to deal with. We summarize these approaches into three 
>>> types as
>>> following:
>>> 
>>> 1. Break down a vector instruction to scalar VEX IR ops.
>>> 2. Break down a vector instruction to fixed-length VEX IR ops.
>>> 3. Use dirty helpers to realize vector instructions.
>> 
>> I would also look at adding new VEX IR ops for scalable vector 
>> instructions. In particular, if it could be shown that RVV and SVE can 
>> use same new ops then it could make a good argument for adding them.
>> 
>> Perhaps interesting is if such new scalable vector ops could also 
>> represent fixed operations on other architectures, but that is just me 
>> thinking out loud.
>> 
>It's a good idea to consolidate all vector/simd together, the challenge is to verify its feasibility and to speedup the adaption progress, as it's supposed to take more efforts and longer time. Is there anyone with knowledge or experience of other ISA such as avx/sve on valgrind >can share the pain and gain, or we can do some quick prototype?
>
>Thanks,
>Fei.

Hi,

I don't know if my experience is the one you expect, nevertheless I will try to share it.
I'm the main developer of a valgrind tool called verrou (url: https://github.com/edf-hpc/verrou ) which currently only works with x86_64 architecture.
>From user's point of view, verrou enables to estimate the effect of the floating-point rounding error propagation (If you are interested by the subject, there are documentation and publication). 

>From valgrind tool developer's point of view, we need to replace all floating-point operations (fpo) by our own modified fpo implemented with C++ functions. One C++ function has 1,2 or 3 floating point input values and one floating point output value. 

As we have to replace all VEX fpo, the way we handle with SSE and AVX has consequences for us. For each kind of fpo (add,sub,mul,div,sqrt)x(float,double), we have to replace VEX op for the following variants : scalar, SSE low lane, SSE, AVX. It is painful but possible via code generation. Thanks to the multiple VEX ops it is possible to select only one type of instruction (it can be useful to 1- get speed up, 2- know if floating point errors come from scalar or vector instructions).

On the other hand, for fma operations (madd,msub)x(float,double) we have less work to do, as valgrind do the un-vectorisation for us, but it is impossible to instrument selectively scalar or vector ops.
We could think that the multiple VEX ops enable performance improvements via the vectorisation of C++ call, but it is not now possible (at least to my knowledge). Indeed, with the valgrind API I don't know how I can get the floating-point values in the register without applying un-vectorisation : To get the values in the AVX register, I do an awful sequence of Iop_V256to64_0, Iop_V256to64_1, Iop_V256to64_2, Iop_V256to64_3 for the 2 arguments. As it is not possible to do a IRStmt_Dirty call with a function with 9 args (9=2*4+1  2 for a binary operation, 4 for the vector length and 1 for the result), I do a first call to copy the 4 values of the first arg somewhere then a second one to perform the 4 C++ calls.
Due to the algorithm inside the C++ calls it could be tricky to vectorise, but I even didn't try because of the sequence of Iop_V256to64_*.
In my dreams I would like Iop_ to convert a V256 or V128 type to an aligned pointer on floating point args. 

So, I don't know if my experience can be useful for you, but if someone has a better solution to my needs it will be useful at least ... to me :)

Best regards,
Bruno Lathuilière

Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________

This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or virus-free.