Re: [Valgrind-developers] RFC: support scalable vector model / riscv vector

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

-----Message d'origine-----
De : fe...@in... <fe...@in...> 
Envoyé : lundi 5 juin 2023 03:24
À : LATHUILIERE Bruno <bru...@ed...>; val...@li...; val...@li...; Petr Pavlu <pet...@da...>; Jojo R <rj...@gm...>; pa...@so...; yun...@al...; zha...@al...
Objet : Re: [Valgrind-developers] RFC: support scalable vector model / riscv vector

>On 6/1/2023 7:13 PM, LATHUILIERE Bruno via Valgrind-developers wrote:
>> 
>> -------- Courriel original --------
>> Objet: Re: [Valgrind-developers] RFC: support scalable vector model / 
>> riscv vector
>> Date: 2023-05-29 05:29
>> De: "Wu, Fei" <fe...@in...>
>> À: Petr Pavlu <pet...@da...>, Jojo R <rj...@gm...>

>> From valgrind tool developer's point of view, we need to replace all floating-point operations (fpo) by our own modified fpo implemented with C++ functions. One C++ function has 1,2 or 3 floating point input values and one floating point output value. 
> 
>Do you use libvex_BackEnd() to translate the insn to host, e.g.
>host_riscv64_isel.c to select the host insn, Is there any difference of processing flow between verrou and memcheck?

I do not use (at least directly) function defined in host_*_isel.c. The fact Verrou is not yet portable to other architecture comes from two reasons :
- In the C++ call we use intrinsics  for the fma.  
- We need to compile with --enable-only64bit. As I do not need to use verrou on 32bit architecture, I postpone the problem.

>> As we have to replace all VEX fpo, the way we handle with SSE and AVX has consequences for us. For each kind of fpo (add,sub,mul,div,sqrt)x(float,double), we have to replace VEX op for the following variants : scalar, SSE low lane, SSE, AVX. It is painful but possible via code generation. Thanks to the multiple VEX ops it is possible to select only one type of instruction (it can be useful to 1- get speed up, 2- know if floating point errors come from scalar or vector instructions).
>> 
>> On the other hand, for fma operations (madd,msub)x(float,double) we have less work to do, as valgrind do the un-vectorisation for us, but it is impossible to instrument selectively scalar or vector ops.

>As these insns are un-vectorised, are there any other issues besides the
>1 (performance) & 2 (original type) mentioned above? I want to make sure if there is any risk of the un-vectorisation design, e.g. when the vector length is large such as 2k vlen on rvv.
As a user of valgrind framework (ie tool developer), I ve no idea about this kind of limitation. To be able to develop a valgrind tool without strong architecture knowledge is a strength of valgrind framework.  

>> We could think that the multiple VEX ops enable performance improvements via the vectorisation of C++ call, but it is not now possible (at least to my knowledge). Indeed, with the valgrind API I don't know how I can get the floating-point values in the register without applying un-vectorisation : To get the values in the AVX register, I do an awful sequence of Iop_V256to64_0, Iop_V256to64_1, Iop_V256to64_2, Iop_V256to64_3 for the 2 arguments. As it is not possible to do a IRStmt_Dirty call with a function with 9 args (9=2*4+1  2 for a binary operation, 4 for the vector length and 1 for the result), I do a first call to copy the 4 values of the first arg somewhere then a second one to perform the 4 C++ calls.
>> Due to the algorithm inside the C++ calls it could be tricky to vectorise, but I even didn't try because of the sequence of Iop_V256to64_*.

>For memcheck, the process is as follows if we put it simple:
>   toIR -> instrumentation -> Backend isel

With my understanding the tool memcheck do only the instrumentation stage, and toIR and backend isel stages are done by the valgrind framework.

>If the vector insn is split into scalar at the stage of toIR just as I did in this series, the advantage looks obvious as I only need to deal with this single stage and leverage the existing code to handle the scalar >version, the disadvantage is that it might lose some opportunities to optimize, e.g.
>* toIR - introduce extra temp variables for generated scalars
>* instrumentation - for memcheck, the key is to trace the V+A bits instead of the real results of the ops, the ideal case is V+A of the whole vector can be checked together w/o breaking it to scalars
You pinpoint the main difference between verrou and memcheck. The verrou instrumentation can not be seen as a trace generation : indeed we modifiy the floating point behaviour.

>* Backend isel - the ideal case is to use the vector insn on host for guest vector insn, but I'm not sure how much effort will be taken to achieve this.

> 
>Thank you again for this sharing. I hope the discussion can help both of us, and others.
I hope so.

>
>Best regards,
>Fei.

Best regards,
Bruno Lathuilière

Ce message et toutes les pièces jointes (ci-après le 'Message') sont établis à l'intention exclusive des destinataires et les informations qui y figurent sont strictement confidentielles. Toute utilisation de ce Message non conforme à sa destination, toute diffusion ou toute publication totale ou partielle, est interdite sauf autorisation expresse.

Si vous n'êtes pas le destinataire de ce Message, il vous est interdit de le copier, de le faire suivre, de le divulguer ou d'en utiliser tout ou partie. Si vous avez reçu ce Message par erreur, merci de le supprimer de votre système, ainsi que toutes ses copies, et de n'en garder aucune trace sur quelque support que ce soit. Nous vous remercions également d'en avertir immédiatement l'expéditeur par retour du message.

Il est impossible de garantir que les communications par messagerie électronique arrivent en temps utile, sont sécurisées ou dénuées de toute erreur ou virus.
____________________________________________________

This message and any attachments (the 'Message') are intended solely for the addressees. The information contained in this Message is confidential. Any use of information contained in this Message not in accord with its purpose, any dissemination or disclosure, either whole or partial, is prohibited except formal approval.

If you are not the addressee, you may not copy, forward, disclose or use any part of it. If you have received this message in error, please delete it and all copies from your system and notify the sender immediately by return message.

E-mail communication cannot be guaranteed to be timely secure, error or virus-free.