|
From: Mark W. <ma...@kl...> - 2020-10-30 11:38:19
|
Hi valgrind hackers, We are working on supporting the arm64 fmadd/sub instructions for 32bit floats and 64bit doubles using Iop_MAdd/SubF32/F64. See https://bugs.kde.org/show_bug.cgi?id=426014 One embarrassing bug we introduced was mixing up the order of Addend (A) and Product (NxM), causing hilarious results. We were wondering in which order other arches (ppc, s390 and mips) were storing the original arguments for the Iop_MAdd/SubF32/F64. We are currently storing them as RoundingMode, A, N, M. Is that the same as other arches are using those Iops? Is that documented somewhere we missed? And are there any helpers that we could have used to get/keep this correct? Thanks, Mark |
|
From: Mark W. <ma...@kl...> - 2020-11-05 17:51:59
|
Hi valgrind hackers, On Fri, 2020-10-30 at 12:37 +0100, Mark Wielaard wrote: > We are working on supporting the arm64 fmadd/sub instructions for 32bit > floats and 64bit doubles using Iop_MAdd/SubF32/F64. See > > https://bugs.kde.org/show_bug.cgi?id=426014 > > One embarrassing bug we introduced was mixing up the order of Addend > (A) and Product (NxM), causing hilarious results. > > We were wondering in which order other arches (ppc, s390 and mips) were > storing the original arguments for the Iop_MAdd/SubF32/F64. We are > currently storing them as RoundingMode, A, N, M. > > Is that the same as other arches are using those Iops? Is that > documented somewhere we missed? And are there any helpers that we could > have used to get/keep this correct? On irc Julian pointed out that I had overlooked the definition in libvex_ir.h: /* Fused multiply-add/sub */ /* :: IRRoundingMode(I32) x F32 x F32 x F32 -> F32 (computes arg2 * arg3 +/- arg4) */ Iop_MAddF32, Iop_MSubF32, /* Ternary operations, with rounding. */ /* Fused multiply-add/sub, with 112-bit intermediate precision for ppc. Also used to implement fused multiply-add/sub for s390. */ /* :: IRRoundingMode(I32) x F64 x F64 x F64 -> F64 (computes arg2 * arg3 +/- arg4) */ Iop_MAddF64, Iop_MSubF64, That settles it. Sorry for overlooking the obvious place to look. We will change the argument order to follow the above definition. Cheers, Mark |