From: Julian S. <js...@ac...> - 2017-02-09 18:01:33
|
Hi Rashawn, Thank you for the offer of adding AVX-512 support, and sorry for the slow response. Some of the Valgrind developers discussed this briefly at Fosdem in Brussels last weekend and there was general agreement that this would be a good thing to do. I would be happy to be a point of contact for technical and process assistance. I have both technical and process comments regarding your proposal. >From a process point of view: * This is likely to take several months and may involve more than one round of review and iteration. That's based on experience from other large chunks of instruction-set development work. * As an example, have a look the following 5 bugs, which show a staged approach to implementation of the recent POWER ISA 3.0 extensions: https://bugs.kde.org/show_bug.cgi?id=359767 https://bugs.kde.org/show_bug.cgi?id=361207 https://bugs.kde.org/show_bug.cgi?id=362329 https://bugs.kde.org/show_bug.cgi?id=363858 https://bugs.kde.org/show_bug.cgi?id=364948 * Patches should go on the bug tracker, as per the examples above, and will be reviewed there. * All contributions to the tree need to be licensed "GNU GPL 2 or later". Are you OK with that? GPL 2-only is not possible. * There is a general, although largely unstated, expectation that parties who contribute large chunks of code continue afterwards to provide at least some minimal level of support/bugfixing, especially around release-time. We've had problems in the past with large bits of the code going into the tree and the developers later simply disappearing, and would prefer to avoid that in future. Would you be able to provide that level of support going forward? * Similarly, there is an expectation that you have some machine which can run nightly tests (from our framework) and send results to the valgrind-testresults mailing list. Since none of the developers (AFAIK) have AVX512 capable hardware, we have no other way to know whether the support is working. * VEX is basically a mini-compiler for basic blocks. Not essential, but it will help if your developer(s) have a bit of basic background in compiler internals. Regarding your proposed implementation steps, they sound plausible. However: * You need a step zero, which is to extend Valgrind's HW capabilities detection (coregrind/m_machine.c) to detect AVX512 support and tell VEX about it. That has to happen before any insns get implemented. * Also, you will need to extend the implementation of XSAVE and XRSTOR to cover the new register state. Given the inflexibility of VEX's IR (intermediate representation), the current AVX2-level XSAVE and XRSTOR was difficult to implement and is hard to understand, so this is likely to be a challenge. I suggest you deal with it sooner rather than later, since we've found that runtime libraries rely on XSAVE and XRSTOR and so you won't be able to run any real code with AVX512 until those two are working. * I assume (although you didn't say this) that you are doing this for the 64-bit instruction set only. Our 32 bit insn set support is essentially legacy, having stopped at SSSE3, and doesn't have a proper prefix decoder in the same way that the 64 bit front end does. * Write test cases for the insns first, and make sure they are comprehensive enough and work well. This reduces the general stress and difficulty of implementing the instructions. Bear in mind that incorrect instruction emulation can corrupt program state in a way that isn't apparent until hundreds of millions of instructions later, by which time it is impossible to figure out what went wrong. So a good test suite is essential. See for example none/tests/amd64/avx2-1.c and many others in the same directory. * Some of the existing AVX256 insn implementations are less than ideal, in the sense that they generate very verbose IR that performs operations a lane at a time, rather than as a vector as a whole. That gives rise to problems like https://bugs.kde.org/show_bug.cgi?id=375839 The practical consequence is that (often) you won't be able to just implement a 512-bit variant of an existing 256-bit insn by doubling up the IR -- we'll have to do something better (wider and shallower) here. * If -- as seems likely -- you need to add new IROps to facilitate this support, then you will also need to add support for them in memcheck/mc_translate.c. * Since you are adding register state, you'll need to futz with memcheck/mc_machine.c too. * You will need to be careful to ensure that the back end provides SIMD integer support capable of supporting Memcheck's instrumentation of the front end's SIMD FP IR. Without that, you'll wind up in a situation where you can run AVX512 code with the 'none' tool but not with 'memcheck'. This is an arcane but important detail. We can come back to it later. J |