QuadRay-engine
Realtime raytracer using SIMD on ARM, MIPS, PPC and x86
...The efficient use of SIMD is achieved
by processing four rays at a time to match SIMD register width (hence the name).
The rendering core of the engine is written in a unified SIMD assembler
allowing single assembler code to be compatible with different processor
architectures, thus reducing the need to maintain multiple parallel versions.
At present, Intel SSE/SSE2/SSE4 and AVX/AVX2/AVX-512 (32/64-bit x86 ISAs),
ARMv7 NEON/NEONv2, ARMv8 AArch32 and AArch64 NEON, SVE (32/64-bit ARM ISAs),
MIPS 32/64-bit r5/r6 MSA and POWER 32/64-bit VMX/VSX (little/big-endian ISAs)
are mostly implemented (/w horizontal reductions) although scalar improvements,
wider SIMD vectors with zeroing/merging predicates in 3/4-operand instructions
are planned as extensions to current 2/3-operand SPMD-driven vertical SIMD ISA.
...