As part of the AnySL system we implemented an LLVM backend for NVIDIA's "Parallel Thread Execution" (PTX) assembly language. PTX is the low-level representation fed to NVIDIA GPGPU graphics drivers and is usually generated by compilers for the "Compute Unified Device Architecture" (CUDA).
The backend is similar to LLVM's C-backend and generates .ptx files directly from LLVM's intermediate representation (IR).
The backend already supports most of the PTX features:
* simple arithmetic (add, mul, ...)
* control flow
* structs and arrays
* simple function calls (no recursion, no struct returns)
* global, shared, constant, and texture memory acces
* mathematical functions (e.g. sin, cos, sqrt, pow, ...)
* special registers (e.g. thread_id)
There are no intrinsics for PTX-specific functionality like texture fetches, they are currently only accessed via external functions. Atomic and synchronization instructions are not yet implemented but should work the same way.
Performance has not yet been optimized to a larger degree. Register pressure lowering optimizations are necessary for more performant code.
The backend was written as part of a bachelor's thesis of Helge Rhodin.
Code contributions to the backend are very welcome! :)