FLoops.jl
Fast sequential, threaded, and distributed for-loops for Julia
...Furthermore, the loop written in @floop can be executed with any compatible executors. See FoldsThreads.jl for various thread-based executors that are optimized for different kinds of loops. FoldsCUDA.jl provides an executor for GPU. FLoops.jl also provides a simple distributed executor.