CUDA-Quicksort

CUDA-quicksort is a GPU-based implementation of the quicksort algorithm. CUDA-quicksort is designed to exploit the power computing of modern NVIDIA GPUs. "Two GPU-based implementations of the quicksort were presented in literature: the GPU-quicksort, a compute-unified device architecture (CUDA) iterative implementation, and the CUDA dynamic parallel (CDP) quicksort, a recursive implementation provided by NVIDIA Corporation."[*]. CUDA-quicksort is an iterative GPU-based implementation of the quicksort algorithm. "Experiments performed on six sorting benchmark distributions show that CUDA-quicksort is up to four times faster than GPU-quicksort and up to three times faster than CDP-quicksort."[*].

*Copyright © 2015 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. (2015) DOI: 10.1002/cpe.3611

For further information, please see the corresponding publication: http://onlinelibrary.wiley.com/doi/10.1002/cpe.3611/abstract

Features

CUDA-quicksort has been designed starting from GPU-quicksort. Unlike GPU-quicksort, it uses atomic primitives to perform inter-block communications while ensuring an optimized access to the GPU memory. [Copyright © 2015 John Wiley & Sons, Ltd. Concurrency Computat.: Pract. Exper. (2015) DOI: 10.1002/cpe.3611]
The CUDA-quicksort algorithm has been published in Concurrency and Computation Practice and Experience · August 2015 . DOI: 10.1002/cpe.3611
How to cite: Manca, E., Manconi, A., Orro, A., Armano, G., and Milanesi, L. (2015) CUDA-quicksort: an improved GPU-based implementation of quicksort. Concurrency Computat.: Pract. Exper., doi: 10.1002/cpe.3611.