I use GPUJPEG to encode only parts of the image that have changed and noticed that there is quite a big constant overhead when encoding small images (e.g. 256x256). Is there any way this can be improved or is it the nature of CUDA that there is not enough parallel work to do on such small images?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The library is designed to encode one image at a time in each encoder/decoder instance. When encoding/decoding small images it can happend that the GPU isn't fully utilized. You can run your application in nvvp (NVIDIA Visual Profiler) to see how the GPU is utilized.
You can try to use multiple threads in your application and in each thread create separate encoder instance and then encode multiple images simultaneously, it may help to utilize the GPU.
The library would need to be rewritten to support encoding/decoding small images to fully utilize GPU by single instance and unfortunately I don't have time to do it.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Thanks for the quick answer. Then I will just make sure to keep the images large enough to benefit from the GPU, multiple threads is currently not an option for me, due to the OpenGL textures.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I use GPUJPEG to encode only parts of the image that have changed and noticed that there is quite a big constant overhead when encoding small images (e.g. 256x256). Is there any way this can be improved or is it the nature of CUDA that there is not enough parallel work to do on such small images?
The library is designed to encode one image at a time in each encoder/decoder instance. When encoding/decoding small images it can happend that the GPU isn't fully utilized. You can run your application in nvvp (NVIDIA Visual Profiler) to see how the GPU is utilized.
You can try to use multiple threads in your application and in each thread create separate encoder instance and then encode multiple images simultaneously, it may help to utilize the GPU.
The library would need to be rewritten to support encoding/decoding small images to fully utilize GPU by single instance and unfortunately I don't have time to do it.
Thanks for the quick answer. Then I will just make sure to keep the images large enough to benefit from the GPU, multiple threads is currently not an option for me, due to the OpenGL textures.