I have a situation where I have an image already in gpu memory, so I made a slight modification to your code to support this use case. It seems to work just fine, except that it takes between 17-18 ms to encode a 640x480 rgb image, quality = 75. I'm running this on a GeForce GTX750 Ti (Maxwell). That seems a bit slow to me, and doesn't line up with the Performance metrics shown on your website.
My changes were simple:
1- I created a new input type (GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU)
2- added a new function in gpujpeg_encoder.cpp to set this image type:
void
gpujpeg_encoder_input_set_image_on_gpu(struct gpujpeg_encoder_input input, uint8_t image)
{
input->type = GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU;
input->image = image;
input->texture = NULL;
}
3- inside your gpujpeg_encoder_encode(...) function, I added this to the end of the load input image if/else statement:
...
<previous if="" statements="">
} else
if(input->type == GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU){
GPUJPEG_CUSTOM_TIMER_START(encoder->def);</previous>
// coder->d_data_raw = input->image;
// Copy image data from circular fifo buffer object to device data
cudaMemcpy(coder->d_data_raw, input->image, coder->data_raw_size * sizeof(uint8_t), cudaMemcpyDeviceToDevice);
Your changes seems OK to me. The added memory-copy isn't necessary since you can just assign the d_data_raw, but it should't add so much overhead as you are describing.
Try to explore the following variables:
coder->duration_*
They shall tell you which part of the encoding process is slow.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I don't know what can be the problem in your application. I tried to measure encoding image 640x480 on my computer with old GeForce GTX 660 and it looks like this:
./gpujpeg --encode --size 640x480 --quality 75 img.rgb img.jpg
...
duration_memory_to 0.15 ms
duration_memory_from 0.05 ms
duration_memory_map 0.00 ms
duration_memory_unmap 0.00 ms
duration_memory_preprocessor 0.09 ms
duration_memory_quantization 0.08 ms
duration_memory_huffman_coder 0.30 ms
duration_memory_stream 0.05 ms
duration_memory_in_gpu 0.49 ms
What version of CUDA do you use? Try different.
Try to use the gpujpeg app from the sources (if you didn't use it).
Try it on another computer.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Wow! That's much faster. Something strange must be happening on my machine.
CUDA 7.5
Visual Studio 2012 Pro
Downloaded your source and rebuilt. A debug build...I'll try a release build.
I'm not sure how something in my app could be slowing your code down, but admittedly, I'm fairly new to gpu programming. My app is a video app with a basic workflow of
H264 video stream -> CUDA Video Decoder -> RGB image -> GPUJPEG Encoder -> JPEG image
Could the CUDA decoder be causing some conflict with GPUJPEG? Resource contention? Doesn't seem right to me. I'll also see if I can find another computer to try this on.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Getting the same performance on 2 other computers, one with GeForce 750 Ti and one with GeForce 950M.
I did a NSight Performance Analysis and have attached the output file (I opened with Excel) and some screen shots of the analysis. It shows the same as we discussed earlier, most time spent in preprocessor and huffman encoding. I'm not sure how to interpret the results...maybe it will be obvious to you if something is there.
Please let me know if anything looks out of order.
The GPUJPEG wasn't designed to run along with another app (e.g., CUDA Video Decoder).You can try the code from another branch (e.g., the latest decoder-gltex), where usage of CUDA stream was implemented and it can improve the performance when running multiple CUDA computations at the same time.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi Martin. I had to step away for a few weeks to finish up another project, but I'm back on this now. I'm still seeing poor performance, both on GeForce GTX 750 Ti and a 950M...somewhere on the order of 17 ms to encode a 640x480 image. So I took your advice above and downloaded the decoder-gltex branch, built a static lib, and wrote a very simple console app that creates a synthetic RGB image, sets up a gpujpeg encoder, and encodes the image. Although I do get a properly formatted JPEG image output, the color is not what I expected (this is probably due to not getting the encoder set up properly). Also, the performance is still poor. I've attached the minimal example. Hopefully there is something obvious that I'm doing wrong. Thanks again.
Bryan
Hi Martin. Thanks for the response, and I apologize for the silly errors in my code. <embarassing></embarassing>
I made the changes you show above, and get the following timing results:
Encode Image: 50.94 ms
Compressed size: 39029
success #0
Encode Image: 31.41 ms
Compressed size: 39029
success #1
Encode Image: 31.55 ms
Compressed size: 39029
success #2
Encode Image: 31.58 ms
Compressed size: 39029
success #3
Encode Image: 31.38 ms
Compressed size: 39029
success #4
Nothing else (that I can tell) is running on my machine, but these times are not nearly as good as yours. I'm starting to wonder if its my GeForce GTX 750 Ti. I've arranged to have access to a machine with a Quadro K4000 tomorrow, so I'll test the same code there. Not sure what else it could be at this point. Maybe it has something to do with the Maxwell architecture of the 750 Ti...this is just a $150 card. I'll post my results in case you're interested.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Recompiled in Release mode. New results. I don't understand why there's such a big difference, but there you have it. Results from GeForce GTX 750 Ti, 640x480 RGB image, Release Mode:
Encode Image: 17.43 ms
Compressed size: 12629
success #0
Encode Image: 1.39 ms
Compressed size: 12629
success #1
Encode Image: 1.83 ms
Compressed size: 12629
success #2
Encode Image: 1.52 ms
Compressed size: 12629
success #3
Encode Image: 1.39 ms
Compressed size: 12629
success #4
Thanks!
Last edit: bryang 2016-05-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have a situation where I have an image already in gpu memory, so I made a slight modification to your code to support this use case. It seems to work just fine, except that it takes between 17-18 ms to encode a 640x480 rgb image, quality = 75. I'm running this on a GeForce GTX750 Ti (Maxwell). That seems a bit slow to me, and doesn't line up with the Performance metrics shown on your website.
My changes were simple:
1- I created a new input type (GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU)
2- added a new function in gpujpeg_encoder.cpp to set this image type:
void
gpujpeg_encoder_input_set_image_on_gpu(struct gpujpeg_encoder_input input, uint8_t image)
{
input->type = GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU;
input->image = image;
input->texture = NULL;
}
3- inside your gpujpeg_encoder_encode(...) function, I added this to the end of the load input image if/else statement:
...
<previous if="" statements="">
} else
if(input->type == GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU){
GPUJPEG_CUSTOM_TIMER_START(encoder->def);</previous>
// coder->d_data_raw = input->image;
// Copy image data from circular fifo buffer object to device data
cudaMemcpy(coder->d_data_raw, input->image, coder->data_raw_size * sizeof(uint8_t), cudaMemcpyDeviceToDevice);
Is there anything obvious that I've done to slow down your code? It seems to work fine except for the speed.
Thanks for a great library!!
Your changes seems OK to me. The added memory-copy isn't necessary since you can just assign the d_data_raw, but it should't add so much overhead as you are describing.
Try to explore the following variables:
They shall tell you which part of the encoding process is slow.
Looks like mostly preprocessor and huffman coder.
Does this reveal anything?
I don't know what can be the problem in your application. I tried to measure encoding image 640x480 on my computer with old GeForce GTX 660 and it looks like this:
What version of CUDA do you use? Try different.
Try to use the gpujpeg app from the sources (if you didn't use it).
Try it on another computer.
Wow! That's much faster. Something strange must be happening on my machine.
CUDA 7.5
Visual Studio 2012 Pro
Downloaded your source and rebuilt. A debug build...I'll try a release build.
I'm not sure how something in my app could be slowing your code down, but admittedly, I'm fairly new to gpu programming. My app is a video app with a basic workflow of
H264 video stream -> CUDA Video Decoder -> RGB image -> GPUJPEG Encoder -> JPEG image
Could the CUDA decoder be causing some conflict with GPUJPEG? Resource contention? Doesn't seem right to me. I'll also see if I can find another computer to try this on.
Getting the same performance on 2 other computers, one with GeForce 750 Ti and one with GeForce 950M.
I did a NSight Performance Analysis and have attached the output file (I opened with Excel) and some screen shots of the analysis. It shows the same as we discussed earlier, most time spent in preprocessor and huffman encoding. I'm not sure how to interpret the results...maybe it will be obvious to you if something is there.
Please let me know if anything looks out of order.
Thanks!!
The GPUJPEG wasn't designed to run along with another app (e.g., CUDA Video Decoder).You can try the code from another branch (e.g., the latest decoder-gltex), where usage of CUDA stream was implemented and it can improve the performance when running multiple CUDA computations at the same time.
Hi Martin. I had to step away for a few weeks to finish up another project, but I'm back on this now. I'm still seeing poor performance, both on GeForce GTX 750 Ti and a 950M...somewhere on the order of 17 ms to encode a 640x480 image. So I took your advice above and downloaded the decoder-gltex branch, built a static lib, and wrote a very simple console app that creates a synthetic RGB image, sets up a gpujpeg encoder, and encodes the image. Although I do get a properly formatted JPEG image output, the color is not what I expected (this is probably due to not getting the encoder set up properly). Also, the performance is still poor. I've attached the minimal example. Hopefully there is something obvious that I'm doing wrong. Thanks again.
Bryan
Hi Bryan, I tested your example and it fails with this error:
You need to change the pixel format from GPUJPEG_444_U8_P0P1P2 to GPUJPEG_444_U8_P012. After this modification, the code works OK:
To get "red image" I think you need to change the image creation code like this:
Hi Martin. Thanks for the response, and I apologize for the silly errors in my code. <embarassing></embarassing>
I made the changes you show above, and get the following timing results:
Nothing else (that I can tell) is running on my machine, but these times are not nearly as good as yours. I'm starting to wonder if its my GeForce GTX 750 Ti. I've arranged to have access to a machine with a Quadro K4000 tomorrow, so I'll test the same code there. Not sure what else it could be at this point. Maybe it has something to do with the Maxwell architecture of the 750 Ti...this is just a $150 card. I'll post my results in case you're interested.
Results on a Quadro K4000:
Let me know if you have any suggestions. Thanks.
Recompiled in Release mode. New results. I don't understand why there's such a big difference, but there you have it. Results from GeForce GTX 750 Ti, 640x480 RGB image, Release Mode:
Thanks!
Last edit: bryang 2016-05-06