GPUJPEG / Discussion / General Discussion: gpujpeg encode speed

bryang - 2016-03-29

I have a situation where I have an image already in gpu memory, so I made a slight modification to your code to support this use case. It seems to work just fine, except that it takes between 17-18 ms to encode a 640x480 rgb image, quality = 75. I'm running this on a GeForce GTX750 Ti (Maxwell). That seems a bit slow to me, and doesn't line up with the Performance metrics shown on your website.

My changes were simple:
1- I created a new input type (GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU)
2- added a new function in gpujpeg_encoder.cpp to set this image type:
void
gpujpeg_encoder_input_set_image_on_gpu(struct gpujpeg_encoder_input input, uint8_t image)
{
input->type = GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU;
input->image = image;
input->texture = NULL;
}

3- inside your gpujpeg_encoder_encode(...) function, I added this to the end of the load input image if/else statement:
...
<previous if="" statements="">
} else
if(input->type == GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU){
GPUJPEG_CUSTOM_TIMER_START(encoder->def);</previous>

// coder->d_data_raw = input->image;
// Copy image data from circular fifo buffer object to device data
cudaMemcpy(coder->d_data_raw, input->image, coder->data_raw_size * sizeof(uint8_t), cudaMemcpyDeviceToDevice);

GPUJPEG_CUSTOM_TIMER_STOP(encoder->def); coder->duration_memory_to = GPUJPEG_CUSTOM_TIMER_DURATION(encoder->def); } else { // Unknown output type assert(0); }

Is there anything obvious that I've done to slow down your code? It seems to work fine except for the speed.

Thanks for a great library!!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Martin Srom - 2016-03-30

Your changes seems OK to me. The added memory-copy isn't necessary since you can just assign the d_data_raw, but it should't add so much overhead as you are describing.
Try to explore the following variables:

coder->duration_*

They shall tell you which part of the encoding process is slow.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Looks like mostly preprocessor and huffman coder.

    duration_memory_to  0.0248959996    float
    duration_memory_from    0.151168004 float
    duration_memory_map 0.000000000 float
    duration_memory_unmap   0.000000000 float
    duration_preprocessor   5.55904007  float
    duration_dct_quantization   0.825727999 float
    duration_huffman_coder  11.2312956  float
    duration_stream 0.00144000002   float
    duration_in_gpu 17.7247353  float

Does this reveal anything?

Martin Srom - 2016-03-31

I don't know what can be the problem in your application. I tried to measure encoding image 640x480 on my computer with old GeForce GTX 660 and it looks like this:

./gpujpeg --encode --size 640x480 --quality 75 img.rgb img.jpg ... duration_memory_to 0.15 ms duration_memory_from 0.05 ms duration_memory_map 0.00 ms duration_memory_unmap 0.00 ms duration_memory_preprocessor 0.09 ms duration_memory_quantization 0.08 ms duration_memory_huffman_coder 0.30 ms duration_memory_stream 0.05 ms duration_memory_in_gpu 0.49 ms

What version of CUDA do you use? Try different.
Try to use the gpujpeg app from the sources (if you didn't use it).
Try it on another computer.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bryang - 2016-03-31

Wow! That's much faster. Something strange must be happening on my machine.
CUDA 7.5
Visual Studio 2012 Pro
Downloaded your source and rebuilt. A debug build...I'll try a release build.

I'm not sure how something in my app could be slowing your code down, but admittedly, I'm fairly new to gpu programming. My app is a video app with a basic workflow of

H264 video stream -> CUDA Video Decoder -> RGB image -> GPUJPEG Encoder -> JPEG image

Could the CUDA decoder be causing some conflict with GPUJPEG? Resource contention? Doesn't seem right to me. I'll also see if I can find another computer to try this on.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bryang - 2016-04-01

Getting the same performance on 2 other computers, one with GeForce 750 Ti and one with GeForce 950M.

I did a NSight Performance Analysis and have attached the output file (I opened with Excel) and some screen shots of the analysis. It shows the same as we discussed earlier, most time spent in preprocessor and huffman encoding. I'm not sure how to interpret the results...maybe it will be obvious to you if something is there.

Please let me know if anything looks out of order.

Thanks!!

gpujpeg_PerfAnalysis.xml

screen_shot_1.JPG

screen_shot_2.JPG

screen_shot_3.JPG

screen_shot_4.JPG

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Martin Srom - 2016-04-05

The GPUJPEG wasn't designed to run along with another app (e.g., CUDA Video Decoder).You can try the code from another branch (e.g., the latest decoder-gltex), where usage of CUDA stream was implemented and it can improve the performance when running multiple CUDA computations at the same time.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

bryang - 2016-05-04

Hi Martin. I had to step away for a few weeks to finish up another project, but I'm back on this now. I'm still seeing poor performance, both on GeForce GTX 750 Ti and a 950M...somewhere on the order of 17 ms to encode a 640x480 image. So I took your advice above and downloaded the decoder-gltex branch, built a static lib, and wrote a very simple console app that creates a synthetic RGB image, sets up a gpujpeg encoder, and encodes the image. Although I do get a properly formatted JPEG image output, the color is not what I expected (this is probably due to not getting the encoder set up properly). Also, the performance is still poor. I've attached the minimal example. Hopefully there is something obvious that I'm doing wrong. Thanks again.
Bryan

main.cpp

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Hi Bryan, I tested your example and it fails with this error:

Encoding JPEG from pixel format 444-u8-p0p1p2 is supported only when no color transformation is required.Failed to encode image! failure!

You need to change the pixel format from GPUJPEG_444_U8_P0P1P2 to GPUJPEG_444_U8_P012. After this modification, the code works OK:

Encode Image:              5.50 ms
Compressed size: 127481
success #0
Encode Image:              2.00 ms
Compressed size: 128408
success #1
Encode Image:              2.01 ms
Compressed size: 127127
success #2
Encode Image:              1.97 ms
Compressed size: 128002
...

To get "red image" I think you need to change the image creation code like this:

-pimage[r*width*3 + c*3 + 0] = 255; // red
-pimage[r*width*3 + c*3 + 0] = 0;   // green
-pimage[r*width*3 + c*3 + 0] = 0;   // blue
=>
+pimage[r*width*3 + c*3 + 0] = 255; // red
+pimage[r*width*3 + c*3 + 1] = 0;   // green
+pimage[r*width*3 + c*3 + 2] = 0;   // blue

bryang - 2016-05-05

Hi Martin. Thanks for the response, and I apologize for the silly errors in my code. <embarassing></embarassing>

I made the changes you show above, and get the following timing results:

Encode Image: 50.94 ms Compressed size: 39029 success #0 Encode Image: 31.41 ms Compressed size: 39029 success #1 Encode Image: 31.55 ms Compressed size: 39029 success #2 Encode Image: 31.58 ms Compressed size: 39029 success #3 Encode Image: 31.38 ms Compressed size: 39029 success #4

Nothing else (that I can tell) is running on my machine, but these times are not nearly as good as yours. I'm starting to wonder if its my GeForce GTX 750 Ti. I've arranged to have access to a machine with a Quadro K4000 tomorrow, so I'll test the same code there. Not sure what else it could be at this point. Maybe it has something to do with the Maxwell architecture of the 750 Ti...this is just a $150 card. I'll post my results in case you're interested.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Results on a Quadro K4000:

Encode Image:             18.63 ms
Compressed size: 12629
success #0
Encode Image:              8.32 ms
Compressed size: 12629
success #1
Encode Image:              8.12 ms
Compressed size: 12629
success #2
Encode Image:              8.35 ms
Compressed size: 12629
success #3
Encode Image:              8.22 ms
Compressed size: 12629
success #4

Let me know if you have any suggestions. Thanks.

bryang - 2016-05-06

Recompiled in Release mode. New results. I don't understand why there's such a big difference, but there you have it. Results from GeForce GTX 750 Ti, 640x480 RGB image, Release Mode:

Encode Image: 17.43 ms Compressed size: 12629 success #0 Encode Image: 1.39 ms Compressed size: 12629 success #1 Encode Image: 1.83 ms Compressed size: 12629 success #2 Encode Image: 1.52 ms Compressed size: 12629 success #3 Encode Image: 1.39 ms Compressed size: 12629 success #4

Thanks!

Last edit: bryang 2016-05-06
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

gpujpeg encode speed

JPEG compression and decompression accelerated on GPU

Forums

Help

gpujpeg encode speed

gpujpeg encode speed

JPEG compression and decompression accelerated on GPU

Forums

Help

gpujpeg encode speed document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

gpujpeg encode speed