Menu

gpujpeg encode speed

bryang
2016-03-29
2016-05-06
  • bryang

    bryang - 2016-03-29

    I have a situation where I have an image already in gpu memory, so I made a slight modification to your code to support this use case. It seems to work just fine, except that it takes between 17-18 ms to encode a 640x480 rgb image, quality = 75. I'm running this on a GeForce GTX750 Ti (Maxwell). That seems a bit slow to me, and doesn't line up with the Performance metrics shown on your website.

    My changes were simple:
    1- I created a new input type (GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU)
    2- added a new function in gpujpeg_encoder.cpp to set this image type:
    void
    gpujpeg_encoder_input_set_image_on_gpu(struct gpujpeg_encoder_input input, uint8_t image)
    {
    input->type = GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU;
    input->image = image;
    input->texture = NULL;
    }

    3- inside your gpujpeg_encoder_encode(...) function, I added this to the end of the load input image if/else statement:
    ...
    <previous if="" statements="">
    } else
    if(input->type == GPUJPEG_ENCODER_INPUT_IMAGE_ON_GPU){
    GPUJPEG_CUSTOM_TIMER_START(encoder->def);</previous>

    // coder->d_data_raw = input->image;
    // Copy image data from circular fifo buffer object to device data
    cudaMemcpy(coder->d_data_raw, input->image, coder->data_raw_size * sizeof(uint8_t), cudaMemcpyDeviceToDevice);

        GPUJPEG_CUSTOM_TIMER_STOP(encoder->def);
        coder->duration_memory_to = GPUJPEG_CUSTOM_TIMER_DURATION(encoder->def);
    } else {
        // Unknown output type
        assert(0);
    }
    

    Is there anything obvious that I've done to slow down your code? It seems to work fine except for the speed.

    Thanks for a great library!!

     
  • Martin Srom

    Martin Srom - 2016-03-30

    Your changes seems OK to me. The added memory-copy isn't necessary since you can just assign the d_data_raw, but it should't add so much overhead as you are describing.
    Try to explore the following variables:

    coder->duration_*
    

    They shall tell you which part of the encoding process is slow.

     
  • bryang

    bryang - 2016-03-30

    Looks like mostly preprocessor and huffman coder.

        duration_memory_to  0.0248959996    float
        duration_memory_from    0.151168004 float
        duration_memory_map 0.000000000 float
        duration_memory_unmap   0.000000000 float
        duration_preprocessor   5.55904007  float
        duration_dct_quantization   0.825727999 float
        duration_huffman_coder  11.2312956  float
        duration_stream 0.00144000002   float
        duration_in_gpu 17.7247353  float
    

    Does this reveal anything?

     
  • Martin Srom

    Martin Srom - 2016-03-31

    I don't know what can be the problem in your application. I tried to measure encoding image 640x480 on my computer with old GeForce GTX 660 and it looks like this:

    ./gpujpeg --encode --size 640x480 --quality 75 img.rgb img.jpg
    ...
    duration_memory_to              0.15 ms
    duration_memory_from            0.05 ms
    duration_memory_map             0.00 ms
    duration_memory_unmap           0.00 ms
    duration_memory_preprocessor    0.09 ms
    duration_memory_quantization    0.08 ms
    duration_memory_huffman_coder   0.30 ms
    duration_memory_stream          0.05 ms
    duration_memory_in_gpu          0.49 ms
    

    What version of CUDA do you use? Try different.
    Try to use the gpujpeg app from the sources (if you didn't use it).
    Try it on another computer.

     
  • bryang

    bryang - 2016-03-31

    Wow! That's much faster. Something strange must be happening on my machine.
    CUDA 7.5
    Visual Studio 2012 Pro
    Downloaded your source and rebuilt. A debug build...I'll try a release build.

    I'm not sure how something in my app could be slowing your code down, but admittedly, I'm fairly new to gpu programming. My app is a video app with a basic workflow of

    H264 video stream -> CUDA Video Decoder -> RGB image -> GPUJPEG Encoder -> JPEG image

    Could the CUDA decoder be causing some conflict with GPUJPEG? Resource contention? Doesn't seem right to me. I'll also see if I can find another computer to try this on.

     
  • bryang

    bryang - 2016-04-01

    Getting the same performance on 2 other computers, one with GeForce 750 Ti and one with GeForce 950M.

    I did a NSight Performance Analysis and have attached the output file (I opened with Excel) and some screen shots of the analysis. It shows the same as we discussed earlier, most time spent in preprocessor and huffman encoding. I'm not sure how to interpret the results...maybe it will be obvious to you if something is there.

    Please let me know if anything looks out of order.

    Thanks!!

     
  • Martin Srom

    Martin Srom - 2016-04-05

    The GPUJPEG wasn't designed to run along with another app (e.g., CUDA Video Decoder).You can try the code from another branch (e.g., the latest decoder-gltex), where usage of CUDA stream was implemented and it can improve the performance when running multiple CUDA computations at the same time.

     
  • bryang

    bryang - 2016-05-04

    Hi Martin. I had to step away for a few weeks to finish up another project, but I'm back on this now. I'm still seeing poor performance, both on GeForce GTX 750 Ti and a 950M...somewhere on the order of 17 ms to encode a 640x480 image. So I took your advice above and downloaded the decoder-gltex branch, built a static lib, and wrote a very simple console app that creates a synthetic RGB image, sets up a gpujpeg encoder, and encodes the image. Although I do get a properly formatted JPEG image output, the color is not what I expected (this is probably due to not getting the encoder set up properly). Also, the performance is still poor. I've attached the minimal example. Hopefully there is something obvious that I'm doing wrong. Thanks again.
    Bryan

     
  • Martin Srom

    Martin Srom - 2016-05-05

    Hi Bryan, I tested your example and it fails with this error:

    Encoding JPEG from pixel format 444-u8-p0p1p2 is supported only when no color transformation is required.Failed to encode image! failure!
    

    You need to change the pixel format from GPUJPEG_444_U8_P0P1P2 to GPUJPEG_444_U8_P012. After this modification, the code works OK:

    Encode Image:              5.50 ms
    Compressed size: 127481
    success #0
    Encode Image:              2.00 ms
    Compressed size: 128408
    success #1
    Encode Image:              2.01 ms
    Compressed size: 127127
    success #2
    Encode Image:              1.97 ms
    Compressed size: 128002
    ...
    

    To get "red image" I think you need to change the image creation code like this:

    -pimage[r*width*3 + c*3 + 0] = 255; // red
    -pimage[r*width*3 + c*3 + 0] = 0;   // green
    -pimage[r*width*3 + c*3 + 0] = 0;   // blue
    =>
    +pimage[r*width*3 + c*3 + 0] = 255; // red
    +pimage[r*width*3 + c*3 + 1] = 0;   // green
    +pimage[r*width*3 + c*3 + 2] = 0;   // blue
    
     
  • bryang

    bryang - 2016-05-05

    Hi Martin. Thanks for the response, and I apologize for the silly errors in my code. <embarassing></embarassing>

    I made the changes you show above, and get the following timing results:

    Encode Image:             50.94 ms
    Compressed size: 39029
    success #0
    Encode Image:             31.41 ms
    Compressed size: 39029
    success #1
    Encode Image:             31.55 ms
    Compressed size: 39029
    success #2
    Encode Image:             31.58 ms
    Compressed size: 39029
    success #3
    Encode Image:             31.38 ms
    Compressed size: 39029
    success #4
    

    Nothing else (that I can tell) is running on my machine, but these times are not nearly as good as yours. I'm starting to wonder if its my GeForce GTX 750 Ti. I've arranged to have access to a machine with a Quadro K4000 tomorrow, so I'll test the same code there. Not sure what else it could be at this point. Maybe it has something to do with the Maxwell architecture of the 750 Ti...this is just a $150 card. I'll post my results in case you're interested.

     
  • bryang

    bryang - 2016-05-06

    Results on a Quadro K4000:

    Encode Image:             18.63 ms
    Compressed size: 12629
    success #0
    Encode Image:              8.32 ms
    Compressed size: 12629
    success #1
    Encode Image:              8.12 ms
    Compressed size: 12629
    success #2
    Encode Image:              8.35 ms
    Compressed size: 12629
    success #3
    Encode Image:              8.22 ms
    Compressed size: 12629
    success #4
    

    Let me know if you have any suggestions. Thanks.

     
  • bryang

    bryang - 2016-05-06

    Recompiled in Release mode. New results. I don't understand why there's such a big difference, but there you have it. Results from GeForce GTX 750 Ti, 640x480 RGB image, Release Mode:

    Encode Image:             17.43 ms
    Compressed size: 12629
    success #0
    Encode Image:              1.39 ms
    Compressed size: 12629
    success #1
    Encode Image:              1.83 ms
    Compressed size: 12629
    success #2
    Encode Image:              1.52 ms
    Compressed size: 12629
    success #3
    Encode Image:              1.39 ms
    Compressed size: 12629
    success #4
    

    Thanks!

     

    Last edit: bryang 2016-05-06

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.