Menu

#84 jdcolor null_convert more expensive than rgb_rgb_convert path

closed-fixed
nobody
None
5
2015-01-20
2015-01-20
No

At least on my system, the "fast-path" null color converter seems to do more harm than good.

Environment

  • 64-bit Fedora 21
  • Intel Core i5-2430M
  • libjpeg-turbo r1519
  • gcc 4.9.2
  • 20000 x 20000 RGB JPEG created with cjpeg

Baseline performance

Reading the JPEG with djpeg, I get this runtime and perf profile:

$ time ~/jpeg/bin/djpeg -outfile /dev/null test.jpg 
real    0m2.951s
user    0m2.933s
sys 0m0.020s
$ perf record ~/jpeg/bin/djpeg -outfile /dev/null test.jpg 
$ perf report -n --stdio --percent-limit 1
# Samples: 12K of event 'cycles'
# Event count (approx.): 8747033111
#
# Overhead       Samples  Command  Shared Object      Symbol                              
# ........  ............  .......  .................  ....................................
#
    32.89%          3937    djpeg  libjpeg.so.62.1.0  [.] decode_mcu                      
    27.71%          3329    djpeg  libjpeg.so.62.1.0  [.] null_convert                    
    19.23%          2311    djpeg  libjpeg.so.62.1.0  [.] jsimd_idct_islow_sse2.column_end
     8.79%          1057    djpeg  libjpeg.so.62.1.0  [.] decompress_onepass              
     3.68%           438    djpeg  libjpeg.so.62.1.0  [.] jsimd_idct_islow_sse2.columnDCT 
     3.19%           382    djpeg  libjpeg.so.62.1.0  [.] jsimd_idct_islow_sse2           
     1.73%           209    djpeg  libc-2.20.so       [.] __memset_sse2                   

rgb_rgb_convert instead of null_convert

With this change:

--- jdcolor.c   (revision 1519)
+++ jdcolor.c   (working copy)
@@ -797,13 +797,7 @@
     } else if (cinfo->jpeg_color_space == JCS_GRAYSCALE) {
       cconvert->pub.color_convert = gray_rgb_convert;
     } else if (cinfo->jpeg_color_space == JCS_RGB) {

-      if (rgb_red[cinfo->out_color_space] == 0 &&
-          rgb_green[cinfo->out_color_space] == 1 &&
-          rgb_blue[cinfo->out_color_space] == 2 &&
-          rgb_pixelsize[cinfo->out_color_space] == 3)
-        cconvert->pub.color_convert = null_convert;
-      else
-        cconvert->pub.color_convert = rgb_rgb_convert;
+      cconvert->pub.color_convert = rgb_rgb_convert;
     } else
       ERREXIT(cinfo, JERR_CONVERSION_NOTIMPL);
     break;

...djpeg produces identical output and improved performance:

$ time ~/jpeg/bin/djpeg -outfile /dev/null test.jpg 
real    0m2.668s
user    0m2.643s
sys 0m0.026s
$ perf record ~/jpeg/bin/djpeg -outfile /dev/null test.jpg 
$ perf report -n --stdio --percent-limit 1
# Samples: 10K of event 'cycles'
# Event count (approx.): 7842096154
#
# Overhead       Samples  Command  Shared Object      Symbol                              
# ........  ............  .......  .................  ....................................
#
    36.18%          3898    djpeg  libjpeg.so.62.1.0  [.] decode_mcu                      
    21.43%          2317    djpeg  libjpeg.so.62.1.0  [.] jsimd_idct_islow_sse2.column_end
    19.28%          2087    djpeg  libjpeg.so.62.1.0  [.] rgb_rgb_convert                 
     9.63%          1044    djpeg  libjpeg.so.62.1.0  [.] decompress_onepass              
     4.63%           496    djpeg  libjpeg.so.62.1.0  [.] jsimd_idct_islow_sse2.columnDCT 
     3.93%           426    djpeg  libjpeg.so.62.1.0  [.] jsimd_idct_islow_sse2           
     2.02%           217    djpeg  libc-2.20.so       [.] __memset_sse2                   

Discussion

  • DRC

    DRC - 2015-01-20

    Confirmed that this is the case for 64-bit code, but the RGB-to-RGB conversion routine is slower with 32-bit code for some reason.

    Thus, I adapted the basic algorithm that is used by the RGB-to-RGB conversion routine and created a more simplified version of it for NULL conversion. This is now used as a "fast path" whenever the number of components is 3 or 4. This proves to be significantly faster both with 64-bit and 32-bit code. The patch has been checked into trunk and branches/1.4.x. The overall speedup is about 5-20% for 64-bit compression, 10-30% for 64-bit decompression, 0-3% for 32-bit compression, and 3-12% for 32-bit decompression-- measured by patching turbojpeg.c to generate RGB JPEGs instead of YCbCr JPEGs and using tjbench with these images:
    http://www.libjpeg-turbo.org/About/Performance

    Also confirmed that compressing/decompressing CMYK images is sped up by the same amount (assuming that the JPEG image uses the CMYK colorspace, not the YCCK colorspace.)

     
  • DRC

    DRC - 2015-01-20
    • status: open --> closed-fixed
     
MongoDB Logo MongoDB