libjpeg-turbo / Patches (closed - use GitHub) / #70 Implementation of jpeg_skip_scanlines(JDIMENSION rows) API to optimize subset decodes

DRC - 2015-06-16

Cool. I'll take a look at this ASAP. Unfortunately, any extension to the jpeg_[de]compress_struct structures will break ABI compatibility, even if the new members are at the end of the structure. The reason is that jpeg_CreateCompress() and jpeg_CreateDecompress() check the size of the structure and will throw an error if there is a size mismatch between the library and the application. However, I recently added a framework to mozjpeg that works around this issue by stashing new structure members in the opaque master structures and adding a set of accessor functions to allow those new members to be read or modified. In this case, since the new structure members don't need to be read or modified by the application, then they can just be moved into the master structure without needing to add accessor functions for them.

As far as the reused code in discard_mcu(), that should be easy to solve using macros. Will get back to you once I've evaluated the patch.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-06-16

Two things I need you to do before I can look at this further:

(1) For all of the source files that you modified, please add a copyright notice underneath the "libjpeg-turbo Modifications" header. It should say something like "Copyright (C) 2015, Google, Inc." (or whatever name your project team would normally use for open source code attributions.) This serves two purposes: it complies with the terms of the IJG license requiring that all modifications are clearly marked, and it serves as a written record of Google's intent to introduce this code into the open source.

(2) Modify djpeg so that it can test the new jpeg_skip_scanlines() function.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matt Sarett - 2015-06-16

(1) Added copyright headers.

(2) Modified djpeg so it can test the new jpeg_skip_scanlines() function.

There were a few possibilities for this. In the end, I went with an approach where the user specifies on the command line that he wants an image "stripe" and then we use jpeg_skip_scanlines() to provide this stripe. This is cool because it adds a new functionality to djpeg. The drawback is that jinit_write() always writes a header for the full image in the dest_mgr. I work around this with a hack that I'm not too proud of. Also the dest_mgr still allocates unnecessary space.

This could be cleaner if we change the signature of the dest_mgr to allow it to take in the height as a parameter. I'm not sure if this is allowed.

Another option is to test by decoding the full image in stripes. This would require making multiple passes on input data source and reading part of the image each time. This is actually how I've been testing during development - I then diff the output with the decode output of a regular decode. This would mean that "stripe" mode is just a test mode, but doesn't add any functionality to djpeg (except a way to decompress images slowly).

I'm happy to go with whatever approach you think makes the most sense.

(3) In order to accomplish (2), I had to learn to build libjpeg-turbo with the normal build system. This showed me that my code was causing a bunch of warnings. I added corrections to silence the warnings.

skip_v2.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-06-25

A heavily modified version of this has been checked into SVN trunk, but there is an issue that I can't track down. Doing the following causes a seg fault:

./djpeg -skip 1,144 -outfile skip.ppm ../testimages/testorig.jpg

(NOTE: -skip is a new option that I added. It does the exact opposite of -stripe and is thus useful for regression testing, since it's the only way to exercise some of the code paths -- such as the path that skips lines within an iMCU row or the path that handles prefetched iMCU rows, etc.)

It appears that the segfault is occurring within the upsampler, which shouldn't ever be invoked within the body of jpeg_skip_scanlines().

I need to get this resolved before I can finish crafting the regression tests. I am attempting to design a test suite that will, as much as possible, reuse existing generated test images while still providing full coverage of the jpeg_skip_scanlines() code paths.

Also please review the code in general and let me know if you spot any issues with it, in terms of formatting or factual errors in the comments, etc. A lot of stuff was moved around in order to avoid exposing any new global symbols (except for the new jpeg_skip_scanlines function), to avoid code duplication, and to improve readability.

Last edit: DRC 2015-06-25
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matt Sarett - 2015-06-25

I think the issue is with:
dummy_buffer_setup (j_decompress_ptr cinfo)

In some cases it returns with the dummy_buffer unallocated. I will follow up once I know a little bit more.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matt Sarett - 2015-06-25

Hmm never mind, I think that's a different bug.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-06-25

Yeah, I think the dummy buffer needs to be initialized to NULL in jinit_master_decompress(), but doing that didn't fix the issue for me (although it made valgrind happier.)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matt Sarett - 2015-06-25

Yeah that was me causing me some seg faults, but did not solve the first issue. I actually just figured out that the first seg fault is happening because set_wraparound_pointers() is never getting called. I don't have a fix yet, but I'm working on it.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matt Sarett - 2015-06-25

As I mentioned above, the bug you described was caused by never calling get_wraparound_pointers() after skipping the first iMCU block. I am attaching a patch that corrects this behavior and that I expect to work in all cases. I also took a closer look at get_bottom_pointers() as well - as far as I can tell this will work properly for any usage of read() and skip(). Nice catch on this one! It slipped by my tests.

Also, as you mentioned above, the dummy buffer needs to be initialized to NULL in jinit_master_decompress().

Thanks so much for all your efforts in testing/integrating this patch! I haven't done a thorough read through yet, but from what I've seen while I've been debugging, it looks great. I'm excited about the impact this will have.

fix.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matt Sarett - 2015-06-25

As a whole the patch looks great! I just finished reading through the diffs.

I think the following comment doesn't make too much sense anymore since your reorganization gives us access to my_coef_struct.
/ Another
* advantage of discarding coefficients is that it allows us to avoid
* accessing the private field cinfo->coef->MCU_buffer (which would
* normally be a parameter to decode_mcu().)
/

But I'm just nitpicking there.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-06-27

This has been fully implemented and verified in SVN trunk. In addition to the initial integration/rewrite of the feature and testing, then subsequent integration of the bug fixes, I also had to design a regression test suite that fully covers all of the code paths in the jpeg_skip_scanlines() function with the least possible amount of tests (pretty tedious work.) This test suite additionally had to be ported to 12-bit samples and to the Windows build system, and building/running on Windows revealed some other issues that needed fixing. Overall, my work on this feature has required about the equivalent of $2000 in unpaid labor, and that was in addition to thousands of dollars more that I already donated to rewrite the RGB 565 feature and other stuff that Linaro initially contributed in the context of getting libjpeg-turbo to work with Android.

For completeness, it would be nice if this feature was extended into the TurboJPEG API (which would allow for benchmarking it), but at this point, doing all of this Android work has actually created a financial hardship for me, so I simply cannot continue adding features without funding. The libjpeg-turbo Project, because it has no "General Fund" for me to draw from to fund this kind of work, ends up requiring me to mostly contribute my labor to it pro bono. I'm happy to do that for bug fixes, etc., but features are always difficult to integrate, even if someone else is writing the initial code. To put it another way-- there's no such thing as a free patch. :) Regardless of how good a patch is (and yours was good-- don't get me wrong), it always just takes a lot of time to integrate properly. There are just too many platforms and compilers to test, too many things to do to make the code conform to the existing (20-year-old) coding standards, too many tests to run to make sure everything is stable and performant. I wanted to go ahead and get libjpeg-turbo into Android so I could check that off the list, but implementing jpeg_skip_scanlines() pretty much broke the bank. In the future, I am simply not going to be able to accept any non-trivial patches at all without financial sponsorship, unless a company steps forward to sponsor a General Fund for maintaining libjpeg-turbo.

In short, hopefully this feature is sufficient to get libjpeg-turbo into Android, and any subsequent patches will be minor bug fixes. If I do any more major work on Android stuff pro bono, I'm literally going to go broke. If you know anyone within Google who might be able to kick a grant my way, I would appreciate it if you could put in a good word with them.

Last edit: DRC 2015-06-27

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-06-27

status: open --> closed-integrated
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matt Sarett - 2015-07-01

It's good to hear where you're coming from. Thanks!

On another note, I think I may have discovered a memory leak bug.

We currently free the dummy_row_buffer in jpeg_finish_output, which I think it only called in buffered image mode. I think we might want to free it in jpeg_finish_decompress? Let me know what you think!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-07-01

Fixed in trunk. Instead of using jpeg_get_small(), dummy_buffer_setup() now uses (*cinfo->mem->alloc_small) (..., JPOOL_IMAGE, ...) instead, which is consistent with other parts of the libjpeg code. This allocates the memory from the libjpeg image pool, so it will be freed automatically whenever jpeg_abort_decompress() or jpeg_destroy()/jpeg_destroy_decompress() is called. The master structure is freed at the same time, so it should not be necessary to explicitly set the dummy buffer pointer to NULL (before jpeg_skip_scanlines() can be called again, the calling program will have had to first call jpeg_start_decompress(), which will re-allocate the master struct.)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Matt Sarett - 2015-07-21

If you're interested, I'm attaching another patch to fix a valgrind error.

Valgrind complains on color conversion to 565, when skip() calls read() to read into the dummy buffer. Valgrind is upset because the dummy read() uses table lookup on garbage data.

This can be fixed by adding a noop_convert() routine that we use on dummy reads. This has additional positive consequences of eliminating the need for a dummy row buffer and eliminating a step on dummy reads.

valgrind.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-07-21

Nice! I'll get that integrated as soon as the repositories are back online.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Implementation of jpeg_skip_scanlines(JDIMENSION rows) API to optimize...

SIMD-accelerated libjpeg-compatible JPEG codec library

Searches

Help

#70 Implementation of jpeg_skip_scanlines(JDIMENSION rows) API to optimize subset decodes

Discussion