SIGSEGV thrown performing longjmp in jpeg.c
Swiss army knife of image processing
Brought to you by:
bfriesen
I'm running GraphicksMagick 1.3.36 built using mingw64.
The command producing the SEGV is:
gm identify -define jpeg:max-warnings=0 -verbose truncated_pool.jpg
After modifying the Makefile to turn off optimization when compiling jpeg.c, I get the expected output i.e.
$ ./gm.exe identify -define jpeg:max-warnings=0 -verbose truncated_pool.jpg
C:\Users\chris\git\GraphicsMagick-1.3.36\utilities\gm.exe identify: Premature end of JPEG file (truncated_pool.jpg).
C:\Users\chris\git\GraphicsMagick-1.3.36\utilities\gm.exe identify: Request did not return an image.
The gdb log, jpeg file and gcc version information are attached.
Please re-upload truncated_pool.jpg wrapped in an archive format such as zip or tar because SourceForge automatically re-writes JPEG, GIF, and PNG file uploads.
The gdb.txt backtrace shows that exception is zero upon entry into ReadJPEGImage(). This should not be possible. I suspect that the stack has been corrupted:
#12 0x00007ff6b3843d22 in ReadJPEGImage (image_info=0x7ff600000000, exception=0x0) at coders/jpeg.c:1603It is unlikely that I will be able to re-create this issue, particularly since it appears that optimization makes a difference. If you are able to debug the issue further on your end, that would help.
Added zipped jpeg file
I've made the following changes:
gdb_O0.txt was produced compiling coders/jpeg.c with CFLAGS = -fopenmp -g -O0 -Wall
gdb_Og.txt was produced compiling coders/jpeg.c with CFLAGS = -fopenmp -g -Og -Wall -fno-combine-stack-adjustments -fno-compare-elim -fno-cprop-registers -fno-defer-pop -fno-forward-propagate -fno-guess-branch-probability -fno-inline -fno-ipa-profile -fno-ipa-pure-const -fno-ipa-reference -fno-ipa-reference-addressable -fno-omit-frame-pointer -fno-reorder-blocks -fno-shrink-wrap -fno-split-wide-types -fno-toplevel-reorder -fno-tree-builtin-call-dce -fno-tree-ccp -fno-tree-ch -fno-tree-coalesce-vars -fno-tree-copy-prop -fno-tree-dce -fno-tree-dominator-opts -fno-tree-fre -fno-tree-sink -fno-tree-slsr -fno-tree-ter
The first test runs correctly; the second throws a SEGV.
The flags for the second test were chosen so gcc -Q --help=optimizers gives the same results as for the flags used in the first test.
Note that in the second test, the assert ( exception != ( ExceptionInfo * ) NULL ) in ReadJPEGImage doesn't trigger even though the stack trace claims exception is 0x0. My guess is that gcc is doing some sort of optimization that's causing gdb to print an incorrect call stack.
I notice too that in the second test, the breakpoint on entry to ReadJPEGImage is on line 1123 and line 1169 is the second breakpoint hit, suggesting extra preamble at the start of the function.
My suspicion is that gcc optimization and setjmp/longjmp don't play well together, in which case forcing functions that do setjmp/longjmp to build without optimization might be the right thing to do.
On Fri, 5 Mar 2021, Chris Gravely wrote:
It is indeed true that setjmp/longjmp are a terrible thing in that
they do dangerous things and are platform and CPU specific. The
setjmp "saves" the current stack and "longjmp" attempts to restore it
as well as CPU registers so that the program appears to continue where
it was right after setjmp. When this happens, any stack data which
might have been modified after the call to setjmp is restored back to
what it was before. If code then uses that stack data and expects the
modified value, there can be problems.
It is entirely possible that this version/build of GCC for Windows
might have some bug in this area. The bug could be CPU type specific
if a particular type of CPU is being targeted.
While compiling GraphicsMagick, are there any GCC warnings, and
particularly saying that a variable may be "clobbered"?
Bob
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt
There are no warnings or errors when I recompile coders/jpeg.c and rebuild the executables
I am able to reproduce this issue using mingw64 and MSYS2. The problem appears to be directly related to '-define jpeg:max-warnings=0' and if I change it to '-define jpeg:max-warnings=1' it does not crash.
The crash does not occur under Linux and various memory checkers do not find anything wrong.
The problem is directly related to '-define jpeg:max-warnings=0'; there's only one warning issued when reading the jpeg file, so longjmp isn't called when max-warnings is one or greater.
On Sat, 6 Mar 2021, Chris Gravely wrote:
Yes. At this point there is no reason to believe that the JPEG error
handling is working reliably at all with this compiler and/or
libjpeg-turbo version. I would not trust something which only works
25-33% of the time.
The PNG code also uses setjmp/longjmp so it might be similarly
incapacitated.
Bob
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt
I found that if I add 'volatile' before Image at line 247 of jpeg.c ('volatile Image'), it no longer crashes, although there are then many warnings.
Curioser and curioser! I thought that code edits where somehow making a difference. This is not the case. My Microsoft Windows system has 4 cores. What I have discovered is that if I run the program four times, it will usually segfault once.
By the way, I am also using GCC 10.2.0.
I just tested building under Cygwin with GCC 10.2.0 but apparently with a libjpeg which identifies itself as "IJG JPEG 80" in its version identification. In this case there are no crashes. It seems to me that the JPEG library used might have something to do with the crashes.
I've replaced libjpeg-turbo's jpeg library with an IJG one (I used jpeg-9d as that's the latest). I get crashes similar to my originals.
Do you know anything about Cygwin's setjmp/longjmp implementations. I know mingw64 uses MSVCRT implementations, but I'm wondering if the Cygwin library providing the POSIX environment has its own implementation. If so, that could explain why the test works when built under Cygwin.
On Sun, 14 Mar 2021, Chris Gravely wrote:
I don't have any knowledge about this. The original MSVCRT
implementation could have no knowledge about modern CPU types
(providing additional families of instructions such as SSE3, AVX1, and
AVX2, and additional registers) which were invented after its release.
If it was updated by Microsoft as part of system updates then it could
gain such knowledge.
It would be useful to learn if the MinGW build suffers from similar
problems with the setjmp-based error reporting other than from the one
based on the number of warnings. It is not yet clear if the
setjmp-based error reporting is entirely broken, or just for
this one code path.
In my own testing, the failures are semi-random, perhaps failing 25%
to 30% of the time.
Bob
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt
This problem is fixed by Mercurial changeset 16516:1747878113cd. I am not sure exactly why the issue is fixed, but the change dramatically reduces the amount of data pushed to the stack, and it also relocates the jump buffer to the heap rather than the stack.
It looks as though ISO-IEC 9899 doesn't allow assignment of the setjmp result, so it might be better to remove the assignment and its associated logging, as it's possible this might cause problems in the future.
On Thu, 22 Apr 2021, Chris Gravely wrote:
That is a very interesting finding. It appears that the return value
needs to be consumed immediately as part of decision logic (e.g. a
switch statement) or not used at all.
I am not much of a fan of setjmp/longjmp but the only alternative I am
aware of requires use of threads.
Bob
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt
A possible alternative I can think of would be to implement JPEGCoder as a C-callable C++ subsystem, which would allow you to use try...catch and throw a C++ exception instead of using setjmp/longjmp.
Having said that, in some situations the exception would get thrown through intermediate JPEG code, which lives in a separate library, and that might complicate things.
Changeset 16518:2a0438efd812 eliminates the assignment of setjmp() to
a variable. Please look/check again to make sure that all seems ok.
Bob
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt
Sorry for the slow response. The fix looks fine to me
On Thu, 22 Apr 2021, Chris Gravely wrote:
Unfortunately, throwing C++ exceptions through C code is not portable.
In the case of JPEG all of these longjmps are through intermediate
JPEG C code in libjpeg.
I thought that I was being clever by passing the reporting line
number. Now I see that one would need to use some sort of enumeration
value instead, which is then deciphered using a switch statement.
Bob
Bob Friesenhahn
bfriesen@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer, http://www.GraphicsMagick.org/
Public Key, http://www.simplesystems.org/users/bfriesen/public-key.txt