Can you clarify what you said above? Did you mean that you pushed the patch out to end users, and it seems to have fixed the problem on their end?
Also, what version of libjpeg-turbo are you using? What is the application? What O/S? In general terms, how does the application use libjpeg-turbo? That is, what encode parameters might be in play when this occurs?
Mozilla is generally on top of overrun issues like this, but they mostly use the decoder, so this one may have slipped by. It's certainly easy enough to check in your patch, but I want to understand first why that fixes the problem. Otherwise, I can't be sure whether it's really fixing the problem or whether it's hiding something else.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We have a crash reporting system (like Mozilla Crash Reporter). It can send crash report to our server. We receive hundreds of crash reports about this crash from the previous version of our product per day. But we don't receive this crash report from the new version with patched libjpeg-turbo.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
OK, but you didn't answer my other questions. Which version of libjpeg-turbo is your app using? What operating system? What does your app do? How does it generally use libjpeg-turbo?
Any information you can provide that might narrow down the issue, such as the type of images that are being compressed, would be very helpful.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
is_decompressor 0 unsigned char
global_state 101 int
image_width 1227 unsigned int
image_height 314 unsigned int
input_components 3 int
in_color_space JCS_EXT_BGR J_COLOR_SPACE
input_gamma 1.0000000000000000 double
scale_num 1 unsigned int
scale_denom 1 unsigned int
jpeg_width 1227 unsigned int
jpeg_height 314 unsigned int
data_precision 8 int
num_components 3 int
jpeg_color_space JCS_YCbCr J_COLOR_SPACE
num_scans 1 int
raw_data_in 0 unsigned char
arith_code 0 unsigned char
optimize_coding 0 unsigned char
CCIR601_sampling 0 unsigned char
do_fancy_downsampling 1 unsigned char
smoothing_factor 0 int
dct_method JDCT_ISLOW J_DCT_METHOD
restart_interval 0 unsigned int
restart_in_rows 0 int
write_JFIF_header 1 unsigned char
JFIF_major_version 1 unsigned char
JFIF_minor_version 1 unsigned char
density_unit 1 unsigned char
X_density 96 unsigned short
Y_density 96 unsigned short
write_Adobe_marker 0 unsigned char
next_scanline 23 unsigned int
progressive_mode 0 unsigned char
max_h_samp_factor 1 int
max_v_samp_factor 1 int
min_DCT_h_scaled_size 8 int
min_DCT_v_scaled_size 8 int
total_iMCU_rows 40 unsigned int
comps_in_scan 3 int
MCUs_per_row 154 unsigned int
MCU_rows_in_scan 40 unsigned int
blocks_in_MCU 3 int
Ss 0 int
Se 63 int
Ah 0 int
Al 0 int
block_size 8 int
lim_Se 63 int
script_space_size 0 int
call stack:
encode_one_block(working_state * state=0xdaf20db3, short * block=0x1502b330, int last_dc_val=347, c_derived_tbl * dctbl=0x15022750, c_derived_tbl * actbl=0x15022c50) Line 559 + 0xa bytes
encode_mcu_huff(jpeg_compress_struct * cinfo=0x0011dafc, short [64] * MCU_data=0x15022238) Line 619 + 0x29 bytes
compress_data(jpeg_compress_struct * cinfo=0x0011dafc, unsigned char * * input_buf=0x150222a8) Line 204 + 0x10 bytes
process_data_simple_main(jpeg_compress_struct * cinfo=0x0011dafc, unsigned char * input_buf=0x15023b50, unsigned int * in_row_ctr=0x0011d9b0, unsigned int in_rows_avail=1) Line 135 + 0x10 bytes
jpeg_write_scanlines(jpeg_compress_struct * cinfo=0x00000001, unsigned char * scanlines=0x15023b50, unsigned int num_lines=1) Line 109
We use libjpeg-turbo for coverting raw RGB buffer to JPEG file, it's one step of capturing screen.
Last edit: Zhou Wei 2014-04-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
We have updated new version of libjpeg-turbo one month ago. So both versions are using in our appliation (different versions). And we have received crash reports from all of them.
Last edit: Zhou Wei 2014-04-28
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I have run the attached program through valgrind for hours in an attempt to reproduce this bug, but I cannot reproduce it at the moment. Any additional data you can provide would be helpful. You haven't even, for instance, told me what quality level you are using during compression. A sample test program that reproduces the problem on your end would be the most expedient way to move this forward. Otherwise, I have no way of knowing whether I can't reproduce the problem because I'm not exercising the library in the same way or whether I can't reproduce the problem because of differences in our systems.
I cannot readily tell from your report whether this is a bug in libjpeg-turbo that you fixed or whether it is a bug in your code that just happened to be worked around by your patch to libjpeg-turbo. The fact that no one else out of the thousands of libjpeg-turbo users (probably millions if you count the users of Firefox and Chrome) has run across this yet makes me hesitant to declare that it is a bug in libjpeg-turbo until I can discover the exact mechanism, but I really don't have the time to dig into it right now, so anything you can do to make this easier on me will be much appreciated.
Our product has 848 million monthly active users. The screenshot function of our product is be used 200 million times everyday.
Just as you say, other software mostly use the decoder, and they don't use YCrCb 4:4:4.
This crash is a very low probability event.
Based on the above three points, I think that's the reason why only we found this crash bug.
The quality level is 100. I'm sorry I can not provide more information, because we only found this the crash in the screenshot function (encode parameters: YCrCb 4:4:4, quality level 100%) and the crash reports from users are not include the raw RGB buffer of user's screen.
Last edit: Zhou Wei 2014-05-20
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I believe you, but in order to be comfortable checking in this patch, I need to understand the theory behind the failure. That is in part to gain confidence that the patch fixes all of the underlying failure modes and not just the one you happen to be encountering.
I will study the code some more and see if I can figure out what's going on.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I was finally able to reproduce this. I modified the test program slightly (attached) so that it compresses directly from an 8x8 YUV image, which eliminates the loss due to RGB-to-YUV conversion. When I do that and set the YUV component values to either 0 or 255 based on the output of a random number generator, I start to find instances in which the Huffman codec degenerates and its output is slightly larger than its input, thus overrunning the 128-byte local buffer. The other key to reproducing the issue is allowing libjpeg-turbo to dynamically grow the output buffer as it needs to. This is the only way to activate the local buffer code in the Huffman encoder. The previous test program was always allocating the output buffer to the worst-case size, so the local buffer was never activated.
Even though I'm doing the most I can to stack the deck in favor of the bug, it still only occurs once in about every 25 million iterations.
I have checked in your patch without modifications.
So far, 130 bytes is the maximum I have been able to produce for a single MCU using the test, after literally a billion iterations. However, I am running it overnight so I can collect as many cases of degenerate input images as I can, so I can hopefully determine what the upper bound is for the local buffer.
I need exact reproduction steps.
Sorry, I can't reproduce this bug. I got this crash information from minidump files uploaded by normal users.
After I change the size of array, I haven't received this crash anymore.
_buffer[BUFFSIZE] => _buffer[BUFFSIZE + 1]
Last edit: Zhou Wei 2014-04-25
Can you clarify what you said above? Did you mean that you pushed the patch out to end users, and it seems to have fixed the problem on their end?
Also, what version of libjpeg-turbo are you using? What is the application? What O/S? In general terms, how does the application use libjpeg-turbo? That is, what encode parameters might be in play when this occurs?
Mozilla is generally on top of overrun issues like this, but they mostly use the decoder, so this one may have slipped by. It's certainly easy enough to check in your patch, but I want to understand first why that fixes the problem. Otherwise, I can't be sure whether it's really fixing the problem or whether it's hiding something else.
We have a crash reporting system (like Mozilla Crash Reporter). It can send crash report to our server. We receive hundreds of crash reports about this crash from the previous version of our product per day. But we don't receive this crash report from the new version with patched libjpeg-turbo.
OK, but you didn't answer my other questions. Which version of libjpeg-turbo is your app using? What operating system? What does your app do? How does it generally use libjpeg-turbo?
Any information you can provide that might narrow down the issue, such as the type of images that are being compressed, would be very helpful.
libjpeg-turbo version: 1.3.0, 1.3.1 (VC2010, JPEG_LIB_VERSION = 80)
O/S: WinXP, Win7, Win8 (x86)
encode parameters:
call stack:
encode_one_block(working_state * state=0xdaf20db3, short * block=0x1502b330, int last_dc_val=347, c_derived_tbl * dctbl=0x15022750, c_derived_tbl * actbl=0x15022c50) Line 559 + 0xa bytes
encode_mcu_huff(jpeg_compress_struct * cinfo=0x0011dafc, short [64] * MCU_data=0x15022238) Line 619 + 0x29 bytes
compress_data(jpeg_compress_struct * cinfo=0x0011dafc, unsigned char * * input_buf=0x150222a8) Line 204 + 0x10 bytes
process_data_simple_main(jpeg_compress_struct * cinfo=0x0011dafc, unsigned char * input_buf=0x15023b50, unsigned int * in_row_ctr=0x0011d9b0, unsigned int in_rows_avail=1) Line 135 + 0x10 bytes
jpeg_write_scanlines(jpeg_compress_struct * cinfo=0x00000001, unsigned char * scanlines=0x15023b50, unsigned int num_lines=1) Line 109
We use libjpeg-turbo for coverting raw RGB buffer to JPEG file, it's one step of capturing screen.
Last edit: Zhou Wei 2014-04-28
Can you confirm whether it was libjpeg-turbo 1.3.0 or 1.3.1? Why do you list both versions?
We have updated new version of libjpeg-turbo one month ago. So both versions are using in our appliation (different versions). And we have received crash reports from all of them.
Last edit: Zhou Wei 2014-04-28
You can reproduce this bug with the following code, it's a very low probability event.
//set subsampling to 4:4:4
cinfo.comp_info[0].h_samp_factor = 1;
cinfo.comp_info[0].v_samp_factor = 1;
I have run the attached program through valgrind for hours in an attempt to reproduce this bug, but I cannot reproduce it at the moment. Any additional data you can provide would be helpful. You haven't even, for instance, told me what quality level you are using during compression. A sample test program that reproduces the problem on your end would be the most expedient way to move this forward. Otherwise, I have no way of knowing whether I can't reproduce the problem because I'm not exercising the library in the same way or whether I can't reproduce the problem because of differences in our systems.
I cannot readily tell from your report whether this is a bug in libjpeg-turbo that you fixed or whether it is a bug in your code that just happened to be worked around by your patch to libjpeg-turbo. The fact that no one else out of the thousands of libjpeg-turbo users (probably millions if you count the users of Firefox and Chrome) has run across this yet makes me hesitant to declare that it is a bug in libjpeg-turbo until I can discover the exact mechanism, but I really don't have the time to dig into it right now, so anything you can do to make this easier on me will be much appreciated.
Our product has 848 million monthly active users. The screenshot function of our product is be used 200 million times everyday.
Just as you say, other software mostly use the decoder, and they don't use YCrCb 4:4:4.
This crash is a very low probability event.
Based on the above three points, I think that's the reason why only we found this crash bug.
The quality level is 100. I'm sorry I can not provide more information, because we only found this the crash in the screenshot function (encode parameters: YCrCb 4:4:4, quality level 100%) and the crash reports from users are not include the raw RGB buffer of user's screen.
Last edit: Zhou Wei 2014-05-20
I believe you, but in order to be comfortable checking in this patch, I need to understand the theory behind the failure. That is in part to gain confidence that the patch fixes all of the underlying failure modes and not just the one you happen to be encountering.
I will study the code some more and see if I can figure out what's going on.
I was finally able to reproduce this. I modified the test program slightly (attached) so that it compresses directly from an 8x8 YUV image, which eliminates the loss due to RGB-to-YUV conversion. When I do that and set the YUV component values to either 0 or 255 based on the output of a random number generator, I start to find instances in which the Huffman codec degenerates and its output is slightly larger than its input, thus overrunning the 128-byte local buffer. The other key to reproducing the issue is allowing libjpeg-turbo to dynamically grow the output buffer as it needs to. This is the only way to activate the local buffer code in the Huffman encoder. The previous test program was always allocating the output buffer to the worst-case size, so the local buffer was never activated.
Even though I'm doing the most I can to stack the deck in favor of the bug, it still only occurs once in about every 25 million iterations.
I have checked in your patch without modifications.
So far, 130 bytes is the maximum I have been able to produce for a single MCU using the test, after literally a billion iterations. However, I am running it overnight so I can collect as many cases of degenerate input images as I can, so I can hopefully determine what the upper bound is for the local buffer.
Last edit: DRC 2014-08-21