Ondrej Sury - 2015-01-14

Attaching build logs for reference.

libjpeg-turbo-1.4.0-arm64.log

libjpeg-turbo-1.4.0-mipsel.log

libjpeg-turbo-1.4.0-powerpc.log

libjpeg-turbo-1.4.0-s390x.log

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-01-14

I have contacted Teodora (the programmer for MIPS Technologies who developed the DSPr2 code.) Hopefully he can diagnose the error. In the meantime, can you try the following so I can isolate exactly where the errors are?

-- configure the MIPS code with --without-simd and see if the error persists. I suspect it goes away.

-- On the other platforms, try commenting out the floating point tests and see if the rest of the tests pass. They should (I'm actively working on PowerPC right now, so I have experienced the same error.)

The problem with the floating point DCT/IDCT, in general, is that it doesn't always produce the same bitwise results. It depends on the compiler being used, the particulars of the FPU, etc. All of the x86 compilers generate results that match the included MD5 sums, and of course the results are deterministic when the included SSE-accelerated float DCT/IDCT are used. However, non-x86 platforms tend to not produce the same bitwise results on those tests. The float DCT/IDCT are really legacy features and are not accelerated on any platform except x86, and I'm not recommending that any new SIMD extensions be implemented for those algorithms, since the float algorithms really have no advantage over the slow integer algorithms. Thus, I think the best approach would be to move those tests into an optional heading, so that they are only run when one invokes 'make floattest', or perhaps I can make it so that they only run on x86.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ondrej Sury - 2015-01-14

I have disabled float tests on all archs (for now - no point in finetunning the "experimental") and it looks much better:

https://buildd.debian.org/status/package.php?p=libjpeg-turbo&suite=experimental

Disabling SIMD on mipsel also seems to have helped.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-01-14

OK, great. I'll make the floating point fix in our repository and keep you updated as to the status of the MIPS SIMD fix.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ondrej Sury - 2015-01-15

Looks like there's same problem on mips:

https://buildd.debian.org/status/fetch.php?pkg=libjpeg-turbo&arch=mips&ver=1%3A1.4.0-2&stamp=1421269002

(mips and mipsel just differ in endianess)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-01-15

I'm not the expert on MIPS at all. All I can tell you is that our code will enable DSPr2 extensions under the following circumstances:

-- If -mdspr2 is passed to the compiler, then DSPr2 is always enabled and can never be disabled.
-- If -mdspr2 is not passed to the compiler, then the library checks /proc/cpuinfo at run time and enables DSPr2 if it contains "MIPS 74K".

It would be worthwhile to test whether, for whatever reason, DSPr2 is being compile-time enabled. You could make the following modification, for instance, to simd/jsimd_mips.c:

#if defined(__MIPSEL__) && defined(__mips_dsp) && (__mips_dsp_rev >= 2) #error "DSPr2 is compile-time enabled" simd_support |= JSIMD_MIPS_DSPR2;

Also check and see what /proc/cpuinfo says on your test machines.

I could envision this illegal instruction error possibly occurring if you were building the library with compile-time DSPr2 support and then running it on a machine that lacked DSPr2 support. You'll have to tell me whether or not that's the case, though.

Last edit: DRC 2015-01-16
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-01-16

The floating point test issue is fixed at least, and the fix is available in SVN trunk or branches/1.4.x (branches/1.4.x is the stable branch where 1.4.1 is evolving, so that's probably what you want. It also contains other enhancements to the test suite that might prove useful.)

It turns out that there are basically three sets of results that the floating point DCT/IDCT can produce:

(1) Our SSE SIMD extensions are slightly more accurate than the C code, so one set of results is produced when those extensions are enabled.

(2) A second set of results is produced when using the C code with 32-bit floating point math-- for instance, on a 32-bit FPU or a 64-bit FPU running in 32-bit mode. These results are also produced when running on an x86-64 CPU without the libjpeg-turbo SIMD extensions, because GCC uses SSE (which uses 32-bit float) by default to do floating point math on x86-64 (unless you specify -mfpmath=387.)

(3) A third set of results is produced when using the C code with 64-bit floating point math-- for instance, on a 64-bit FPU.

Since it is basically impossible for the test suite to determine which type of floating point math will be used, I punted and made it a run-time option. You can now specify:

make test FLOATTEST=sse make test FLOATTEST=32bit

or

make test FLOATTEST=64bit

to validate the test against the floating point results that you expect for a particular platform. More specifically, you should use "sse" for all x86 platforms, "32bit" for MIPS and ARM, and "64bit" for ARM64, PowerPC, and S390X.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

It's not the case (although I have encountered a similar situation elsewhere), since the tests are built in one go on a buildd machine.

/proc/cpuinfo on a porter box (might be different from buildd, but not much):

$ cat /proc/cpuinfo
system type : lemote-fuloong-2e-box
machine : Unknown
processor : 0
cpu model : ICT Loongson-2 V0.2 FPU V0.1
BogoMIPS : 443.90
wait instruction : no
microsecond timers : yes
tlb_entries : 64
extra interrupt vector : no
hardware watchpoint : yes, count: 0, address/irw mask: []
isa : mips1 mips2 mips3
ASEs implemented :
shadow register sets : 1
kscratch registers : 0
package : 0
core : 0
VCED exceptions : not available
VCEI exceptions : not available

gdb backtrace on the mipsel shows:

Program received signal SIGILL, Illegal instruction.
0x77f99918 in jsimd_h2v1_extrgb_merged_upsample_mips_dspr2 () at jsimd_mips_dspr2.S:863
863 GENERATE_H2V1_MERGED_UPSAMPLE_MIPS_DSPR2 extrgb, 6, 0, 1, 2, 6, 3, 4, 5, 6
(gdb) bt full

0 0x77f99918 in jsimd_h2v1_extrgb_merged_upsample_mips_dspr2 () at jsimd_mips_dspr2.S:863

No locals.

1 0x77f965cc in jsimd_h2v1_merged_upsample (cinfo=<optimized out="">, input_buf=<optimized out="">, in_row_group_ctr=<optimized out="">, output_buf=<optimized out="">) at jsimd_mips.c:668

    mipsdspr2fct = <optimized out>

2 0x77f80c5c in merged_1v_upsample (cinfo=<optimized out="">, input_buf=<optimized out="">, in_row_group_ctr=0x41f9d4, in_row_groups_avail=<optimized out="">, output_buf=0x41db9c,

out_row_ctr=0x7fff32e8, out_rows_avail=1) at jdmerge.c:312
    upsample = <optimized out>

3 0x77f7c7e0 in process_data_simple_main (cinfo=0x7fff33c0, output_buf=0x41db9c, out_row_ctr=<optimized out="">, out_rows_avail=<optimized out="">) at jdmainct.c:370

    main_ptr = 0x41f9a0
    rowgroups_avail = 8

4 0x77f758cc in jpeg_read_scanlines (cinfo=0x7fff33c0, scanlines=0x41db9c, max_lines=1) at jdapistd.c:176

    row_ctr = 0

5 0x0040114c in main (argc=7, argv=0x7fff3694) at djpeg.c:642

    cinfo = {err = 0x7fff333c, mem = 0x41c008, progress = 0x0, client_data = 0x0, is_decompressor = 1, global_state = 205, src = 0x41c140, image_width = 227, image_height = 149, 
      num_components = 3, jpeg_color_space = JCS_YCbCr, out_color_space = JCS_RGB, scale_num = 1, scale_denom = 1, output_gamma = 1, buffered_image = 0, raw_data_out = 0, 
      dct_method = JDCT_IFAST, do_fancy_upsampling = 0, do_block_smoothing = 1, quantize_colors = 0, dither_mode = JDITHER_FS, two_pass_quantize = 1, desired_number_of_colors = 256, 
      enable_1pass_quant = 0, enable_external_quant = 0, enable_2pass_quant = 0, output_width = 227, output_height = 149, out_color_components = 3, output_components = 3, 
      rec_outbuf_height = 1, actual_number_of_colors = 0, colormap = 0x0, output_scanline = 0, input_scan_number = 1, input_iMCU_row = 1, output_scan_number = 1, output_iMCU_row = 1, 
      coef_bits = 0x0, quant_tbl_ptrs = {0x41c170, 0x41c200, 0x0, 0x0}, dc_huff_tbl_ptrs = {0x41c290, 0x41c4d0, 0x0, 0x0}, ac_huff_tbl_ptrs = {0x41c3b0, 0x41c5f0, 0x0, 0x0}, 
      data_precision = 8, comp_info = 0x41da80, progressive_mode = 0, arith_code = 0, arith_dc_L = '\000' <repeats 15 times>, arith_dc_U = '\001' <repeats 16 times>, 
      arith_ac_K = '\005' <repeats 16 times>, restart_interval = 0, saw_JFIF_marker = 1, JFIF_major_version = 1 '\001', JFIF_minor_version = 1 '\001', density_unit = 0 '\000', 
      X_density = 1, Y_density = 1, saw_Adobe_marker = 0, Adobe_transform = 0 '\000', CCIR601_sampling = 0, marker_list = 0x0, max_h_samp_factor = 2, max_v_samp_factor = 1, 
      min_DCT_scaled_size = 8, total_iMCU_rows = 19, sample_range_limit = 0x41df80 "", comps_in_scan = 3, cur_comp_info = {0x41da80, 0x41dad4, 0x41db28, 0x0}, MCUs_per_row = 15, 
      MCU_rows_in_scan = 19, blocks_in_MCU = 4, MCU_membership = {0, 0, 1, 2, 0, 0, 0, 0, 0, 0}, Ss = 0, Se = 63, Ah = 0, Al = 0, unread_marker = 0, master = 0x41de60, main = 0x41f9a0, 
      coef = 0x41f8a0, post = 0x41f430, inputctl = 0x41c120, marker = 0x41c070, entropy = 0x41f7b0, idct = 0x41f450, upsample = 0x41e400, cconvert = 0x0, cquantize = 0x0}
    jerr = {error_exit = 0x77f86224 <error_exit>, emit_message = 0x77f85fa0 <emit_message>, output_message = 0x77f8618c <output_message>, format_message = 0x77f86038 <format_message>, 
      reset_error_mgr = 0x77f86028 <reset_error_mgr>, msg_code = 105, msg_parm = {i = {0, 63, 0, 0, 0, 0, 0, 0}, 
        s = "\000\000\000\000?", '\000' <repeats 35 times>, "\344A\374w\354A\374w\364A\374w", '\000' <repeats 27 times>}, trace_level = 0, num_warnings = 0, 
      jpeg_message_table = 0x77fc0e00 <jpeg_std_message_table>, last_jpeg_message = 126, addon_message_table = 0x417f4c <cdjpeg_message_table>, first_addon_message = 1000, 
      last_addon_message = 1044}
    file_index = <optimized out>
    dest_mgr = 0x41db80
    input_file = 0x41c770
    output_file = 0x41c8e0
    inbuffer = 0x0
    insize = <optimized out>
    num_scanlines = <optimized out>

Would compiling with -O0 help or is this enough?

DRC - 2015-01-16

Based on your /proc/cpuinfo, it doesn't appear that your CPU supports DSPr2 instructions-- I could be wrong about that, though. Like I said, I'm not the expert. All of the MIPS code was submitted, and I have not been able to contact the submitter.

Can you verify as I suggested above whether DSPr2 is being compile-time enabled? If so, then I don't think that's what you want. We need to figure out how to build it such that it checks /proc/cpuinfo at run time instead.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ondrej Sury - 2015-01-20

Sourceforge hates me and forced me to re-login while loosing previous message. Anyway...

I have enhanced your patch to error-out on every condition and it takes the run-time path, so DSPr2 is not enforced. The all previous builds have all failed on "Invalid instruction", so there must be something else going on.

Should I patch the code to print out warning every time DSPr2 is picked on run-time and test again?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-01-20

Yes, as a sanity check, I would suggest adding a print statement to all of the functions in simd/jsimd_mips.c. That will determine if any of them is actually called (other than the can functions, of course. Those are always called.)

If no DSPr2 functions are being called, then the next thing I would check is whether somehow the C flags are different between the "normal" build and the --without-simd build. That is, execute 'make V=1' with both builds and diff the outputs to see if perhaps a different flag is being inserted in the SIMD-enabled build.

Before writing this comment, I double-checked the code to make sure that, at least to my eye, none of the above issues exist, but actually inserting the print statements is the only way to know for sure. I would happily test all of the above myself if I had access to a MIPS machine.

Last edit: DRC 2015-01-20

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Ondrej Sury - 2015-01-21

Gosh, SF has logged me out again and lost the long comment. Let's make it short now then.

The debugging fprintf() helped. The _h2v[12]can* functions were missing init_simd() call and thus the call was always true (and probably optimized out due ~0U initialization).

Attached patches fixes the issue (and I have also checked rest of the code and it seems to be ok).

add_missing_init_simd.patch

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-01-21

Yes, I've unfortunately had to get in the habit of copying my comments to the clipboard before hitting "Post", because SourceForge eats them sometimes. I haven't had the time to submit a bug report on that-- but more than likely, someone already beat me to it.

I've checked in your patch to trunk and branches/1.4.x. Thanks for helping diagnose this.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-01-21

status: open --> closed-fixed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-05-15

Question:

Is there a mechanism for pushing out somewhat bleeding edge code to the Debian community, sort of like Fedora? I have SIMD support for PowerPC/AltiVec that is implemented in the libjpeg-turbo SVN trunk but needs testing by the community. Since I have no idea when libjpeg-turbo 1.5 will be released (probably not until 2016, because it's waiting on funding for several new features), it would be nice to put the AltiVec code out in a point release. However, I'm not comfortable doing that without some indication that it doesn't break builds or introduce other major bugs.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Ondrej Sury - 2015-05-22
  
  There's nothing formal, but if you ask me, I can prepare and upload anything you want to spread around into Debian experimental.
  
  Just don't force me to work with subversion please :)
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-05-22

Well, the code is not released yet-- that's kind of the point of this. I want to get feedback on it to figure out how stable it is and whether it would make sense to push it out sooner rather than later. You can use git-svn to check it out:

git svn clone svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk libjpeg-turbo-experimental

or I can generate a tarball if you'd prefer that.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

DRC - 2015-07-27

We're now on GitHub, so you can grab the AltiVec code from the master branch:

https://github.com/libjpeg-turbo/libjpeg-turbo

Would appreciate any testing you can do. If I can get at least some sense that the code isn't broken, I'll push it out in 1.4.2.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Ondrej Sury - 2015-07-28
  
  https://buildd.debian.org/status/package.php?p=libjpeg-turbo&suite=experimental
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - DRC - 2015-07-28
    
    Awesome. Thanks!
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Failing tests on more exotic platforms

SIMD-accelerated libjpeg-compatible JPEG codec library

Searches

Help

#83 Failing tests on more exotic platforms

Discussion