Menu

#83 Failing tests on more exotic platforms

closed-fixed
nobody
None
1
2015-07-27
2015-01-14
Ondrej Sury
No

Hi,

libjpeg-turbo is failing on more exotic architectures as seen here:

https://buildd.debian.org/status/package.php?p=libjpeg-turbo&suite=experimental

Namely: arm64, mipsel, powerpc, s390x (maybe even more, but some archs are still waiting to be built).

Most fails with:

./djpeg -dct float -outfile testout_3x2_float.ppm testout_3x2_float_prog.jpg
md5/md5cmp f6bfab038438ed8f5522fbd33595dcdc testout_3x2_float.ppm
testout_3x2_float.ppm: FAILED. Checksum is 0e917a34193ef976b679a6b069b1be26

except mipsel that fails with:

./djpeg -dct fast -nosmooth -outfile testout_422m_ifast.ppm testout_422_ifast_opt.jpg
make[1]: *** [test] Illegal instruction

Full build logs are available when you click on "Build-Attempted".

Unfortunately I can't give you access to builder boxes, but I can help you debugging this if you send me instructions what to do...

Discussion

  • DRC

    DRC - 2015-01-14

    I have contacted Teodora (the programmer for MIPS Technologies who developed the DSPr2 code.) Hopefully he can diagnose the error. In the meantime, can you try the following so I can isolate exactly where the errors are?

    -- configure the MIPS code with --without-simd and see if the error persists. I suspect it goes away.

    -- On the other platforms, try commenting out the floating point tests and see if the rest of the tests pass. They should (I'm actively working on PowerPC right now, so I have experienced the same error.)

    The problem with the floating point DCT/IDCT, in general, is that it doesn't always produce the same bitwise results. It depends on the compiler being used, the particulars of the FPU, etc. All of the x86 compilers generate results that match the included MD5 sums, and of course the results are deterministic when the included SSE-accelerated float DCT/IDCT are used. However, non-x86 platforms tend to not produce the same bitwise results on those tests. The float DCT/IDCT are really legacy features and are not accelerated on any platform except x86, and I'm not recommending that any new SIMD extensions be implemented for those algorithms, since the float algorithms really have no advantage over the slow integer algorithms. Thus, I think the best approach would be to move those tests into an optional heading, so that they are only run when one invokes 'make floattest', or perhaps I can make it so that they only run on x86.

     
  • Ondrej Sury

    Ondrej Sury - 2015-01-14

    I have disabled float tests on all archs (for now - no point in finetunning the "experimental") and it looks much better:

    https://buildd.debian.org/status/package.php?p=libjpeg-turbo&suite=experimental

    Disabling SIMD on mipsel also seems to have helped.

     
  • DRC

    DRC - 2015-01-14

    OK, great. I'll make the floating point fix in our repository and keep you updated as to the status of the MIPS SIMD fix.

     
  • DRC

    DRC - 2015-01-15

    I'm not the expert on MIPS at all. All I can tell you is that our code will enable DSPr2 extensions under the following circumstances:

    -- If -mdspr2 is passed to the compiler, then DSPr2 is always enabled and can never be disabled.
    -- If -mdspr2 is not passed to the compiler, then the library checks /proc/cpuinfo at run time and enables DSPr2 if it contains "MIPS 74K".

    It would be worthwhile to test whether, for whatever reason, DSPr2 is being compile-time enabled. You could make the following modification, for instance, to simd/jsimd_mips.c:

    #if defined(__MIPSEL__) && defined(__mips_dsp) && (__mips_dsp_rev >= 2)
      #error "DSPr2 is compile-time enabled"
      simd_support |= JSIMD_MIPS_DSPR2;
    

    Also check and see what /proc/cpuinfo says on your test machines.

    I could envision this illegal instruction error possibly occurring if you were building the library with compile-time DSPr2 support and then running it on a machine that lacked DSPr2 support. You'll have to tell me whether or not that's the case, though.

     

    Last edit: DRC 2015-01-16
  • DRC

    DRC - 2015-01-16

    The floating point test issue is fixed at least, and the fix is available in SVN trunk or branches/1.4.x (branches/1.4.x is the stable branch where 1.4.1 is evolving, so that's probably what you want. It also contains other enhancements to the test suite that might prove useful.)

    It turns out that there are basically three sets of results that the floating point DCT/IDCT can produce:

    (1) Our SSE SIMD extensions are slightly more accurate than the C code, so one set of results is produced when those extensions are enabled.

    (2) A second set of results is produced when using the C code with 32-bit floating point math-- for instance, on a 32-bit FPU or a 64-bit FPU running in 32-bit mode. These results are also produced when running on an x86-64 CPU without the libjpeg-turbo SIMD extensions, because GCC uses SSE (which uses 32-bit float) by default to do floating point math on x86-64 (unless you specify -mfpmath=387.)

    (3) A third set of results is produced when using the C code with 64-bit floating point math-- for instance, on a 64-bit FPU.

    Since it is basically impossible for the test suite to determine which type of floating point math will be used, I punted and made it a run-time option. You can now specify:

    make test FLOATTEST=sse
    
    make test FLOATTEST=32bit
    

    or

    make test FLOATTEST=64bit
    

    to validate the test against the floating point results that you expect for a particular platform. More specifically, you should use "sse" for all x86 platforms, "32bit" for MIPS and ARM, and "64bit" for ARM64, PowerPC, and S390X.

     
  • Ondrej Sury

    Ondrej Sury - 2015-01-16

    It's not the case (although I have encountered a similar situation elsewhere), since the tests are built in one go on a buildd machine.

    /proc/cpuinfo on a porter box (might be different from buildd, but not much):

    $ cat /proc/cpuinfo
    system type : lemote-fuloong-2e-box
    machine : Unknown
    processor : 0
    cpu model : ICT Loongson-2 V0.2 FPU V0.1
    BogoMIPS : 443.90
    wait instruction : no
    microsecond timers : yes
    tlb_entries : 64
    extra interrupt vector : no
    hardware watchpoint : yes, count: 0, address/irw mask: []
    isa : mips1 mips2 mips3
    ASEs implemented :
    shadow register sets : 1
    kscratch registers : 0
    package : 0
    core : 0
    VCED exceptions : not available
    VCEI exceptions : not available

    gdb backtrace on the mipsel shows:

    Program received signal SIGILL, Illegal instruction.
    0x77f99918 in jsimd_h2v1_extrgb_merged_upsample_mips_dspr2 () at jsimd_mips_dspr2.S:863
    863 GENERATE_H2V1_MERGED_UPSAMPLE_MIPS_DSPR2 extrgb, 6, 0, 1, 2, 6, 3, 4, 5, 6
    (gdb) bt full

    0 0x77f99918 in jsimd_h2v1_extrgb_merged_upsample_mips_dspr2 () at jsimd_mips_dspr2.S:863

    No locals.

    1 0x77f965cc in jsimd_h2v1_merged_upsample (cinfo=<optimized out="">, input_buf=<optimized out="">, in_row_group_ctr=<optimized out="">, output_buf=<optimized out="">) at jsimd_mips.c:668

        mipsdspr2fct = <optimized out>
    

    2 0x77f80c5c in merged_1v_upsample (cinfo=<optimized out="">, input_buf=<optimized out="">, in_row_group_ctr=0x41f9d4, in_row_groups_avail=<optimized out="">, output_buf=0x41db9c,

    out_row_ctr=0x7fff32e8, out_rows_avail=1) at jdmerge.c:312
        upsample = <optimized out>
    

    3 0x77f7c7e0 in process_data_simple_main (cinfo=0x7fff33c0, output_buf=0x41db9c, out_row_ctr=<optimized out="">, out_rows_avail=<optimized out="">) at jdmainct.c:370

        main_ptr = 0x41f9a0
        rowgroups_avail = 8
    

    4 0x77f758cc in jpeg_read_scanlines (cinfo=0x7fff33c0, scanlines=0x41db9c, max_lines=1) at jdapistd.c:176

        row_ctr = 0
    

    5 0x0040114c in main (argc=7, argv=0x7fff3694) at djpeg.c:642

        cinfo = {err = 0x7fff333c, mem = 0x41c008, progress = 0x0, client_data = 0x0, is_decompressor = 1, global_state = 205, src = 0x41c140, image_width = 227, image_height = 149, 
          num_components = 3, jpeg_color_space = JCS_YCbCr, out_color_space = JCS_RGB, scale_num = 1, scale_denom = 1, output_gamma = 1, buffered_image = 0, raw_data_out = 0, 
          dct_method = JDCT_IFAST, do_fancy_upsampling = 0, do_block_smoothing = 1, quantize_colors = 0, dither_mode = JDITHER_FS, two_pass_quantize = 1, desired_number_of_colors = 256, 
          enable_1pass_quant = 0, enable_external_quant = 0, enable_2pass_quant = 0, output_width = 227, output_height = 149, out_color_components = 3, output_components = 3, 
          rec_outbuf_height = 1, actual_number_of_colors = 0, colormap = 0x0, output_scanline = 0, input_scan_number = 1, input_iMCU_row = 1, output_scan_number = 1, output_iMCU_row = 1, 
          coef_bits = 0x0, quant_tbl_ptrs = {0x41c170, 0x41c200, 0x0, 0x0}, dc_huff_tbl_ptrs = {0x41c290, 0x41c4d0, 0x0, 0x0}, ac_huff_tbl_ptrs = {0x41c3b0, 0x41c5f0, 0x0, 0x0}, 
          data_precision = 8, comp_info = 0x41da80, progressive_mode = 0, arith_code = 0, arith_dc_L = '\000' <repeats 15 times>, arith_dc_U = '\001' <repeats 16 times>, 
          arith_ac_K = '\005' <repeats 16 times>, restart_interval = 0, saw_JFIF_marker = 1, JFIF_major_version = 1 '\001', JFIF_minor_version = 1 '\001', density_unit = 0 '\000', 
          X_density = 1, Y_density = 1, saw_Adobe_marker = 0, Adobe_transform = 0 '\000', CCIR601_sampling = 0, marker_list = 0x0, max_h_samp_factor = 2, max_v_samp_factor = 1, 
          min_DCT_scaled_size = 8, total_iMCU_rows = 19, sample_range_limit = 0x41df80 "", comps_in_scan = 3, cur_comp_info = {0x41da80, 0x41dad4, 0x41db28, 0x0}, MCUs_per_row = 15, 
          MCU_rows_in_scan = 19, blocks_in_MCU = 4, MCU_membership = {0, 0, 1, 2, 0, 0, 0, 0, 0, 0}, Ss = 0, Se = 63, Ah = 0, Al = 0, unread_marker = 0, master = 0x41de60, main = 0x41f9a0, 
          coef = 0x41f8a0, post = 0x41f430, inputctl = 0x41c120, marker = 0x41c070, entropy = 0x41f7b0, idct = 0x41f450, upsample = 0x41e400, cconvert = 0x0, cquantize = 0x0}
        jerr = {error_exit = 0x77f86224 <error_exit>, emit_message = 0x77f85fa0 <emit_message>, output_message = 0x77f8618c <output_message>, format_message = 0x77f86038 <format_message>, 
          reset_error_mgr = 0x77f86028 <reset_error_mgr>, msg_code = 105, msg_parm = {i = {0, 63, 0, 0, 0, 0, 0, 0}, 
            s = "\000\000\000\000?", '\000' <repeats 35 times>, "\344A\374w\354A\374w\364A\374w", '\000' <repeats 27 times>}, trace_level = 0, num_warnings = 0, 
          jpeg_message_table = 0x77fc0e00 <jpeg_std_message_table>, last_jpeg_message = 126, addon_message_table = 0x417f4c <cdjpeg_message_table>, first_addon_message = 1000, 
          last_addon_message = 1044}
        file_index = <optimized out>
        dest_mgr = 0x41db80
        input_file = 0x41c770
        output_file = 0x41c8e0
        inbuffer = 0x0
        insize = <optimized out>
        num_scanlines = <optimized out>
    

    Would compiling with -O0 help or is this enough?

     
  • DRC

    DRC - 2015-01-16

    Based on your /proc/cpuinfo, it doesn't appear that your CPU supports DSPr2 instructions-- I could be wrong about that, though. Like I said, I'm not the expert. All of the MIPS code was submitted, and I have not been able to contact the submitter.

    Can you verify as I suggested above whether DSPr2 is being compile-time enabled? If so, then I don't think that's what you want. We need to figure out how to build it such that it checks /proc/cpuinfo at run time instead.

     
  • Ondrej Sury

    Ondrej Sury - 2015-01-20

    Sourceforge hates me and forced me to re-login while loosing previous message. Anyway...

    I have enhanced your patch to error-out on every condition and it takes the run-time path, so DSPr2 is not enforced. The all previous builds have all failed on "Invalid instruction", so there must be something else going on.

    Should I patch the code to print out warning every time DSPr2 is picked on run-time and test again?

     
  • DRC

    DRC - 2015-01-20

    Yes, as a sanity check, I would suggest adding a print statement to all of the functions in simd/jsimd_mips.c. That will determine if any of them is actually called (other than the can functions, of course. Those are always called.)

    If no DSPr2 functions are being called, then the next thing I would check is whether somehow the C flags are different between the "normal" build and the --without-simd build. That is, execute 'make V=1' with both builds and diff the outputs to see if perhaps a different flag is being inserted in the SIMD-enabled build.

    Before writing this comment, I double-checked the code to make sure that, at least to my eye, none of the above issues exist, but actually inserting the print statements is the only way to know for sure. I would happily test all of the above myself if I had access to a MIPS machine.

     

    Last edit: DRC 2015-01-20
  • Ondrej Sury

    Ondrej Sury - 2015-01-21

    Gosh, SF has logged me out again and lost the long comment. Let's make it short now then.

    The debugging fprintf() helped. The _h2v[12]can* functions were missing init_simd() call and thus the call was always true (and probably optimized out due ~0U initialization).

    Attached patches fixes the issue (and I have also checked rest of the code and it seems to be ok).

     
  • DRC

    DRC - 2015-01-21

    Yes, I've unfortunately had to get in the habit of copying my comments to the clipboard before hitting "Post", because SourceForge eats them sometimes. I haven't had the time to submit a bug report on that-- but more than likely, someone already beat me to it.

    I've checked in your patch to trunk and branches/1.4.x. Thanks for helping diagnose this.

     
  • DRC

    DRC - 2015-01-21
    • status: open --> closed-fixed
     
  • DRC

    DRC - 2015-05-15

    Question:

    Is there a mechanism for pushing out somewhat bleeding edge code to the Debian community, sort of like Fedora? I have SIMD support for PowerPC/AltiVec that is implemented in the libjpeg-turbo SVN trunk but needs testing by the community. Since I have no idea when libjpeg-turbo 1.5 will be released (probably not until 2016, because it's waiting on funding for several new features), it would be nice to put the AltiVec code out in a point release. However, I'm not comfortable doing that without some indication that it doesn't break builds or introduce other major bugs.

     
    • Ondrej Sury

      Ondrej Sury - 2015-05-22

      There's nothing formal, but if you ask me, I can prepare and upload anything you want to spread around into Debian experimental.

      Just don't force me to work with subversion please :)

       
  • DRC

    DRC - 2015-05-22

    Well, the code is not released yet-- that's kind of the point of this. I want to get feedback on it to figure out how stable it is and whether it would make sense to push it out sooner rather than later. You can use git-svn to check it out:

    git svn clone svn://svn.code.sf.net/p/libjpeg-turbo/code/trunk libjpeg-turbo-experimental
    

    or I can generate a tarball if you'd prefer that.

     
  • DRC

    DRC - 2015-07-27

    We're now on GitHub, so you can grab the AltiVec code from the master branch:

    https://github.com/libjpeg-turbo/libjpeg-turbo

    Would appreciate any testing you can do. If I can get at least some sense that the code isn't broken, I'll push it out in 1.4.2.

     
    • Ondrej Sury

      Ondrej Sury - 2015-07-28
       
      • DRC

        DRC - 2015-07-28

        Awesome. Thanks!

         

Log in to post a comment.