Hi,
libjpeg-turbo is failing on more exotic architectures as seen here:
https://buildd.debian.org/status/package.php?p=libjpeg-turbo&suite=experimental
Namely: arm64, mipsel, powerpc, s390x (maybe even more, but some archs are still waiting to be built).
Most fails with:
./djpeg -dct float -outfile testout_3x2_float.ppm testout_3x2_float_prog.jpg
md5/md5cmp f6bfab038438ed8f5522fbd33595dcdc testout_3x2_float.ppm
testout_3x2_float.ppm: FAILED. Checksum is 0e917a34193ef976b679a6b069b1be26
except mipsel that fails with:
./djpeg -dct fast -nosmooth -outfile testout_422m_ifast.ppm testout_422_ifast_opt.jpg
make[1]: *** [test] Illegal instruction
Full build logs are available when you click on "Build-Attempted".
Unfortunately I can't give you access to builder boxes, but I can help you debugging this if you send me instructions what to do...
Attaching build logs for reference.
I have contacted Teodora (the programmer for MIPS Technologies who developed the DSPr2 code.) Hopefully he can diagnose the error. In the meantime, can you try the following so I can isolate exactly where the errors are?
-- configure the MIPS code with --without-simd and see if the error persists. I suspect it goes away.
-- On the other platforms, try commenting out the floating point tests and see if the rest of the tests pass. They should (I'm actively working on PowerPC right now, so I have experienced the same error.)
The problem with the floating point DCT/IDCT, in general, is that it doesn't always produce the same bitwise results. It depends on the compiler being used, the particulars of the FPU, etc. All of the x86 compilers generate results that match the included MD5 sums, and of course the results are deterministic when the included SSE-accelerated float DCT/IDCT are used. However, non-x86 platforms tend to not produce the same bitwise results on those tests. The float DCT/IDCT are really legacy features and are not accelerated on any platform except x86, and I'm not recommending that any new SIMD extensions be implemented for those algorithms, since the float algorithms really have no advantage over the slow integer algorithms. Thus, I think the best approach would be to move those tests into an optional heading, so that they are only run when one invokes 'make floattest', or perhaps I can make it so that they only run on x86.
I have disabled float tests on all archs (for now - no point in finetunning the "experimental") and it looks much better:
https://buildd.debian.org/status/package.php?p=libjpeg-turbo&suite=experimental
Disabling SIMD on mipsel also seems to have helped.
OK, great. I'll make the floating point fix in our repository and keep you updated as to the status of the MIPS SIMD fix.
Looks like there's same problem on mips:
https://buildd.debian.org/status/fetch.php?pkg=libjpeg-turbo&arch=mips&ver=1%3A1.4.0-2&stamp=1421269002
(mips and mipsel just differ in endianess)
I'm not the expert on MIPS at all. All I can tell you is that our code will enable DSPr2 extensions under the following circumstances:
-- If -mdspr2 is passed to the compiler, then DSPr2 is always enabled and can never be disabled.
-- If -mdspr2 is not passed to the compiler, then the library checks /proc/cpuinfo at run time and enables DSPr2 if it contains "MIPS 74K".
It would be worthwhile to test whether, for whatever reason, DSPr2 is being compile-time enabled. You could make the following modification, for instance, to simd/jsimd_mips.c:
Also check and see what /proc/cpuinfo says on your test machines.
I could envision this illegal instruction error possibly occurring if you were building the library with compile-time DSPr2 support and then running it on a machine that lacked DSPr2 support. You'll have to tell me whether or not that's the case, though.
Last edit: DRC 2015-01-16
The floating point test issue is fixed at least, and the fix is available in SVN trunk or branches/1.4.x (branches/1.4.x is the stable branch where 1.4.1 is evolving, so that's probably what you want. It also contains other enhancements to the test suite that might prove useful.)
It turns out that there are basically three sets of results that the floating point DCT/IDCT can produce:
(1) Our SSE SIMD extensions are slightly more accurate than the C code, so one set of results is produced when those extensions are enabled.
(2) A second set of results is produced when using the C code with 32-bit floating point math-- for instance, on a 32-bit FPU or a 64-bit FPU running in 32-bit mode. These results are also produced when running on an x86-64 CPU without the libjpeg-turbo SIMD extensions, because GCC uses SSE (which uses 32-bit float) by default to do floating point math on x86-64 (unless you specify -mfpmath=387.)
(3) A third set of results is produced when using the C code with 64-bit floating point math-- for instance, on a 64-bit FPU.
Since it is basically impossible for the test suite to determine which type of floating point math will be used, I punted and made it a run-time option. You can now specify:
or
to validate the test against the floating point results that you expect for a particular platform. More specifically, you should use "sse" for all x86 platforms, "32bit" for MIPS and ARM, and "64bit" for ARM64, PowerPC, and S390X.
It's not the case (although I have encountered a similar situation elsewhere), since the tests are built in one go on a buildd machine.
/proc/cpuinfo on a porter box (might be different from buildd, but not much):
$ cat /proc/cpuinfo
system type : lemote-fuloong-2e-box
machine : Unknown
processor : 0
cpu model : ICT Loongson-2 V0.2 FPU V0.1
BogoMIPS : 443.90
wait instruction : no
microsecond timers : yes
tlb_entries : 64
extra interrupt vector : no
hardware watchpoint : yes, count: 0, address/irw mask: []
isa : mips1 mips2 mips3
ASEs implemented :
shadow register sets : 1
kscratch registers : 0
package : 0
core : 0
VCED exceptions : not available
VCEI exceptions : not available
gdb backtrace on the mipsel shows:
Program received signal SIGILL, Illegal instruction.
0x77f99918 in jsimd_h2v1_extrgb_merged_upsample_mips_dspr2 () at jsimd_mips_dspr2.S:863
863 GENERATE_H2V1_MERGED_UPSAMPLE_MIPS_DSPR2 extrgb, 6, 0, 1, 2, 6, 3, 4, 5, 6
(gdb) bt full
0 0x77f99918 in jsimd_h2v1_extrgb_merged_upsample_mips_dspr2 () at jsimd_mips_dspr2.S:863
No locals.
1 0x77f965cc in jsimd_h2v1_merged_upsample (cinfo=<optimized out="">, input_buf=<optimized out="">, in_row_group_ctr=<optimized out="">, output_buf=<optimized out="">) at jsimd_mips.c:668
2 0x77f80c5c in merged_1v_upsample (cinfo=<optimized out="">, input_buf=<optimized out="">, in_row_group_ctr=0x41f9d4, in_row_groups_avail=<optimized out="">, output_buf=0x41db9c,
3 0x77f7c7e0 in process_data_simple_main (cinfo=0x7fff33c0, output_buf=0x41db9c, out_row_ctr=<optimized out="">, out_rows_avail=<optimized out="">) at jdmainct.c:370
4 0x77f758cc in jpeg_read_scanlines (cinfo=0x7fff33c0, scanlines=0x41db9c, max_lines=1) at jdapistd.c:176
5 0x0040114c in main (argc=7, argv=0x7fff3694) at djpeg.c:642
Would compiling with -O0 help or is this enough?
Based on your /proc/cpuinfo, it doesn't appear that your CPU supports DSPr2 instructions-- I could be wrong about that, though. Like I said, I'm not the expert. All of the MIPS code was submitted, and I have not been able to contact the submitter.
Can you verify as I suggested above whether DSPr2 is being compile-time enabled? If so, then I don't think that's what you want. We need to figure out how to build it such that it checks /proc/cpuinfo at run time instead.
Sourceforge hates me and forced me to re-login while loosing previous message. Anyway...
I have enhanced your patch to error-out on every condition and it takes the run-time path, so DSPr2 is not enforced. The all previous builds have all failed on "Invalid instruction", so there must be something else going on.
Should I patch the code to print out warning every time DSPr2 is picked on run-time and test again?
Yes, as a sanity check, I would suggest adding a print statement to all of the functions in simd/jsimd_mips.c. That will determine if any of them is actually called (other than the can functions, of course. Those are always called.)
If no DSPr2 functions are being called, then the next thing I would check is whether somehow the C flags are different between the "normal" build and the --without-simd build. That is, execute 'make V=1' with both builds and diff the outputs to see if perhaps a different flag is being inserted in the SIMD-enabled build.
Before writing this comment, I double-checked the code to make sure that, at least to my eye, none of the above issues exist, but actually inserting the print statements is the only way to know for sure. I would happily test all of the above myself if I had access to a MIPS machine.
Last edit: DRC 2015-01-20
Gosh, SF has logged me out again and lost the long comment. Let's make it short now then.
The debugging fprintf() helped. The _h2v[12]can* functions were missing init_simd() call and thus the call was always true (and probably optimized out due ~0U initialization).
Attached patches fixes the issue (and I have also checked rest of the code and it seems to be ok).
Yes, I've unfortunately had to get in the habit of copying my comments to the clipboard before hitting "Post", because SourceForge eats them sometimes. I haven't had the time to submit a bug report on that-- but more than likely, someone already beat me to it.
I've checked in your patch to trunk and branches/1.4.x. Thanks for helping diagnose this.
Question:
Is there a mechanism for pushing out somewhat bleeding edge code to the Debian community, sort of like Fedora? I have SIMD support for PowerPC/AltiVec that is implemented in the libjpeg-turbo SVN trunk but needs testing by the community. Since I have no idea when libjpeg-turbo 1.5 will be released (probably not until 2016, because it's waiting on funding for several new features), it would be nice to put the AltiVec code out in a point release. However, I'm not comfortable doing that without some indication that it doesn't break builds or introduce other major bugs.
There's nothing formal, but if you ask me, I can prepare and upload anything you want to spread around into Debian experimental.
Just don't force me to work with subversion please :)
Well, the code is not released yet-- that's kind of the point of this. I want to get feedback on it to figure out how stable it is and whether it would make sense to push it out sooner rather than later. You can use git-svn to check it out:
or I can generate a tarball if you'd prefer that.
We're now on GitHub, so you can grab the AltiVec code from the master branch:
https://github.com/libjpeg-turbo/libjpeg-turbo
Would appreciate any testing you can do. If I can get at least some sense that the code isn't broken, I'll push it out in 1.4.2.
https://buildd.debian.org/status/package.php?p=libjpeg-turbo&suite=experimental
Awesome. Thanks!