Menu

#35 x32 ABI support

closed-wont-integrate
nobody
None
5
2014-03-27
2012-06-09
No

here's an initial patch at making x32 at least build

Discussion

  • Mike Frysinger

    Mike Frysinger - 2012-06-09

    libjpeg-turbo-1.2.0-x32.patch

     
  • DRC

    DRC - 2012-06-10

    Does this pass 'make test'? I assume not, because my understanding of x32 is that it's allowing you to access the full x86-64 register file (which libjpeg-turbo doesn't use anyhow) but is still using 32-bit pointers and data. I'm not sure how this looks from the assembly point of view, but I'm betting it doesn't look exactly like either our 32-bit or our 64-bit SIMD routines but rather a combination of both. That is just a guess, though, because information on this ABI seems hard to come by. At any rate, it seems certain that using the x86-64 SIMD routines would not work, and that's what this patch would do, as implemented. Note also that libjpeg-turbo reaps significant performance benefits from 64-bit, mainly in the Huffman routines. Part of this comes from the additional word size, but part of it is also from being able to use additional registers to avoid memory swaps in the inner loops. If the compiler is smart enough, it seems like it should at least be able to extract the latter benefit using x32 as well.

    What is the minimum development stack necessary to test this stuff? I.e. what versions of the kernel, glibc, GCC, nasm, etc. are necessary?

     
  • DRC

    DRC - 2012-06-10

    I should clarify my comment about the 64-bit registers: what I meant was that in the SSE2 assembly routines, LJT does not take advantage of the additional xmm registers available in 64-bit mode, because testing has shown no performance advantage to that (apparently the minimal amount of swapping that the algorithms do to fit within xmm0-xmm7 creates only a negligible hit.) Thus, there is really no difference between the performance of the SIMD code in 32-bit and 64-bit modes. The main difference in performance comes from the Huffman code, which is all in C and is thus heavily compiler-dependent.

     
  • Mike Frysinger

    Mike Frysinger - 2012-06-14

    correct, this patch is just the starting point. it makes building as x32 objects work, but the code crashes at runtime.

    for data, "unsigned long" is 32bit, but "unsigned long long" will be 64bit in a single register -- no overhead like x86 where the two 32bit halves are split over two registers. in assembly, you're free to use the full 64bit register file like x86_64, so the speedups with the larger register set and register size over x86 are not lost.

    linux-3.4, gcc-4.7.0, binutils-2.22, nasm-2.10, glibc-2.16 should get you going. if you just update your kernel to 3.4 (and enable the x32 ABI in the kconfig), you can grab a Gentoo chroot to speed things up:
    http://distfiles.gentoo.org/experimental/amd64/x32/

    just unpack the stage3 tarball and chroot into it. then do:
    emerge nasm

    then you should have a full development environment for testing libjpeg-turbo

     
  • DRC

    DRC - 2012-06-14

    I don't have the time to dive into it at the moment unless some organization were to step forward and fund the effort (not likely-- this stuff is too bleeding edge for anyone to have identified a commercial need for it yet.)

    If you can make it actually run and pass 'make test', then I'll be glad to peer review your work. I suspect it will require at least some modification of the assembly code, so I'm not really relishing the thought of that. Until it can pass the unit tests, there is no point to checking it into the repository.

     
  • Siarhei Siamashka

    Maybe a good start would be to just disable all the assembly optimizations specifically for x32 and make sure that libjpeg-turbo works correctly when compiled as C code?

     
  • Siarhei Siamashka

    My understanding is that libjpeg-turbo-1.2.1 is around the corner. So if anybody is interested in non-broken build of libjpeg-turbo for x32, then this needs to be handled really fast.

    BTW, patching the auto-generated "configure" file does not look right.

     
  • Mike Frysinger

    Mike Frysinger - 2012-06-14

    yes, tests pass when using --without-simd

    the patching of configure was meant for packaging only. just change the patch header to apply to acinclude.m4 (since that's what the patch is actually written against).

     
  • DRC

    DRC - 2012-06-14

    Since this patch only affects the SIMD build, then --without-simd is completely bypassing it, so that leads me to believe that libjpeg-turbo already works on X32 without SIMD. No reason why it wouldn't. Thus, this patch serves no purpose at the moment.

    To make libjpeg-turbo work with X32 will likely require new SIMD routines, or at least some moderately disruptive mods to the existing ones. If I'm not understanding that correctly, then please correct me, but I don't think X32 assembly looks exactly like x86-64 assembly or x86 assembly.

    In any case, this isn't going in 1.2.1. I simply don't have time to mess with getting a build environment in place for it now. X32 seems really bleeding edge at the moment, anyhow. I think it needs more time to cook.

    If someone wants to submit a more comprehensive patch that passes 'make test' with SIMD enabled, I will evaluate that for inclusion in the next major release.

     
  • DRC

    DRC - 2014-03-27

    Closing as WNF until a proper x32 patch is submitted that passes the unit tests.

     
  • DRC

    DRC - 2014-03-27
    • status: open --> wont-fix
     
  • DRC

    DRC - 2014-03-27
    • Status: wont-fix --> closed-wont-integrate