Menu

#72 Don't use .func / .endfunc in arm neon assembly

closed-fixed
nobody
None
1
2014-10-28
2014-08-18
No

Hi,

jsimd_arm_neon.S and jsimd_arm_neon_64.S use .func / .endfunc around functions. This only has an effect with STABS debug info (-gstabs), but almost everyone builds with dwarf debug info (and the only effect is to emit debug entries). Furthermore, clang's integrated assembler doesn't understand .func / .endfunc (and they don't want to add STABS support, as it's not really used anymore): http://llvm.org/bugs/show_bug.cgi?id=20424

Would it be possible to just remove the .func / .endfunc lines from libjpeg_turbo? Then it would compile with clang's built-in assembler and I don't think anyone would miss those lines.

Thanks,
Nico

ps: You simplify the ifdef apple block in these two files by using USER_LABEL_PREFIX -- that's a symbol that evaluates to "_" on systems that use underscores and to nothing elsewhere. Example:

#define GLUE2(a, b) a ## b
#define GLUE(a, b) GLUE2(a, b)
#define SYMBOL_NAME(name) GLUE(__USER_LABEL_PREFIX__, name)

.macro asm_function fname
SYMBOL_NAME(\fname)
...

Discussion

  • Nicolas Weber

    Nicolas Weber - 2014-08-18

    Whow. That's supposed to be #define GLUE etc. But the .func / .endfunc bit is the bit I'm really interested in.

     
  • DRC

    DRC - 2014-08-18

    Forwarded to the NEON developers, who can hopefully answer the question. I've never had trouble with building the ARMv7 code for iOS using clang, but we use gas-preprocessor.pl, which I believe filters out or renames the .func/.endfunc directives.

    The asm_function macro serves other purposes (note the #ifdef __ELF__), so it wouldn't really simplify it to do as you suggest-- it would just make it more difficult to read.

     

    Last edit: DRC 2014-08-18
  • DRC

    DRC - 2014-08-18

    Also, FYI-- SourceForge's tracker editor is stupid, and it will turn a line into

    large characters

    if the line is prefixed by a # sign. To get around this, indent lines with tab in order to

    display them in a code block.
    
     
  • Siarhei Siamashka

    Hi Nicolas,

    It would be easier for everyone if you could submit a patch with your
    proposed changes and also provide the compilation logs before and after
    it is applied (to confirm that there was a problem and that the patch helps).

    I don't have any Apple device, so can't comment much. But is it really
    the only required change? I used to think that gas-preprocessor.pl
    actually takes care of a lot more problems.

     
  • Nicolas Weber

    Nicolas Weber - 2014-08-19

    This is the patch I applied in chromium: https://codereview.chromium.org/481243002/diff/1/simd/jsimd_arm_neon.S

    It builds fine on android and linux, with gcc and with clang's integrated assembler. (We're using a clang close to clang trunk, not the one bundled with Xcode.) This is the only change I had to make to get libjpeg_turbo compile with clang's integrated assembler. (Again, targeting android and linux, not ios.)

     
  • Nicolas Weber

    Nicolas Weber - 2014-08-19

    (The libjpeg_turbo website said that filing tickets is the way to go. If there's better documentation somewhere how to contribute patches, I'd appreciate a pointer to it, and apologize for missing it.)

     
  • Siarhei Siamashka

    Thanks for clarifying that you actually use clang in android and linux.

    I have just tried to compile jsimd_arm_neon.S with clang 3.4 (the latest release at the moment) and it gives me the following error messages:

    $ clang -integrated-as jsimd_arm_neon.S 
    jsimd_arm_neon.S:31:1: error: unknown directive
    .arch armv7a
    ^
    jsimd_arm_neon.S:32:1: error: unknown directive
    .object_arch armv4
    ^
    <instantiation>:6:5: error: unknown directive
        .func jsimd_idct_islow_neon
        ^
    jsimd_arm_neon.S:199:1: note: while in macro instantiation
    asm_function jsimd_idct_islow_neon
    ^
    jsimd_arm_neon.S:270:16: error: invalid operand for instruction
          ldrd r4, [COEF_BLOCK, #(-96 + 2 * (4 + 1 * 8))]
                   ^
    jsimd_arm_neon.S:276:16: error: invalid operand for instruction
          ldrd r4, [COEF_BLOCK, #(-96 + 2 * (4 + 2 * 8))]
                   ^
    jsimd_arm_neon.S:283:16: error: invalid operand for instruction
          ldrd r4, [COEF_BLOCK, #(-96 + 2 * (4 + 3 * 8))]
                   ^
    jsimd_arm_neon.S:291:16: error: invalid operand for instruction
          ldrd r4, [COEF_BLOCK, #(-96 + 2 * (4 + 4 * 8))]
                   ^
    jsimd_arm_neon.S:299:16: error: invalid operand for instruction
          ldrd r4, [COEF_BLOCK, #(-96 + 2 * (4 + 5 * 8))]
    

    ... and a lot more errors, but I skipped the rest of the log.

    $ clang --version
    clang version 3.4.1 (tags/RELEASE_34/dot1-final)
    Target: armv7a-hardfloat-linux-gnueabi
    Thread model: posix
    

    Removing .func/.endfunc eliminates only the .func/.endfunc complaints, but the rest of the problems are still there. So could you please clarify it again? How exactly you are compiling libjpeg-turbo in arm linux using clang? What is your clang version? Any other important details?

     
  • Siarhei Siamashka

    BTW, you are right about the .func/.endfunc directives redundancy.
    The GCC compiler does not emit them (anymore?) in the generated
    assembly code if compiling something with '-S' option. So we indeed
    should not do it either.

     
  • Siarhei Siamashka

    If there's better documentation somewhere how to contribute patches,
    I'd appreciate a pointer to it, and apologize for missing it.)

    Use the 'Add attachments' link on this page and attach the patch here.

     
  • DRC

    DRC - 2014-08-20

    Upon further reflection, I actually ran into that same problem with iOS builds as well (the "invalid operand for instruction" errors.) I had to modify the build instructions so that -no-integrated-as is passed to clang to work around the issue. This supposedly invokes the "system assembler", but I'm not sure what that means, since I'm cross-compiling on an x86 system.

     
  • Nicolas Weber

    Nicolas Weber - 2014-08-20

    clang 3.5 will be released soon, maybe it'll be able to compile libjpeg-turbo with this patch (but even it might not have enough arm asm support -- some of my changes to clang's arm assembler landed just last week, and there's more work to be done as far as I can tell. Clang trunk can compile libjpeg-turbo with the attached patch applied.)

     
  • Siarhei Siamashka

    Thanks, the patch looks good to me (DRC can probably already apply it
    safely). And both patched/unpatched jsimd_arm_neon.S produce binary
    identical object files when compiled with the GNU assembler.

    It is good to know that the integrated assembler in clang is getting
    improved and we may eventually retire the gas-preprocessor.pl (which
    was really a stop gap solution). It is also very good that Apple
    decided not to invent some sort of a "new" macro preprocessor, but
    apparently intends to ensure compatibility with the GNU assembler,
    which is a de-facto standard for the ARM assembly optimizations.

    And as the last request, could you please confirm that "make test"
    check passes if libjpeg-turbo is compiled with clang 3.5?

     
  • DRC

    DRC - 2014-08-22

    Siarhei, were you ever able to build this? I'm still getting the same errors that you were above. Is clang 3.5 required? Can you verify whether the same thing needs to be done in the ARM64 code?

     
  • Siarhei Siamashka

    I only checked that this change does not cause regressions for the
    GNU assembler.

    My understanding is that Nicolas is one of the clang developers or
    contributors. And they are working on improving the integrated
    assembler in the future clang 3.5 release. We can only welcome
    this news, but can't easily verify the improvements yet.

    As for the ARM64 code, I can't verify it myself, but would guess
    that .func/.endfunc directives can be also safely removed.

     
  • DRC

    DRC - 2014-08-22
    • status: open --> closed-fixed
     
  • Nicolas Weber

    Nicolas Weber - 2014-08-26

    Yes, I'm a clang contributor. If you have a 64 bit file, doing the same change there would be good.

    …oh, I see that https://sourceforge.net/p/libjpeg-turbo/code/1375 did the change to both files. Thanks much! :-)

     
  • Siarhei Siamashka

    You are welcome :-)

    Still, as I mentioned earlier, you are strongly encouraged to make sure
    that "make test" check does not detect any problems when libjpeg-turbo
    is built with clang 3.5 and its integrated assembler. We are interested not
    only in successful compilation, but also in correct execution at runtime.

     
MongoDB Logo MongoDB