Menu

#456 Enable LTO, PGO and PLO for GnuCobol

GC 3.x
pending
nobody
None
5 - default
2024-08-19
2023-11-15
No

Hello.

Recently I did many Profile-Guided Optimization (PGO) benchmarks on multiple projects (including static analysis tools and compilers like Rustc, Clang, Clangd, Clang Tidy, and many others) - the results are available here. I think it's worth trying to apply PGO to the GnuCobol tooling ecosystem like the compiler (and maybe something else like code formatters, etc.).

I can suggest the following things to do:

  • Evaluate PGO's applicability to GnuCobol tooling.
  • If PGO helps to achieve better performance - add a note to GnuCobol's documentation about that. In this case, users and maintainers will be aware of another optimization opportunity for GnuCobol tools.
  • Provide PGO integration into the build scripts. It can help users and maintainers easily apply PGO for their workloads.
  • Optimize prebuilt binaries with PGO.

Here are some examples of how PGO is already integrated into other projects' build scripts:

Some PGO documentation examples in various projects:

Regarding other optimizations, I recommend enable Link-Time Optimization. Usually it's easy to implement. After PGO, I can suggest evaluating LLVM BOLT as an additional optimization step after PGO - Post Link Optimization (PLO). But I suggest starting with PGO - it's a more stable optimization than PLO in the general case. More about LLVM BOLT use cases you also can read at https://github.com/zamazan4ik/awesome-pgo .

Discussion

  • Simon Sobisch

    Simon Sobisch - 2023-11-16

    Thank you for the suggestion and for taking the time to do this summary writeup.

    For LTO: that could be useful for cobc and libcob, not for generated modules, could it?

    Can you please do two countings with make checkall and share the comparison, along with the used flags in configure?
    I guess it would be most useful to check during configure is it is available and in this case auto-add to COBC_LIBS/COBC_CFLAGS/LIBCOB_LIBS/LIBCOB_CFLAGS if not explicit disabled, do you agree? This should be quite easy to check and implement so this could get into this year's release, too.

    I do think it is reasonable to have PGO as a test project.
    I don't expect much difference in the compiler, at least not for common workloads as in this case the C compiler called takes most of the cpu time and strcasecmp has most of the time during parsing within cobc. It likely provides some benefit for mini sources with -fsyntax-only, but that's so special that people can tune this themselves.

    It likely doesn't make much difference for a generated program as nearly all time (outside of memset/memcmp) is spent in libcob.

    But applying that to libcob could indeed make a difference. A test can prove that easily by gathering the instructions during make checkall (we can easily check this by just seeing an environment variable)

    • get initial counters
    • reconfigure and compile libcob with pgo
    • get pgo data by running make checkall
    • apply pgo
    • get new counters

    Can you please test that after taking LTO? I'd be available to help by mail or Matrix, or in this ticket.

     
  • Zaitsev Alexander

    I will try to perform these tests when I get free time. Thanks a lot for a test scenario suggestion!

    By the way, since GnuCobol implements Ahead-of-Time compilation model - is it possible to compile a Cobol program with PGO? Are there PGO-related compiler switches in GnuCobol? Maybe you heard about some Cobol compilers with PGO support?

     
    • Simon Sobisch

      Simon Sobisch - 2023-11-16

      is it possible to compile a Cobol program with PGO?

      If you manually specify the needed options via -A "compile options here" -Q "link options here", then yes. But as noted I don't think there will be much benefit as the "real work" is nearly completely done in libcob, and the best optimizations you commonly get is with a recent GCC and passing CFLAGS="-O2 -march=native -mtune=native -g" to GnuCOBOL's configure command (all but -O2 will be included in cobc calling the C compiler).

      I will try to perform these tests when I get free time. Thanks a lot for a test scenario suggestion!

      As noted, it would be quite useful to have this comparison for lto, so I'll let it run during the day and report the reults back.

       
  • Simon Sobisch

    Simon Sobisch - 2023-11-16

    LTO check on up-to-date Debian gcc (Debian 10.2.1-6) 10.2.1 20210110 + GNU assembler (GNU Binutils for Debian) 2.35.2 with AMD Ryzen 7 3700X 8-Core Processor (I've used the options mentioned in the Debian lto status page)

    ../configure CFLAGS="-O2 -g"
    make -j8
    make -C tests -j4 checkall TESTSUITEFLAGS="--jobs=6" PERFSUFFIX="build_gcno_sysopt"
    
    Aggregation for build_gcno_opt/tests/perf/cobcrun/build_gcno_sysopt.log:
    
          instructions:u   119.311.669.526
    seconds time elapsed           29.2878
            seconds user           21.2240
             seconds sys            1.1919
    
    
    ../configure CFLAGS="-O2 -g" LIBCOB_CPPFLAGS="-flto=auto -ffat-lto-objects" COBC_CPPFLAGS="-flto=auto -ffat-lto-objects" LIBCOB_LIBS="-flto=auto" PROGRAMS_LIBS="-flto=auto"
    make -j8
    make -C tests -j4 checkall TESTSUITEFLAGS="--jobs=6" PERFSUFFIX="build_gc_lto_only"
    
    
    Aggregation for build_lto/tests/perf/cobcrun/build_gc_lto_only.log:
    
          instructions:u   119.270.506.812
    seconds time elapsed           29.6686
            seconds user           21.5349
             seconds sys            1.2562
    
    
    ../configure CFLAGS="-O2 -g -march=native -mtune=native"
    make -j8
    make -C tests -j4 checkall TESTSUITEFLAGS="--jobs=6" PERFSUFFIX="build_gcno_archopt"
    
    Aggregation for build_gc_archopt/tests/perf/cobcrun/build_gcno_archopt.log:
    
          instructions:u   120.159.186.718
    seconds time elapsed           32.1405
            seconds user           23.5175
             seconds sys            1.2158
    
    
    ../configure CFLAGS="-O2 -g -march=native -mtune=native" LIBCOB_CPPFLAGS="-flto=auto -ffat-lto-objects" COBC_CPPFLAGS="-flto=auto -ffat-lto-objects" LIBCOB_LIBS="-flto=auto" PROGRAMS_LIBS="-flto=auto"
    make -j8
    make -C tests -j4 checkall TESTSUITEFLAGS="--jobs=6" PERFSUFFIX="build_gc_arch_and_lto"
    
    
    Aggregation for build_arch_and_lto/tests/perf/cobcrun/build_gc_arch_and_lto.log:
    
          instructions:u   120.112.323.343
    seconds time elapsed           30.8471
            seconds user           22.6944
             seconds sys            1.3432
    

    So in this setup - there isn't any improvement with native march/mtune (there may be better specific values) and the LTO options used bring around 0.05% decrease in cpu instructions and cpu time.
    Using newer versions of GCC/binutils likely bring in different timing differences. using a different OS/compilers may lead to different results as well.

    Note that the only parts that were traced are the actual COBOL runs and optimization was only applied to the runtime (and cobc, but this was not traced), not to the COBOL modules to not increase the time the called compiler spends (if you have huge COBOL sources after all preparsers and every preprocessing is done, which is not uncommon, then you may already have to wait for minutes).

     

    Last edit: Simon Sobisch 2023-11-16
  • Simon Sobisch

    Simon Sobisch - 2023-11-16

    Rerun on Rocky 9 with gcc (GCC) 11.3.1 20221121 (Red Hat 11.3.1-4) + GNU assembler version 2.35.2-37.el9 on Intel(R) Xeon(R) Gold 6242R CPU that has

    configure:  Use gettext for international messages:      yes
    configure:  Use fcntl for file locking:                  yes
    configure:  Use math multiple precision library:         gmp
    configure:  Use curses library for screen I/O:           ncursesw
    configure:  INDEXED I/O (no handler configured):         NO
    configure:  Used for XML I/O:                            libxml2
    configure:  JSON I/O (no handler found):                 NO
    

    has (same option as above, -O2 -g then one time with the native and -O2 options in CFLAGS only, one time additional placing that in LDFLAGS and one time with lto

    Aggregation for build/tests/perf/cobcrun/build_gc.log:
    
            instructions   123,757,574,626
    seconds time elapsed           24.2639
            seconds user           17.4197
             seconds sys            2.7934
    
    
    Aggregation for build_native/tests/perf/cobcrun/build_native.log:
    
            instructions   123,574,082,771
    seconds time elapsed           24.0246
            seconds user           17.2723
             seconds sys            2.7639
    
    
    Aggregation for build_all_native/tests/perf/cobcrun/build_all_native.log:
    
            instructions   123,566,343,985
    seconds time elapsed           24.1000
            seconds user           17.3610
             seconds sys            2.7506
    
    
    Aggregation for build_lto/tests/perf/cobcrun/build_lto.log:
    
            instructions   123,795,039,404
    seconds time elapsed           24.0250
            seconds user           17.2924
             seconds sys            2.7555
    

    The LTO version is the one that takes most time..., native brings 1% improvement.

     

    Last edit: Simon Sobisch 2023-11-16
  • Simon Sobisch

    Simon Sobisch - 2024-01-11

    @zamazan4ik I think LTO is out given my tests from November?
    PGO seems like something that should be inspected - Do you have any plans to do so?

     
    👍
    1
  • Zaitsev Alexander

    Oh, sorry that I didn't reply in time - was busy with other stuff, my bad.

    Yeah, I think your LTO tests are fine. Regarding PGO - yes, I still have such plans to perform PGO tests for GnuCobol but right now I am still busy with other stuff so cannot guarantee any ETA in this case.

     
  • Vincent (Bryan) Coen

    • Group: unclassified --> GC 3.x
     
  • Vincent (Bryan) Coen

    Re-mapped to GC v3.X and not Prog guide.

     
  • Simon Sobisch

    Simon Sobisch - 2024-08-19
    • status: open --> pending
     
  • Simon Sobisch

    Simon Sobisch - 2024-08-19

    friendly ping @zamazan4ik for possible further checks / status updates

     

Log in to post a comment.