ELF Tool Chain Wiki

BSD licensed ELF toolchain

Brought to you by: jkoshy, kaiwang27

BuildAutomation

Authors:

Build Automation

This wiki page describes a proposed build system for the Elftoolchain project.

This build system is intended to be "P2P Transparent" (the precise definition of this term is given further down).

Goals

The major goals for the system are:

To support repeatable & hermetic (i.e., fully deterministic) builds from source code.
To support "build transparency": the ability to verify that an output artifact (say the ISO image for an installer) had been verifiably built from a specific set of human-reviewed source files.
To support cross-building of source code.
To allow developers to securely share build cycles with each other (peer-to-peer sharing of their build effort).
To be frugal in its machine resource usage.
To offer a more expressive build language than make, that is also tooling friendly.

Terminology

Repeatable Builds

Local Repeatability

We consider a build to be locally repeatable when two attempts at building the same source tree on a given machine result in the 'same' effective bits being generated.

The word 'same' is used here to mean "logically indistinguishable when used by a subsequent build step".

Many compilation processes do not guarantee local repeatability out of the box. Some causes of local non-repeatibility include:

The presence of compilation timestamps in the output of a build step,
Embedded names of temporary files in generated output,
Differences caused by unstable orderings of inputs,
The use of special macros like __DATE__ and __TIME__,

and the like.

Prior work in making builds locally repeatable includes:

Reproducible Builds by the Debian project.
Reproducible Builds in FreeBSD from the FreeBSD project.
Reproducible Builds in NetBSD, from the NetBSD project.
Deterministic Builds at the Tor Project.

Some Elftoolchain tools produce locally repeatable output (this mode is usually specified using the -D option).

A locally repeatable build permits the build process to be treated as a pure function mapping a set of sources to a set of built artifacts.

Local repeatability is the first step towards sharing build cycles across multiple developers.

Cross-Machine Repeatability a.k.a. Global Reproducibility

In a globally reproducible build, builds on different machines generate the 'same' (i.e., logically indistinguishable) output. Two different machines are permitted to have different build execution graphs, as long as the final outputs are the 'same'. For example, the build on one machine could have additional instrumentation interleaved between its build steps, or could use tools that differ from the other, provided the end result is the 'same'.

Some sources of "non-local" non-determinism include:

The use of embedded (absolute) pathnames in generated output,
Build time dependencies on the time zone of the build machine,

and so on.

Global reproducibility is needed before we can share build cycles across machines.

Hermetic Builds

In a hermetic build all inputs to the build process is fully specified. These include:

The version of the compilation tools in use.
The configuration used when invoking compilation tools: invocation flags, linker scripts, and so on.
Each and every input that a given step in the build depends on. This includes the headers, libraries and configuration traditionally considered to be part of the base system, apart from the source code being used as input.

A hermetic build process guarantees that the build does not depend upon inputs that are "external" in some fashion (and are thus uncontrolled).

If the source code to the compilation tools is part of the source tree being built (e.g. as is true for NetBSD) then the build can be made hermetic through time. With a hermetic-through-time build we would be able to roll back a source tree to a prior revision and be able to reconstruct binaries using the exact toolchain that had been originally used at that revision.

Peer To Peer Builds

Once builds are hermetic and globally reproducible, we would be able to share builds across build peers.

Peer to peer builds

Sharing builds in this fashion would help speed up builds of large source trees (e.g., the equivalent of a make world or a run of build.sh). Our build utility could choose between either performing a local fully deterministic build from sources, or retrieving the already built artifact for that build step from its peers (or occasionally doing both, for verification).

The peer-to-peer build system could be made independent of the build front end, possibly implementing an existing API such as the Bazel remote execution API. This would both ease development, and would permit the peer-to-peer build network to be potentially used by multiple build front ends (e.g., Bazel or Buck, in addition to proposed build tool):

Related work: Bazel supports caching of build results using a (centralized) remote cache.

Herd Immunity and Decentralized Verification

Peer-to-peer builds could also be used to implement a form of herd immunity. Once we have multiple builders building the same source tree and sharing metadata about their builds, it would become possible to detect output artifacts that had been tampered with.

This tamper detection protocol could be used to verify, in a decentralized fashion, that the final artifact generated by a build (say the ISO image containing an OS installer) had not been tampered with locally.

Auditable Builds

This kind of build generates a (cryptographically signed) trail of the inputs, outputs and tools used for each build step.

This trail could be used to verify the build 'end to end'.

P2P Auditable Builds

In this document this term means an extension of auditable builds that share outputs and build trails in a P2P fashion.

Vetted Source

Source code that is linked to a 'is-reviewed' bit issued by a human.

Vetted Builds

These are builds whose source code inputs can be proven to have been vetted by one or more humans.

Transparent Builds

This is a build that is both auditable and vetted.

P2P Transparent Build

This is a transparent build which uses a P2P build-sharing backend.

Cross-Organisational Verifiability

Sometimes a software build would take for its input one or more binary blobs built by external organisations, for which source code is not available.

Ordinarily the presence of such binary blobs would preclude the build process from offering a meaningful guarantee of the integrity of the final product. Nevertheless, if the binary blob can be published along with a certificate that confirms that the blob had been verifiably built (or peer-verifiably built) from its source code, then that certificate could possibly be made part of the audit trail for the current build, improving the guarantee offered for its final output.

For the purposes of this document, a build process which has the ability to incorporate such an external certificate into its audit trail is said to offer cross-organisational verifiability.

TODO(jkoshy): Look for existing standards and protocols implementing cross-organisational verifiability, if these exist.

The Build Configuration Language

BSD operating systems are built from source using BSD make (e.g., the FreeBSD makeworld build process, or by using an additional tool that drives make, such as in NetBSD's build.sh).

Unfortunately, make's specification language is not expressive enough to implement a fully-deterministic build system of the type desired.

This section lists some of the traits desired in a build specification language.

At this point of time I am inclined to use a notation that is close to S-expressions, for the reasons outlined in some of the sections below.

Tooling friendliness of the notation

The build configuration language should be tooling-friendly. Large projects almost always need such tooling support. This means that the build configuration language should be easy to parse and modify programmatically.

Due to historical reasons Bazel and Buck use a syntax reminiscent of Python for their build rules. Other aspects, such as the naming of build dependencies, use a different (embedded) syntax. Extensions to the build system's rules are written using a Python-like extension environment; please see Starlark for Bazel, and Buck Macros for Buck.

# A Bazel/Buck example.
#
# The following rule invocation specifies the build process for library "libfoo".
cc_library(
    name = "foo",
    srcs = ["foo.c", "bar.c"],
    hdrs = ["foo.h"],
    deps = ["//some/package"],  # dependencies use a different sub-language
)

Although we could adopt this syntax (for continuity), I have a slight preference for a more uniform notation, say one that is based on S-expressions.

For example, using an SRFI-88/89-esque syntax:

 ; This stanza might define the build process for library "libfoo".
 ;
 ; This is a proposal only - this may not be the final syntax.
 (c_library
  name: "foo"
  srcs: ("foo.c" "bar.c")
  hdrs: ("foo.h")
  deps: (...))

Support for modularity

The build system should support the structuring of project code into logical modules.

Build configuration should also be modular: users should be able to write their build files without need to worry about conflicts between the symbols used in their configuration files and any symbols defined elsewhere.

Make, in contrast uses a single global namespace, with its attendant lack of modularity. Large projects then end up using recursive invocations of make, which all the drawbacks documented in Recursive make Considered Harmful (Peter Miller, 2008).

Separation of concerns

We would like the configuration used to specify build & test scenarios (i.e., different compilation modes: -c optimize vs -c debug or running tests under valgrind vs running them normally) to be separate from the configuration that specifies the dependencies between input and output artifacts. These two aspects of build configuration are intertwined in make.

We want users to be able to easily define their own scenarios for building and testing code.

Multiple logically connected build outputs per directory

The BSD /usr/share/mk/*.mk framework currently works best when a single sub-directory in a source tree builds one program or one library. This restriction can be fairly limiting.

We would like our build system to allow test suites to be co-located with code that they test. The build system should allow multiple logically related libraries and binaries to be built in a single source directory.

Expressivity

Builds for large projects would need a language for writing "build macros" and the like.

make's language is quite limited. BSD make allows certain rules to be designated as "macros", by using the .USE pseudo target as a dependency. BSD make also offers iteration using .for/.endfor and conditional execution using .if/.endif. However these and other extensions to the base "make" syntax do not fit together well, which means that specifying large systems with make remains awkward.

Bazel and Buck have their own "macro" definition languages; these are distinct from the build configuration language itself.

If we go the S-expression route, then a declarative subset of RnRS Scheme may be a natural choice for the extension language.

Other Features Desired For The Build System

Offline Operation

The build system should be able to work in 'offline' mode, without needing external network access to function (although builds would be entirely local in this case).

This feature would be useful for developers in areas of poor network connectivity.

Frugality

The build system should be able to work effectively in a resource constrained host environment (such as a laptop computer).

Bazel in particular seems to need a lot of memory when building projects.

Cross-platform Builds

The build system should support building code on a variety of (non-native) machine architectures and operating systems.

Cross building an operating system often involves compiling the cross compiler itself. The build tool should be able to handle these kinds of scenarios efficiently.

Building With Local Changes

The build system should be able to work efficiently on a source tree with a small number of local changes, sharing the build of the unmodified parts of the source tree with its peers.

Easing (Build) Portability

New target architectures should be easy to support, either through cross compilation, or by invoking the necessary build steps "remotely" using a small, portable helper that runs on the target hardware.

Support for Bootstrapping

The build tool should be usable when bootstrapping an operating system from source.

This means that a substantial part of the build system needs to be written in portable C, so that it can be built and can run in a cross-hosted OS bootstrap environment.

Sources

tools/build-automation

Deterministic Build Tools

The following general purpose build systems support deterministic builds:

Bazel (from Google).
Buck (from Facebook).

Monadic vs Applicative Build Systems

Bazel and Buck implement Applicative builds.

Looking over some of the steps involved in building BSD operating systems from source, it appears that we would need the full generality of Monadic builds; monadic build systems allow the structure of the build graph to be determined by the output of a previous build step. Please see "Applicative vs Monadic build systems" by Neil Mitchell for a discussion of the differences between the two kinds of build systems.

Open-source build systems that are monadic in nature include:

Shake, described by "Build Systems a la carte", Andrey Mokhov, Neil Mitchell, and Simon Peyton Jones. 2018. Build Systems à la Carte. Proc. ACM Program. Lang. 2, ICFP, Article 79 (September 2018).
Scons.
Redo.

Build Configuration Languages

Bazel and Buck use a Python-esque syntax for specifying build targets and their dependencies. Bazel and Buck also offer their own mini-languages for writing extensions: Starlark for Bazel, and Skylark for Buck.

Android's build system Soong uses a configuration language that is closely related to that used by Bazel.

Shake's build configuration language is Haskell, in its full generality. This makes it hard to write tooling to modify Shake configuration files.

Dune, O'Caml's build system, uses an S-expression based configuration language.

Other Build Automation: Bitten / Buildbot / Hudson

Bitten, Buildbot and Hudson are popular tools used for continuous integration.

Our project has constraints that come in the way of using these tools to automate our builds:

First, these build tools are not available for some of our target operating systems and machine architectures.
Second, their design uses a central 'master' controller that build 'slaves' connect to. This 'master' controller typically is the same host serving the SVN repository. In our setup, we do not have a way to run such a controller on SF.Net's SVN servers.
Third, since we would need to using virtual machines to be able to build and test on a range of OSes and architectures, we would need to additionally manage the virtual machines used for a build run.

crosstool-NG

crosstool-NG is a utility for building toolchains, offering a user-friendly way to configure and build a GNU cross toolchain, using a menuconfig-style configuration interface.

However, it does not address the issue that we face: that of invoking and managing builds on virtual machines running non-native operating systems.