For security critical projects, it seems like it would make sense to try to set up the build infrastructure to error (or at least warn!) when binary files are being included in the build. This should be done transitively, so when linux distros attempted to update to this new version of liblzma, the build would fail (or warn) about this new binary dependency.
I don't know how common this practice is in the linux distro builds. Obviously if it's common, it would take a lot of work to clean up to make this possible, even if it's even possible in the first place. It seems like something that would be doable with bazel, but I'm not sure about other build systems.
> Bazel can skip compiling an output file if the hashes for its source code files + BUILD files have an artifact in the remote (or local) cache.
This is what I mentioned in a sibling comment: we need some way to identify binaries that doesn't rely on their hash. Using the hash of their source code files and build instructions is one way to identify things (Nix also does this, as well as those of any dependencies (recursively)). A different approach is to assign each binary an arbitrary name and version, which is what Debian packages do; although this is less automatic and is more prone to conflicting IDs.
> This requires reproducible builds or else you could introduce build errors when your build environment changes.
No, this only requires that builds are robust. For example, scripting languages are pretty robust, since their "build" mostly just copies text files around, and they look up most dependencies at run time rather than linking them (or a hard-coded path) into a binary. Languages like C are more fragile, but projects like autotools have been attempting to make their builds robust across different environments for decades. In this sense, reproducibility is just another approach to robustness.
Don't get me wrong, I'm a big fan of reproducibility; but caching build artefacts is somewhat tangential (although not completely orthogonal).
I don't really agree - I think its more the case that the build system should be able to prove that tests and test files cannot influence the built artifact. Any test code (or test binary files) going into the produced library is a big red flag.
Bazel is huge and complicated, but it allows making those kinds of assertions.
For Java and C++ binaries, yes, assuming you do not change the toolchain. If you have build steps that involve custom recipes (eg. executing binaries through a shell script inside a rule), you will need to take some extra care:
Do not use dependencies that were not declared. Sandboxed execution (–spawn_strategy=sandboxed, only on Linux) can help find undeclared dependencies.
Avoid storing timestamps in generated files. ZIP files and other archives are especially prone to this.
Avoid connecting to the network. Sandboxed execution can help here too.
Avoid processes that use random numbers, in particular, dictionary traversal is randomized in many programming languages."
Security stuff should probably have a Hermetic build. And it should be a one-liner to do the build and take a hash of the resulting binary. They should publish source, binaries, and the resulting hash - allowing anyone to easily verify and call them out if the hash doesn't match.
I have never seriously used bazel. Just looking at it, I found the the thought of manually mirroring all #includes to the build configuration in a big project pretty aggravating.
Is that not really an issue in practice? Do people automate that away?
It does but not in a way you probably expect it to be handled: i.e., with some filesystem mechanisms to make the update command only see what has been declared as target's dependencies -- I must admit I don't know Bazel does this in a cross-platform manner (Windows, Mac OS); copying seems way too heavy-handed.
In any case, in build2 this is "handled" by not including headers as "foo.h" but as <libfoo/foo.h>, that is, with the project prefix. You can read more on this here: https://build2.org/build2-toolchain/doc/build2-toolchain-int... And the proper fix will hopefully come with C++20 modules.
> Rather than needing custom-built infrastructure for every type of language supported
My understanding is that bazel is moving away from this, so that you can define toolchains by saying "here is a binary that serves the job of linking/compiling stuff".
The challenge with your idea is that you're basically saying "hey, we should sandbox and introspect <any number of fairly arbitrary and complex binaries> to intercept and modify their filesystem and network (at a minimum) accesses, across any number of versions and uses". Even just handling conditionally rewriting file writes/reads based on guessing whether something is an input or re-used output isn't that easy in general.
You're right - it doesn't magically solve build reproducibility. Bazel pushes you towards a build configuration where you have to describe (in a terse way) the entire dependency graph of what is being built. It allows Bazel to be smart about where in the graph things are stale.
If you run a script that outputs intermediate files, Bazel needs to know about that scripts inputs and outputs. And it works better if it knows them ahead of time.
You do control what version of libraries you're using. You include the exact SHA256 of every dependency of every dependency, down to the toolchain itself.
If you're saying "your distribution can't automatically update you if libc is vulnerable to something", that's true. More CPU time is required to react to major vulnerabilities, as everything has to be recompiled. However, it's not much CPU time, and the downsides of requiring more compute time are lower than the upsides of knowing exactly where your dependencies come from. And having your "getting started" instructions be "1) install bazel 2) bazel run //your:binary".
> Many of your other ones are byproducts of the fact that Bazel is primarily a build-from-source system. This has some benefits, particularly in a C++ ecosystem where binary compatibility across versions basically doesn't exist. But it also has some big drawbacks when it comes to compile times.
Nix, Guix and Spack packager managers solved the C++ ABI issue a long time ago already without the crazy needs of Bazel in term of resource consumption, integration and compile time. They even supports binary distributions for some of them.
I know that. But all of them are terrible options. I do not want to depend of SQLite, OpenSSL, libxml or whatever other system library compiled by Bazel, nor I want Bazel to takes 45 minutes to recompile them. Additionally, this will cause diamond dependency problem with other softwares that use Bazel artefact without compiling with Bazel.
> "bazel --copt=<your options here> :target"?
Can I use that to specify a flag to some target and not some other ? Without having to build sequentially each of them ?
Concrete example of the Maddness: SQLITE will not compile if you enable some options that would make tensorflow faster.... Bazel recursively compile both.
> What isn't reproducible? I tend to think of reproducibility as a strength of Bazel. Because all of your dependencies are explicit and Bazel fetches a known version, the build is less dependent on your system environment and whatever you happen to have installed there.
Bazel try to build in isolated environment but do half of the job.
It still depends on system compiler, and do not chroot nor "compiler-wrap" ( c.f Spack ) making the build still very vulnerable of system side effect and update.
> Disclosure: I am a Googler. I have some gripes with Bazel, but overall I think it gets some important ideas right. You have a BUILD file that is declarative, then any imperative code you need goes into separate .bzl files to define the rules you need.
I can understand that Bazel is very convenient in Google environment with Google resources. But it's a nightmare for everyone I talked to outside of Google.
It used to be the case that file hashing wasn't coordinated well with a concurrently edited file, so changing source while building corrupts your cache.
Bazel is more strict about what constitutes a dependency change. It doesn't use mtime on the file, it relies on the checksum. It also considers the command line flags to be part of the cache key.
So, spurious changes (touching a file) will result in a cache hit, while hidden changes (changing an environment flag used by Make) are caught.
This is particularly important if verifiable builds are needed for SoX compliance.
> Nix, Guix and Spack packager managers solved the C++ ABI issue a long time ago
Yes this is certainly something that is solvable at the package manager level also. And this approach will certainly have shorter compile times.
I agree it would be nice if Bazel integrated with package managers like this more easily. I hope Bazel adds support for this. There is a trade-off though: with less control over the specific version of your dependencies, there is a greater risk of build failure or bugs arising from an untested configuration. Basically this approach outsources some of the testing and bugfixing from the authors to the packagers. But it's a trade-off I know many people are willing to make.
> Can I use that to specify a flag to some target and not some other ? Without having to build sequentially each of them ?
You can put copts=["<opt>"] in the cc_library() rules in the BUILD file. This will give per-target granularity. You can add a select() based on compilation_mode if you need to define opt-only flags: https://docs.bazel.build/versions/master/configurable-attrib...
> Bazel try to build in isolated environment but do half of the job. It still depends on system compiler, and do not chroot nor "compiler-wrap" ( c.f Spack ) making the build still very vulnerable of system side effect and update.
Bazel allows you to define your own toolchain. This can support cross-compiling and isolating the toolchain I believe, though I don't have any direct experience with this: https://docs.bazel.build/versions/master/toolchains.html
> I can understand that Bazel is very convenient in Google environment with Google resources. But it's a nightmare for everyone I talked to outside of Google.
I hear you and I hope that we see some improvements to integrate better with package managers.
FWIW, I have been experimenting with auto-generating CMake from my Bazel BUILD file for my project https://github.com/google/upb. My plan is to use Bazel for development but have the CMake build be a fully supported option also for users who don't want to touch Bazel.
Ok, but that decision would have to be made by the project maintainer, in this case Google, not the person using Bazel to compile protobuf. (And not particular to Bazel -- a developer can make any build system effectively hermetic by vendoring everything.)
In my view the challenge here is that a dependency changes (e.g. /usr/include/stdio.h is upgraded by the system package manager, or two users sharing a cache have different versions of a system library) and Bazel doesn't realize that it needs to rebuild. It would be a pretty heavy hammer if the way to fix that requires every OSS project to include the whole user-space OS (C++ compiler, system headers, libraries) in the repo or via submodule and then be careful that no include path or any part of the build system accidentally references any header or library outside the repository.
And maybe this issue just doesn't need to be fixed (it's not like automake produces build rules that explicitly depend on system headers either!) -- my quibble was with the notion that Bazel, unlike CMake or whatever, provides fully hermetic builds, or tracks dependencies carefully enough to provide an organization-wide build cache across diversely configured/upgraded systems.
I think you missed something in the parent comment. Bazel can skip compiling an output file if the hashes for its source code files + BUILD files have an artifact in the remote (or local) cache. This requires reproducible builds or else you could introduce build errors when your build environment changes.
A tool for updating bazel build target dependencies. It inspects build files and source code, then adds/removes dependencies from build targets as needed. It requires using global include paths in C/C++ sources. It is not perfect, but it is pretty nice!
It tries, but it's really more of an operational benefit (i.e. works to your advantage to enable build traceability and avoid compile-time Heisenbugs, when you the developer can hold your workstation's build-env constant) than a build-integrity one (i.e. something a mutually-untrustworthy party could use to audit the integrity of your build pipeline, by taking random sample releases and building them themselves to the same resulting SHA — ala Debian's deterministic builds.)
Bazel doesn't go full Nix — it doesn't capture the entire OS + build toolchain inside its content-fingerprinting to track it for changes between builds. It's more like Homebrew's build env — a temporary sandbox prefix containing a checkout of your project, plus symlinks to resolved versions of any referenced libs.
Because of this, you might build, upgrade an OS package referenced in your build or containing parts of your toolchain, and then build again, Bazel (used on its own) doesn't know that anything's different. But now you have a build that doesn't look quite like it would if you had built everything with the newest version of the package.
I'm not saying you can't get deterministic builds from Bazel; you just have to do things outside of Bazel to guarantee that. Bazel gets you maybe 80% of the way there. Running the builds inside a known fixed builder image (that you then publish) would be one way to get the other 20%.
I have a feeling that Blaze is probably better for this, though, given all the inherent corollary technologies (e.g. objFS) it has within Google that don't exist out here.
I don't know how common this practice is in the linux distro builds. Obviously if it's common, it would take a lot of work to clean up to make this possible, even if it's even possible in the first place. It seems like something that would be doable with bazel, but I'm not sure about other build systems.
reply