Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I weep for this period of time where we don't have sticky disks readily available for builds. Uploading the layer cache each time is such a coarse and time-consuming way to cache things.

Maybe building from scratch all the time is a good correctness decision? Maybe stale values in disks is a tricky enough issue to want to avoid entirely?

If you keep a stack of disks around and grab a free one when the job starts you'd end up with good speedup a lot of the time. If cost is an issue you can expire them quickly. I regularly see CI jobs spending >50% of their time downloading the same things, or compiling the same things, over and over. How many times have I triggered an action that compiled the exact same sqlite source code? Tens of thousands?

Maybe this is fine, I dunno.



sort by: page size:

Yes, please do what Depot does and put fast persistent disks close to builds to cache docker layers. Github action runners and circleci and all the others adding expensive network calls to manually cache layers has always been such a time sink and I think moves lots of people to remove caching entirely.

I've spent days trying all of these solution at my company. All of these solutions suck, they are slow and only successful builds get their layers cached. This is a dead end. The only workable solution is to have a self-hosted runner with a big disk.

> Building Docker images in CI today is slow. CI runners are ephemeral, so they must save and load the cache for every build.

>...persistent disks significantly lowers build time

Does this mean your solution places specific caches, like bazel, node_modules, .yarn, and other intermediary artifacts onto a shared volume and reuses them among jobs?


All the CI/CD build agents with no cache and so on. This is a general problem for all tech. For the web, cache is cheap but as far as I know there is no equal way to cache builds as cheap.

I think there needs to be a redesign in how dependencies work in most programming languages. Deterministic builds have been such a game changer and I think that CPU vs bandwidth may be the next big area to explore when it comes to compiling code.


And the CI with many-GB Docker images is very painful... Turning on layer caching usually makes the process even slower as it needs to pull and unpack the previous image before starting, and if you turn it off you're downloading a ton of deps on every build.

If you separate the heavy stuff into a base-image you still have to load it on the beginning of CI, which without beefy machines with local SSD caching can take a loooong time.


There are a lot of things that waste cycles on a build machine, but they're worth the reproduciblity. I think that problem would be better solved via caching, ideally.

I would hazard a guess that there are far fewer people these days who download a tarball and `./configure` `make` `make install` than there are distros (who often need to patch) and developers (who will be working from git anyway).


I want those. Why would devs not want fast build times and incremental compilation through caching?

I'm reminded of the waste many CI sets do. Where they download the same set of dependencies at the same versions for each run. No effort put into caching thing.

There is so much bandwidth used because of that (I've seen the numbers for some projects and it's HUGE).


Or even have a dedicated build server/farm with object caching. At some point building on your own machine is not a great solution.

Just the build caching is worth the price of entry.

Gradle especially does a great job at this.


this is pretty neat—it’s been a while since i’ve tried caching layers with gha. it used to be quite frustrating.

my previous experience was that in nearly all situations the time spent sending and retrieving cache layers over the network wound up making a shorter build step moot. ultimately we said “fuck it” and focused on making builds faster without (docker layer) caching.


Don't forget ccache - storing the cache on a fast disk, it can easily speed up a build by 10x.

All great points but in practice, tools like Bazel and sccache are incredibly conservative about hashes matching, to include file path on disk and even env var state.

One goal of these tools is to guarantee that such misconfiguration results in a cache key mismatch, rather than a hit and a bug.

There are tons of challenges designing a remote build cache product, like anything, but that one has turned out to be a reliable truth.

Some other interesting insights:

- transmitting large objects is often not profitable, so we found that setting reasonable caps on what’s shared with the cache can be really effective for keeping transmissions small and hits fast

- deferring uploads is important because you can’t penalize individual devs for contributing to the cache, and not everybody has a fast upload link. making this part smooth is important so that everyone can benefit from every compile.

- build caching is ancient, Make does its own simple form of build caching, but the protocols for it vary in robustness greatly, from WebDAV in ccache to Bazel’s gRPC interface

- most GitHub Actions builds occur in a small physical area, so accelerating build artifacts is an easier problem than, say, full blown CDN serving

The assumptions that definitely help:

- it’s a cache, not a database; things can be missing, it doesn’t need strong consistency

- replication lag is okay because a build cache entry is typically not requested multiple times in a short window of time; the client that created it has it locally

- it’s much better to give a fast miss than a slow hit, since the compiler is quite fast

- it’s much better to give a fast miss than an error. You can NEVER break a build; at worst it should just not be accelerated.

It’s an interesting problem to work on for sure.


Probably everybody knows that each test run and build is downloading GB of data but they're doing it quickly, they don't cost much money or none at all, and it's easier to do it than setting up a local cache and use it (CI, local dev machines,) etc. The only reason I ever saw some optimization at that level was because building the base image took too long, so we were saving one and we were rebuilding it only when dependencies changed. I can't remember the details.

It's not that you have to, it's that you have many different builds that are going to stomp on each other's caches, plus your build services are often ephemeral - especially since I was at a small startup where we wanted to shut systems down overnight to keep the money.

Unfortunately, my team has some builds that take ~25 min without caching and maybe 2 min with caching.

I'm still not entirely sure why it's the case, but the connection to the package registry is incredibly slow, so downloading all dependencies takes forever.


I recently moved to docker build pipeline for a project, and it’s redownloading all deps on each source file change, unlike the efficient on disk incremental compilation, because of how docker layer caching works, so my usage skyrocketed (and my build times went from seconds to minutes).

Something like this might be exactly what we need, but we don't make money building distributed build artifact caching systems. If we don't make money doing it, we aren't doing it :(

"Is this good for the company?"


This is fine if you treat your CI provider as a "dumb shell runner". But good CI platforms have actually useful features and APIs (e.g. caching) and if you want to use them, a simple Makefile isn't going to work. For projects where the difference between a cold and warm cache build is tens of minutes, those features have meaningful quality of life improvements.

This may be a tradeoff you're ok with, but for a lot of people, it's not.

next

Legal | privacy