Yup! We observed the same thing back before we built Depot. The act of saving/loading cache over a GHA network pretty much negated any performance gain from layer caching. So, we created a solution to persist cache to NVMe disks and orchestrate that across builds so it's immediately available on the next build. All the performance of layer caching without any network transfer.
The registry cache idea is a neat idea, but in practice suffers the same problem.
hopefully depot will reply, but from my perspective it is mostly laid out on their homepage. they are comparing against builds in other CI products that use network-backed disks, virtualized hardware, and don’t keep a layer cache around. Depot provides fast hardware and disks and is good at making the layer cache available for subsequent builds.
You could likely get very similar performance by provisioning a single host with good hardware and simply leverage the on-host cache.
Yeah that still holds true to some extent today with the GHA cache. Blacksmith colocates its cache with our CI runners, and ensures that they're in the same local network allowing us to saturate the NIC and provide much faster cache reads/writes. We're also thinking of clever ways to avoid downloading from a cache entirely and instead bind mount cache volumes over the network into the CI runner. Still early days, but stay tuned!
An application that is aware of the half dozen of so caching layers from register to platter can perform dramatically better than a naive program. Two wrinkles:
1) it needs to either be told the various sizes, speeds, and quirks on each server to make best use. (just some work)
2) it needs to coordinate with the other processes running on the system to divide up the resources. This is hard. Generally people bail and just assign some share of RAM and hope for the best with the other layers.
I got it working, with intermediate layers, too. All to find that I didn’t see that material a performance benefit after taking into account how long it takes to pull from and push to the cache.
Cache is insanely fast, orders of magnitude faster than ram, and basically instant compared to going to disk or another machine on the network. I would find it unlikely that they could overcome the added network latency introduced in such a system.
Right I wasn’t saying don’t cache network topology, IO is slow, cache it. That’s why I wondered why they didn’t have LRU or some other eviction policy to deal with a full cache, stale entries, etc.
The registry cache idea is a neat idea, but in practice suffers the same problem.
reply