Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

Yup! We observed the same thing back before we built Depot. The act of saving/loading cache over a GHA network pretty much negated any performance gain from layer caching. So, we created a solution to persist cache to NVMe disks and orchestrate that across builds so it's immediately available on the next build. All the performance of layer caching without any network transfer.

The registry cache idea is a neat idea, but in practice suffers the same problem.



sort by: page size:

hopefully depot will reply, but from my perspective it is mostly laid out on their homepage. they are comparing against builds in other CI products that use network-backed disks, virtualized hardware, and don’t keep a layer cache around. Depot provides fast hardware and disks and is good at making the layer cache available for subsequent builds.

You could likely get very similar performance by provisioning a single host with good hardware and simply leverage the on-host cache.


Yeah that still holds true to some extent today with the GHA cache. Blacksmith colocates its cache with our CI runners, and ensures that they're in the same local network allowing us to saturate the NIC and provide much faster cache reads/writes. We're also thinking of clever ways to avoid downloading from a cache entirely and instead bind mount cache volumes over the network into the CI runner. Still early days, but stay tuned!

An application that is aware of the half dozen of so caching layers from register to platter can perform dramatically better than a naive program. Two wrinkles:

1) it needs to either be told the various sizes, speeds, and quirks on each server to make best use. (just some work)

2) it needs to coordinate with the other processes running on the system to divide up the resources. This is hard. Generally people bail and just assign some share of RAM and hope for the best with the other layers.


I got it working, with intermediate layers, too. All to find that I didn’t see that material a performance benefit after taking into account how long it takes to pull from and push to the cache.

Pretty sure the load time problem can be mitigated by caching.

A caching layer

No layer caching mechanisms

I know cases where caches on top of a kv store improved performance a lot. I don’t think it’s as simple as you claim.

Yes, it's a trade-off. But if you move towards frequent or continuous deployment, all your users could see a stale cache quite often.

There are other caching layers too.

Cache is insanely fast, orders of magnitude faster than ram, and basically instant compared to going to disk or another machine on the network. I would find it unlikely that they could overcome the added network latency introduced in such a system.

Edit: check this out for more info https://people.eecs.berkeley.edu/~rcs/research/interactive_l...


lack of cacheability is the main issue I've heard

I broadly agree, however running a cache/networking layer is difficult, and is handled well by other open sourced implementations.

If someone already wrote that networking layer, why would I want to do it again? And run into all of the bugs that they already discovered and solved?


I didn't say it's an cache system. I said that it effectively is a caching system. You won't notice much difference to a write back cache.

Much of the cache "management" can be done with specialist load/store instructions that skip the cache rather than being OS managed like a mapping.

The idea is you cache a rendering engine. It just isn’t core in the system layer

Right I wasn’t saying don’t cache network topology, IO is slow, cache it. That’s why I wondered why they didn’t have LRU or some other eviction policy to deal with a full cache, stale entries, etc.

What block layer cache?

Page cache and disk cache are quite shared between containers...
next

Legal | privacy