Hacker Read

Hacker Read top | best | new | newcomments | leaders | about | bookmarklet

login

		Cache is King: A guide for Docker layer caching in GitHub Actions (blacksmith.sh) similar stories update story
		8 points by adityamaru \| karma 115 \| avg karma 7.19 2024-04-06 22:42:39 \| hide \| past \| favorite \| 111 comments

view as:

notnmeyer | karma 214 | avg karma 1.41 2024-04-07 02:20:35 | [–] similar comments

this is pretty neat—it’s been a while since i’ve tried caching layers with gha. it used to be quite frustrating.

my previous experience was that in nearly all situations the time spent sending and retrieving cache layers over the network wound up making a shorter build step moot. ultimately we said “fuck it” and focused on making builds faster without (docker layer) caching.

adityamaru | karma 115 | avg karma 7.19 2024-04-07 03:16:44 | [–] similar comments

Yeah that still holds true to some extent today with the GHA cache. Blacksmith colocates its cache with our CI runners, and ensures that they're in the same local network allowing us to saturate the NIC and provide much faster cache reads/writes. We're also thinking of clever ways to avoid downloading from a cache entirely and instead bind mount cache volumes over the network into the CI runner. Still early days, but stay tuned!

notnmeyer | karma 214 | avg karma 1.41 2024-04-07 14:09:37 | [–] similar comments

there’s probably a cool consistent hashing solution where jobs are routed to a host that that is likely to have the cache stored locally already and can be mounted into the containers.

kylegalbraith | karma 425 | avg karma 1.69 2024-04-07 06:39:16 | [–] similar comments

Yup! We observed the same thing back before we built Depot. The act of saving/loading cache over a GHA network pretty much negated any performance gain from layer caching. So, we created a solution to persist cache to NVMe disks and orchestrate that across builds so it's immediately available on the next build. All the performance of layer caching without any network transfer.

The registry cache idea is a neat idea, but in practice suffers the same problem.

notnmeyer | karma 214 | avg karma 1.41 2024-04-07 14:13:45 | [–] similar comments

totally, your approach is the right one and anything reasonable is going to focus on collocating the cache as close as possible to where the build runs.

maxmcd | karma 1735 | avg karma 3.64 2024-04-07 02:49:46 | [–] similar comments

I weep for this period of time where we don't have sticky disks readily available for builds. Uploading the layer cache each time is such a coarse and time-consuming way to cache things.

Maybe building from scratch all the time is a good correctness decision? Maybe stale values in disks is a tricky enough issue to want to avoid entirely?

If you keep a stack of disks around and grab a free one when the job starts you'd end up with good speedup a lot of the time. If cost is an issue you can expire them quickly. I regularly see CI jobs spending >50% of their time downloading the same things, or compiling the same things, over and over. How many times have I triggered an action that compiled the exact same sqlite source code? Tens of thousands?

Maybe this is fine, I dunno.

parentheses | karma 991 | avg karma 1.6 2024-04-07 02:54:14 | [–] similar comments

I agree. The notion that everything must be docker is nice in principle but requires a lot of performance optimization work early on. Earlier than one would need with "sticky disks" as you called them.

adityamaru | karma 115 | avg karma 7.19 2024-04-07 03:11:18 | [–] similar comments

This is exactly the sort of insight that led us to work on Blacksmith. Since we own the hardware we run CI jobs on there are some exciting things we can do to make these "sticky disks" work the way you describe it. Stay tuned!

withinboredom | karma 7494 | avg karma 1.75 2024-04-07 04:17:16 | [–] similar comments

Interesting.

I remember working on a project where the first clean build would always fail, and only incremental builds could succeed. I was a junior at the time, so this was 15-20 years ago. I remember spending some time trying to get it to succeed from a clean build and my lead pulling me aside: he said it was an easy fix, but if we fixed it, the ops guys would insist on building from scratch for every build. So please, stop.

Personally, unless you have an exotic build env, it’s usually faster and easier to simply build in the runner. If you need a docker image at the end, build a dockerfile that simply copies the artifacts from disk.

aayushshah15 | karma 20 | avg karma 1.33 2024-04-07 05:35:18 | [–] similar comments

A subtle challenge with "sticky disks" is that it requires your workflow steps to be idempotent beyond the point of "resumption", which can be tricky in a lot of cases.

kylegalbraith | karma 425 | avg karma 1.69 2024-04-07 06:36:46 | [–] similar comments

Couldn't agree more. Somewhere, we lost the concept of disks in CI unless you run it yourself, and a lot of build tools could benefit from having them.

We came to the same conclusion and built Depot around this exact workflow for the Docker image build problem. We're now bringing that same tech into GitHub Actions workflows.

cpfohl | karma 1544 | avg karma 3.62 2024-04-07 03:27:34 | [–] similar comments

This is wild. I've spent the last three weeks working on this stuff for two separate clients.

Important note if you're taking advice: cache-from and cache-to both accept multiple values. Cache to just ouputs the cache data to all the ones specified. cache-from looks for cache hits in the sources in-order. You can do some clever stuff to maximize cache hits with the least amount of downloading using the right combination.

adityamaru | karma 115 | avg karma 7.19 2024-04-07 03:39:28 | [–] similar comments

oh TIL, that is interesting

Arbortheus | karma 453 | avg karma 4.49 2024-04-07 11:37:40 | [–] similar comments

That’s a great idea.

user- | karma 341 | avg karma 3.55 2024-04-07 03:35:31 | [–] similar comments

I feel like ive seen so many new companies just providing cheaper github actions.

clintonb | karma 1389 | avg karma 1.91 2024-04-07 14:50:48 | [–] similar comments

They provide Actions _runners_ because GitHub runners are quite expensive (per CPU and GB of memory) compared to the underlying cost of a Kubernetes node on most cloud providers or bare metal. Of course, that assumes you’ve already paid the cost to setup a cluster, which is not free.

aayushshah15 | karma 20 | avg karma 1.33 2024-04-07 03:41:26 | [–] similar comments

(blacksmith co-founder here)

it's unfortunate the amount of expertise / tinkering required to get "incrementalism" in docker builds in github actions. we're hoping to solve this with some of the stuff we have in the pipeline in the near future.

damianh | karma 48 | avg karma 3.0 2024-04-07 04:56:57 | [–] similar comments

The fact that GitHub don't provide a better solution here has to be actually costing them money with the network usage and extra agent time consumed. Right?

aayushshah15 | karma 20 | avg karma 1.33 2024-04-07 05:01:00 | [–] similar comments

GitHub has perverse incentives to not fix this problem because they charge customers based on usage (by the minute), so they make more money by providing slower builds to end-users.

krainboltgreene | karma 1244 | avg karma 1.24 2024-04-07 06:04:40 | [–] similar comments

They've also just completely refocused to AI in the last two years thanks to the microsoft/ChatGPT situation.

remdoWater | karma 2 | avg karma 0.67 2024-04-07 03:46:15 | [–] similar comments

title suggestion: Cache Rules Everything Around Me (C.R.E.A.M.)

boronine | karma 173 | avg karma 2.19 2024-04-07 03:48:15 | [–] similar comments

I've spent days trying all of these solution at my company. All of these solutions suck, they are slow and only successful builds get their layers cached. This is a dead end. The only workable solution is to have a self-hosted runner with a big disk.

aayushshah15 | karma 20 | avg karma 1.33 2024-04-07 03:52:36 | [–] similar comments

how do you ensure isolation between runs on a self hosted runner that way?

boronine | karma 173 | avg karma 2.19 2024-04-07 03:56:16 | [–] similar comments

What kind of isolation do you need? We are building our own code so I don't see the need for isolation beyond a clear directory.

airspeedjangle | karma 6 | avg karma 1.5 2024-04-07 04:02:25 | [–] similar comments

Shared runner infrastructure in a big company. It's pretty common to treat these situations as multi-tenant low trust environments.

damianh | karma 48 | avg karma 3.0 2024-04-07 04:54:36 | [–] similar comments

Plenty of marketplace actions will install things and/or mutate the runner. It's a matter of time before someone does something or there's a build that doesn't cleannup after itself (e.g. leaving test processes running) that ruins the day for everyone else.

damianh | karma 48 | avg karma 3.0 2024-04-07 04:02:28 | [–] similar comments

Selh-hosted runners can be ephemeral too. With such either mount the cache as a disk or bake docker layers/images into the runner image.

remdoWater | karma 2 | avg karma 0.67 2024-04-07 04:21:24 | [–] similar comments

This requires a lot of work from a dev inf team, though. Not as straightforward for an average team.

damianh | karma 48 | avg karma 3.0 2024-04-07 04:49:36 | [–] similar comments

I won't disagree. It should be easier imo. I guess this is why a cottage industry has sprung up addressing such e.g. https://news.ycombinator.com/item?id=39930908

intelVISA | karma 2034 | avg karma 1.58 2024-04-07 16:34:39 | [–] similar comments

Now I've seen everything...

boundlessdreamz | karma 4595 | avg karma 5.44 2024-04-07 05:00:28 | [–] similar comments

runs-ons supports custom images - https://runs-on.com/features/byoi/ and caching to S3 - https://runs-on.com/reference/caching/

I haven't used it yet but these two features make it the clear favourite for me in alternate github action runners

ownagefool | karma 2043 | avg karma 2.14 2024-04-07 07:02:55 | [–] similar comments

It's actually pretty easy.

Setup GitHub app. Install the arc helm charts. Install a buildkitd statefulset.

Update parans on build to use buildkitd.

That's not to say there aren't better caching strategies, but a really basic ephemeral setup is right there.

withinboredom | karma 7494 | avg karma 1.75 2024-04-07 04:04:36 | [–] similar comments

If you are in k8s, you can use the default chart provided by GitHub and get 90% of the way there.

teaearlgraycold | karma 2842 | avg karma 1.71 2024-04-07 04:03:24 | [–] similar comments

I use namespace’s action runners for this (just a customer, not affiliated in any way). They’re a company with a pretty good product stack. Although the web UI is annoyingly barebones.

20thr | karma 15 | avg karma 1.15 2024-04-07 07:36:19 | [–] similar comments

Hi -- Namespace's CEO here; if you have a chance, please drop me a note at hugo-at-namespacelabs.com; I'd love to hear what we could be doing better in the UI, and product overall. Thank you!

Hugo @ Namespace (https://namespace.so)

kylegalbraith | karma 425 | avg karma 1.69 2024-04-07 06:04:37 | [–] similar comments

This is definitely a direction to try. But if its faster Docker image builds and a layer caching system that actually works, you should definitely try out Depot. We automatically persist layer cache to persistent NVMe devices and orchestrate that to be immediately available across builds.

paholg | karma 332 | avg karma 2.39 2024-04-08 18:31:14 | [–] similar comments

If you have the ability to choose the tooling, I recommend looking into building docker images with nix.

Yes, nix is complex. But its caching story is soooo much better than docker's, and all the other docker issues just disappear.

https://nix.dev/tutorials/nixos/building-and-running-docker-...

mysza | karma 1 | avg karma 1.0 2024-04-10 07:25:49 | [–] similar comments

Can you share example of github actions? When i use docker/setup-buildx-action and local runner i can't make it use the cache. I think it's the "docker-container" runner's fault

solatic | karma 6845 | avg karma 4.06 2024-04-07 04:11:42 | [–] similar comments

As someone who spent way too much time chasing this rabbit, the real answer is Just Don't. GitHub Actions is a CI system that makes it easy to get started with simple CI needs but runs into hard problems as soon as you have more advanced needs. Docker caching is one of those advanced needs. If you have non-trivial Docker builds then you simply need on-disk local caching, period.

Either use Depot or switch to self-hosted runners with large disks.

adityamaru | karma 115 | avg karma 7.19 2024-04-07 04:14:12 | [–] similar comments

totally agree, github actions has done an excellent job at this lowest layer of the build pipeline today but is woefully inadequate the minute your org hits north of 50 engineers

aayushshah15 | karma 20 | avg karma 1.33 2024-04-07 04:16:48 | [–] similar comments

Did you consider using a local (in the same VPC) docker registry mirror perhaps? https://docs.docker.com/docker-hub/mirror/

solatic | karma 6845 | avg karma 4.06 2024-04-07 05:39:11 | [–] similar comments

It's not the pulls that are the problem, it's caching intermediate layers from the build that is the problem. As soon as you introduce a networked registry, the time it takes to pull layers from the registry cache and push them back to the registry cache are frequently not much better than simply rebuilding the layers, not to mention the additional compute/storage cost of running the registry cache itself.

It's just a problem that requires big, local disks to solve.

DanielHB | karma 747 | avg karma 1.77 2024-04-08 06:23:50 | [–] similar comments

Yeah I had the exact same problem and came to the same conclusion.

bushbaba | karma 3328 | avg karma 2.7 2024-04-07 05:40:28 | [–] similar comments

Can’t you use s3 + mountpoint for most distributed CI cache needs?

pas | karma 7438 | avg karma 1.12 2024-04-07 10:01:41 | [–] similar comments

if you want fast builds it's worth spinning up a buildkit server on a beefy dedicated server.

docker/nerdctl only transfers the context, everything else is cached on the builder. it's very useful for monorepos (where you usually want to build and tag images for every tested commit)

and the builder directly pushes the images/tags/layers to the registry. (which can be just a new tag for already existing layer.)

a noop job is about 2 sec on GitLab CI this way.

yeswecatan | karma 260 | avg karma 1.58 2024-04-07 20:41:09 | [–] similar comments

i haven't looked into setting up a buildkit server. would it be easier to just attach an ebs volume?

pas | karma 7438 | avg karma 1.12 2024-04-08 11:42:37 | [–] similar comments

you mean run an EC2 instance with EBS as buildkit server storage dir?

sure, it should work nicely. (I just prefer the local disk, it's just a cache after all.)

kylegalbraith | karma 425 | avg karma 1.69 2024-04-07 06:02:11 | [–] similar comments

Thanks for the shout-out regarding Depot, really appreciate it. We came to the same conclusion regarding Docker layer cache and thus why we created Depot in the first place. The limitations and performance surrounding GitHub Actions cache leaves a lot to be desired.

fireflash38 | karma 615 | avg karma 2.58 2024-04-07 15:46:52 | [–] similar comments

Quick glance showed no, but is there no purely local for depot? It's all cloud based?

matsemann | karma 20700 | avg karma 6.84 2024-04-07 06:34:17 | [–] similar comments

> as soon as you have more advanced needs

If there's one thing I've learned over the years, is that we really seldom have advanced needs. Mostly we just want things to work a certain way, and will fight systems to make it behave so. It's easier to just leave it be. Like maven vs gradle; yes, gradle can do everything, but if you need that it's worth taking a step back and assess why the normal maven flow won't work. What's so special with our app compared to the millions working just fine out of the box?

kbolino | karma 1070 | avg karma 1.61 2024-04-07 14:56:18 | [–] similar comments

It has been a few years, but last I recall, the key advantage of Gradle over Maven was not power so much as brevity. Doing many things in Maven required a dozen nested XML tags, while doing the same thing in Gradle was often a one-liner.

stackskipton | karma 878 | avg karma 3.48 2024-04-07 18:38:15 | [–] similar comments

I'm sad as DevOps Engineer I only have one upvote to give. YAGNI needs to be every team motto.

We tried caching at several companies. Outside node builds, it was never worth it. Horray, our .Net builds took 15 seconds instead of 4 Minutes. Eventually you realized no one cared since we averaged deployments every 4 days outside of outages and time being burned by it just wasn't there.

maccard | karma 10305 | avg karma 2.29 2024-04-11 13:39:26 | [–] similar comments

Pulling my entire repository clean from source control takes longer than your entire uncached pipeline.

A clean build on a 16 core machine with an SSD and a GB network is about 4 hours including checkout.

Our cached builds are 15 minutes including deployment.

mhitza | karma 1909 | avg karma 2.46 2024-04-07 08:05:45 | [–] similar comments

On one project that was a bit more involved, I pulled the latest image I've built from the registry before starting the build. That worked well enough for caching in my case.

ithkuil | karma 7515 | avg karma 2.06 2024-04-07 09:10:20 | [–] similar comments

Huge shout-out to depot! It works really well!

cqqxo4zV46cp | karma 1240 | avg karma 1.26 2024-04-07 09:29:40 | [–] similar comments

I got it working, with intermediate layers, too. All to find that I didn’t see that material a performance benefit after taking into account how long it takes to pull from and push to the cache.

cpuguy83 | karma 1728 | avg karma 2.13 2024-04-07 15:16:01 | [–] similar comments

You might want to try the actions cache "--cache-to=gha --cache-from=gha", but still it needs to pull that stuff down, just that locality is likely better here.

There's also an action out there "GitHub cache dance" that will stick your whole buildkit state dir into the gha cache.

yeswecatan | karma 260 | avg karma 1.58 2024-04-07 20:43:43 | [–] similar comments

how large is your cache and how long does the pull/push take?

candiddevmike | karma 12368 | avg karma 4.09 2024-04-07 11:34:17 | [–] similar comments

Additionally, docker build refuses to cause any side effects to the host system. This makes any kind of caching difficult by design. IMO, if possible, consider doing your build outside of docker and just copying it into a scratch container...

cpuguy83 | karma 1728 | avg karma 2.13 2024-04-07 15:10:15 | [–] similar comments

I'm not sure what you mean here. "RUN --mount type=cache,dest=/foo" is exactly for keeping a persistent cache on the host.

Arbortheus | karma 453 | avg karma 4.49 2024-04-07 11:35:26 | [–] similar comments

GitHub really need to invest in their CI. It is a second-class feature in the platform, but should be the beating heart of every SaaS team.

GitLab CI is leaps and bounds ahead.

zer00eyz | karma 5292 | avg karma 2.98 2024-04-07 13:11:34 | [–] similar comments

Linked in is playing with twitch like video.

Zoom is adding in email.

Years ago I worked for a bank. You know what happens if you set up bill pay with a bank? You're unlikely to end that relationship. Because who the fuck wants to do all that work to move.

Your labor, your suffering (cause setting up bill pay sucks) is an egress fee.

If you have GitHub acting as anything other than your public facing code repo you're locking yourself into the platform. Bug tracking, code review, CI pipelines, GitHub features that are going to keep you from moving quickly if you need to change providers.

CSSer | karma 1673 | avg karma 2.5 2024-04-07 13:37:50 | [–] similar comments

The funny thing about this is that as far as most software engineers are concerned these things are generic competencies. As long as the price isn’t egregious and the feature-set is rich, we really don’t and shouldn’t care if we’re locked in for this. Some tools do belong together, and most people’s job in this sector shouldn’t be to spend half their time fiddling with devOps/project management tools, it should be to make/fix software. If you don’t believe me, consider that even in the scenario that you describe, any VCS platform is ultimately going to require a robust API to support integrations with other tools anyway, which will be orders of magnitude more difficult to accomplish than decent, built-in reasonable ops/pm features. This is coming from the person who typically agrees with you about lock-in. I’m afraid in this case your approach gets you JIRA and https://ifuckinghatejira.com/

unshavedyak | karma 1724 | avg karma 2.48 2024-04-07 16:34:56 | [–] similar comments

Tangent, boy i love that site's design. Simple, elegant, animations feel like they layer on-top of the primary UX (ie they add to the text. Rather than the text being delayed for the purpose of showing some fancy animation).

flatline | karma 5341 | avg karma 4.29 2024-04-07 13:49:08 | [–] similar comments

I’ve migrated between devops platforms multiple times on multiple projects. The barrier is not really that high, and the cost of losing some data is relatively low. You can script most of it or pay a small fee for a user friendly plugin. There are lots of roughly equivalent options, some of them free. It’s nothing like, say, migrating between cloud providers.

LtWorf | karma 1639 | avg karma 1.22 2024-04-07 16:38:35 | [–] similar comments

Well before github had a CI everybody used travis for free from it. Then they killed the free tier and people just started to switch.

It's trivial to switch from github to codeberg for example… So I don't think it's that bad to be honest.

gigatexal | karma 6313 | avg karma 1.28 2024-04-07 15:40:23 | [–] similar comments

Yeah that’s my thought as well: this is something GitHub is supposed to do. Keep it simple on the users and leave the hard stuff to the creators/runners of the tool

asmor | karma 2288 | avg karma 2.43 2024-04-08 11:07:35 | [–] similar comments

Always love to shock more people with the random fact that GitHub Actions is Azure DevOps Pipelines in a trenchcoat (and Azure Pipelines is seemingly abandoned / in maintenance mode now).

The runner code is on GitHub, and it's not pretty. In fact last time I ran it it had trouble generating stable exit codes.

codethief | karma 4183 | avg karma 2.01 2024-04-08 11:33:41 | [–] similar comments

Wait, really? We have been using Gitlab CI for a few years and it's awful. I run into bugs and surprises (missing features etc.) on the daily.

Now I don't even want to imagine what Github Actions are like…

deepsun | karma 3058 | avg karma 1.77 2024-04-08 12:25:29 | [–] similar comments

GitLab were the first to introduce built-in CI. GitHub followed their lead once it became a decision point for many.

crote | karma 7004 | avg karma 4.39 2024-04-08 01:59:57 | [–] similar comments

> GitHub Actions is a CI system that makes it easy to get started

It's not even that! Coming from GitLab I was quite surprised at how poor the "getting started" experience was. Rather than a simple "on push, run command X" you first have to do a deep dive into actions/events/workflows/jobs/runs, and then figure out what kind of weird tooling is used for trivial things like checking out your code, or storing artifacts.

And then you try to unify your pipeline across several projects because that's what Github is heavily promoting with the whole "uses: actions/checkout" reuse thing - but it turns out to be a huge hassle to get it working because nothing works the way you'd expect it to work.

In the end I did get GHA to do what I was already doing in GitLAb, but it took me ten times as long as it did originally setting it up. I believe GHA is flexible and powerful enough to be well-suited for medium-sized companies, but it's neither easy enough for small companies, nor powerful enough for large companies. It's one of the few Github features I genuinely dislike using.

lispisok | karma 705 | avg karma 3.96 2024-04-08 18:00:04 | [–] similar comments

I use self-hosted runners. It wasnt even because we could have large disk for caching. Github pricing for their runners is so bad it was a no brainer to host our own.

manx | karma 1252 | avg karma 2.68 2024-04-07 05:15:00 | [–] similar comments

Earthly solves this really well: https://earthly.dev

They rethink Dockerfiles with really good caching support.

oftenwrong | karma 14616 | avg karma 6.35 2024-04-07 14:49:16 | [–] similar comments

The caching support is mostly the same. Both Earthly and Dockerfile are BuildKit frontends. BuildKit provides the caching mechanisms.

A possible exception is the "auto skip" feature for Earthly Cloud, since I do not know how that is implemented.

adamgordonbell | karma 4986 | avg karma 7.19 2024-04-07 16:51:36 | [–] similar comments

Also CACHE keyword, for cache mounts. Makes incremental tools like compilers work well in the context of dockerfiles and layer caches.

That can extend beyond just producing docker iamges as well. Under the covers the CACHE keyword is how lib/rust in Earthly makes building Rust artifacts in CI faster.

https://github.com/earthly/earthly/issues/1399

vladaionescu | karma 583 | avg karma 8.1 2024-04-07 18:15:06 | [–] similar comments

I would add that 1. Earthly is meant for full CI/CD use-cases, not just for image building. We've forked buildkit to make that possible. And 2. remote caching is pretty slow overall because of the limited amount of data you can push/pull before it becomes performance-prohibitive. We have a comparison in our docs between remote runner (e.g. Earthly Satellites) vs remote cache [1].

[1]: https://docs.earthly.dev/docs/caching#sharing-cache

ValtteriL | karma 277 | avg karma 3.79 2024-04-07 05:23:32 | [–] similar comments

Docker layer caching is one of the reasons I moved to Jenkins 2 years ago and have been very happy with it for the most part.

I only need to install utils once and all build time goes to building my software. It even integrates nicely with Github. Result: 50% faster feedback.

However, it needs a bit initial housekeeping and discipline to use correctly. For example using Jenkinsfiles is a must and using containers as agents is desirable.

adityamaru | karma 115 | avg karma 7.19 2024-04-07 05:26:31 | [–] similar comments

do you self host your jenkins deployment in your AWS account?

ValtteriL | karma 277 | avg karma 3.79 2024-04-07 07:14:23 | [–] similar comments

Self host

remdoWater | karma 2 | avg karma 0.67 2024-04-07 05:26:55 | [–] similar comments

what do you mean by discipline here?

ValtteriL | karma 277 | avg karma 3.79 2024-04-07 07:27:10 | [–] similar comments

Basically using exclusively declarative pipelines with Jenkinsfiles in SCM, avoiding cluttering Jenkins with tools aside from docker, keeping Jenkins up to date and protected with proper auth.

Jenkins is the most flexible automation platform and its easy to do things in suboptimal ways (eg. Configuring jobs using the GUI).

There's also a way to configure Jenkins the IaC way and I am hoping to dig into that at some point. The old way requires manual work that instictly feels wrong when automating everything else.

dboreham | karma 16160 | avg karma 2.32 2024-04-07 05:36:43 | [–] similar comments

No!

So much time spent debunking such broken "caching" solutions.

Computers are very fast now. Use proper package/versioning systems (part of the problem here is that those are often also broken/badly designed).

aayushshah15 | karma 20 | avg karma 1.33 2024-04-07 05:44:08 | [–] similar comments

This is simply false. For starters, GitHub actions by default run on Intel Haswell chips from 2014 (in some cases). Secondly, hardware being faster doesn't obviate the need for caching, especially for docker builds where your layer pulls are purely network bound.

kbolino | karma 1070 | avg karma 1.61 2024-04-07 15:06:43 | [–] similar comments

"Computers are very fast now" is largely because of caching. The CPU has a cache, the disk drive has a cache, the OS has a cache, the HTTP client has a cache, the CDN serving the content has a cache, etc. There may be better ways to cache than at the level of Docker image layers, but no caching is the same as a cache miss on every request, which can be dozens, hundreds, or even thousands of times slower than a cache hit.

tanepiper | karma 793 | avg karma 2.48 2024-04-07 05:39:35 | [–] similar comments

I have this set up in our pipeline, we also build the image early and use assets to move it between jobs. We've also just switched to self-hosted runners, so might look into shared disk.

But in the long run, as annoying as it is out build pipelines reduced but quite a few minutes per build.

adityajp | karma 6 | avg karma 0.55 2024-04-07 05:47:47 | [–] similar comments

(Co-founder of Blacksmith here)

Glad it worked really well for you.

What made you switch to self-hosted runners?

jpgvm | karma 9485 | avg karma 4.45 2024-04-07 06:18:28 | [–] similar comments

The trick to Docker (well OCI) images is never under any circumstance use `docker build` or anything based on it. Dockerfile is your enemy.

Use tools like Bazel + rules_oci or Gradle + jib and never spend time thinking about image builds taking time at all.

dindresto | karma 561 | avg karma 3.19 2024-04-07 10:06:00 | [–] similar comments

+1 to this, migrating our build setup to Nix + nix2container decreased our pipeline duration for incremental changes by a lot, thanks to Nix's granular caching abilities.

jpgvm | karma 9485 | avg karma 4.45 2024-04-07 11:19:06 | [–] similar comments

Yeah I really need to actually sit down and learn Nix, seems like it can solve this in a more general way for cases where the thing you want to run is packaged for Nix already.

Arbortheus | karma 453 | avg karma 4.49 2024-04-07 11:43:25 | [–] similar comments

Please no! Do not use Bazel unless you have a platform team with multiple people who know how to use it - e.g. large Google-like teams.

We had “the Bazel guy” in our mid-sized company that Bazelified so many build processes, then left.

It has been an absolute nightmare to maintain because no normal person has any experience with this tooling. It’s very esoteric. People in our company have reluctantly had to pick up Bazel tech debt tasks, like how the rules_docker package got randomly deprecated and replaced with rules_oci with a different API, which meant we could no longer update our Golang services to new versions of Go.

In the process we’ve broken CI, builds on Mac, had production outages, and all kinds of peculiarities and rollbacks needed that have been introduced because of an over-engineered esoteric build system that no one really cares about or wanted.

jpgvm | karma 9485 | avg karma 4.45 2024-04-07 11:58:52 | [–] similar comments

Bazel isn't for everyone which is why I suggested using any similar tool, jib, Nix, etc. Just not Dockerfile (or if you are going to use Dockerfile only use ADD).

Also just because you don't have experience with something doesn't make it a bad choice. I would recommending understanding it first, why your coworker chose it and how other tools would actually do in the same role, grass is often greener on the other side until you get there.

Personally I went through a bit of an adventure with Bazel. My first exposure to it was similar to yours, was used in a place I didn't understand for reasons I didn't understand, broke in ways I didn't understand and (regretfully) didn't want to spend time understanding.

The reality was once I sat down to use it properly and understood the concepts a lot of things made sense and a whole bunch of very very difficult things became tractable.

That last bit is super important. Bazel raises the baseline effort to do something with the build system, which annoys people that don't want to invest time in understanding a build system. However it drastically reduces the complexity of extremely difficult things like fully byte for byte reproducible builds, extremely fast incremental builds and massive build step parallelization through remote build execution.

humanrebar | karma 12057 | avg karma 2.44 2024-04-07 13:27:16 | [–] similar comments

If you're saying to use a proper dependency management system (package manager or monorepo build system) and keep Docker to mostly dumb installs, I agree.

Though I also think Nix and Bazel are typically not the right starting points for most projects. If you're not committed to having at least four experts on those complex tools in perpetuity, find something simpler.

To be clear, inventing your own, better system is typically as bad. Language specific ecosystems can be too, but it's often hard to avoid both Maven and gradle if you're a Java shop, for instance.

jpgvm | karma 9485 | avg karma 4.45 2024-04-07 13:35:05 | [–] similar comments

Yeah it's a conundrum. The easiest time to adopt something like Bazel/Buck/etc is at the start. However that is when tools like that provide the least value which given their additional friction isn't a good trade-off.

I recently started a side project and decided to do the whole thing using Gradle instead of Bazel. Essentially committing to writing absolutely everything in Kotlin instead of going for a multi-language setup with a more complex build tool. However Kotlin is a bit special in that regard because with multi-platform you can build for server/desktop/mobile/web all in one language with one build system.

jonathanlydall | karma 1476 | avg karma 4.09 2024-04-07 13:42:59 | [–] similar comments

Lack of experience is a perfectly valid and often very rationale reason for something being a bad choice, especially when considering upskilling costs and possible challenges in finding new hires proficient in the chosen technology.

The new technology needs to be sufficiently better than the existing to justify the investment or ongoing additional cost, and not just “has more features”, it should be solving problems which may otherwise not be reasonably solvable.

In a past job we had an incident where a dev had unilaterally decided to develop parts of a .NET project in F#, when the contract was for a C# project to be ultimately handed over to the client.

This was a run of mill back-end API, there were no interesting requirements that could possibly justify saddling the client with a need to hire for a relatively niche language.

The dev in question had this general view that F# was an underrated language and technically better than C# and if other devs would just give it a chance, they’d see for themselves.

What they totally ignored is that hiring C# devs is super easy here, F# though, not so much.

baobun | karma 217 | avg karma 1.87 2024-04-10 10:11:27 | [–] similar comments

> or if you are going to use Dockerfile only use ADD

So we should do

    ADD https://example.com/fetch-and-install.sh
    RUN sudo fetch-and-install.sh

instead of

    COPY fetch-and-install.sh .
    RUN sudo fetch-and-install.sh

glenjamin | karma 1315 | avg karma 3.49 2024-04-07 06:55:23 | [–] similar comments

Docker layer caching is complicated!

CircleCI has an implementation that used to use a detachable disk, but that had issues with concurrency

It’s since been replaced with an approach that uses a docker plugin under the hood to store layers in object storage

https://circleci.com/docs/docker-layer-caching/

remram | karma 7409 | avg karma 1.74 2024-04-07 11:22:05 | [–] similar comments

Is it any better than buildx cache then, that also stores caches in object storage (via OCI registry)?

mshekow | karma 135 | avg karma 3.14 2024-04-07 07:38:14 | [–] similar comments

I took a detailed look at Docker's caching mechanism (actually: BuildKit) in this article https://www.augmentedmind.de/2023/11/19/advanced-buildkit-ca...

There I also explain that IF you use a registry cache import/export, you should use the same registry to which you are also pushing your actual image, and use the "image-manifest=true" option (especially if you are targeting GHCR - on DockerHub "image-manifest=true" would not be necessary).

remram | karma 7409 | avg karma 1.74 2024-04-07 12:50:49 | [–] similar comments

Thanks, this is a very thorough explanation.

Is there really no way to cache the 'cachemount' directories?

mshekow | karma 135 | avg karma 3.14 2024-04-07 14:22:06 | [–] similar comments

The only option I know is to use network shares/disks, but you need to make sure that each share/disk is only used by one BuildKit process at a time.

daulis | karma 2 | avg karma 1.0 2024-04-09 19:13:15 | [–] similar comments

After years of lurking, I made an account to reply to this

"image-manifest=true" was the magic parameter that I needed to make this work with a non-DockerHub registry (Artifactory). I spent a lot of time fighting this, and non-obvious error messages. Thank you!!

We use a multi-stage build for a DevContainer environment, and the final image is quite large (for various reasons), so a better caching strategy really helps in our use case (smaller incremental image updates, smaller downloads for developers, less storage in the repository, etc)

tkiolp4 | karma 1436 | avg karma 3.08 2024-04-07 10:38:30 | [–] similar comments

Docker has been among us for years. Why isn’t efficient caching already implemented out of the box? It’s a leaking abstraction that users have to deal with. Annoying at best.

omeid2 | karma 2271 | avg karma 3.23 2024-04-07 10:46:43 | [–] similar comments

[delayed]

joe0 | karma 1 | avg karma 0.33 2024-04-07 18:23:33 | [–] similar comments

They actually have recently, but it’s a separate (payed) product offering: https://www.docker.com/products/build-cloud/

ikekkdcjkfke | karma 315 | avg karma 0.72 2024-04-07 13:49:07 | [–] similar comments

Anybody have a build system that builds as fast or faster than locally?

bagels | karma 6530 | avg karma 2.47 2024-04-07 15:43:15 | [–] similar comments

I think that bitbucket offers this out of the box on their CI product (pipelines)

Cloudef | karma 1683 | avg karma 2.16 2024-04-08 01:11:09 | [–] similar comments

    - uses: DeterminateSystems/nix-installer-action@main
    - uses: DeterminateSystems/magic-nix-cache-action@main

spurin | karma 4 | avg karma 1.0 2024-04-09 12:14:52 | [–] similar comments

It would be possible to offload the caching to Docker Build Cloud transparently, it’s part of the Docker subscription service, every account gets free minutes - 50 free minutes a month so depending on usage, you may be able to get this at zero cost.

With this approach, you’d use buildx and remotely, they would manage and maintain cache amongst other benefits.

It does require a credit card signup (which takes $0 to mitigate fraud). Full transparency, I’m a Docker Captain and helped test this whilst it was called Hydrobuild.

SJC_Hacker | karma 180 | avg karma 0.98 2024-04-09 18:35:09 | [–] similar comments

Smaller images are also another way to go. At my last company, image sizes were like 2-3Gb. I was able to prune that down to ~1.5 GB. Boost and a custom clang/llvm build were particular major offenders here.

There's quite a bit of cruft that can be pruned.

Legal | privacy