Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

You’re conflating unattended-upgrades (server mutability, hard to roll back) with automated patching in general. Do automated patching but also run the changes though your CI so you can catch breaking changes and roll them out in a way that’s easy to debug (you can diff images) and revert.

I bet when you update your software dependencies you run those changes through your tests but your OS is a giant pile of code that usually gets updated differently and independently because mostly historical reasons.



sort by: page size:

I know it’s too late for a bunch of shops but for gods sake please don’t use unattended upgrades to do your patching unless you want to hate you life and chase down hard to find hard to undo bugs.

Build your images in CI job and have your deploy version be (code version, image version) so patching runs through all the same tests your code does and you have a trivial roll-forward to undo any mess you find yourself in.


Performing unattended upgrades seems like a great idea, until a bug introduced in an upgrade causes a performance or operability regression, introduces an even worse security bug, or is otherwise problematic. At worst, you could automate yourself into an outage.

IMO good reliability practices counsel against this approach, certainly not without testing these upgrades in a staging environment first.


Yes, an automatic update can break things. Personally, I am happy to have minor version updates be applied automatically if my test suite passes. For anything larger, I at least review the changelog to make sure there aren't any obvious breaking changes and then if the tests pass, I go ahead and deploy.

Having a proper test suite and updating when the changes are minimal usually leads to better overall product maintenance.

I dont get these claims to not upgrade unless you're also just as worried about changing a line of code and having it all break too? Make the changes, test the changes, and deploy carefully, just as you would for anything else.


Oh I really like those comments. People think that keeping autoupgrades on without control keeps them secure. Right... And then you keep hearing stories about server wont go up because of update, or stuff breaks horribly. Not to mention brining new bugs to the table.

Its not that simple. I care what I install on my servers. Everything is carefully selected. Additionally, I follow KISS concept, so I run simple things that are managable.


Pointless article.

What we need is good, language independent, tooling to automatically select which versions to use, how risky the update is going to be, what problems are being fixed.

Tracking and correlating successful and failed updates centrally - call it "distributed CI".

Tentatively upgrading dependencies, running tests and rolling back on error.

Don't forget that even minor, bugfix releases introduce bugs or break code by actually fixing bugs that people were inadvertently relying upon.

Lack of automated tools only encourages the "vendorize, ship and forget" model that is so popular.


A similar kind of automated mechanism is required in distributed systems that allow for rolling upgrades. New functionality in upgraded nodes can't break not-yet-upgraded nodes and legacy behavior in not-yet-upgraded nodes has to be tolerated in upgraded nodes but only until the entire system is upgraded and then it is prohibited. Doing this wrong results in some really hard to fix production states.

This is software that I rely on for my day-to-day tasks. I've had upgrades break things SO MANY times, that I never do an upgrade of "production" without specifically setting aside at least 30 to 60 minutes of time to deal with any potential fallout.

If we were talking about a video game, or some kind of testing/QA environment, then sure, automatic unattended upgrades would be fine.


You'd probably find it easier when the deployed changes were extremely small. Our last automated deploy was a single line change. Most are bigger, but not huge.

Also, while they don't do full rollbacks, I suspect more than one fix has been "remove the offending code until we can figure out what's wrong".


Even with a dynamic software updating system in place, you shouldn't be deploying dynamic updates directly to production without testing them. It'd be pretty stupid to deploy an update by any mechanism that would render a service unstartable.

Small, incremental upgrades are something I'm finding are essential to maintaining sanity as the only dev at my company doing what I do. Waiting to upgrade only increases the pain. Now instead of having one potential issue to troubleshoot every now and again, I have dozens when I finally do buck up and upgrade.

So I get in the habit of upgrading the dependencies of all the apps I work on, every time I work on them. Issues happen, but only to one dependency at a time. It's manageable.

What I would love is to eventually have a CI server do it for me. Every single day, it would run bundle update, run the tests, and deploy unless there's a problem. If there is a problem, it drops me an email with the trace and fixing it becomes part of my morning routine.

If subtler problems surface this way, then I've discovered an oversight in my test coverage, or an overly complicated architecture that I need to remove dependencies from.

I'll probably implement this sometime this year. I'm thinking I'll want to redo deployment instead of relying on Capistrano, then finally growing my own CI solution. I'm slowly moving away from big monolithic apps to smaller, homegrown solutions that do only what I want them to do. I've already reimplemented provisioning and configuration management. I believe in DevOps as code.


alternatively I suppose depending on the size of your operation, you should consider having a dummy prod using at least one of each of the servers in your environment and using that to validate host upgrades. after that you can push an unattended upgrade via a self-hosted package+upgrade server.

Let things be automatic to the maximum degree possible but give yourself a single hard human checkpoint and some minimum level of validation in a dummy environment first.


I don't agree - I try to do security patches every few days, and major upgrades when I have time to test them. The hard part is keeping tabs on what servers need them. I have 7 or 8 vm's and when they're not involved in a project they're easy to forget about.

Things out of my control. Spending days tracking down a bug only to find that there is no mitigation other than modifying some upstream library. Bonus terror: deploying the changes would involve numerous clients also out of the team's control. Cue weeks and months of begging to make the upgrades.

(Edit: one more:) Deploying changes that don't have an easy rollback mechanism. E.g. a risky change involving apps or browser cookies, and both deploy and rollback take e.g. a day.


This resonates with me for a LOT of reasons but I take a very different approach. I try to keep just a few dependencies and keep them all up to date. For most updates I can read every line of updated code. I learn a lot, get all of the security patches, and sometimes I realize I don’t need a dependency and I remove it. I’m always trying to take small calculated risks. I have great monitoring and rollbacks are easy.

Even worse, on Linux, most packages will be helpful, for example upgrading the MongoDB package on CentOS? It'll helpfully restart the service for you. Nevermind that you've just scripted this and it's being run on all of the servers all at once so instead of doing a nice easy rolling upgrade WHEN you are finally ready, it's done at the same time.

That’s exactly how it must work for total automation where humans are out of the picture. What you lack is a push, rather than a pull methodology, with a software deployment server and a capability maturity model level 3 (or higher) change management process around it. Then it works flawlessly, and I’m writing from experience of several decades of modelling and implementing such things on a very large scale (tens or even hundreds of thousands of servers).

That’s my specialty as a technical architect.

https://en.m.wikipedia.org/wiki/Capability_Maturity_Model


#2 isn't surprising. A ton of shops only patch on a quarterly basis, and even then they only consistently patch the OS. Only when there is a vendor (like my employer, Red Hat) sending out notices that there is a critical vulnerability do people move any faster.

Which is twice as bad for application dependencies. These don't even have quarterly patch cycles. Dependencies may be updated when a new release is deployed, which may be a couple times a year or never. One of my favorite questions for clients is "who is responsible for patching applications with no development team anymore?"

This is an underappreciated benefit of CICD. An automated process lets you get a central group to take ownership of third party libraries and their security, and then let them trigger all applications to rebuild and release. Especially with containers, this is essential.


You should definitely tug on things when you have a controlled test environment and the time to explore what-ifs.

Much like companies should try to replace their own products (before a competitor does), infrastructure teams need to force “predictable” upgrades in a controlled environment on a regular basis. For example: look at your dependencies, imagine what upgrades are likely to be required in the near future, and try making those upgrades on test systems to see what could go wrong.

That approach achieves three things. One, since you’re not in emergency mode and you’ve used a test environment, any problems that you do uncover are not going to cause a crisis. Two, if you do this semi-regularly then you’re likely to see only minor issues. Third, exploratory upgrades give you a lot of time to fix problems (whether it’s time for your own developers to make changes, or time to wait for an external open-source-project/vendor to make changes for you).


That’s pretty sweet. Makes me think about software upgrade scenarios. A typical upgrade process for long-running services I’ve seen is usually some variant of “flush everything out to disk; shutdown process; run some hand-crafted upgrade code for the on-disk datastructures; start the new process.”

On the other hand, the development for this stuff usually consists of a sequence of small patches. You could generate a lot of that on-disk upgrade code by applying something like this against all those patches. Maybe you could take it a step further and never actually tear down the process (by doing that memory mapping stuff the article touches on).

I wonder how far you can take that concept. People develop software today with an edit -> build -> deploy cycle. With sufficient tools, could one do development solely by describing how to patch the running process? Your repository consists of a sequence of such patches: CI applies the patches against a running process (and then runs all your tests). If that passes, you can deploy the patch everywhere using an identical procedure.

Fixes the issue of customers running outdated software because they can’t handle the one minute of downtime associated with an upgrade to a critical service. You don’t need to keep years-old upgrade code in the codebase (and keeping it well-tested) because one of those customers might want to upgrade someday. On the other hand, pulling bad patches caught late in the cycle becomes more difficult. And also the problem of downtime already has workarounds today, usually by making use of redundancy (which remains valuable even outside of upgrades).

I dunno, it’s fun to think about where and how you could apply this. Hot-reloading structs is a totally novel idea for me :-)

next

Legal | privacy