The problem is when your deployment depends on external state like a database. Code rollbacks are trivial, rolling back state (if you even can) is not.
I always say this - there is no such thing as a rollback. Anyone who pretends you can simply "roll back" a stateful application is out of their mind.
If you can roll back and forward events, good for you but for most of us rollbacks are actually a new build, deployment, and a new set of tests. Every single time.
The system can only move forwards. Just like a `git revert`, it's a new commit to an immutable history.
This is why I've never used or understood the value of the "downgrade" feature in some database migration tools. If you need to revert, make a _new_ migration that fixes the problem. Your tooling/code/logs should reflect the true history of the system without cooking the books.
In what sense is 'downgrade' not just that prepared ahead of time? Or do you just not like the name?
Of course usually it would be followed by something that fixes it 'properly' - applying the original 'upgrade' again but with some correction, but typically that'll take longer than reverting to the previous version, and given it's gone wrong you'll probably want to take longer with it, test more, restore confidence.
downgrade implies that such a thing is possible. In a stateful system though, you cannot take back that which you have committed. It's quite literally impossible by definition.
I’ve found “downgrade” mostly useful for quick iteration during local development (where you may well not care about lost data). Realise your initial cut of the migration wasn’t quite right? Run “down” then fix it, then “up” again.
Or have the SAN make a snapshot of your db before the deploy, no fancy solution-looking-for-a-problem tech needed.
Completely reverting all the state to some point in the past is easy and have been a solved problem for quite a while.
The actual problem is undoing states changes related to the code change, while not losing state changes related to normal system activity.
If I add some new fields to how my web forum records posts, and then find out that it's eating every tenth post and need to revert, it'd be good to not lose and posts made in the meantime.
Good point, this system is much more useful for scientists and ML researchers, etc. wherein the code may generate datasets over several weeks or months of compute time, but produce some output that should be directly tied to the code.
You're right though. Systems in the social media space dealing with users appending to a database typically don't have this requirement to be easily reproducible, so the local snapshots are the right choice.
I don't understand your statement. If a git clone of a specific commit also pulls the data that is generated from that commit hash via IPFS simultaneously, then you can always have the data at whatever state you choose. How might this have a negative effect on conserving the data? Do you mean that there are additional structures required to manage keys if the data is encrypted?
One of the very best development teams I worked with had an interesting take, they always did database migrations first. Any new state that was to be added to the system could only be done so by first adding the new database fields or tables. This ensure that version 1 of the code would work with version 1 and 2 of the database. They would then roll out version 2 of the code, but have the new features hidden behind a feature flags (in the database), ensuring that version 2 could run, without using the new database schema. Once they where confident that everything was still running on version 2 of the code and database they'd enable the new feature. Later the feature flag could be migrated from the database to a properties file, saving the database lookup.
I wouldn't necessarily call this approach simple, but it was incredibly safe and rollbacks was always a none event.
I learned this as standard practice at Google back in 2015.
We got really good at data migrations and it was no big deal - but we only got serious about this after we had a major DB and functionality update that went wrong and took us down for 2 days.
That sounds reasonable. But what about the case where the DB migration of version 2 would be incompatbile with code version 1, e.g. a column was dropped?
You NEVER do that in one go, you need to split it in several deployments. Dropping a column is relatively straightforward, in two steps. First deploy a version of the code that doesn’t use the column, then release the migration dropping the column.
The typical example is the renaming of a column, which needs to be done in several steps:
1. Create the new column, copying the data from the old column (DB migration) both columns exist, but only the old one is used
2. Deploy new code that works with both columns, reading from old and writing new and old
3. Deploy data migration (DB migration) that ensures old and new columns has the same values (to ensure data consistency). At this point, there are no “old column only” writes by the code deployed in previous step
4. Deploy new code using only new column. Old column is deprecated
5. Delete old column
At any given point, code versions N (current) and N-1 (previous) are compatible with the DB. Any change on the DB is done in advance in a backwards compatible way.
And these DB migrations, did your team keep a history of them? If so, did you manage them yourselves, or did you use some tools like flyway?
I'm asking because I'm starting a project where we will manage the persistence SQL layer without any ORM (always did it so far with Django's migrations), but might consider some third party tools for DB migrations.
btw. it's also bad to drop a column if you have multiple people in a team when they switch between branches.
it's always a headache, so the best thing is to delay dropping/deleting.
renaming stuff with that gets a little bit tricky, but you can workaround that with database triggers if you really need to rename things.
The problem I've seen a lot, particularly with Rails, is when migrations generate a schema dump after running them, which can get really messy if people blindly commit the diff or if you rely on running several years of migrations in your local environment from scratch (many of which may have been edited or removed if they directly referenced application code). Given the migrations are executed through a DSL and the dump is just a structural snapshot of the DB at the end of the run, they're not quite as reproducible as plain SQL migrations.
You just end up with weird, unpredictable DB states that don't reflect production at all. Especially when you're dealing with old versions of MySQL and the character encoding and collation are all over the place.
The way I've seen it work is hand written SQL for the migrations, numbered and tracked in Git.
There shouldn't be any reason that you can't do it with Flyway, but I would be concerned about fighting Flyway a bit. I use Django a fair bit and I honestly don't see a good way to make this approach work for Django, not suggesting that you can't, but you would be fighting Django a fair bit, it's not really how it's designed to work.
If you don't have en ORM, then this is actually much much easier to do right. I'd design the initial schema, either by hand or using some pgAdmin, TOAD or whatever you database has. For there on everything is just hand written migrations.
The migrations is fairly tightly couple to the code. You can apply the migrations without deploying new code, if you extract the migrations, but now you have at least two branches in your version control, both of which are technically in production. You have the version that's actually running, and you have the version with the model changes and the migrations, from which you extracted the migrations and applied to the database.
I'd argue that because you're making the migrations from the model, it's also easier to do accidentally create migrations that are not independent of the code version.
Hm, but isn't that right? You make your change in code, which doesn't touch your models (except you're no longer using the column you'd like to deprecate), and deploy that; show it's working. Then you make another change to actually remove the column from the models and generate a migration. Then you deploy that version, which migrates the db and runs your new model code?
(You could in theory remove the column but not merge the migration if you wanted to show your code worked fully without that column in your ORM model before removing it from the DB as well?)
I didn't mean to use Django _and_ a separate migration tool. It's just that I did work with Django so far, but switching now to a new codebase without it. Hence my question for experiences in DB migration.
You do it in two stages. Add as new column, deploy code that uses it and no longer use the old column. Then later drop the column once nothing is using it anymore.
Not saying it is a bad idea, but the way it works is it ensures certain things happen that you would normally want to happen, namely:
* test of a rollback procedure,
* developers thinking about backwards compatibility and rollback procedure.
The main issue I see with this approach is that the test of the rollback is only partial. Just because the schema is usable by the previous version of the application does not mean the new data that is going to be put by the application will perform the same way.
Another issue I am seeing is that it is not a separate testing event but essentially happening live when the application is turned on on production. Not nice.
On the pros, this is very useful when you want to have more than one version of the application code to coexist at the same time. But I would not rely on learning about incompatibility when I start deploying the application, I would want to know that well before then.
> Another issue I am seeing is that it is not a separate testing event but essentially happening live when the application is turned on on production. Not nice.
It's a testing event if you make it a testing even. The team I'm referring to regularly makes a dump from production, run it through a PII anonymizer and performs migration test on that.
reply