> Is this bad because I now have to do a deployment to enable the feature? ... I would argue as long as your deployments are easy, this is the better way to do things because it reduces the complexity of integrating with a third-party tool.
Shoot, even when it's all first party tooling, I prefer a release-flags-and-binary as an atomic unit. If the flags and binary are going out as a single push, it simplifies a lot of things:
* reproducible state for given time with only one thing to track (running version), instead of two or more things (binary version, flag configs version)
* corollary: rollbacks are much more simple, because it's just one thing to rollback - the code+config+everything as an atomic unit, not "did we need to roll back the flags, or the binary, or both?" (Corollary to corollary: removes the awful problem of "oh shit the code and/or flags/configs wasn't backwards/forwards compatible" where you shoot yourself in the foot while doing a rollback
* removals and cleanups are easier: can remove flags, config bits, and code all at once, instead of having to do a careful dance of "set code behavior to default-on and remove dependency on flag, the wait for full deployment, then make sure everyone is okay with not rolling back after a given point, then remove flags and config logic"
* depending on your tooling, diffs vs prod during code review get cleaner
The problem is when your deployment depends on external state like a database. Code rollbacks are trivial, rolling back state (if you even can) is not.
I always say this - there is no such thing as a rollback. Anyone who pretends you can simply "roll back" a stateful application is out of their mind.
If you can roll back and forward events, good for you but for most of us rollbacks are actually a new build, deployment, and a new set of tests. Every single time.
The system can only move forwards. Just like a `git revert`, it's a new commit to an immutable history.
This is why I've never used or understood the value of the "downgrade" feature in some database migration tools. If you need to revert, make a _new_ migration that fixes the problem. Your tooling/code/logs should reflect the true history of the system without cooking the books.
In what sense is 'downgrade' not just that prepared ahead of time? Or do you just not like the name?
Of course usually it would be followed by something that fixes it 'properly' - applying the original 'upgrade' again but with some correction, but typically that'll take longer than reverting to the previous version, and given it's gone wrong you'll probably want to take longer with it, test more, restore confidence.
downgrade implies that such a thing is possible. In a stateful system though, you cannot take back that which you have committed. It's quite literally impossible by definition.
I’ve found “downgrade” mostly useful for quick iteration during local development (where you may well not care about lost data). Realise your initial cut of the migration wasn’t quite right? Run “down” then fix it, then “up” again.
Or have the SAN make a snapshot of your db before the deploy, no fancy solution-looking-for-a-problem tech needed.
Completely reverting all the state to some point in the past is easy and have been a solved problem for quite a while.
The actual problem is undoing states changes related to the code change, while not losing state changes related to normal system activity.
If I add some new fields to how my web forum records posts, and then find out that it's eating every tenth post and need to revert, it'd be good to not lose and posts made in the meantime.
Good point, this system is much more useful for scientists and ML researchers, etc. wherein the code may generate datasets over several weeks or months of compute time, but produce some output that should be directly tied to the code.
You're right though. Systems in the social media space dealing with users appending to a database typically don't have this requirement to be easily reproducible, so the local snapshots are the right choice.
I don't understand your statement. If a git clone of a specific commit also pulls the data that is generated from that commit hash via IPFS simultaneously, then you can always have the data at whatever state you choose. How might this have a negative effect on conserving the data? Do you mean that there are additional structures required to manage keys if the data is encrypted?
One of the very best development teams I worked with had an interesting take, they always did database migrations first. Any new state that was to be added to the system could only be done so by first adding the new database fields or tables. This ensure that version 1 of the code would work with version 1 and 2 of the database. They would then roll out version 2 of the code, but have the new features hidden behind a feature flags (in the database), ensuring that version 2 could run, without using the new database schema. Once they where confident that everything was still running on version 2 of the code and database they'd enable the new feature. Later the feature flag could be migrated from the database to a properties file, saving the database lookup.
I wouldn't necessarily call this approach simple, but it was incredibly safe and rollbacks was always a none event.
I learned this as standard practice at Google back in 2015.
We got really good at data migrations and it was no big deal - but we only got serious about this after we had a major DB and functionality update that went wrong and took us down for 2 days.
That sounds reasonable. But what about the case where the DB migration of version 2 would be incompatbile with code version 1, e.g. a column was dropped?
You NEVER do that in one go, you need to split it in several deployments. Dropping a column is relatively straightforward, in two steps. First deploy a version of the code that doesn’t use the column, then release the migration dropping the column.
The typical example is the renaming of a column, which needs to be done in several steps:
1. Create the new column, copying the data from the old column (DB migration) both columns exist, but only the old one is used
2. Deploy new code that works with both columns, reading from old and writing new and old
3. Deploy data migration (DB migration) that ensures old and new columns has the same values (to ensure data consistency). At this point, there are no “old column only” writes by the code deployed in previous step
4. Deploy new code using only new column. Old column is deprecated
5. Delete old column
At any given point, code versions N (current) and N-1 (previous) are compatible with the DB. Any change on the DB is done in advance in a backwards compatible way.
And these DB migrations, did your team keep a history of them? If so, did you manage them yourselves, or did you use some tools like flyway?
I'm asking because I'm starting a project where we will manage the persistence SQL layer without any ORM (always did it so far with Django's migrations), but might consider some third party tools for DB migrations.
btw. it's also bad to drop a column if you have multiple people in a team when they switch between branches.
it's always a headache, so the best thing is to delay dropping/deleting.
renaming stuff with that gets a little bit tricky, but you can workaround that with database triggers if you really need to rename things.
The problem I've seen a lot, particularly with Rails, is when migrations generate a schema dump after running them, which can get really messy if people blindly commit the diff or if you rely on running several years of migrations in your local environment from scratch (many of which may have been edited or removed if they directly referenced application code). Given the migrations are executed through a DSL and the dump is just a structural snapshot of the DB at the end of the run, they're not quite as reproducible as plain SQL migrations.
You just end up with weird, unpredictable DB states that don't reflect production at all. Especially when you're dealing with old versions of MySQL and the character encoding and collation are all over the place.
The way I've seen it work is hand written SQL for the migrations, numbered and tracked in Git.
There shouldn't be any reason that you can't do it with Flyway, but I would be concerned about fighting Flyway a bit. I use Django a fair bit and I honestly don't see a good way to make this approach work for Django, not suggesting that you can't, but you would be fighting Django a fair bit, it's not really how it's designed to work.
If you don't have en ORM, then this is actually much much easier to do right. I'd design the initial schema, either by hand or using some pgAdmin, TOAD or whatever you database has. For there on everything is just hand written migrations.
The migrations is fairly tightly couple to the code. You can apply the migrations without deploying new code, if you extract the migrations, but now you have at least two branches in your version control, both of which are technically in production. You have the version that's actually running, and you have the version with the model changes and the migrations, from which you extracted the migrations and applied to the database.
I'd argue that because you're making the migrations from the model, it's also easier to do accidentally create migrations that are not independent of the code version.
Hm, but isn't that right? You make your change in code, which doesn't touch your models (except you're no longer using the column you'd like to deprecate), and deploy that; show it's working. Then you make another change to actually remove the column from the models and generate a migration. Then you deploy that version, which migrates the db and runs your new model code?
(You could in theory remove the column but not merge the migration if you wanted to show your code worked fully without that column in your ORM model before removing it from the DB as well?)
I didn't mean to use Django _and_ a separate migration tool. It's just that I did work with Django so far, but switching now to a new codebase without it. Hence my question for experiences in DB migration.
You do it in two stages. Add as new column, deploy code that uses it and no longer use the old column. Then later drop the column once nothing is using it anymore.
Not saying it is a bad idea, but the way it works is it ensures certain things happen that you would normally want to happen, namely:
* test of a rollback procedure,
* developers thinking about backwards compatibility and rollback procedure.
The main issue I see with this approach is that the test of the rollback is only partial. Just because the schema is usable by the previous version of the application does not mean the new data that is going to be put by the application will perform the same way.
Another issue I am seeing is that it is not a separate testing event but essentially happening live when the application is turned on on production. Not nice.
On the pros, this is very useful when you want to have more than one version of the application code to coexist at the same time. But I would not rely on learning about incompatibility when I start deploying the application, I would want to know that well before then.
> Another issue I am seeing is that it is not a separate testing event but essentially happening live when the application is turned on on production. Not nice.
It's a testing event if you make it a testing even. The team I'm referring to regularly makes a dump from production, run it through a PII anonymizer and performs migration test on that.
I agree with all this and it’s how I do it for web development but the time Apple/Google take to approve apps makes doing this for mobile apps quite risky since it’s hard to rollback.
I guess it goes to the point of the author that hard deployments on mobile makes mobile development harder
Anyone have advice for teams without control over when their software is deployed by the people actually running it? Say you have a product released to a third party and they may or may not take releases and may be arbitrarily delayed in deploying those they choose to take.
You're in a for a long tail of pain if you have significant uptake.
To avoid this, you will need have a clear policy on what you're willing to support (eg versions up to 1 year old, or "major version - 1", etc.), and stick to it. Otherwise, people will expect you to support all of it, forever.
If you want people to keep up to date with your releases, you either have to provide an automatic update facility that they have to manually turn off (ugh, too hard), or you have to dangle desirable features as carrots in front of them, so that upgrading is exciting instead of a social obligation.
My experience is kinda the opposite. Big companies get into certain workflows and don't want to change, altho they are happy to take bugfixes. This is also where I'm at with much of the software I rely upon, some of which I pull at my own whim.
Give yourself as much leverage as you can get over their deployment cycles. Use both carrot and stick - Include features they can't live without with the hardest of deployment maintenance burdens, and make sure you have tight adherence to your software lifecycle - Every question about 1.0 should be answered with "We don't know, 1.0 is out of support" from the second that it's EOL date passes.
Include the currently running version in every logfile, if not every log statement. Make it clear to you where they are and run your own run-behind that upgrades only after they do. This will be painful, but it will save you overall time.
If you're talking about an app store you can bundle two binaries in a single binary and then have an external feature flag that selects which binary to use
Dealing with the long tail of customers waiting to upgrade will be a persistent challenge though
> I did an inventory of my recent feature flags and realized that about 80% of them aren’t there to roll things out to specific populations, or do any sort of A/B testing, but to hide unfinished code.
Maybe it’s just me but having unfinished, dead, or scratch code in a production codebase really annoys me. Either finish your work or delete the unneeded code. More than a few times I’ve sunk time out of my day into investigating some code path only to realize it’s completely unused.
“Unfinished” can mean abandoned, but it can also mean half-baked code that has been pushed to master even though it’s still actively being developed and is nowhere near ready to run. Personally I prefer long-lived feature branches; this is a risk with no payoff.
As for experimentation, I like percentage rollouts with segregated control/treatment group metrics. I agree that trivial on/off flags should be replaced by code deployments where possible (at my day job, code deployments happen to be a lot slower).
Companies have an expectation that most of the code they paid for will eventually ship. Pretending like code might not ever be finished is an odd choice.
The exception is experimental work. Obviously experiments aren't being merged into master, so they don't need to be included in the arithmetic for any processes that eventually involve master.
Personally, long-lived branches feel like disaster waiting to materialize since by definition they do not integrate with upstream.
Yeah, you might be really responsible and rebase and test your feature frequently, but what about other people? They are more likely to break the integration with your code since they can't even see it.
I'd rather have unfinished feature flagged code in master (and therefore production) than have the same unfinished code withering in a long running branch, diverging from master and causing integration problems later.
> Either finish your work or delete the unneeded code.
It's work in progress. We're working on getting it finished.
>It's work in progress. We're working on getting it finished.
So finish it and then I will merge your PR ;) What's the use of putting it in master if it's not finished?
It's the author's responsibility to get it merged successfully. If they're taking too long and have to rebase and re-work their code to integrate, that's on them. Pushing it into master is either wasting a reader's time (per the original comment) or, worse, inviting an uninitiated collaborator to use it and cause an incident.
Somewhere along the way, a new generation of developers thought that 'Continuous Integration' means "automated builds". I wonder what they think that 'integration' word is doing there?
Since you're here, I'll ask you. What do you think 'integration' means in this context?
"Passing CI" is a low bar -- it is only a single layer of defense. Moreover, code has syntactic properties that aren't necessarily evaluated by CI. My comment claimed that there are readability and usability concerns with unfinished code. This is why we have code review. CI is a signal that the code is good to merge, but as a code reviewer, I have the final say. I don't merge unfinished code.
If CI passes on your branch but later fails due to lagging behind master, it is on you to get it working before re-requesting review.
PS: The HN guidelines clearly state "Be kind. Don't be snarky." (:
Continous Integration is a development methodology, not an automated process. The idea is that developers continuously merge their changes into a shared integration branch. Having long running feature branches that get merged when they are done is a valid methodology, but it is not continuous integration; no matter what your automated build and test pipeline calls it.
In practice, I have typically seen CI done with either feature freezes, or release branches to allow creating versions where everything is complete.
> If CI passes on your branch but later fails due to lagging behind master,
That’s the problem, which I already stated before the question. That’s not integration, that’s a build. Integration is testing your changes with all the code around it.
Specifically, it’s about testing your changes with changes that started after you began writing your code.
Only small changes land in anything like chronological order. On real codebases, even with low to moderate coupling, the rug can get pulled out from underneath of you without you even noticing until something bad starts happening. CI is about exposing your code to feedback at the earliest possible moment, so you 1) don’t continue to build on violated assumptions, 2) you do lots of small merge operations instead of one big one and 3) the thoughts that began the trouble are still fresh in your head. Consequences of decisions become abstract over time. Nobody changes their behavior based on 1 year old bugs found in their code.
Long lived branches are a crutch. It’s avoidant behavior. If you can’t figure out how to make your code work with everyone else’s, why do you think waiting longer will make things better? It doesn’t. The mess just gets bigger. I’ve seen it over and over again. The guys who won’t merge early are full of delusions about their own work that doesn’t match up with the bug and incident count. They call it bad luck.
> So finish it and then I will merge your PR ;) What's the use of putting it in master if it's not finished?
Because of the thing I said in the sentence before the one you quoted.
> If they're taking too long and have to rebase and re-work their code to integrate, that's on them.
No, in the organisations I build, we work as a team and if someone is taking too long it's because we don't have the systems in place to help them work faster. There is very clear research on the benefits of working in small batches and integrating continuously. If that research has passed you by then I strongly suggest you go back and take a look.
> What's the use of putting it in master if it's not finished?
Visibility, and avoiding duplicated effort. What's the use of keeping it out of master?
> Pushing it into master is either wasting a reader's time (per the original comment)
The new parts they'll have to read sooner or later, and for the parts that are changing it's better for them to read the new version than the old version.
> or, worse, inviting an uninitiated collaborator to use it and cause an incident.
Why would that cause an incident? Having someone see the "upcoming" code and realise they can reuse it in something they're working on is the ideal outcome.
Some changes are too big to fit into a effort. A few years ago, I was working on a niche compiler. When the project was first started, the decision was made to inline everything, greatly simplifying the rest of the compiler [0]. This decision had served us well for the better part of its then 17 year lifetime, but was finally starting to cause issues with compile time and memory usage. One of our senior developers, who was intimately familiar with the project, tried on several occasions to allow for things to not get inlined. However, he kept having to give up as it was a low priority background task, and his branch would diverge from the main branch faster than he could keep his up to date.
The solution ended up being a rather simple feature flag. Within a few days, he coded a flag that would enable not-inlinging; and updated our test infastructure to look for regressions on unit tests with the flag enabled. Going forward, developers we responsible for making sure their changes didn't cause regression when the flag was enabled; and everyone was able to slowly chip away at everything the feature broke when they had spare cycles.
What would have been a major stop-the-world refactor with our most senior engineers, turned into a slow moving non-issue.
[0] Lack of turing completeness was and remains an explicit design goal, so recursion was explicitly forbidden.
We need better visualization than we have, but this is why you attach ticket numbers to all commits.
If the code is toggled off for a ticket in progress, great. If it’s toggled off for an epic in progress, okay. If it’s toggled off and the epic is complete/abandoned, then it’s not dark code, it’s dead code. Fire up the chainsaws.
The old code behind a toggle should be deleted before the feature is Done. If it isn’t then someone screwed up.
When you have an `if False:` or equivalent in code, you have to deploy to enable a new feature, so if the current state in the test environment isn't good, you either have to wait for it to be fixed, or to roll back all possibly bad new commits before you can enable the feature.
Another feature of feature flags is that a non-coder can toggle them.
If you need neither of these, sure, go ahead with commits and deployments instead.
When you're trying to triage a production problem or explain what happened between Saturday and today, orthogonality isn't a feature, it's a challenge to be overcome.
> if the current state in the test environment isn't good, you either have to wait for it to be fixed, or to roll back all possibly bad new commits before you can enable the feature.
This is a symptom of long release cycles. The more work you “save up” to deploy all at once, the more difficult deployments become. Everything becomes much easier when you minimise unreleased work.
I have recently seen someone wrap a single function in about 10 classes of Java to turn it into a standalone application, slap on some Docker Compose magic, add some build scripts, and then continue to look proud at the feature being wholly configurable at deployment time.
Of course, the deployment team would have to be informed about this change, so some documentation was required as well, but that was exactly the part of the process where most effort was saved.
Note that all this was used for a system that would be deployed only at one customer, and the feature would always be on. Yes, a boolean flag would have been a better solution for this.
My cynicism is probably not the best approach to change the world for the better, so any hint on how to teach younger colleagues to stop snacking micro service candy is much appreciated.
> My cynicism is probably not the best approach to change the world for the better, so any hint on how to teach younger colleagues to stop snacking micro service candy is much appreciated.
I could be way off base here, but I'd bet a lot of junior devs are just trying to stand out. They want respect, raises, promotions, and new job offers. Invisible good solutions don't bring those.
junior devs are junior devs in part because they do not yet know that there is no guarantee that anyone else has been keeping track of their victories come review time.
solutions are only invisible due to a lack of sufficient documentation and communication.
boring solutions are part of the job. they can be fun if the goal is shifted from implementing to automating.
none of this to say you are wrong, but a junior dev that succeeds at such peacocking efforts is signaling the poor health of their surrounding team as much as their eagerness to impress. where was the manager or senior dev to say “great! have you considered a boolean?”
Nobody cares about invisible good solutions, because invisible solutions are simple - and developers are to one degree or another, rated on the degree of complexity they can deal with and how smart they can be. It is common here to understand the cleverness behind just doing the dumb thing, but it is not common in many places.
Seems cynical. In my experience, junior devs are just more susceptible to cargo-culting and getting excited about cool ways to apply the things they learned in undergrad. I haven't actually seen a lot of "resume-driven development" in my (relatively limited) time in the industry... just well-intentioned eager people with more knowledge than wisdom.
I like to describe my colleagues as the delta between how smart they are, and how smart they _think_ they are. “Junior” devs almost always fall in the “wide gap” spectrum but it is by no means restricted to any age group.
Strongly disagree this is a junior dev thing. Most of the terribly complex abstractions are, in my experience creates by people who have enough seniority that others can't easily question them. There are good devs and bad devs, and there are junior and senior devs, but it's the first axis that determines code quality.
I’ve seen entire orgs paralysed by a single architecture astronaut who has been there since the beginning who nobody has found a way to route around. Simplicity as a cultural value is underrated, and fetishising complexity can kill a business.
Well no doubt! Same. You can certainly get better at programming as you get more experience. I'm just saying: the axis that determines this stuff is skill, not seniority. They correlate but they're not the same thing.
I've saved a great many man months by doing just this. Sometimes work just vanishes. And it's not just the juniors that do it, sometimes the senior / tech lead types just miss a crucial point to make the solution trivial.
But juniors are often used to wanting to make work, rather than solve a problem.
Just yesterday I was helping with an incident and we had outlined two possible solutions. The first was bad (do nothing, leave things in a somewhat bad state), the second was a ton of work (including restoring from snapshot). I was able to describe these solutions as a spectrum.
1. Do nothing, bad state remains
2. Surgically restore to world before bad state
3. Remove all bad state (even if it wasn't caused by the incident)
I'm omitting some details, but I think this strategy can be effective.
How configurable should X be? We have one customer who wants it to be configured like this. All other customers want it configured like that.
We can make X maximally configurable which would allow future customers to have a custom setup - how likely is that?
Are there no seniors on the team? Why aren't anyone helping and guiding your team to make a better product? Seniority isn't about technical skill, it's about being a multiplier and paving the road for others. If you're not, and you're letting the team in your eyes waste time, the whole team acts just as junior.
Touché! Both luckily and unfortunately, this incident was not on my team.
Even then, I find it increasingly hard to argue people out of the nonsense they pick up on blogs and conferences.
Especially for newer techniques that are not yet proven to be inefficient, such as using micro services in the wrong context, one has to resort to arguments by authority. Some junior and senior devs are not very susceptible to that.
For areas such as web development for user interfaces, or taking in enormous dependency trees with package managers, the problem is even worse. Here, an entire industry had standardised on suboptimal methods. One can argue that this is wrong, but there is no viable alternative.
In the latter case, guiding only one junior developer does not improve things. One has to educate the entire industry.
> the nonsense they pick up on blogs and conferences
Thanks for this bit. As I gain more experience, I grow less and less patient with many blogs and conference talks. Often authors are just very excited about a topic, and my general impression is that they tend to propose ideas that are not seriously battle-tested in production.
If you’re one of these people who are oozing with sarcasm then just be yourself. It’s a fun way to convey when something actually is ridiculous. Nothing wrong with that.
Just also make sure from time to time that you’re on the same side. Don’t hesitate to laugh at your own mistakes openly. Sprinkle in some sympathetic, honest remarks. Target the thing, the mistakes, never the person.
Bottom line: laugh together at the expense of human fallibility and not a particular person.
In game dev there was a saying like “you’re writing the game in C# so just write the game logic in C#” rather than making a run-time configurable / scriptable monolith
Easy deployments are great, and all too rare. But feature flags are good too. It's actually a trade-off between determinism and control. No flags make the blob deterministic; flags give you control over the blob very late in its runtime. In the limit your feature flags might be set by an admin endpoint (which is very common, actually).
But the OP's real issue is git, and not wanting to work on a branch. I actually think that's a valid goal! I've noticed that many teams have simply moved "one level up" with their git usage. The repo is the new directory; the push is the new commit; etc. You can practice and get fast with it, but it's still too many moving parts and plenty of things to go wrong. I actually think that feature flags are too coarse anyway to block off incomplete code; you should probably use conditionals for incomplete code, even if you also have robust feature flag support. The reason is that you never, ever want to dynamically turn incomplete code on, so coupling the flag to a deployment makes perfect sense.
As an aside, those “feature flags as a service” tools have some neat features, but are just way too expensive for what most apps probably need: simple binary flags that can change at runtime.
Example: just use a database table. Query and cache in memory for 60s. If you want, build a simple internal web page or tool to toggle flags. This works, and scales (from experience).
Do many apps really need A/B testing or segmented rollout? Maybe, but probably not.
This technique has been used since Wordpress days and earlier. At it’s core, it’s a “meta” table with a string identifier and some value. It’s a pattern I still use today in my own web apps.
"Just use a database table" solves for the simplest cases - a single monolithic app. It works incredibly well in this case. It falls apart when I need to turn features on across multiple apps together.
I think OPs point was a simple app doesn't need this whole suite of tools, but you pay for them anyway, which makes it a bad cost/value proposition for simple apps.
So yes a more complex situation could benefit from all this extra tooling and complexity but otherwise it is dead weight/cost.
I would claim that release a new feature by enabling additional code path across multiple apps at once is a bit of an anti-pattern. It seems rather dangerous and error prone. In that case I'd actually release it in reverse order, so to speak. Release the apps that use the services of others first and have it check is this service is available/functional and if not, skip calling it. The release the feature to the next service down the stack. Then you can always rollback the last service and be confident that the callers still work.
It's way more work and I can see why for certain types of application isn't not really worth the trouble.
> Just use a database table" solves for the simplest cases - a single monolithic app.
Most cases are the simplest cases.
No approach works for every single use case, but OP was specifically talking about the common case. You don’t have to shoot for the most complex use case every time.
I think most frameworks I have worked with have the concept of configurations, more advanced ones have config that can be overriden by environment files. Definitely a solved problem if you want to roll your own.
.env files are the quick and dirty version for simple sites that don't have an admin panel or database.
I am not a fan nor a user of backend feature flags, so don't ding me for relaying this: they tout features such as central management of these settings, bringing them in the hands of the product team rather than being tugged away in code, allowing for canary deployments where you fade in a feature based on performance metrics, a/b testing, and so on. In their own interest, these folks take an as broad as possible view on what is a feature flag, often including things that you'd otherwise call system configuration or entitlements or permission toggles. It's not a new trend, LaunchDarkly is probably the best known commercial party here, and they've been around for about ten years, and I don't think they were first.
Because feature flags often intersect with segmentation and AB testing.
So it's not just a byte in memory, but often also correlating the status of said byte with a users identity and then tracking and summarising user behaviour based on that relationship.
It's become fairly standardised and requires engineer time to setup and maintain the services behind all that, so it's valid to go third party for less than the cost of said engineer time, if all you want is standard.
Edit:it's also hard to always predict when a standard flags going to become part of a test, so just integrating for every flag and making that a standard process for your teams becomes the simplest approach.
I wouldn't say a new trend, it did go through a hype cycle a few years ago, and some teams have adopted it. I definitely wouldn't call it a standard practice though, as it brings with its own overhead. In effect you've got "byte in memory as a service" with its own deployment, maintenance, and statefulness. It's only useful if you have a business model that really relies on having this capability. That could be in a sufficiently large, already complex, sprawling application estate, having a single flag in a central location could be useful if several pieces of your sprawling estate need it.
TIL there's an industry around feature flags. I can see how it can be useful, but imo if you're doing the modern approach with k8s, gitops, etc. deployments are too easy to bother with this.
Even if deployments took 2 seconds you’ll still find a frontend engineer who wants to “iterate faster” and an ear attached to the purse strings who will listen.
Often it's not about only turning something on/off (a simple bool). But doing a slow ramp up, only enable for certain cohorts to test etc. Things still doable in code, of course, but it's nice with a dashboard to control it.
You quickly end up with a NIH thing. You make a boolean toggle in the db. Then make an internal dashboard. Then some more advanced toggles. And suddenly you've spent a few weeks of coding time to essentially save yourself from spinning up an Unleash container.
>Much of the time I’m using flags so I can commit unfinished code because I don’t want to create a separate branch and have one source of truth
This seems like an incident in the making. If each dev on a team commits unfinished code into prod behind flags, the whole project is going to be littered with flags and unfinished code. Some intern is going to delete a flag or an if check and then everything is going to break.
Why would an intern remove a flag? The flag is not special, it's code like everything else, with tests, ownership, documentation etc..
The idea of the "unfinished code behind a flag" is to be able to work on trunk instead of a long lived branch, increasing the pace of development and reducing integration costs.
This works quite well in my experience, and definitely better than "let's keep a huge branch in sync with the main one for 3 months while we finish".
The problem IME is the opposite: flags do not get removed fast enough, littering the code past their utility.
Aren't there common patterns of good "just code" and bad "just code"? I've been told for a long time that global variables are a bad pattern. Maybe feature flags are a bad pattern too.
One concern about feature flags is testing, and the added permutations of testing needed to include all the feature flags in testing. You tested with flag A on and off, you tested with flag B on and off, but did you ever test with them both on and both off? Without feature flags, a big change that could have been represented by a feature flag would hopefully have to make its way past some quality gates. With feature flags, the exact permutation that you're going to cause later today by flipping on some feature flags may well not have been tested. Not that forgetting to test is something you can't protect yourself against with tools and processes, but testing all the permutations may be expensive.
You may not have to test all the permutations, if you can predict which permutations are relevant for your flipping feature flags later today. But a lot of organizations have poor discipline in cleaning up old feature flags, so it may not be so predictable. Maybe that's not a feature flag problem but an organizational problem, but the feature flags are gonna get blamed at some point, nonetheless.
I've worked at a large company where everything was under feature flags, think a dozen teams working on apps alone. We didn't push straight to develop, but we definitely had possibly unfinished code on production. For example a feature could be "done" but a bug was found, and the app was already released. That feature flag would simply not be turned on until the next version where it would be fixed.
We had a lot of tests for each feature flag variation, both unit and ui tests.
The codebase wasn't "littered" but there definitely existed unused code under flags, either to-be-used, or to-be-deleted. We had a grafana view of experiments that were old and not yet removed to manage them.
Overall we never had incidents around this, to my knowledge anyway.
> I did an inventory of my recent feature flags and realized that about 80% of them aren’t there to roll things out to specific populations, or do any sort of A/B testing, but to hide unfinished code.
Which means that for 20% you actually do need the feature flags. So you can't do away with the product, and as the author and many of the respondents here have mentioned, there is also a number of benefits to using feature flags.
In the trivial case of a feature for a single user still in development; yeah, a boolean is enough. But that is not the intended use case for feature flags, nor is it what the libraries promote you to do.
Pushing config to production is also deployment. Pushing config update to a dynamic config serving service from which all other services pull the config is also deployment.
You need safety guardrails for any deployment of code or config – a clearly defined unit of deployment that is version controlled, changes going through a repeatable automated flow in which lints/tests can be added, changes can be rolled out atomically all at once, or rolled out with careful control gradually, can be rolled back to a known safe version instantly when needed etc. All of this applies to config and code (binary).
Within code, having feature flags, dynamic config variables, experiments that can be ramped up slowly and ramped down automatically upon regression detection etc – these are mechanisms to make your life easier when you have unknown-unknowns manifest to bite you. And they always do in any non-trivial real-world large-scale distributed architecture application that needs to evolve continuously.
And these unknown-unknowns are not just code bugs, they are mix shifts in usage patterns, data etc causing unanticipated behaviors too. They are old stale stored state interacting with new code that you didn't test for. They are changes in code outside of your control – like remote services or OS/language libraries – that you assume to be mostly stable and don't exhaustively test every single time – that you don't even know when they changed - that change behavior without changing interface etc. These are real-world software systems problems.
If your software is built like an appliance – frozen in time and doesn't change ever after, is stateless, doesn't interact with outside world etc – then you don't need any of those guardrails and you can keep your code simple.
At Google, there are no branches, so typically development indeed happens under some experiment flag. Binary, configs and experiment flags are on separate deployment schedules, and often you need to coordinate your release across different systems (with their own binary, config and experiment flag deployment schedules). This requires you to think about backward & forward compatibility of all your changes.
With my SaaS/PaaS you hot-deploy async. to all global nodes with complete (type last code character to ready) turnaround of ~1-2 seconds on live (just enough time to switch to your client/browser):
> Much of the time I’m using flags so I can commit unfinished code
There are two interpretations of this, and I find one of them horrifying
1. I'm working on a feature, I need to update multiple components, so I make small changes for each component and feature flag those (this is fine! Totally normal)
2. I commit code that doesn't work because I'm unwilling to use or understand git
The second of these scares the crap out of me. Why? Doesn't it feel like a bit of a footgun to have broken code deployed?
However, I like most of the other comments of the essay - deploys should be easy! That makes rollbacks safe and easy. Using bool flags is fine too - imo feature flagging systems are best at de risking expensive deploys.
If you have an expensive deploy (thinking at least 20m to deploy prod) and you have staging which mirrors prod (probably also 15-20m to deploy staging), then any commit you want to deploy to fix or revert a change in prod is going to cost you a lot of time or hassle. Feature flags give you a mechanism to instantly rollback.
But it sounds like you already have that - it's called your normal deploy.
"Broken code" "deployed" inside an `if (false) {}` (or even an `#if FALSE #endif` depending on your language of choice) isn't that much more of a footgun than a git branch hundreds of commits behind and full of merge issues. It's something of a "different strokes" preference and it's something of a question of how fast the main branch is moving and how hard to keep up with it.
Most language compilers and JITs eliminate the unreachable code within a constant false branch sooner or later and the code isn't even technically "deployed" even though it exists in source control for that deployment.
There's still a difference between commented code because it is dead and bitrot and code-in-progress. I am among the first people to delete commented code and send anyone looking for it to git tutorials, but I've also kept git branches short by keeping code behind if flags (that are false in builds/production) while working on it so that I can still give that code the benefit of other refactors or compile errors and so that other developers are aware of that partially complete code.
It's not a dichotomy, it's an orthogonal spectrum. Of course it is hugely useful to be good at source control. Of course it is still sometimes useful to have code compiling in CI/CD that "isn't ready yet" but prone to cross-fire in other on going work and refactoring. (Plus source control is greatly useful for answering "Why is this flag false right now? Who was last working on it and for what project?" and keeping that code honest that it isn't just bit rot.)
Shoot, even when it's all first party tooling, I prefer a release-flags-and-binary as an atomic unit. If the flags and binary are going out as a single push, it simplifies a lot of things:
* reproducible state for given time with only one thing to track (running version), instead of two or more things (binary version, flag configs version)
* corollary: rollbacks are much more simple, because it's just one thing to rollback - the code+config+everything as an atomic unit, not "did we need to roll back the flags, or the binary, or both?" (Corollary to corollary: removes the awful problem of "oh shit the code and/or flags/configs wasn't backwards/forwards compatible" where you shoot yourself in the foot while doing a rollback
* removals and cleanups are easier: can remove flags, config bits, and code all at once, instead of having to do a careful dance of "set code behavior to default-on and remove dependency on flag, the wait for full deployment, then make sure everyone is okay with not rolling back after a given point, then remove flags and config logic"
* depending on your tooling, diffs vs prod during code review get cleaner
reply