Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

It is probably one of those little process changes to minimize chance of catastrophic failure. Sure, the risk of the daisy chained system going poof is low, but not zero. Instead, you should try to re-work your plans so you do not need to daisy chain.


sort by: page size:

I don’t know about that specifically, but I’ve seen that as a common occurrence as a cost cutting measure in low risk situations. In that case, the risk is low because 1) it’s not safety critical and 2) the switching moat is large enough to ensure they don’t lose customers

Sounds like you are cutting corners on redundancy. Saves $$$, but risky.

That does save a lot of money :-) The trick is managing the rate of change and the risk of disruption. If you manage it to no risk you end up changing too slowly, if you manage it to close to the risk you end up with unexpected downtime and other customer impacting events. Understanding where you are between no risk and certain doom really only comes with experience.

If there is no spec or process, then the risk that the system becomes a mess is higher - but on the bright side it’s a lot easier to fix.

It's risk avoidance to the point that that avoidance leads to new kinds of risks. The whole idea that you can architect yourself out of failure modes to the point that you no longer need to make backups is one that I see every other week or so and the number of companies out there that believes that because they have redundancies they don't need backups any more is staggering.

If the risk of the change was minimal, why would they not proceed?

How can you plan for things occurring out of your control? CF engineers are people as well. Things like this happen and there will be learnings to take out of it (like how to fail over faster)


And they can't afford even a tiny bit of redundancy?

Big companies tend to defer risk. Managers and project leads want to start new projects rather than upgrade existing infrastructure. Combine these forces and sometimes you get a catastrophe.


Your risk increases with more moving parts that you introduce into your product. It does not decrease.

Probably you are talking about redundancy where, I agree, it does go down.


The only way to optimize for lowest overall risk is to optimize for speed of change.

All the checklists in the world to prevent something from happening are fine and dandy until something happens anyway (which it will). And then they hamstring you from actually fixing it.

Instead, if you can move fast consistently, you can minimize the total downtime.


That's not the greatest risk to your project.

The obvious risk here is that you're combining a rush job with limited oversight, and large-scale distribution. So any mistake that would have been caught by the existing process will instead have the potential to roll into a global disaster.

When you're doing a project like this, you want less, not more risk.

Changing controller design would be a foolhardy adding of risk. Something very familiar is a great choice for it


The gamble is to have small, contained issues you can deal with in a timely manner, vs full scale propagated failures you'd have to deal with at the worse time ever.

It's like accidents during fire drills, they happen, yet it's worth doing all things considered.


Thanks for your response. So its all about designing the system and reduce the riskness involved in system interactions?

That’s a business continuity risk. Generally you want to abstract the business logic from infrastructure (code in a separate system or in escrow).

There's also risks inherent in a more complicated system.

You can engineer a more complicated system with the goal of avoiding downtime, but this added complexity may end up with unexpected corner-cases and cause a net decrease in uptime, at least in the short term.

It's often better to concentrate on improving mean time to repair (MTTR).


And then there are two reasons and you failed to discern which it is. Either:

1) thing is not essential, you saved the cost

2) a workaround that is inconvenient or costly exists, compare costs

3) you just caused an opportunity cost

In other words, you're breaking the process. Unless you're a process engineer and track both causes and results diligently you should not do that. Ever.


I'm not a company myself so it doesn't really make sense, although I do consider my parent's home as a backup if I need to leave my house for some reason.

But my company does everything we can in order not to have a single point of failure, however unlikely it is to fail. Because low probabilities of something still mean it can happen, and you don't want to allow that if you can avoid it.


Well, for some projects supply chain risks are better than no supply at all, so there's that.
next

Legal | privacy