Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

It's ok. Don't worry. I lost $250k during Covid because of similar lack of due diligence.

I made most of them back YOLO-ing TQQQ, but like you, I was having cold sweats for several nights.

Thankfully, there is so much money in tech industry that you can still recover from mistakes like this.



sort by: page size:

Your CEO was correct. He should have also said the same thing to the guy who cancelled backups as well...and the guy who never put in place and periodically tested a disaster recovery plan. So much fail in this story, but mistakes happen and I've had my share as well.

I once (nah twice) left a piece of credit card processing code in "dev mode" and it wasn't caught until a day later costing the company over 60k initially. Though they were able to recover some of the money getting the loss down to 20k. Sheesh.


The contracts absolve NASDAQ of liability in calamities of this nature (my business was almost destroyed -- had to take a 250K loss -- thanks to an order entry server failure), but it doesn't make you feel better.

Having had something similar happen to me, I know how much it sucks. You accept its a possibility, but don't really believe it will happen until it does.


My feels go out to the ops folk at Cloudflare. Mistakes happen no matter how many years of experience people bring in, or how much they're paid. We're all humans after all. It must be a pressurizing task to be responsible for potentially millions of dollars of losses during this downtime.

I hope the issue is resolved soon and if a person caused it, they're not in too much trouble.


Don't feel bad about this. The decision to launch without backups was a legitimate business gamble. Blaming an engineer for the outcome of a business decision is plain bad management.

Back in the day $440M loss due to coding error was a landmark warning case. How could this happen??

In 2021 alone something like $10B was lost due to bugs in defi land.

Something about the worst possible thing could happen tends to happen eventually and it gets worse every passing year.


Perfect storm for the company I was working for. Had their worst fourth quarter in company history. Purchased another company in January as part of a pivot to a more profitable business model. They were a bit desperate to get the deal done, probably didn't even consider what was occurring in China at the time. They had significant operations in Southeast Asia. As far as I knew zero planning had been taking place until it was too late.

Essentially all revenue generating operations halted as of March 16th.

I was unfortunately leading the charge in a newly created division. We were not generating revenue yet, so the hammer fell for all of us.

Stay happy, stay safe, and keep hacking!


I feel terrible for the people who worked on Boxopus, but I find it hard to believe this very outcome wasn't evaluated during basic risk management.

There have been no shortage of companies built 100% on someone else's platform and the realities surrounding what happens when your access is turned off.


That's a fair question. Loss of a couple hundred thousand US dollars during an 8 hour window where a poorly performing SQL query during the payment pipeline caused purchases to hang. Indexes are great things... if you use them.

They lost $440 million (and amount greater than their market cap), and possibly the company, on what the world knows to be incompetence.

At some point if I couldn’t stop it - I’d be tempted to just kill the power to the server rooms, all of them. There just has to be a way to cut your losses.


It is hard to measure the potential damage. Did we lose users/customers? We don't know. We certainly didn't gain German devs on board. Did we lose investors? We got ourselves a great investment ( $500K), but it may hurt future rounds. This is a "better be safe then sorry" decision in the end I think.

No big deal -- their backup was severely unprepared to handle live transactions. It's not like these guys leaked private information or otherwise lost people money.

It was partially noted in past threads, and partially missed, but this 20 year old didn't actually lose the money he thought he lost. It was basically a "leaky abstraction" of the backend that bubbled up. Their total lack of any real customer support since it's not "scalable" also contributed to this tragic adn avoiadable death.

I wish I could agree with you but the details provided in the post do not help us understand what happened.

We only get very superficial information by one of the rare companies that could typically contribute and help the community by sharing what really went wrong.

Right now, I'm in a situation that forces me to speculate (in addition to reading all the speculation comments below) on whether or not I could do the same mistake than SO did, and that terribly saddens me.


Devs are fine. We lost all our QE.

Yes it's inexcusable. But If you've already dropped $1300+, do yourself a favor and drop $13 to spare yourself a couple of weeks of inevitable downtime.

That's what happens when you join a trillion-dollar company. Somebody lets it slip that the service might not be reliable and suddenly you lose a billion dollars on the stock market.

Let me get this straight.

- Tens of thousands of paying customers

- No backups

- Working in a production database

- Having the permissions to empty that table

- Even having read access to that table with customer info...

You are hardly responsible. Yeah you fucked up badly, but everyone makes mistakes. This was a big impact one and it sucks, but the effect it had was in now way your fault. The worst-case scenario should have been two hours of downtime and 1-day-old data being put back in the table, and even that could have been prevented easily with decent management.


Yikes.

But I wouldn't place too much blame on the companies - COVID wrecked a lot of supply chains as well - as anyone who runs emulators on RPis will be able to confirm.


No one gave me a figure, but the product going down does have a direct impact on customer revenues. So, while no one actually gave me direct numbers, it was felt quite a bit. The most notable thing from the incident was where certain account managers in Japan had to do formal mea culpas because of this mistake. So, in other words, they were on calls, but I got back just "you can't have this happen again - do something." Could someone else at least do code review? I was depressed that I got a lot of "well, we don't know the code base" - the issue was lost in the shuffle also with management. So, I just grew more pessimistic about process and risk. It's not healthy and I would not bottle it up now, but I did, and that was my mistake. I ended up hating that project, but at least I got to train someone else to work on it (who has a team around him.)
next

Legal | privacy