Maybe the server was responding with a 200, but something deeper in the service just wasn't working. I expect these things are complicated and a status page is just an approximation.
Yes but in that case they took at least 40 minutes before notifying customers and posting on their status blog... I had to contact them on the live chat to be able to know what was happening...
And they could have a script that directly posts to the status page something like "A lot of our servers are down, we are working on it please be patient" when they detect some problem wouldn't take long...
I personally do not believe this page because I've viewed in the past when Reddit was down and it the status showed just that. However, once whatever was wrong got fixed and their system came back up, all those error statuses just disappeared and availability went back to 100% (or whatever) like nothing ever happened.
What's the point of keeping track of these things if downtime is going to be quietly sanitised away after the fact.
Yeah, that status page is really frustrating. Didn't anyone consider building it to REALLY check if services were working? not just returning a 200 Status... but actually DOING things?
To be fair my information was not accurate. It was fast but when I said it was a problem with our "backbone" I was wrong (it was a networking problem but not the backbone). I favour speed over accuracy here, but the status page wants to be fast and accurate.
I think we should applaud this when it actually happens. Far too many services are terrible for this. While I'd rather the service not be down, it makes me feel a bit better if they don't lie about it.
> Status page was throwing 500's in the first 10 minutes of outage but seems to have recovered.
It's still doing that.
Serving static images (and who knows what else) with "Cache-Control: max-age=1" is probably not a good idea if a million frustrated customers are going to reload your status site every minute.
reply