Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Shout Out to the Server Teams (stevestreza.com) similar stories update story
73.0 points by siglesias | karma 3733 | avg karma 5.89 2012-05-16 04:52:32+00:00 | hide | past | favorite | 44 comments



view as:

I'm very curious what it is in Diablo that's causing the issues. It seems a bit odd to me that they're experiencing issues this severe for two reasons:

1) Diablo was developed from the fairly unusual position of knowing it would face a launch to millions of users before development ever started. I'd have imagined having this "scale it to infinity" mentality from the starting line would have helped a lot.

2) The whole game is conceptually VERY easy to shard. Unlike WoW, there is very little interaction between players that are not in a party together, and the maximum party size is four.

I wonder whether the failures have anything to do with the achivement tracking/broadcasting. It's the only component I can think of that breaks out of these obvious sharding boundaries, and I can kind of imagine how large friend lists might cause problems. Additionally, it seems achievement progress was lost from the time leading up to one of the downtimes.

I know it's easy to speculate from here, and there are probably very legitimate reasons for all of this. As an outsider, it seems like these particular failures are things that, in general, just happen. Still, I would have expected Blizzard in particular to be better prepared for this. It's a bit surprising.


No idea if this stands true, but did just see this on Reddit: http://i.imgur.com/efw0N.png

I saw that too. I tend to mistrust anything originating on 4chan, but it would explain the authentication server blowing up.

I'm looking forward to reading (hoping they release) a post-mortem on this.


It's odd to me too. They had a great deal of pre-order information, something that most other server teams that face crazy launches don't ever have. I would guess that they could at least estimate the load to an order of magnitude. The fact that single player can't even be accessed is absurd, something that your comment on sharding hits on. The server teams at Blizzard should also have had some experience with this in their WoW trials and errors.

All it takes is one cluster falling over due to one unexpected issue stemming from unprecedented, more-or-less unreproducible simultaneous load to take a service down, especially if it removes the ability to log in.

Distributed services under very heavy load are susceptible to all the same small bugs due to all the normal mistakes every developer makes, except it's much easier for those bugs to cause catastrophic failures!


I know all about servers being under so much load that everything falls apart. Working at a nascent mobile ad network whose traffic doubles every month, and whose monthly number of requests amount to 10 figures, I know all about it.

And yet… I feel like Blizzard could have made an effort to make its single player game run offline. The multiplayer is fantastic, but give us something to fallback on.


this!

It's essentially the Ubisoft Route (Assassins Creed 2 et al) that everybody was incredibly critical of, and they actually removed it by now.

But when golden boy Blizzard pulls shit like that, suddenly everybody is like "Awwww, yeah Diablo!" and I see only few criticisms about the always-online system. (compared to Ubisoft)

One of the reasons I heard for the missing offline single player was "Cheating vs. Real Money Auctionhouse". This seems fallacious to me. Just provide an offline mode that is completely separate from the online part and also separate from the Auctionhouse.

shrug


Blizzard very intentionally made it online only to combat piracy. Not only that but I read they have units/loot/and map layouts generated server side to make it much more difficult for the crackers to release something playable. The servers are doing a lot of work, so it's not surprising in the slightest that they are having lot's of launch day issues.

It's a pretty sad case of putting business concerns over user experience. If it wasn't an anti-piracy thing they would have happily made a offline mode because it would drastically reduce the server load and all the support and development costs that massive multiplayer games have.


> The servers are doing a lot of work, so it's not surprising in the slightest that they are having lot's of launch day issues.

If it's not surprising in the slightest, then they should have planned to scale.


It is probably too expensive (development wise) to scale for this many concurrent users as it will only happen once (or twice in case of expansions) during the whole lifetime of the game.

"Expensive" is incurring a tremendous amount of bad PR for a game on launch day. "Expensive" is having the launch of one of your flagship games be forever used as the butt of a joke. "Expensive" is turning your own customer base against you by having people who took days off work or stayed up until well after midnight to play one of the most anticipated games of the last decade being thwarted in their attempts. "Expensive" is people deciding to put off their purchases, perhaps forever, because they see the problems people are having. "Expensive" is having your brand reputation dented to such a degree that it affects future sales of all of your games.

That's expensive.

Compared to that servers are cheap.


Ya but sometimes you just don't know who's in charge of making these decisions. I wouldn't be surprised if someone came out and said this was a calculated business decision. Nothing surprises anymore.

What do you think this incident reads like in Vivendi's annual report? I'm thinking "We made a few more mountains of money with the enormously successful release of Diablo 3. Fans love it and monetization is six times previous records for the series a per-copy-sold basis or 200 times higher per copy played."

WoW also had launch issues. Players complained. Money hats were made.


Money hats: Blizzard is making them right now.

But yes, if anyone realistically thinks these server issues significantly alter sales, they must have also forgotten the SC2, WoW, DiabloII launch.


In the near term Blizzard isn't going to be going out of business, nor is it going to have a shortage of money hats. But make no mistake, this is a serious issue and it has tarnished their reputation. They still have plenty of excess reputation at the moment but if they continue to take a cavalier attitude towards customer satisfaction then there will be another incident like this, and another, and another, until it really starts hurting their bottom line in a way they can't ignore.

Did it really tarnish their reputation when SC2, WoW, or DiabloII launched? It's such a transient thing that seriously suspect that they could do this forever and never effect sales in the slightest.

I can't stress enough how I admire and respect server/dev ops people. Their job is among the hardest and people definitely overlook their importance way too often. I wouldn't even know how I would go about finding them

I kind of take issue with the "No amount of load testing could adequately prepare the server team behind Diablo 3 for firepower of this magnitude".

If your load-testing does not prepare you for the worst then your load-testing plan is garbage.

They already knew how many copies had been pre-ordered and could make a pretty good guess how many copies would be sold and activated on the first night. Take that worst case estimate, now double it and test for that.

"But the cost of supporting worst-case scenarios!!" some may cry. This is where rented servers / cloud setups are useful for elastic scale without breaking the bank.

There are companies that can simulate load from users across the globe, I have no doubt that blizzard would have the connections / influence /cash to set-up a kick-arse load-test system.


Can I ask why this got downvoted?

I'd think there are some discussion-worthy points here. For instance, I'm not proud of it but at a former employer I talked them out of load testing an app, which then had serious scaling issues. I learned my lesson the hard way.


Behaviorally there will be differences between how your users act and how your load test plan is executed. A small difference in the two can cause problems you didn't expect. That's the point I was trying to make.

For some things I can agree with you, usually web-apps with many possible branching execution paths where it is hard to know exactly what will be the most common use case to test for.

However in this case what can the users do that can be different to their plan? The most basic example would be logging in, there are not that many execution paths I can think of for that one.

As for the integration / reporting during gameplay, this is all done via a strictly defined API which is called by the game client in relatively predictable ways (game started, player does X, authentication ping every x seconds, achievement unlocked), unlike a web-app where users can do whatever they want the API usage flow is basically controlled by Blizzard.

That is why I am not impressed with blizzard on this one; they control basically the entire use of this API apart from one thing, the number of users trying to use it at any one time, which is there the load testing plan should have worked.


Shout out to the dev teams who create buggy services that crash all the time. You keep the server teams employed.

Really? How shallow thinking is that? I'm a sysadmin and our most problems are not buggy software related.

Another sysadmin reporting in. OP doesn't have a clue.

This whole episode has been a massive face plant for Blizzard.

Consider: they have access to all of the sales data so they know how many copies of the game could potentially be played at launch.

They ran an open beta so they should have a good idea of how everything scales relative to total simultaneous user count.

They have extensive experience with all aspects of patching, scaling, and server operations through World of Warcraft and Starcraft II.

They intentionally decided to go with a "single player" experience that required connectivity and incurred server load.

Given all of that, there really is no excuse to fail as hard as they did on launch day. It is 2012, the standards are pretty high for getting things right with digital distribution and with online games. More so, if you make a bold decision to force connectivity for a single player game you damned sure better get it right or you are going to destroy your credibility.

Blizzard is enormously lucky that they have a very strong history of compelling games, these sorts of issues could easily cause an upstart game studio to go out of business.


Couldn't they have sold more digital copies than anticipated? I know I didn't preorder (not like I was worried about stock outs or anything), just bought it in the morning when it came out.

DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU DEAT NOOBS HN CENSORSHIPS YOU

> This whole episode has been a massive face plant for Blizzard.

It was a stupid decision and I'm sure there were many angry fans, but if I've learned anything from spending too much time on Reddit for the past couple of years it's that people will forget this pretty quickly.

Take Football Manager 2009 as an example. In order to halt piracy Sports Interactive added an online authenticator for all installs, and to cut a long story short for the best part of three days the authentication server was down due to load, and the only people playing the game happily were the growing number of people who pirated it. After a month or two, it was completely forgotten.

Yes, if a poor developer makes mistakes they'll suffer, but Blizzard is loved by all and all will be forgotten when the fans get their hands on sweet, sweet Diablo action...


Given how much people love Blizzard, I wonder if they really care /that/ much about launch day issues?

They have massive amounts of experience in this field, I'm sure they had the capability to make launch day run much smoother, so why didn't it? Perhaps they thought 'unprecedented demand for new game forces it temporarily offline' sounds like a nice headline in the paper.

It's just one day after all, and all the people who play on launch day have already spent their money, and probably aren't really the type to get a refund.


People's love of Blizzard makes it worse, not better, when there are problems.

I was at the London launch event Monday night (working not buying), the first person in the queue had been camped there since Saturday lunchtime. He said that by the time he got his copy signed at 11pm he had time to get a train home, get to work with ten minutes to spare, work a twelve hour shift, then go home and spend the night playing the game.

A lot of people care a lot about games like this, and that includes how soon they can play it.


In an MMO, a day can make a big difference in who gets the best stuff first. So it can change the experience of hardcore players.

FWIW: I loved Blizzard in the past. I didn't buy into SC recently. I didn't like the decisions taken, stripping one good game into three, adding weird rules to the TOS that just cry out "I want to milk fans" (think: LAN games, tournaments).

For WoW they were stoned for a while and considered a real name policy in their battle.net system..

Now Diablo comes along. Blizzard isn't special for me anymore and always online games get -100 points by default. For now I guess I'll pass (with SC I don't regret it one bit) and go back to Warcraft III again.

Blizzard of today is to me what the 'new' Star Wars episodes were: Bringing joy to lots of people, but disappointing for me.


Yea SRE people are just fine. But whoever decision maker decided it was a good idea to run a single player game online just need to get a clue. And don't worry, people who pirate will use a server emulator as they've done for every previous such protection.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

Trolls kill HN.

HN is ignorance.

HN is ignorance.

HN is ignorance.

HN is ignorance.

HN is ignorance.

HN is ignorance.

HN is ignorance.

HN is ignorance.

HN is ignorance.

HN is ignorance.

Hackers are dead.

Hackers are dead.

Hackers are dead.

Hackers are dead.

Hackers are dead.

Hackers are dead.

Hackers are dead.

Hackers are dead.

Hackers are dead.

Hackers are dead.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Staartup Junkies are just greedy cunts.

Long Live Neckbeards.

Long Live Neckbeards.

Long Live Neckbeards.

Long Live Neckbeards.

Long Live Neckbeards.

Long Live Neckbeards.

Long Live Neckbeards.

Long Live Neckbeards.

Long Live Neckbeards.

Long Live Neckbeards.

Long Live Neckbeards.


My favorite Blizzard launch story actually involves Microsoft.

Years ago, before the days of the cloud and well-understood fail over mechanisms, a very enterprise-y product happened to share datacenter space with Blizzard. One fine day, Blizzard shipped an update to WoW and from what I hear, it took down networking across the DC and left everyone scrambling.

Try explaining to your customers that your business critical service just went down because Azeroth got a new continent.


Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Truth will be downvoted.

Long live neckbeards. Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.Long live neckbeards.


Do you think HN is a free society ???

Do you think HN is a free society ???

Do you think HN is a free society ???

Do you think HN is a free society ???

Do you think HN is a free society ???

Do you think HN is a free society ???

Do you think HN is a free society ??? Do you think HN is a free society ???

v

Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ??? Do you think HN is a free society ???


Dear noobs.

Your content will never make it to the front page. Censorship.

HN is here to promote only their needs.

Join me.

Spam the shit out of this site.

Only after destruction can neckbeards be resurected.


Strange to see them called "server teams". Devops - maybe. Devs - someone's got to fix the actual code issues. Ops - if it's a platform configuration issue. But whenever I read "server team", I'm thinking of the DC ops racking the actual hardware.

Is it a common name for devops in other companies?


I wasn't writing this for the Hacker News audience, I wrote it so it could be understood by anyone. In this context I used "server team" to refer to the umbrella of devops (which is opaque industry jargon).

+1 for server teams! sysadmins never get appreciated enough.

However, as far as the Diablo 3 launch goes, some thoughts:

+ Blizzard have been doing this for years

+ They know how many people have pre-ordered and pre-installed the game

+ The game is singleplayer, yet they decided upon this online requirement (no offline play, thanks Blizzard)

All in all it's pretty annoying to purchase something and not be able to play it because they simply haven't upgraded their infrastructure for the load they should have expected.


Does anyone have any information on what their infrastructure looks like? What do they use to manage their servers? I guess a lot of this information would be "proprietary" but scaling something this large would be a great read. I manage about 1000 non-critical (think kiosks) servers, and deploy the code to them; and it is relatively painless, I would be curious to know how the big boys do things.

Legal | privacy