Jump to content
FORUMS
Sign in to follow this  
Staff

Blizzard on Recent Diablo 2 Server Outages

Recommended Posts

HmqGvXI.jpg

Diablo 2 servers have gone through some connection issues lately. Blizzard clarified their causes and how the Diablo 2 team works on long-term fixes for the issues.

Blizzard Logo(Source)

Hello, everyone.

Since the launch of Diablo II: Resurrected, we have been experiencing multiple server issues, and we wanted to provide some transparency around what is causing these issues and the steps we have taken so far to address them. We also want to give you some insight into how we’re moving forward.

tl;dr: Our server outages have not been caused by a singular issue; we are solving each problem as they arise, with both mitigating solves and longer-term architectural changes. A small number of players have experienced character progression loss–moving forward, any loss due to a server crash should be limited to several minutes. This is not a complete solve to us, and we are continuing to work on this issue. Our team, with the help of others at Blizzard, are working to bring the game experience to a place that feels good for everyone.

We’re going to get a little bit into the weeds here with some engineering specifics, but we hope that overall this helps you understand why these outages have been occurring and what we’ve been doing to address each instance, as well as how we’re investigating the overall root cause. Let’s start at the beginning.

The problem(s) with the servers:

Before we talk about the problems, we’ll briefly give you some context as to how our server databases work. First, there’s our global database, which exists as the single source of truth for all your character information and progress. As you can imagine, that’s a big task for one database, and wouldn’t cope on its own. So to alleviate load and latency on our global database, each region–NA, EU, and Asia–has individual databases that also store your character’s information and progress, and your region’s database will periodically write to the global one. Most of your in-game actions are performed against this regional database because it’s faster, and your character is “locked” there to maintain the individual character record integrity. The global database also has a back-up in case the main fails.

With that in mind, to explain what’s been going on, we’ll be focusing on the downtimes experienced between Saturday October 9 to now.

On Saturday morning Pacific time, we suffered a global outage due to a sudden, significant surge in traffic. This was a new threshold that our servers had not experienced at all, not even at launch. This was exacerbated by an update we had rolled out the previous day intended to enhance performance around game creation–these two factors combined overloaded our global database, causing it to time out. We decided to roll back that Friday update we’d previously deployed, hoping that would ease the load on the servers leading into Sunday while also giving us the space to investigate deeper into the root cause.

On Sunday, though, it became clear what we’d done on Saturday wasn’t enough–we saw an even higher increase in traffic, causing us to hit another outage. Our game servers were observing the disconnect from the database and immediately attempted to reconnect, repeatedly, which meant the database never had time to catch up on the work we had completed because it was too busy handling a continuous stream of connection attempts by game servers. During this time, we also saw we could make configuration improvements to our database event logging, which is necessary to restore a healthy state in case of database failure, so we completed those, and undertook further root cause analysis.

The double-edged sword of Sunday’s outage was that because of what we’d dealt with on Saturday, we had created what was essentially a playbook on how to recover from it quickly. Which was good.

But because we came online again so quickly in a peak window of player activity, with hundreds of thousands of games within tens of minutes, we fell over again. Which was bad.

So we had many fixes to deploy, including configuration and code improvements, which we deployed onto the backup global database. This leads us into Monday, October 11, when we made the switch between the global databases. This led to another outage, when our backup database was erroneously continuing to run its backup process, meaning that it spent most of its time trying to copy from the other database when it should’ve been servicing requests from servers. During this time, we discovered further issues, and we made further improvements–we found a since-deprecated-but-taxing query we could eliminate entirely from the database, we optimized eligibility checks for players when they join a game, further alleviating the load, and we have further performance improvements in testing as we speak. We also believe we fixed the database-reconnect storms we were seeing, because we didn’t see it occur on Tuesday.

Then Tuesday, we hit another concurrent player high, with a few hundreds of thousands of players in one region alone. This made us hit another incident of degraded database performance, the cause of which is currently being worked on by our database engineers. We also reached out to other engineers around Blizzard to work on smaller fixes as our own team focused on core server issues, and we reached out to our third-party partners for assistance as well.

Why this is happening:

In staying true to the original game, we kept a lot of legacy code. However, one legacy service in particular is struggling to keep up with modern player behavior.

This service, with some upgrades from the original, handles critical pieces of game functionality, namely game creation/joining, updating/reading/filtering game lists, verifying game server health, and reading characters from the database to ensure your character can participate in whatever it is you’re filtering for. Importantly, this service is a singleton, which means we can only run one instance of it in order to ensure all players are seeing the most up-to-date and correct game list at all times. We did optimize this service in many ways to conform to more modern technology, but as we previously mentioned, a lot of our issues stem from game creation.

We mention “modern player behavior” because it’s an interesting point to think about. In 2001, there wasn’t nearly as much content on the internet around how to play Diablo II “correctly” (Baal runs for XP, Pindleskin/Ancient Sewers/etc for magic find, etc). Today, however, a new player can look up any number of amazing content creators who can teach them how to play the game in different ways, many of them including lots of database load in the form of creating, loading, and destroying games in quick succession. Though we did foresee this–with players making fresh characters on fresh servers, working hard to get their magic-finding items–we vastly underestimated the scope we derived from beta testing.

Additionally, overall, we were saving too often to the global database: There is no need to do this as often as we were. We should really be saving you to the regional database, and only saving you to the global database when we need to unlock you–this is one of the mitigations we have put in place. Right now we are writing code to change how we do this entirely, so we will almost never be saving to the global database, which will significantly reduce the load on that server, but that is an architecture redesign which will take some time to build, test, then implement.

A note about progress loss:

The progress loss some players have experienced is due to the way we do character locks both in the regional and global databases–we lock your character in the global database when you are assigned to a region (for example, when you play in the US region, your character is locked to the US region, and most actions are resolved in the US region’s database.)

The problem was that during a server outage, when the database was falling over, a number of characters were becoming stuck in the regional database, and we had no way of moving them over to the global database. At that time, we believed we had two options: we either unlock everyone with unsaved changes in the global database, therefore losing some progress due to an overwrite that would occur in the global database, or we bring the game down entirely for an indeterminate amount of time and run a script to write the regional data to the global database.

At the time, we acted on the former: we felt it was more important to keep the game up so people could play, rather than take the game down for a long period of time to restore the data. We are deeply sorry to any players who lost important progress or valuable items. As players ourselves, we know the sting of a rollback, and feel it deeply.

Moving forward, we believe we have a way to restore characters that doesn’t lead to any significant data loss–it should be limited to several minutes of loss, if any, in the event of a server crash.

This is better, but still not good enough in our eyes.

What we are doing about it:

Rate limiting: We are limiting the number of operations to the database around creating and joining games, and we know this is being felt by a lot of you. For example, for those of you doing Pindleskin runs, you’ll be in and out of a game and creating a new one within 20 seconds. In this case, you will be rate limited at a point. When this occurs, the error message will say there is an issue communicating with game servers: this is not an indicator that game servers are down in this particular instance, it just means you have been rate limited to reduce load temporarily on the database, in the interest of keeping the game running. We can assure you this is just mitigation for now–we do not see this as a long-term fix.

Login Queue Creation: This past weekend was a series of problems, not the same problem over and over again. Due to a revitalized playerbase, the addition of multiple platforms, and other problems associated with scaling, we may continue to run into small problems. To diagnose and address them swiftly, we need to make sure the “herding”–large numbers of players logging in simultaneously–stops. To address this, we have people working on a login queue, much like you may have experienced in World of Warcraft. This will keep the population at the safe level we have at the time, so we can monitor where the system is straining and address it before it brings the game down completely. Each time we fix a strain, we’ll be able to increase the population caps. This login queue has already been partially implemented on the backend (right now, it looks like a failed authentication in the client) and should be fully deployed in the coming days on PC, with console to follow after.

Breaking out critical pieces of functionality into smaller services: This work is both partially in progress for things we can tackle in less than a day (some have been completed already this week) and also planned for larger projects, like new microservices (for example, a GameList service that is only responsible for providing the game list to players). Once critical functionality has been broken down, we can look into scaling up our game management services, which will reduce the amount of load.

We have people working incredibly hard to manage incidents in real-time, diagnosing issues, and implementing fixes–not just on the D2R team, but across Blizzard. This game means so much to all of us. A lot of us on the team are lifelong D2 players–we played during its initial launch back in 2001, some are part of the modding community, and so on. We can assure you that we will keep working until the game experience feels good to us not only as developers, but as players and members of the community ourselves.

Please continue to submit your feedback to the Diablo II: Resurrected forum, report your bugs to our Bug Report forum, and for troubleshooting assistance, visit our Technical Support forum. Thank you for your ongoing communication with us across all channels–it’s invaluable to us as we work on these issues.

The Diablo community team will keep you updated on our progress via the forums.

  • The Diablo II: Resurrected Dev Team
  • Like 2

Share this post


Link to post
Share on other sites

I was a bit sad about losing over 30 levels and most of my endgame gear on Sunday evening, but it's not like my character got hit with a rollback on purpose. This is still my favorite game, and it can all be recovered in due time. I do appreciate the fact they're letting us know what's going on!

Edited by Draketh
  • Like 3

Share this post


Link to post
Share on other sites

Woah. This is actually a REALLY nice transparent post by the team. Makes me hopeful they plan to truly fix this. I thankfully haven't lost much of anything in these outages, and as long as it's cleaned up before ladder launch I'm okay with it.

  • Like 3

Share this post


Link to post
Share on other sites

Dev Teams always get appreciation from me if they just tell it like it is. I think most rational people are more forgiving of this when it's explained.

  • Like 1

Share this post


Link to post
Share on other sites
9 hours ago, Laragon said:

Woah. This is actually a REALLY nice transparent post by the team. Makes me hopeful they plan to truly fix this. I thankfully haven't lost much of anything in these outages, and as long as it's cleaned up before ladder launch I'm okay with it.

This is pretty much the reason they held ladder back, so they could fix all issues that came up at launch before they started the competitive side of it all.

Also really great to see this sort of communication, perhaps the WoW team could take notes?

Share this post


Link to post
Share on other sites

i was kinda expecting that, after all that nightmare with WC3 reforged, just honestly surprised that these problems didn't start right on day 1, communication with player base is commendable sure, i can see improvements lately in that direction, but then again issues like that shouldn't be happening in first place. They should prevent problems from happening instead of waiting till it actually occurs. Putting minimum resources for maximum profit is every company plan i guess, but what about lost hours upon hours of wasted game time ? People having huge rollbacks is discouraging, after all it's not a f2p for god sake.

Share this post


Link to post
Share on other sites

Really interessting to see what going on and this shows that its more complex than simply "Fix your damn 20 Year old game".

Those kind of posts makes it much more understandable for us.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Staff
      Blizzard have detailed everything about the sixth ladder season of D2:R, with the new addition coming on the 22nd! We have regional launch times, the different modes, and an explanation on what happens to shared stash items for seasonal character after it ends, so let's take a look.
      Season 6 (Source)
      The lords of Hell await your return, traveler.
      Countless demons have met their timely demise at the end of your weapon, yet their onslaught doesn’t relent. Sanctuary needs its savior! Return on February 22 to wreak havoc among the Burning Hells’ ranks in Ladder Season 6!
      Ladder Season 6
      Our previous Ladder Seasons for Diablo II: Resurrected have showed just how adept adventurers were at saving Sanctuary, racing to be the first to cement their names on the Leaderboard, while striking fear into the tormented hearts of their demonic adversaries in the process.
      Ladder Season 6 will usher in a fresh opportunity for brave adventurers to race to Level 99 and amass powerful loot along the way.
      Ladder Season 6 Launch Timing:
      North America February 22, 05:00 p.m. PST
      Europe February 22, 02:00 p.m. CET
      Asia February 23, 10:00 a.m. KST
      As with previous seasons, Ladder Season 6 will feature different modes:
      Version Description Pre-Expansion Ladder The standard version of Ladder play that encompasses only the original four acts. Pre-Expansion Hardcore Ladder The hardcore (only one life) version of Ladder play that encompasses only the original four acts. Ladder The standard version of Ladder play that encompasses five acts, as it includes the “Lord of Destruction” expansion content. Hardcore Ladder The hardcore (only one life) version of Ladder play that encompasses five acts, as it includes the “Lord of Destruction” expansion content. For those new to Ladder Seasons, you might be wondering what happens to your Shared Stash loot once the current season ends.

      Once Ladder Season 5 ends, all Ladder characters will be transferred to their respective non-ladder group. All items in that character group's Shared Stash go into a new Withdraw Only set of Shared Stash tabs, denoted by a check marked Past box. Any items from Ladder Season 4 stored in the Withdraw Only tabs will be lost at this time. You will have all of Ladder Season 6 to withdraw any items you would like to keep from Season 5. When Ladder Season 6 ends, this withdraw only set of Stash Tabs will be overridden with any items in the Ladder Season 6 Shared Stash. Be sure to get any items out before then or they will be forever lost!
      May your Ladder race be filled with fiends to eviscerate and bountiful loot.
      -The Diablo II: Resurrected Team
    • By Staff
      From December 12 through January 3, the 22 Nights of Terror holiday event will be available in Diablo II: Resurrected.
      (Source)
      The air stirs with whimsey, perhaps even magic, adventurer.
      However subtle, a joyous invocation permeates Sanctuary, casting a ray of holiday celebration upon its inhabitants. . . or is this demonic trickery? This magic from beyond is not cause for a ceasefire in the unremitting conflict between demon and mortal, but its nuances can still be felt, disturbances are surely lurking about. Seeing this holiday miracle through to the end relies on you—sleigh your way through 22 Nights of Terror during the Diablo II: Resurrected holiday event.
      Experience Daily Demonic Trickery
      The magic constantly shifts day by day but remains pervasive in nature—even Sanctuary’s most esteemed scribes are unable to predict what abnormalities the future might hold.
      From December 12, 11:00 a.m.–January 3, 11:00 a.m. PST the fabric of Sanctuary will be altered every 24 hours, revealing a new gameplay modifier via the in-game Message of the Day that will last for the next 24 hours. The 22 Nights of Terror holiday event can be enjoyed in all online game modes except for Classic Diablo II.
      Powerful loot, a dash of terror, and a new surprise to experience each day awaits you in Sanctuary this holiday season. Go forth in glory!
      Happy holidays!
      -The Diablo II: Resurrected Team
    • By Staff
      A new season of D2:R has started! You can see the official post below and you can also check out all our guides for it here or head to one of the guides below.
      Diablo 2 Recommended Starter Builds Hammerdin Blizzard Sorceress Trapsin Summoner Necromancer Lightning Fury Javazon Wind Druid Meteorb Sorceress Smiter Paladin Bone Necromancer
        Farming Guides Andariel Run Guide Baal Run Guide Chaos Run Guide (Diablo) Mephisto Run Guide Cow Level Run Guide Lower Kurast Run Guide
        Other Guides PvM Rankings Runes   Season 5 (Source)
      We knew you would return traveler…and so did Hell’s minions.
      Countless demons have met their timely demise at the end of your weapon, yet their onslaught doesn’t relent. Sanctuary needs its savior! Return on September 28 to wreak havoc among the Burning Hells’ ranks in Ladder Season 5! Read further to glean pertinent information about the Season.
      Ladder Season 5

      Our previous Ladder Seasons for Diablo II: Resurrected have showed just how adept adventurers were at saving Sanctuary, racing to be the first to cement their names on the Leaderboard, while striking fear into the tormented hearts of their demonic adversaries in the process.
      Ladder Season 5 will begin on September 28, ushering in a new opportunity for brave adventurers to race to Level 99 and amass powerful loot along the way. We cannot wait to see which determined souls carve their name into the Leaderboard this time.
      Ladder Season 5 Launch Timing:
      North America September 28, 2:00 p.m. PDT
      Europe September 28, 11:00 p.m. CEST
      Asia September 29, 6:00 a.m. KST
      As with previous seasons, Ladder Season 5 will feature different modes:
      Version Description Pre-Expansion Ladder The standard version of Ladder play that encompasses only the original four acts. Pre-Expansion Hardcore Ladder The hardcore (only one life) version of Ladder play that encompasses only the original four acts. Ladder The standard version of Ladder play that encompasses five acts, as it includes the “Lord of Destruction” expansion content. Hardcore Ladder The hardcore (only one life) version of Ladder play that encompasses five acts, as it includes the “Lord of Destruction” expansion content. For those new to Ladder Seasons, you might be wondering what happens to your Shared Stash loot once the current season ends.

      Once Ladder Season 4 ends, all Ladder characters will be transferred to their respective non-ladder group. All items in that character group's Shared Stash go into a new Withdraw Only set of Shared Stash tabs, denoted by a check marked Past box. Any items from Ladder Season 3 stored in the Withdraw Only tabs will be lost at this time. You will have all of Ladder Season 5 to withdraw any items you would like to keep from Season 4. When Ladder Season 5 ends, this withdraw only set of Stash Tabs will be overridden with any items in the Ladder Season 5 Shared Stash. Be sure to get any items out before then or they will be forever lost!
      May your Ladder race be filled with fiends to eviscerate and bountiful loot.
      -The Diablo II: Resurrected Team
    • By Staff
      It's time for some Diablo 2 news as Blizzard just announced Ladder Season 5 coming on September 28!
      (Source)
      We knew you would return traveler…and so did Hell’s minions.
      Countless demons have met their timely demise at the end of your weapon, yet their onslaught doesn’t relent. Sanctuary needs its savior! Return on September 28 to wreak havoc among the Burning Hells’ ranks in Ladder Season 5! Read further to glean pertinent information about the Season.
      Ladder Season 5

      Our previous Ladder Seasons for Diablo II: Resurrected have showed just how adept adventurers were at saving Sanctuary, racing to be the first to cement their names on the Leaderboard, while striking fear into the tormented hearts of their demonic adversaries in the process.
      Ladder Season 5 will begin on September 28, ushering in a new opportunity for brave adventurers to race to Level 99 and amass powerful loot along the way. We cannot wait to see which determined souls carve their name into the Leaderboard this time.
      Ladder Season 5 Launch Timing:
      North America September 28, 2:00 p.m. PDT
      Europe September 28, 11:00 p.m. CEST
      Asia September 29, 6:00 a.m. KST
      As with previous seasons, Ladder Season 5 will feature different modes:
      Version Description Pre-Expansion Ladder The standard version of Ladder play that encompasses only the original four acts. Pre-Expansion Hardcore Ladder The hardcore (only one life) version of Ladder play that encompasses only the original four acts. Ladder The standard version of Ladder play that encompasses five acts, as it includes the “Lord of Destruction” expansion content. Hardcore Ladder The hardcore (only one life) version of Ladder play that encompasses five acts, as it includes the “Lord of Destruction” expansion content. For those new to Ladder Seasons, you might be wondering what happens to your Shared Stash loot once the current season ends.

      Once Ladder Season 4 ends, all Ladder characters will be transferred to their respective non-ladder group. All items in that character group's Shared Stash go into a new Withdraw Only set of Shared Stash tabs, denoted by a check marked Past box. Any items from Ladder Season 3 stored in the Withdraw Only tabs will be lost at this time. You will have all of Ladder Season 5 to withdraw any items you would like to keep from Season 4. When Ladder Season 5 ends, this withdraw only set of Stash Tabs will be overridden with any items in the Ladder Season 5 Shared Stash. Be sure to get any items out before then or they will be forever lost!
      May your Ladder race be filled with fiends to eviscerate and bountiful loot.
      -The Diablo II: Resurrected Team
    • By Staff
      The Classic Diablo 2 ladders will reset on June 22 with a 4-hour downtime starting at 11 am PT.
      (Source)
      Hello -
      We will be resetting the Classic Diablo II ladders on June 22, 2023 at 3pm Pacific Time / 6pm Eastern Time.
      There will be a 4 hour downtime starting at 11am Pacific Time / 2pm Eastern Time for this reset.
      Please note this is for the Classic/Legacy client and not for the Resurrected client of Diablo II.
×
×
  • Create New...