Jump to content
FORUMS
Sign in to follow this  
Staff

An Engineering Update on the Dragonflight Launch

Recommended Posts

51634-engineers-workshop-developing-for-

The World of Warcraft Engineering Team has posted an update on the Dragonflight launch.

Blizzard Logo(Source)

With Dragonflight’s recent launch behind us, we want to take some time to talk with you more about what occurred these past few days from an engineering viewpoint. We hope that this will provide a bit more insight on what it takes to make a global launch like this happen, what can go right, what hiccups can occur along the way, and how we manage them.

Internally, we call events like last Monday “content launch,” because launching an expansion is a process, not one day. Far from being a static game running the same way it did eighteen years ago—or even two years ago—World of Warcraft is in constant change and growth, and our deployment processes change as well.

Expansions now consist of several smaller launches: the code first goes live running the old content, then pre-launch events and new systems turn on, and finally, on content launch day, new areas, quests, and dungeons. Each stage changes different things so we can find and fix problems. But in any large, complex system, the unexpected can still occur.

One change with this expansion was that the content launch was triggered using a timed event —multiple changes to the game can be triggered to all happen at a particular time. Manually making these changes carries the risk of human error, or an internal or external tool outage. Using a timed event helps to mitigate these risks.

Another change in Dragonflight: greatly enhanced support for encrypting game data records. Encrypted records allow us to send out our client with the data that the game needs to show cutscenes, share voice lines, or unlock quests, but keep that data from being mined before players get to experience them in-game. We know the community loves WoW, and when you’re hungry to experience any morsel, it’s hard to not spoil yourself before the main course. Encrypted records allow us to take critical story beats and hide them from players until the right time to reveal them.

We now know that the lag and instability we saw last week was caused by the way these two systems interacted. The result was: they forced the simulation server (that moves your characters around the world and performs their spells and abilities) to recalculate which records should be hidden more than one hundred times a second, per simulation. As a great deal of CPU power was spent doing these calculations, the simulations became bogged down, and requests from other services to those simulation servers backed up. Players see this as lag and error messages like “World Server Down”.

As we discovered, records encrypted until a timed event unlocked them exposed a small logic error in the code: a misplaced line of code signaled to the server that it needed to recalculate which records to hide, even though nothing had changed.

Here’s some insight on how that investigation occurred. First, the clock struck 3:00 p.m. PST. We know from testing that the Horde boat arrives first, and the Alliance boat arrives next. Many of us are logged in to the game on our characters sitting on the docks in both locations in one computer window, watching logs or graphs or dashboards in other windows. We’re also on a conference call with colleagues from our support teams from all over Blizzard.

Before launch, we’ve created contingency plans for situations we’re worried about as a result of our testing. For example, for this launch, our designers created portals that players could use to get to the Dragon Isles in case the boats failed to work.

At 3:02 p.m. the Horde boat arrives on schedule. Hooray! Players pile on, including some Blizzard employees. Other employees wait (they want to be test cases in case we must turn on portals.) The players on the boats sail off, and while some do arrive on the Dragon Isles, many more are disconnected or get stuck.

Immediately we start searching logs and dashboards. There are some players on the Dragon Isles map, but not many. Colleagues having issues report their character names and realms as specific examples. Others start reporting spikes in CPU load and on our NFS (Network File Storage) that our servers use. Still others are watching in-game, reporting what they see.

Now that we’ve seen the Horde boats, we start watching for the Alliance boats to arrive. Most of them don’t, and most of the Horde boats do not return.

A picture emerges: the boats are stuck, and Dragon Isles servers are taking much longer to spin up than expected. Here’s where we really dig in and start to problem solve.

Boats have been a problem in the past, so we turn on portals while we continue investigating. Our NFS is clearly overloaded. There’s a large network queue on the service responsible for coordinating the simulation servers, making it think simulations aren’t starting, so it launches more and starts to overwhelm our hardware. Soon we discover that adding the portals has made the overload worse, because players can click the portals as many times as they want, so we turn the portals off.

As the problems persist, we work on tackling the increased load to get as many players in to play as possible, but the service is not acting like it did in pre-launch tests. We continue to problem-solve the issue and discount things we know aren’t the issue based on those tests.

Despite the lateness in the day, many continue to work while others take off to get rest so they can return early the following day to get a fresh start and relieve those who will work overnight.

By Tuesday morning, we have a better understanding of things. We know we’re sending more messages to clients about quests than usual, although later discoveries will reveal this isn’t causing problems. A new file storage API we’re using is hitting our file storage harder than usual. Some new code added for quest givers to beckon players seems slower than it should be. The service is taking a very long time to send clients all the data changes made in hotfixes. Reports are coming in that the players who have gotten to the Dragon Isles playing have started experiencing extreme lag.

Mid-Tuesday morning a coincidence happens: digging deep into the new beckon code we find hooks for the new encryption system. We start looking at the question from the other side —could the encryption system being slow explain these and other issues we’re seeing? As it turns out, yes it can. The encryption system being slow explains the hotfix problem, the file storage problem, and the lag players are experiencing. With the source identified, the author of the relevant part of the system was able to identify the error and make the needed correction.

Pushing a fix to code used across so many services isn’t like flipping a switch, and new binaries must be pushed out and turned on. We must slowly move players from the old simulations to new ones for the correction to be picked up. In fact, at one point we try to move players too quickly and cause another part of the service to suffer. Some of the affected binaries cannot be corrected without a service restart, which we delay until the fewest players are online to not disrupt players who were in the game. By Wednesday, the fix was completely out and service stability dramatically improved.

While it took some effort to identify the issue and get it fixed, our team was incredibly vigilant in investigating the issue and getting it corrected as quickly as possible. Good software engineering isn’t about never making mistakes; it’s about minimizing the chances of making them, finding them quickly when they happen, having the tools to get in the fixes right away…

…and having an amazing team to come together to make it all happen.


—The World of Warcraft Engineering Team

  • Like 1

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Loading...
Sign in to follow this  

  • Recently Browsing   0 members

    No registered users viewing this page.

  • Similar Content

    • By Stan
      The Mage Tower is available again in Patch 10.0.5, so we're looking at all the rewards you can unlock!
      1. Guardian Druid Bear Form
      Guardian Druids can get a fel-themed Bear Form after completing the tank challenge. This is a recolor of the White Bear Form from the Legion Artifact that was removed. All Druids on your account will have access to this form once unlocked.

      2. Soaring Spell Tome Mount
      You will receive the Mage-Bound Spelltome mount from the A Tour of Towers achievement, which requires you to complete every unique challenge available at the Mage Tower.
      In the forgotten depths of the Dalaran library, this oversized tome was found flapping madly around the room. Open the pages of this arcane-infused book and hitch a ride to magical adventure.

      The character stands on the book while riding the mount.

      3. Armor Sets
      The Mage Tower Sets are recolors of Mythic Tomb of Sargeras Tier 20 sets. Every class has a different set that you can check out below. Some classes have two variations due to multiple chest armor pieces. I'm not sure if Evokers get a unique set as I don't main one.
      Death Knight

      Demon Hunter

      Druid


      Evoker
      ???
      Hunter

      Mage


      Monk

      Paladin

      Priest

      Rogue

      Shaman


      Warlock

      Warrior

      4. Tower Overwhelming Achievement
      Tower Overwhelming doesn't reward anything, but the achievement is quite the challenge for players who must complete every challenge at the Mage Tower across all classes and specializations.
    • By Stan
      The Lunar Festival has been updated in 2023 with the Elders of the Dragon Isles that you must earn for the meta-achievement. Check out our latest post to find out locations of the new Elders!

      For more details about the event, check out our Lunar Festival Guide. We've updated the guide for 10.0.5 and the changes should be deployed soon.
    • By Staff
      We have a bit of a dungeon availability switcheroo today, as Blizzard have fixed the Plaguefall issue and turned the dungeon back on, but also disabled Timewalking Court of Stars! 
      Dungeons (Source 1, 2)
      We’ve identified an issue causing some players to become trapped in the Legion Timewalking Court of Stars dungeon and we’ve temporarily removed the dungeon from the Timewalking dungeon finder. No one likes being stuck in Court.
      The Mythic+ version of the dungeon appears to be unaffected by this issue.
      We’ll let you know when we’ve got it fixed and opened again.

      ...
      With a hotfix, we’ve re-enabled the Plaguefall dungeon. The terrain issue some players were experiencing has been fixed.
      Thank you.
    • By Staff
      Blizzard have announced some big changes coming to Retribution Plaladins with the next content patch, with a rebuild of the talent tree from the ground up! Check out all the details below:
      Retribution (Source)
      Well met!
      We’re doing a substantial rework to the Retribution talent tree in patch 10.0.7. Rebuilding the Retribution talent tree from the ground up means significant changes ahead. However, the talent tree itself will not be available in our first PTR build(s), so initial testing on the PTR will be missing important context.
      Nonetheless, we’d like to share our major goals for Retribution with you.
      Button bloat: Retribution Paladins currently have a fairly large number of abilities, and some of these feel unnecessary and inefficient. We’re working to tone this down and have a more focused set of abilities and cooldowns.
      Stacking modifiers: There are currently several talents and abilities that provide stacking bonuses, which can make for a confusing and messy playstyle that deals extreme burst damage, but in return leaves your core abilities feeling unsatisfying and under-tuned. We intend to significantly change this. We want your core abilities to feel good and powerful when used in all situations. This doesn’t mean we’re reducing all elements of burst damage, but we do want to make burst more purposeful and deliberate, while also allowing options that provide sustained damage.
      Survivability: Retribution Paladins currently have one of the highest death rates across all forms of content. For a plate wearing class with hybrid healing, this doesn’t feel right. We’re looking to strengthen them through a mixture of passive bonus and improvements to active abilities and cooldowns, while keeping the degree of challenge that results from being a melee-based spec.
      Maneuverability: We recognize that there’s an issue here and we hope to make some improvements. However, we want to set clear expectations. Retribution Paladins should not expect to gain the mobility of Rogues.
      Utility: The changes that Dragonflight brought to Retribution have caused some conflict for players, forcing the Ret Paladin to choose between providing benefits to their groups or benefiting themselves. This choice can be interesting, but due to the way this has historically worked for Paladins, it hasn’t felt right in Dragonflight. So, we’re looking to make changes here that allow you to feel good and improve your own damage while also providing the group benefits that you’re used to. We’re also looking into additional utility improvements for Paladins during the 10.0.7 test cycle.
      We’re looking forward to hearing your feedback on the points above, as well as anything else about Retribution that you’d like to share. Thank you!
    • By Starym
      Plenty of class fixes arrive in today's batch, as well as a change to Bolstering, items, quests, professions and more - including a Wrath Classic fix as well.
      January 27 (Source)
      Classes
      Druid Restoration Fixed an issue that could cause Regrowth to sometimes not apply its direct healing after casting Nourish. Monk Mistweaver Fixed an issue that caused Awakened Faeline and Ancient Concordance to fail to apply in some instances while multiple Faelines were present. Paladin Retribution Lawbringer now correctly deals 8% of enemy maximum health in damage. Rogue The Outlaw 2pc set bonus now correctly interacts with Echoing Reprimand. Warlock Affliction Fixed an issue where Inevitable Demise was granting double benefit to Drain Life. Fixed an issue where Malefic Affliction was not being removed when casting Unstable Affliction on another target. Fixed an issue where Withering Bolt was exceeding the described damage increase cap when using Drain Soul. Fixed an issue where Malefic Rapture and Seed of Corruption consumed multiple applications of Cruel Epiphany. Dungeons and Raids
      Mythic+ Affixes Bolstering no longer applies to players’ summons. Items and Rewards
      The Ensemble: Tuskar Trader’s Leather Armor sold by Lontupit or Murik in Iskaara will no longer require Resilient Leather, which wasn’t showing on the vendor’s list of required items. Professions
      Alchemy Potion of Sacrificial Anima can no longer be used above level 60. The tooltip will reflect this in a future patch. Engineering The Magazine of Healing Darts embellishment will now only fire towards and be consumed by other players. Quests
      Valdrakken Players who have not yet started “Reviving the Machine” should now see its starting location displayed in Valdrakken’s map.
        Wrath of the Lich King Classic
      Classes Death Knight Sigil of the Vengeful Heart now increases Death Coil damage and Frost Strike damage by the correct amounts listed on the item’s tooltip.
×
×
  • Create New...