What could have been done to prevent the SimCity capacity issues.
My day job, in a nutshell.
Test the heck out of it!
The majority of us know that the time for a developer to get some of the most valuable data about how their game will be received is from play and beta testing. While the majority of the info will come in the form of testing out the functionality of the game, there is also a stress test phase that is introduced. This is that all important performance test which the developers hope will provide a sense of how their game will perform. However, as has been stated around the web, the test is flawed. Not only is it limited to the size of the beta tester pool, but the developer also has no way of enticing all of them to participate. For that reason, the beta stress test should only be one of the tests performed.
There are a number of tools available that assist with automated performance testing. The key to understand about these tools is that they are not a direct representation of a user. In other words, you don’t need a workstation with the game running on it for every virtual user you want to simulate in your test. Instead, the tools simply mimic the communication that the workstation would do with the server. It doesn’t need to be running the game to do this. It just needs to understand how the game would communicate. Once that is done, a tester is able to run hundreds of users off of a single machine, making them simulate those people logging into the game and playing it. IP spoofing can be used in order to make the server think the traffic is coming from multiple sources.
The ultimate goal is to use both of the above tools to determine whether the current infrastructure can support the expected usage. From EA’s point of view, the goal would be to try to predict how many of users would login on day 1. To do that, first they would need to understand how many pre-orders they have in place. Next, they would need to understand the percentage of users who do not pre-order but just purchase the game on day 1. Since SimCity was a brand new game, they would only be able to use statistics from another AAA title they already have, perhaps something like Crysis 3. If it turns out that Crysis had a 50/50 split, than they take the number of pre-orders for SimCity, double it, and use the above mentioned automation tool to see how their servers behave. While it might not be 100% perfect, it will give everyone a general sense of how far along the application is in being able to support high end volumes.
Once that test is complete, the game should go into beta and the stress test should occur. Even if that stress test only turns out to be 10% of the volume that was run through the automation tool, the behavior on the system should help in validating whether the simulation being used for the test was accurate. If it turns out that an average user uses the game differently than expected, the simulation needs to be updated to reflect that and the test re-run with updated pre-order numbers from that point. The last test done should increase volume by at least 20% to check for the worst case scenario. Once all of this is done, there is a level of certainty that can be established for how the game will perform.