Long-time AWS customer Zynga is making great use of Amazon Aurora and other AWS database services. In today's guest post you can learn about how they use Amazon Aurora to accommodate spikes in their workload. This post was written by Chris Broglie of Zynga.
-Jeff;
Zynga has long operated various database technologies, ranging from simple in-memory caches like Memcached, to persistent NoSQL stores like Redis and Membase, to traditional relational databases like MySQL. We loved the features these technologies offered, but running them at scale required lots of manual time to recover from instance failure and to script and monitor mundane but critical jobs like backup and recovery. As we migrated from our own private cloud to AWS in 2015, one of the main objectives was to reduce the operational burden on our engineers by embracing the many managed services AWS offered.
We're now using Amazon DynamoDB and Amazon ElastiCache (Memcached and Redis) widely in place of their self-managed equivalents. Now, engineers are able to focus on application code instead of managing database tiers, and we've improved our recovery times from instance failure (spoiler alert: machines are better at this than humans). But the one component missing here was MySQL. We loved the automation Amazon RDS for MySQL offers, but it relies on general-purpose Amazon Elastic Block Store (EBS) volumes for storage. Being able to dynamically allocate durable storage is great, but you trade off having to send traffic over the network, and traditional databases suffer from this additional latency. Our testing showed that the performance of RDS for MySQL just couldn't compare to what we could obtain with i2 instances and their local (though ephemeral) SSDs. Provisioned IOPS narrow the gap, but they cost more. For these reasons, we used self-managed i2 instances wherever we had really strict performance requirements.
However, for one new service we were developing during our migration, we decided to take a measured bet on Amazon Aurora. Aurora is a MySQL-compatible relational database offered through Amazon RDS. Aurora was only in preview when we started writing the service, but it was expected to become generally available before production, and we knew we could always fall back to running MySQL on our own i2 instances. We were naturally apprehensive of any new technology, but we had to see for ourselves if Aurora could deliver on its claims of exceeding the performance of MySQL on local SSDs, while still using network storage and providing all the automation of a managed service like RDS. And after 8 months of production, Aurora has been nothing short of impressive. While our workload is fairly modest – the busiest instance is an r3.2xl handling ~9k selects/second during peak for a 150 GB data set – we love that so far Aurora has delivered the necessary performance without any of the operational overhead of running MySQL.
An example of what this kind of automation has enabled for us was an ops incident where a change in traffic patterns resulted in a huge load spike on one of our Aurora instances. Thankfully, the instance was able to keep serving traffic despite 100% CPU usage, but we needed even more throughput. With Aurora we were able to scale up the reader to an instance that was 4x larger, failover to it, and then watch it handle 4x the traffic, all with just a few clicks in the RDS console. And days later after we released a patch to prevent the incident from recurring, we were able to scale back down to smaller instances using the same procedure. Before Aurora we would have had to either get a DBA online to manually provision, replicate, and failover to a larger instance, or try to ship a code hotfix to reduce the load on the database. Manual changes are always slower and riskier, so Aurora's automation is a great addition to our ops toolbox, and in this case it led to a resolution measured in minutes rather than hours.
Most of the automation we're enjoying has long been standard for RDS, but using Aurora has delivered the automation of RDS along with the performance of self-managed i2 instances. Aurora is now our first choice for new services using relational databases.
- Chris Broglie, Architect (Zynga)
No comments:
Post a Comment