AWS Cloud Financial Management
How Coinbase Built a Cloud Center of Excellence to Optimize their Cloud Costs on AWS
Dr. Adam Link, Engineering Manager, Coinbase Cloud Center of Excellence and Schalk Theron, Director of Engineering, Coinbase Cloud Foundations contributed to this blog.
Introduction and Challenges
Coinbase is a secure online platform for buying, selling, transferring, and storing cryptocurrency. Their mission is to create an open financial system for the world and to be the leading global brand for helping people convert crypto into and out of their local currency.
In 2022, Coinbase sought to optimize their cloud computing costs as part of its scaling strategy to support the next 1 billion users globally. Coinbase launched a strategic initiative internally with a goal to achieve a reduction in costs in six months across their cloud vendors, including AWS, through financial and technical optimizations. This optimization would allow Coinbase to reinvest back into the business and innovate in new areas for their customers, such as increased reliability, lower latency, and increased localization of Coinbase as it grows its global footprint.
Defining Business Goals
The Coinbase Director of Infrastructure, Schalk Theron, led the project to ensure executive alignment across the company. He started by implementing spending guardrails to reduce inefficiencies and track optimization progress each week until the targets were met. The team identified the need for product level cost visibility and developed internal tools, such as an HAQM QuickSight dashboard, to provide deeper insight into cloud spending patterns.
A representative named a Product Group Cost DRI was empowered in each business unit to achieve the specific cost reduction goal assigned to their unit, and each of these representatives worked directly with a newly formed Coinbase Cloud Center of Excellence (CCoE). This combination of central and decentralized approaches enabled the reduction in cloud spending.
Evolution of the Coinbase CCoE
During the implementation of its cost reduction project, Coinbase realized that its use of AWS resources was based on outdated best practices given how long it had been operating. Recognizing outdated architecture as technical debt, which impacted its ability to ship the highest quality products quickly, Coinbase formed a CCoE with a mission to align Coinbase’s infrastructure with the latest guidelines from the AWS Well-Architected Framework.
The CCoE, led by Engineering Manager Dr. Adam Link, addressed Coinbase-wide cost optimization. Each business unit within Coinbase was assigned a dedicated CCoE engineer, typically a seasoned senior or staff-level software engineer with a strong cloud background, that worked to understand their particular workloads and apply the AWS Well-Architected Framework. Although the Cost Optimization pillar of the Well-Architected Framework was crucial to success, other pillars such as Performance Efficiency played an essential role. Coinbase’s CCoE focused on translating AWS cost guidance into actionable steps that product teams could implement and lead to overall efficiency.
The CCoE played a crucial role in driving change within the organizations by producing code and configuration artifacts for the product teams and providing explanations of specific changes and their impacts. They collaborated with the product teams on implementation and testing to ensure that Coinbase’s key services continued to function as expected. They also built tooling and made suggestions for centralized cost optimization, including Compute Savings Plans, HAQM Elastic Compute Cloud (HAQM EC2) Reserved Instances, and HAQM EC2 Capacity Reservations. While individual teams were able to achieve their own successes, the CCoE team, in partnership with their AWS Account Team, was able to assist them along the way and make centralized changes to infrastructure, resulting in new cost-optimized standards for Coinbase services as a whole.
Meeting Structure and Cadence
Coinbase divided the project’s tactical components into three phases: kick-off, easy wins, and sustained performance. During the kick-off phase, the CCoE and Coinbase’s AWS Account Team collaborated to identify technical cost savings through HAQM EC2 rightsizing, HAQM Relational Database (HAQM RDS) reconfiguration, and optimization of AWS managed services. In the easy wins phase, the CCoE partnered with AWS Professional Services and implemented centralized configuration changes that impacted the entire infrastructure. Lastly, during the sustained performance phase, AWS Professional Services and the CCoE worked with individual Coinbase business units to optimize application configurations for the cloud.
Coinbase’s AWS Account Team partnered with the CCoE to implement several optimizations early on and AWS brought in Specialist Solutions Architects for AWS services like HAQM EC2, HAQM RDS, HAQM ElastiCache, HAQM CloudWatch, HAQM Simple Storage Service (HAQM S3), and HAQM Managed Streaming for Apache Kafka (HAQM MSK) to achieve even greater company-wide cost optimization in the sustained performance phase.
Leveraging AWS Cost Optimization Services
Coinbase used many AWS tools for cost optimization to track and analyze its costs and build an optimization strategy. AWS Trusted Advisor was used to provide valuable recommendations for cost optimization and rightsizing across all AWS accounts. Through AWS Trusted Advisor Organizational View, Coinbase was able to easily gain insights across all of their AWS accounts. Trusted Advisor provided actionable suggestions for services like HAQM EC2, HAQM S3, HAQM Elastic Block Store (HAQM EBS), HAQM Elastic Load Balancing (HAQM ELB), and more.
Coinbase used AWS Compute Optimizer to provide an initial set of recommendations for HAQM EC2 rightsizing, which the CCoE used first to establish company-wide HAQM EC2 guidelines. Further service-specific recommendations were then carried out by Coinbase’s business units, guided by the CCoE.
AWS Cost Explorer helped Coinbase’s finance and CCoE teams to answer complex budget questions and address queries about cost anomalies and service-level trends. To provide a more Coinbase-specific and easily understood view of service costs for its engineers, AWS Professional Services implemented an HAQM QuickSight dashboard, leveraging HAQM Athena, to process the AWS Cost and Usage Report (CUR). This custom dashboard enabled Coinbase engineers to approach the cost optimization program effectively and use cost allocation tagging to identify workload-specific trends and patterns.
Delivering Results
These combined efforts resulted in significant cost optimizations across various AWS services.
For HAQM EC2, instance types were optimized to match Coinbase’s workloads, leading to better price to performance ratios. Coinbase re-engineered their tooling to become a multi-architecture company that can run workloads on Intel and Arm processors, with 25% of the Coinbase fleet running on AWS Graviton instances today. Additionally, HAQM CloudWatch Detailed Monitoring was enabled on certain EC2 instances to fine-tune rightsizing, autoscaling, and new instance type selections.
For HAQM RDS, single instances were migrated to HAQM Aurora instances, leveraging HAQM Aurora Auto Scaling and enabling Coinbase to utilize scale-down capabilities during low-load periods. Coinbase was also able to migrate their HAQM RDS instances to take advantage of the price-performance advantages of AWS Graviton processors.
For HAQM S3, Coinbase used lifecycle policies to migrate objects to different storage tiers based on their access frequency to reduce storage costs. It enabled HAQM S3 Intelligent-Tiering as the default storage option across the company, allowing objects to be automatically stored in appropriate storage classes for their expected access patterns over time — ensuring cost-effective storage without additional overhead.
For HAQM MSK, AWS Professional Services helped to horizontally scale in and vertically scale down Coinbase’s HAQM Managed Service for Kafka (MSK).
Additional cost savings were achieved through HAQM CloudWatch and HAQM CloudTrail rightsizing; rightsizing and upgrading HAQM ElastiCache clusters for improved Redis performance; HAQM EBS volume optimization by standardizing on GP3 volumes; and switching to HAQM DynamoDB Provisioned Capacity for high-IO workloads.
Conclusion
Coinbase was successful in achieving a significant reduction in cloud costs through a strategic collaboration with AWS and by leveraging the model of a CCoE. Within a six-month period, Coinbase was able to optimize their business and implement best practices to prepare for expected future growth. The process demonstrated that strategic cloud spend management can provide a competitive advantage — allowing for reinvestment of savings into the business and opening doors to innovation and new products for their customers. With cost savings realized, and an updated cloud architecture consistent with the latest AWS Well-Architected Framework, Coinbase can now expand globally to reach its next 1 billion users with the necessary reliability and scalability.