Amazon Web Services (AWS) has significantly changed the IT world as more and more companies are adopting AWS for hosting their QA, Dev, Stage, Prod workloads. This has led to an increase in their monthly AWS spent and there comes a need to re-examine their cloud strategy to bring these costs down. It is a five blog series, I will be sharing a Use Case of a client and how we managed to reduce their monthly AWS spent by multiple folds.
One of the nation’s biggest online deals platform with a presence in 35+ cities enables customers and merchants to discover and engage with each other for fine dining, relaxing at the top spas, or discovering and traveling to the city. The platform also offers its merchants a strong branding platform that enables customers, in and around their establishments, to easily discover their businesses.
In the initial days, the client’s infrastructure setup and maintenance activities were being carried out by the application team without any DevOps perspective that cost the client enormous time, effort and money. Therefore, the client wanted to standardize the infrastructure components, implement scalable solutions and maximize automation wherever possible in order to reduce operational and maintenance cost.
The client engaged TO THE NEW as a trusted DevOps service provider to help them reduce the infrastructure costs. In our client ecosystem, we have microservices architecture with 50+ services and every service in their ecosystem comprised of a load balancer, two application instances with auto scaling, and a 3 node MongoDB replica set. Our application stack majorly leverages Java, but we had also adopted Node js and Angular js for our frontend apps. For managing database, we were mostly using NoSQL databases (MongoDB and Elasticsearch), but for transaction purposes, we were using Postgresql. Apart from the above-mentioned technologies, we are using Apache Kafka for event-based notifications and Hbase for preference engine.
How we identified the scope of Cost Optimizations
The business requirement was pretty straight forward i.e. they wanted a web application that can handle high traffic and can scale independently, therefore the engineering team decided to go head with microservices architecture as it gives individual teams the flexibility to scale and develop new features independently.
In the initial days of our engagement with the customer, in the beginning, we implemented open source monitoring tool as AWS cloud watch doesn’t provide memory and hard disk utilizations. After analyzing the report for the last 15-21 days, we were able to get more insights about the existing infrastructure. Few stats are given below from the report:
- Average CPU utilization of all the instances was below 25% and maximum was 35%
- Average Memory utilization of all the instances was below 35% and maximum was 43%
- Average Disk utilization was about 20-25%
After looking at the stats above, we were able to make our cost optimization roadmap with cloud aggressive approach in mind. (by cloud aggressive approach, I mean running all the AWS Resources at 75-80% utilization rates and after crossing that threshold it can automatically scale). In the roadmap, we decided to divide the entire cost optimization activity into 4 steps based on the cost savings. These four steps are explained further in detail below:
- Ways to Cut Cost with Infrastructure Monitoring: Monitoring system provided us detailed insights about our infrastructure with the help of which, we were able to identify underutilized and idle resources. Monitoring became the key factor for entire cost optimization.
- How we generated 30% cost savings on Application Hosting with AWS ECS Migration: Every new service request required 2 application servers to be provisioned with auto scaling and adding 3-4 new services every month resulted in adding 8-12 EC2 instances which increased monthly AWS billing. To solve this ever ending problem, we started playing with Kubernetes and ended up using AWS ECS.
- MongoDB Consolidation and RDS optimizations helped save 76% of our database spent: As most of the infrastructure was provisioned by the application team, it lacked basic kernel, limits, and MongoDB tuning. After looking at the MongoDB utilization reports we starting categorizing MongoDB on the basis of throughput and availability. i.e. High-throughput systems were kept in the separate cluster and all the internal application databases were kept on the same cluster as the business also didn’t mind slight downtimes for the internal applications.
- Save 80% on Dev, QA, Stage and Test Environment Cost on AWS with Spot Instances: As discussed earlier, we migrated all our applications in QA, STAGE, and PROD to AWS ECS. After going through the blog “Powering your Amazon ECS Clusters with Spot Fleet ”, we started playing with the spot instances and after 30 days we had started using Spot instances to power our ECS and Elasticsearch Clusters that resulted in huge cost savings.
Overall Cost Comparison:
|Before (in $ per month)||After 3 months (in $ per month)|
As you can see from the above stats, we made significant cost savings after implementing the four points we discussed earlier in our roadmap. In the next blog, we will be discussing “AWS Cost Optimization Series | Blog 2 | Monitor & Optimize” and how it became the key factor for cost optimization.