27 tips: How to reduce cloud costs - Digma

Get early access to the Digma SRE AI Platform: Start Now

What should an engineering manager do when tasked with reducing AWS costs by 50% within two months? Confronted with an AWS account in disarray, with RDS and EC2 as the primary cost drivers.

27 tips: How to reduce cloud costs - aws clear

In this blog, you’ll find strategies and tips drawn from industry best practices to help engineering managers identify inefficiencies and reduce cloud expenses effectively. Whether you’re facing a similar challenge or looking to improve cloud cost management, this guide will provide valuable insights for achieving both immediate savings and long-term optimization.

Table of Contents

Key Questions for Reducing specific AWS Cost drivers:

EC2 Instances

Are all EC2 instances properly sized for their workloads?
Are they running 24/7? If not, schedule them to shut down overnight and restart in the morning.
Are any EC2 instances underutilized? Test, Dev, and Pre-prod environments may not need full capacity—scale down where possible.
Are any instances running only a small function? Consider switching to a serverless approach.
Can workloads be containerized so they spin up only when needed?
Can you move workloads from EC2 to serverless? For example, package applications in Docker and run them on Fargate.
Enable hibernation for instances that don’t need to run 24/7—restoring from hibernation is fast and retains the last state.
Use Cost Explorer to analyze where the highest EC2 costs come from and leverage other tools for cost predictions.
Can you commit to specific resources for at least a year? If so, consider purchasing reserved instances.
For non-critical workloads (e.g., Dev), explore using spot instances to reduce costs.

Databases

Do all databases need to be relational? If some function as key-value stores, consider migrating them to NoSQL (e.g., DynamoDB).
Can database sizes or instance types be optimized for better cost efficiency?
Are there any unused or underutilized database instances that can be downsized or removed?

Data Transfer Costs

How much data is exiting AWS?
Can that be reduced?
How much data is being sent to the internet?
Can accelerators or compression reduce outgoing data costs?

27 Tips: How to Reduce Cloud Costs

General cloud cost optimization tips

Cache external dependencies locally to reduce network transfer costs. For example, use pull-through Docker image registries.
Prefer spot instances for stateless workloads instead of on-demand instances.
Use automated scaling solutions to shut down development workloads during weekends when possible.
Filter logs, metrics, and traces before they reach your monitoring solution. Most solutions (SaaS or self-hosted) charge for storage and ingestion since the SSD/HDD space is the least expensive resource of AWS instead of RAM.
Optimize your application to reduce CPU hotspots and memory leaks—this can significantly cut infrastructure costs. The same applies to database queries and caching.
Use a profiler to identify inefficiencies:
1. JProfiler is great for finding CPU hotspots, memory leaks, and slow DB queries in Java applications.
2. Flamegraphs provide similar insights across multiple languages.
Identify inefficient code that leads to unnecessary cloud resource consumption. Whether it’s performance bottlenecks or excessive database queries, Digma helps teams optimize before deployment, avoiding costly inefficiencies.
The best way to save money is by preventing unnecessary asset creation. Implement asset management workflows and approval processes to control resource allocation.
Plan ahead—understand the cost implications of standing up resources. Estimate network transit costs and other variables. Effective cost management comes from proper planning and proactive monitoring, not reactive billing analysis.
Managed services or A managed: you can use only one managed service, like a stable load balancer but keep in mind that a managed service can increase costs.

AWS Cost Optimization Tips

Awesome Cloud Cost – a curated list of tips for reducing cloud costs.
Use Reserved Instances (RI) and Savings Plans to lower costs. Consider “smart” automated RI SaaS solutions based on your existing workloads.
Prefer newer-generation EC2 instances—they are always cheaper. This applies to other products as well, such as using gp3 instead of gp2 for storage.
Use S3 storage classes to cut costs on less frequently accessed data.
If using multiple private subnets that require internet access, ensure each has its own NAT gateway. Sharing a single NAT gateway can be more expensive. Alternatively, install a NAT instance on a small EC2 or use only public subnets with strict access controls.
Move away from Classic Load Balancers (deprecated and more expensive). Use Network Load Balancers (NLB) or Application Load Balancers (ALB) instead.
Prefer Transit Gateways (or AWS Network Manager) over VPC Peering, especially when dealing with many VPCs, as peering costs scale poorly.
Use VPC Endpoints to access AWS services internally. However, compare the cost of an endpoint with the cost of direct usage to ensure savings.
If using S3 Glacier, compress files into as few objects as possible before uploading to reduce request costs.
A common best practice is to create a centralized “endpoint VPC”, where all endpoints are managed, and the rest of your VPCs/accounts access AWS resources through it.
Use Cloud-native block storage management for identifying and removing unused or unattached storage volumes.

Kubernetes cost optimization tips:

Here is the first tip I got: Any attempt to control costs after an application has been architected and deployed is necessarily focusing on the wrong things. Cloud costs are a function of your
Consolidate your pods on fewer nodes. Leave only as little headroom as you intend for in your nodes.
Don’t over-commit resources. Pod requests must be optimized over time in order to not over-provision.
If possible, prefer using only a single region to avoid network transfer costs between nodes. Preferably when it’s not production.
If you are running things like k8s there are other tools to monitor load and dynamically adjust, but monitoring all costs on a 5-minute interval seems odd. You could correlate this to your actual infrastructure monitoring. You are going to know if the cost goes crazy if you are monitoring your deployment properly or even better in pre-deployment.
A list I got from a Reddit post:

Share as much of the cloud resources between environments as possible. First and foremost – the cluster itself
Autoscale the cluster nodes
Use the right node types
Use Spot instances
Reduce replicas
Reduce resource requests
Bring up partial environments

Conclusion: How to reduce cloud costs

See how Digma Preemtive Observability identifies inefficient code patterns that increase resource consumption. By pinpointing areas for optimization, engineering teams are able to write cost-efficient code that scales better while reducing infrastructure expenses. Unlike cloud cost optimization tools that focus on infrastructure expenses.

Request a demo:

Find what your tests are missing

Product

Resources

Company

Connect

© 2024 digma.ai

For California Residents: Do not sell my personal information

All Rights Reserved

We are complied with