RDS Proxy in Production: Real-World Lessons, Limitations, and Why We Use It
Introduction: The Real Problem
Ever run RDS PostgreSQL in production? You’ve probably faced connection limit bottlenecks, app crashes during failovers, or unexpected timeouts, often at the worst possible moments. These are real production issues where RDS Proxy proves its value by offering connection pooling, improved failover handling, and secure credential management to keep your services resilient and responsive.
At first glance, RDS Proxy appears to be a no-brainer for connection management. But if you’re in a DevOps or Platform Engineering role, you know the story doesn’t end at provisioning. This blog dives into real-world lessons from working with RDS Proxy.
What is RDS Proxy?
RDS Proxy is AWS’s managed service that handles database connection pooling, failovers, and secure, credential-free access to RDS. It acts as an intermediary between your app and RDS, optimizing connection usage, improving failover resilience, and enforcing security best practices.
Flow:

The flow of app to DB via RDS Proxy
Let’s break that down:
1. Connection Pooling (What It Solves and Why It Matters)
Imagine your application as a high-energy crowd trying to get into a concert, and the database as the venue with limited doors (connections). Without crowd control, chaos ensues. That’s where RDS Proxy steps in, as a well-trained bouncer that keeps the line moving, avoids stampedes, and prevents the venue from shutting down.
RDS Proxy maintains a pool of warm, reusable database connections that your application can borrow and return on demand.
-
-
- Minimizes the time and CPU cost of constantly opening and closing connections.
- Protects your RDS instance from spikes in traffic.
- Prevents “connection storm” failures during deployments or cold starts.
-
RDS Proxy automatically scales based on the number of application connections. AWS doesn’t expose direct knobs for sizing the proxy, the proxy elastically adjusts its connection pool based on demand. Each RDS Proxy can open up to approx 1000 connections(soft limit) per target RDS instance by default.
2. Failover Handling (Why It’s a Lifesaver)
During planned or unplanned RDS failovers (such as a DB restart or AZ failover), traditional applications connected directly to the DB will experience dropped connections, timeouts, and potential crashes. With RDS Proxy, your application connects to a stable proxy endpoint, not directly to the DB instance. If a failover occurs:
-
- RDS Proxy detects the failover event.
- It re-establishes a connection to the new DB instance behind the scenes.
- The app stays connected to the proxy seamlessly in most cases.
Is there latency? Yes, but minimal. Most reconnections complete within 20–30 seconds. During this window, apps might experience short stalls, but significantly less disruption than full connection drops.
3. Secure Credential Management
RDS Proxy integrates seamlessly with Secrets Manager or IAM authentication, enabling secure database connectivity without the need to hardcode credentials in the application configuration.
4. TLS Enforcement
It can be configured to require TLS encryption, ensuring encrypted communication from the application to the database.
5. No Need to Self-Host PgBouncer
RDS Proxy delivers all these benefits without requiring you to deploy, configure, and maintain third-party tools like PgBouncer.
What is a PgBouncer? A lightweight connection pooler, often used with PostgreSQL to efficiently manage database connections. It requires us to run and manage the service, handle scaling, deal with failover scripts, and configure security. On the other hand, RDS Proxy, being a managed service, eliminates all that overhead. You get similar benefits without the headache of running PgBouncer yourself.
Why It Can Be a Game-Changer?
- Elastic environments, ECS or Lambda, benefit most; they spawn connections unpredictably.
- Failover insulation: When the RDS instance restarts or changes AZs, RDS Proxy hides that turbulence from your app.
- Better security: with the use of IAM tokens or Secrets Manager.
- Security groups can be tightened to block direct DB access.
- It handles thousands of concurrent ECS connections more gracefully than direct RDS connections.
When RDS Proxy might not be the best choice …
Yes, It Has Downsides
While RDS Proxy brings many benefits, it’s important to understand where it might fall short. Here are some limitations one should keep in mind:
- Adds Some Latency: Since the database requests pass through the proxy, it introduces a slight delay. For applications that require ultra-low latency or real-time responsiveness, this extra hop might be a drawback.
- Not Ideal for Long-Lived Connections: If your app maintains long-lived database connections or already uses efficient connection pooling on the client side, RDS Proxy might not provide much additional value.
- Debugging Can Be More Complex: Since the proxy sits between your app and the database, diagnosing connection failures or performance bottlenecks can be trickier.
- Compatibility Limitations You Should Know About:
- Aurora Serverless v2 is Not Supported: AWS officially confirms that RDS Proxy doesn’t currently support Aurora Serverless v2.
- PostgreSQL Protocol Version 3.0 or Newer Only: RDS Proxy requires PostgreSQL’s modern protocol version 3.0 (introduced in PostgreSQL 7.4+). Databases using older protocol versions won’t work with the proxy.
- MySQL Limitations: For MySQL, RDS Proxy supports most standard features but doesn’t support certain advanced startup message options.
- VPC-Only Access: RDS Proxy endpoints are accessible only from within the same VPC or connected networks (like VPC peering or VPN).
Monitoring in RDS Proxy
RDS Proxy can be monitored by using Amazon CloudWatch. CloudWatch collects and processes raw data from the proxies into readable, near-real-time metrics. RDS Proxy exposes CloudWatch metrics like:
- DatabaseConnections
- ClientConnections
- ActiveConnections
- ConnectionBorrowTimeouts
These are critical to tuning and alerting for burst scenarios, idle timeouts, and connection reuse efficiency. For more details, can read – Monitoring RDS Proxy metrics with Amazon CloudWatch
More Detailed Information and Updates on RDS Proxy
For the most accurate and up-to-date information on RDS Proxy features, limitations, and supported configurations, I strongly recommend checking the latest AWS documentation. Below are the “good to read” docs if you’re planning to adopt or are already using RDS Proxy:
- Working with RDS Proxy & Limitations
- RDS Proxy Planning
- Monitoring RDS Proxy metrics with Amazon CloudWatch
REAL PROJECT STORY: High CPU, Wild Spikes & Proxy to the Rescue
We were running a microservice named XYZ API on Amazon ECS (EC2). The XYZ API serves as a central API and data store, modeling the XYZ Betting Domain, allowing multiple teams to access and work with the betting data. It’s a high-traffic service, especially during peak betting times or promotions.
Over time, there were unpredictable CPU spikes on our XYZ API DB (PostgreSQL RDS instance). These spikes would occasionally push CPU utilization close to 100%, leading to performance degradation, slower response times, and even task-level instability in ECS.
Phase 1: Diagnosing the Problem
We traced the spikes to sudden bursts of connections and inefficient query patterns. Since ECS-EC2 tasks can spin up or scale fast, each container establishing multiple new DB connections amplified the problem. The database wasn’t able to handle these bursts efficiently.
Phase 2: Multi-Step Mitigation
To address the issue, we implemented the following:
-
-
- Query and Code Optimization: We reviewed and optimized several inefficient queries, added missing indexes, and tuned logic in the service to reduce load on the database.
- Increased Read Replicas (Reader Nodes): In addition to the existing setup, increase the read replicas count by 1, ending up with a 3-node setup: 1 primary (read-write) and 2 read-only replicas. The service was already using the load-balancing logic to split read traffic at the application level.
- Introduced RDS Proxy: RDS Proxy allowed ECS services to reuse pooled connections rather than opening new ones.
-
After implementing all the solutions above:
-
- CPU utilization dropped sharply.
- ECS service stability improved.
- DB connection limits were no longer breached.
- Latency during scale-up events was significantly reduced.
Phase 3: Rollbacks and Lessons
After seeing stable behavior for a few weeks, we assumed that due to the query and code optimization, we decided to simplify the setup:
-
-
- We removed the extra read replica (back to 1 reader + 1 writer).
- We removed RDS Proxy, thinking query optimization alone would be enough.
-
Within days, the same CPU spikes returned. ECS tasks again started experiencing instability. It became evident that RDS Proxy was doing the heavy lifting by absorbing connection surges. We quickly reinstated RDS Proxy. CPU levels immediately stabilized again.
We eventually settled on a stable architecture:
-
- One read-write RDS node
- One read replica
- RDS Proxy enabled
The real takeaway: Connection pooling via RDS Proxy was critical in ensuring RDS CPU stayed under control, especially with a scalable ECS setup where connection surges are hard to predict.
CONCLUSION
RDS Proxy isn’t a silver bullet, but it’s a powerful tool. Use it when your app environment is elastic, your failovers are affecting availability, or want to secure the connection more.
It’s about knowing when to enable, what all things to look for, and when to back off. RDS Proxy can bring significant reliability and security improvements to the PostgreSQL RDS workloads.
Stay pragmatic. Stay curious.