{"id":72755,"date":"2025-07-01T16:09:12","date_gmt":"2025-07-01T10:39:12","guid":{"rendered":"https:\/\/www.tothenew.com\/blog\/?p=72755"},"modified":"2025-07-09T11:13:26","modified_gmt":"2025-07-09T05:43:26","slug":"rds-proxy-in-production-real-world-lessons-limitations-and-why-we-use-it","status":"publish","type":"post","link":"https:\/\/www.tothenew.com\/blog\/rds-proxy-in-production-real-world-lessons-limitations-and-why-we-use-it\/","title":{"rendered":"RDS Proxy in Production: Real-World Lessons, Limitations, and Why We Use It"},"content":{"rendered":"<h2><span style=\"color: #000000;\"><strong>Introduction: The Real Problem<\/strong><\/span><\/h2>\n<p><span style=\"color: #000000;\">Ever run RDS PostgreSQL in production? You&#8217;ve probably faced connection limit bottlenecks, app crashes during failovers, or unexpected timeouts, often at the worst possible moments. These are real production issues where RDS Proxy proves its value by offering connection pooling, improved failover handling, and secure credential management to keep your services resilient and responsive.<\/span><\/p>\n<p><span style=\"color: #000000;\">At first glance, RDS Proxy appears to be a no-brainer for connection management. But if you&#8217;re in a DevOps or Platform Engineering role, you know the story doesn\u2019t end at provisioning. This blog dives into real-world lessons from working with RDS Proxy.<\/span><\/p>\n<h2><span style=\"color: #000000;\"><strong>What is RDS Proxy?<\/strong><\/span><\/h2>\n<p><span style=\"color: #000000;\">RDS Proxy is AWS\u2019s managed service that handles database connection pooling, failovers, and secure, credential-free access to RDS. It acts as an intermediary between your app and RDS, optimizing connection usage, improving failover resilience, and enforcing security best practices.<\/span><\/p>\n<p><span style=\"color: #000000;\"><strong>Flow:<\/strong><\/span><\/p>\n<div id=\"attachment_72754\" style=\"width: 310px\" class=\"wp-caption alignnone\"><img aria-describedby=\"caption-attachment-72754\" decoding=\"async\" loading=\"lazy\" class=\"size-medium wp-image-72754\" src=\"https:\/\/www.tothenew.com\/blog\/wp-ttn-blog\/uploads\/2025\/06\/Screenshot-2025-06-16-at-12.05.36\u202fPM-300x134.png\" alt=\"The flow of app to DB via RDS Proxy\" width=\"300\" height=\"134\" srcset=\"\/blog\/wp-ttn-blog\/uploads\/2025\/06\/Screenshot-2025-06-16-at-12.05.36\u202fPM-300x134.png 300w, \/blog\/wp-ttn-blog\/uploads\/2025\/06\/Screenshot-2025-06-16-at-12.05.36\u202fPM-1024x458.png 1024w, \/blog\/wp-ttn-blog\/uploads\/2025\/06\/Screenshot-2025-06-16-at-12.05.36\u202fPM-768x343.png 768w, \/blog\/wp-ttn-blog\/uploads\/2025\/06\/Screenshot-2025-06-16-at-12.05.36\u202fPM-624x279.png 624w, \/blog\/wp-ttn-blog\/uploads\/2025\/06\/Screenshot-2025-06-16-at-12.05.36\u202fPM.png 1226w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><p id=\"caption-attachment-72754\" class=\"wp-caption-text\">The flow of app to DB via RDS Proxy<\/p><\/div>\n<p><span style=\"color: #000000;\"> Let\u2019s break that down:<\/span><\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000000;\"><strong>1. Connection Pooling (What It Solves and Why It Matters)<\/strong><\/span><br \/>\n<span style=\"color: #000000;\">Imagine your application as a high-energy crowd trying to get into a concert, and the database as the venue with limited doors (connections). Without crowd control, chaos ensues. That\u2019s where RDS Proxy steps in, as a well-trained bouncer that keeps the line moving, avoids stampedes, and prevents the venue from shutting down. <\/span><\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000000;\">RDS Proxy maintains a pool of warm, reusable database connections that your application can borrow and return on demand.<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><span style=\"color: #000000;\">Minimizes the time and CPU cost of constantly opening and closing connections.<\/span><\/li>\n<li><span style=\"color: #000000;\">Protects your RDS instance from spikes in traffic.<\/span><\/li>\n<li><span style=\"color: #000000;\">Prevents \u201cconnection storm\u201d failures during deployments or cold starts.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000000;\">RDS Proxy automatically scales based on the number of application connections. AWS doesn&#8217;t expose direct knobs for sizing the proxy, the proxy elastically adjusts its connection pool based on demand. Each RDS Proxy can open up to approx 1000 connections(soft limit) per target RDS instance by default.\u00a0<\/span><\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000000;\"><strong>2. Failover Handling (Why It\u2019s a Lifesaver)<\/strong><\/span><br \/>\n<span style=\"color: #000000;\">During planned or unplanned RDS failovers (such as a DB restart or AZ failover), traditional applications connected directly to the DB will experience dropped connections, timeouts, and potential crashes. With RDS Proxy, your application connects to a stable proxy endpoint, not directly to the DB instance. If a failover occurs:<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><span style=\"color: #000000;\">RDS Proxy detects the failover event.<\/span><\/li>\n<li><span style=\"color: #000000;\">It re-establishes a connection to the new DB instance behind the scenes.<\/span><\/li>\n<li><span style=\"color: #000000;\">The app stays connected to the proxy seamlessly in most cases.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000000;\"><strong><span style=\"color: #ff6600;\">Is there latency?<\/span><\/strong> Yes, but minimal. Most reconnections complete within 20\u201330 seconds. During this window, apps might experience short stalls, but significantly less disruption than full connection drops.<\/span><\/p>\n<p style=\"padding-left: 40px;\"><strong><span style=\"color: #000000;\">3. Secure Credential Management<\/span><\/strong><br \/>\n<span style=\"color: #000000;\">RDS Proxy integrates seamlessly with Secrets Manager or IAM authentication, enabling secure database connectivity without the need to hardcode credentials in the application configuration.<\/span><\/p>\n<p style=\"padding-left: 40px;\"><strong><span style=\"color: #000000;\">4. TLS Enforcement<\/span><\/strong><br \/>\n<span style=\"color: #000000;\">It can be configured to require TLS encryption, ensuring encrypted communication from the application to the database.<\/span><\/p>\n<p style=\"padding-left: 40px;\"><strong><span style=\"color: #000000;\">5. No Need to Self-Host PgBouncer<\/span><\/strong><br \/>\n<span style=\"color: #000000;\">RDS Proxy delivers all these benefits without requiring you to deploy, configure, and maintain third-party tools like PgBouncer.<\/span><\/p>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000000;\"><span style=\"color: #ff6600;\"><strong>What is a PgBouncer?<\/strong><\/span> A lightweight connection pooler, often used with PostgreSQL to efficiently manage database connections. It requires us to run and manage the service, handle scaling, deal with failover scripts, and configure security. On the other hand, RDS Proxy, being a managed service, eliminates all that overhead. You get similar benefits without the headache of running PgBouncer yourself.<\/span><\/p>\n<h2><strong><span style=\"color: #000000;\">Why It Can Be a Game-Changer?<\/span><\/strong><\/h2>\n<ul>\n<li><span style=\"color: #000000;\">Elastic environments, ECS or Lambda, benefit most; they spawn connections unpredictably.<\/span><\/li>\n<li><span style=\"color: #000000;\">Failover insulation: When the RDS instance restarts or changes AZs, RDS Proxy hides that turbulence from your app.<\/span><\/li>\n<li><span style=\"color: #000000;\">Better security: with the use of IAM tokens or Secrets Manager.<\/span><\/li>\n<li><span style=\"color: #000000;\">Security groups can be tightened to block direct DB access.<\/span><\/li>\n<li><span style=\"color: #000000;\">It handles thousands of concurrent ECS connections more gracefully than direct RDS connections.<\/span><\/li>\n<\/ul>\n<h2><strong><span style=\"color: #000000;\">When RDS Proxy might not be the best choice \u2026<\/span><\/strong><\/h2>\n<p><span style=\"color: #000000;\">Yes, It Has Downsides<\/span><br \/>\n<span style=\"color: #000000;\">While RDS Proxy brings many benefits, it\u2019s important to understand where it might fall short. Here are some limitations one should keep in mind:<\/span><\/p>\n<ul>\n<li><span style=\"color: #000000;\"><strong>Adds Some Latency:<\/strong> Since the\u00a0database requests pass through the proxy, it introduces a slight delay. For applications that require ultra-low latency or real-time responsiveness, this extra hop might be a drawback.<\/span><\/li>\n<li><span style=\"color: #000000;\"><strong>Not Ideal for Long-Lived Connections:<\/strong> If your app maintains long-lived database connections or already uses efficient connection pooling on the client side, RDS Proxy might not provide much additional value.<\/span><\/li>\n<li><span style=\"color: #000000;\"><strong>Debugging Can Be More Complex:<\/strong> Since the proxy sits between your app and the database, diagnosing connection failures or performance bottlenecks can be trickier.<\/span><\/li>\n<li><strong><span style=\"color: #000000;\">Compatibility Limitations You Should Know About:<\/span><\/strong>\n<ul style=\"list-style-type: circle;\">\n<li><span style=\"color: #000000;\"><span style=\"text-decoration: underline;\">Aurora Serverless v2 is Not Supported:<\/span> AWS officially confirms that RDS Proxy doesn\u2019t currently support Aurora Serverless v2.<\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"text-decoration: underline;\">PostgreSQL Protocol Version 3.0 or Newer Only:<\/span> RDS Proxy requires PostgreSQL\u2019s modern protocol version 3.0 (introduced in PostgreSQL 7.4+). Databases using older protocol versions won\u2019t work with the proxy.<\/span><\/li>\n<li><span style=\"color: #000000;\"><span style=\"text-decoration: underline;\">MySQL Limitations:<\/span> For MySQL, RDS Proxy supports most standard features but doesn\u2019t support certain advanced startup message options.<\/span><\/li>\n<\/ul>\n<\/li>\n<li><span style=\"color: #000000;\"><strong>VPC-Only Access:<\/strong> RDS Proxy endpoints are accessible only from within the same VPC or connected networks (like VPC peering or VPN).\u00a0<\/span><\/li>\n<\/ul>\n<h2><span style=\"color: #000000;\"><strong>Monitoring in RDS Proxy<\/strong><\/span><\/h2>\n<p><span style=\"color: #000000;\">RDS Proxy can be monitored by using Amazon CloudWatch. CloudWatch collects and processes raw data from the proxies into readable, near-real-time metrics. RDS Proxy exposes CloudWatch metrics like:<\/span><\/p>\n<ul>\n<li><span style=\"color: #000000;\">DatabaseConnections<\/span><\/li>\n<li><span style=\"color: #000000;\">ClientConnections<\/span><\/li>\n<li><span style=\"color: #000000;\">ActiveConnections<\/span><\/li>\n<li><span style=\"color: #000000;\">ConnectionBorrowTimeouts<\/span><\/li>\n<\/ul>\n<p><span style=\"color: #000000;\">These are critical to tuning and alerting for burst scenarios, idle timeouts, and connection reuse efficiency. For more details, can read &#8211; <a href=\"https:\/\/docs.aws.amazon.com\/AmazonRDS\/latest\/UserGuide\/rds-proxy.monitoring.html\">Monitoring RDS Proxy metrics with Amazon CloudWatch<\/a><\/span><\/p>\n<h2><strong><span style=\"color: #000000;\">More Detailed Information and Updates on RDS Proxy<\/span><\/strong><\/h2>\n<p><span style=\"color: #000000;\">For the most accurate and up-to-date information on RDS Proxy features, limitations, and supported configurations, I strongly recommend checking the latest AWS documentation. Below are the \u201cgood to read\u201d docs if you&#8217;re planning to adopt or are already using RDS Proxy:<\/span><\/p>\n<ul>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/AmazonRDS\/latest\/UserGuide\/rds-proxy.html#rds-proxy-mysql\"><span style=\"color: #000000;\">Working with RDS Proxy &amp; Limitations<\/span><\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/AmazonRDS\/latest\/UserGuide\/rds-proxy-planning.html\"><span style=\"color: #000000;\">RDS Proxy Planning <\/span><\/a><\/li>\n<li><a href=\"https:\/\/docs.aws.amazon.com\/AmazonRDS\/latest\/UserGuide\/rds-proxy.monitoring.html\"><span style=\"color: #000000;\">Monitoring RDS Proxy metrics with Amazon CloudWatch<\/span><\/a><\/li>\n<\/ul>\n<h2><strong><span style=\"color: #000000;\">REAL PROJECT STORY: High CPU, Wild Spikes &amp; Proxy to the Rescue<\/span><\/strong><\/h2>\n<p><span style=\"color: #000000;\">We were running a microservice named XYZ API on Amazon ECS (EC2). The XYZ API serves as a central API and data store, modeling the XYZ Betting Domain, allowing multiple teams to access and work with the betting data. It&#8217;s a high-traffic service, especially during peak betting times or promotions.<\/span><\/p>\n<p><span style=\"color: #000000;\">Over time, there were unpredictable CPU spikes on our XYZ API DB (PostgreSQL RDS instance). These spikes would occasionally push CPU utilization close to 100%, leading to performance degradation, slower response times, and even task-level instability in ECS.<\/span><\/p>\n<p style=\"padding-left: 40px;\"><strong><span style=\"color: #000000;\">Phase 1: Diagnosing the Problem<\/span><\/strong><br \/>\n<span style=\"color: #000000;\">We traced the spikes to sudden bursts of connections and inefficient query patterns. Since ECS-EC2 tasks can spin up or scale fast, each container establishing multiple new DB connections amplified the problem. The database wasn\u2019t able to handle these bursts efficiently.<\/span><\/p>\n<p style=\"padding-left: 40px;\"><strong><span style=\"color: #000000;\">Phase 2: Multi-Step Mitigation<\/span><\/strong><br \/>\n<span style=\"color: #000000;\">To address the issue, we implemented the following:<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li style=\"list-style-type: none;\">\n<ul style=\"list-style-type: circle;\">\n<li><strong><span style=\"color: #000000;\">Query and Code Optimization: <\/span><\/strong><span style=\"color: #000000;\">We reviewed and optimized several inefficient queries, added missing indexes, and tuned logic in the service to reduce load on the database.<\/span><\/li>\n<li><span style=\"color: #000000;\"><strong>Increased Read Replicas (Reader Nodes):<\/strong> In addition to the existing setup, increase the read replicas count by 1, ending up with a 3-node setup: 1 primary (read-write) and 2 read-only replicas. The service was already using the load-balancing logic to split read traffic at the application level.<\/span><\/li>\n<li><span style=\"color: #000000;\"><strong>Introduced RDS Proxy:<\/strong> RDS Proxy allowed ECS services to reuse pooled connections rather than opening new ones.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li style=\"list-style-type: none;\"><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000000;\">After implementing all the solutions above:<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><span style=\"color: #000000;\">CPU utilization dropped sharply.<\/span><\/li>\n<li><span style=\"color: #000000;\">ECS service stability improved.<\/span><\/li>\n<li><span style=\"color: #000000;\">DB connection limits were no longer breached.<\/span><\/li>\n<li><span style=\"color: #000000;\">Latency during scale-up events was significantly reduced.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p style=\"padding-left: 40px;\"><span style=\"color: #000000;\"><strong>Phase 3: Rollbacks and Lessons<\/strong><\/span><br \/>\n<span style=\"color: #000000;\">After seeing stable behavior for a few weeks, we assumed that due to the query and code optimization, we decided to simplify the setup:<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><span style=\"color: #000000;\">We removed the extra read replica (back to 1 reader + 1 writer).<\/span><\/li>\n<li><span style=\"color: #000000;\">We removed RDS Proxy, thinking query optimization alone would be enough.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><span style=\"color: #000000;\">Within days, the same CPU spikes returned. ECS tasks again started experiencing instability. It became evident that <strong>RDS Proxy was doing the heavy lifting by absorbing connection surges<\/strong>. We quickly reinstated RDS Proxy. CPU levels immediately stabilized again.<\/span><\/p>\n<p><span style=\"color: #000000;\">We eventually settled on a stable architecture:<\/span><\/p>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><span style=\"color: #000000;\">One read-write RDS node<\/span><\/li>\n<li><span style=\"color: #000000;\">One read replica<\/span><\/li>\n<li><span style=\"color: #000000;\">RDS Proxy enabled<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p><span style=\"color: #000000;\">The real takeaway: Connection pooling via <strong>RDS Proxy was critical in ensuring RDS CPU stayed under control<\/strong>, especially with a scalable ECS setup where connection surges are hard to predict.<\/span><\/p>\n<h2><span style=\"color: #000000;\"><strong>CONCLUSION<\/strong><\/span><\/h2>\n<p><span style=\"color: #000000;\">RDS Proxy isn\u2019t a silver bullet, but it\u2019s a powerful tool. Use it when your app environment is elastic, your failovers are affecting availability, or want to secure the connection more.<\/span><\/p>\n<p><span style=\"color: #000000;\">It&#8217;s about knowing when to enable, what all things to look for, and when to back off. RDS Proxy can bring significant reliability and security improvements to the PostgreSQL RDS workloads.<\/span><\/p>\n<p><span style=\"color: #000000;\"><strong>Stay pragmatic. Stay curious.<\/strong><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction: The Real Problem Ever run RDS PostgreSQL in production? You&#8217;ve probably faced connection limit bottlenecks, app crashes during failovers, or unexpected timeouts, often at the worst possible moments. These are real production issues where RDS Proxy proves its value by offering connection pooling, improved failover handling, and secure credential management to keep your services [&hellip;]<\/p>\n","protected":false},"author":1615,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"iawp_total_views":735},"categories":[2348],"tags":[7509,7508,7507],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/72755"}],"collection":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/users\/1615"}],"replies":[{"embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/comments?post=72755"}],"version-history":[{"count":3,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/72755\/revisions"}],"predecessor-version":[{"id":73135,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/posts\/72755\/revisions\/73135"}],"wp:attachment":[{"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/media?parent=72755"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/categories?post=72755"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.tothenew.com\/blog\/wp-json\/wp\/v2\/tags?post=72755"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}