Skip to content
The Real Cost of Read Replicas (and When They Stop Paying Off)
← Back to blogAI & Databases

The Real Cost of Read Replicas (and When They Stop Paying Off)

Read replicas are the reflex fix for growing read traffic, but the cost scales linearly with no ceiling. Here's where replicas earn their keep, where they stop paying off, and how SQL-layer caching handles repetitive reads for a fraction of the spend.

Readyset Team

Readyset Team

2026-09-19 · 7 min read

Read traffic climbs, the primary starts to sweat, and the fix everyone reaches for is the same one: spin up another read replica. It is the path of least resistance, it is well-documented, and it works often enough that it has become a reflex. The trouble is that the read replica cost shows up later, on the invoice, and in the on-call rotation. Every replica you add is a full copy of your database, running the same expensive queries against its own copy of the data, and that cost grows in a straight line with no ceiling as you add more replicas.

We are not here to make the argument that read replicas are a mistake. They solve real problems, and for some of those problems nothing else comes close. But could we convince you that you should know exactly which problem you are solving before you provision the next one? Because past a certain point, replicas stop buying you what you think they are buying.

Where read replicas earn their keep

Replicas are a legitimate scaling tool, and there are workloads where they are the right call. If a separate process needs data immediately after it is written, synchronous replication can provide read-after-write consistency, while semi-synchronous replication can reduce the risk of stale reads. If your users are spread across continents, placing replicas closer to them cuts read latency in a way that a single regional cache cannot fully cover. For absorbing general read traffic off the primary in the early stages of growth, one or two replicas is a sensible, low-friction move.

So the question is not whether replicas belong in your architecture. It is what happens when the first replica is not enough, and you find yourself reaching for the third, the fourth, the fifth.

The real cost of read replicas at scale

A read replica distributes query load, but it does not reduce it. Every SELECT that lands on a replica still has to be executed against a complete copy of the data. You have spread the work across more machines, but the total amount of work the system does has not gone down. If anything, it has gone up, because now several copies of the data are independently recomputing the same results for the same hot queries.

That shows up in three places.

Cost scales linearly, and only linearly

Each replica needs storage and compute roughly equivalent to the primary. Provisioning a replica buys you roughly one replica's worth of additional read capacity. But with this solution, you are still left without any economy of scale. The tenth replica costs about what the first one did and delivers about the same marginal capacity, while your monthly bill keeps climbing in a straight line.

Replication lag becomes a permanent operational concern

Asynchronous replication means replicas trail the primary, and under heavy write pressure that lag can stretch from milliseconds into seconds. Now you are reasoning about which reads can tolerate stale data, building fallback paths to the primary for the ones that cannot, and watching those fallbacks quietly add load back onto the very database you were trying to protect.

The operational surface keeps growing

Every replica adds connections to manage, a failover path to reason about, and version skew to coordinate during upgrades. Schema changes turn into carefully choreographed events across the fleet. When the same query runs in five places, each with its own state and performance profile, debugging means checking five places instead of one.

None of this means replicas are bad, but they are an expensive way to scale repetitive reads, and repetitive reads are exactly what high-traffic applications are made of. In many production systems, that handful of queries accounts for a disproportionate share of database CPU time. Buying more full database copies to keep re-executing identical work is a lot of infrastructure to solve a problem that is really about redundant computation.

There are practical ceilings, and some teams hit them. The Brazilian credit intelligence company Lemit ran a heavily tuned Percona MySQL setup on 256 virtual processors and 1.4 TB of RAM, with async read replicas and in-memory tables layered on top, and still plateaued at roughly 13,000 SELECTs per second hitting the database directly. Reads were competing with writes for the same buffer pool and CPU no matter how much hardware they added. More replicas would not have moved that number, because the bottleneck was not capacity, it was the same reads being executed again and again.

Read replicas vs caching: a different way to handle read load

If the underlying issue is that the same expensive queries keep getting executed, the more direct fix is to stop executing them every time. That is what caching at the SQL layer does. Instead of redistributing the query work across more copies of the database, it removes the work for the queries that repeat, serving precomputed query results from memory and falling back to the database for everything else.

The historical catch with caching has been the maintenance tax: deciding what to cache, designing key schemas, writing invalidation logic, tuning TTLs, and defending against stampedes when a popular key expires. That tax can be intense, and it is why a lot of teams default to replicas instead. The good news? It is also the part that has changed. A SQL-layer cache that watches the database's own replication stream can keep cached results current automatically, incrementally updating cached query results as underlying data changes rather than recomputing from scratch or waiting for a TTL to lapse. No cache keys to manage, no invalidation logic to maintain, and freshness bounded by replication lag, similar to the consistency model of a read replica.

The practical difference is in what each approach does to the work. A replica answers "run this query again, somewhere else." A cache answers "you already computed this, here it is." For the stable, high-frequency queries that dominate read load, the second answer is faster and dramatically cheaper, because the most reliable way to make a query cost less is to not run it. In Readyset's case, cached query results are maintained incrementally as data changes, so the system updates what changed instead of rerunning the entire query.

That difference shows up directly on the bill. In a field test putting Readyset in front of a real GitLab instance on Azure Database for PostgreSQL, the cacheable read queries (aggregate issue counts, project lists, activity feeds) ran 5x to 26x faster from cache. The more interesting result was on the cost side: with those reads served from memory instead of the database, the Postgres tier could drop from an 8-vCore instance to a 2-vCore one at the same throughput target. That moved the monthly compute bill from $593 to $226, including the cost of the Readyset VM itself, a 62 percent reduction, or about $4,400 a year. The savings did not come from a cheaper license. They came from the database no longer having to execute the reads at all.

The same logic plays out at the top end of the scale. Lemit, the team stuck at 13,000 reads per second against fully maxed-out hardware, put Readyset in front of MySQL and pushed the stack past 109,000 queries per second, an 8x increase in read throughput, with cached reads served in under a millisecond. The number of application code changes required to get there was zero. No cache keys, no invalidation code, and no query rewrites. The primary was finally free to do the writes only it could do, while the repetitive reads stopped touching it.

Adding a read replicaCaching at the SQL layer
What it does to the workRedistributes it. Same query, run again elsewhere.Removes it for repeating queries. Results served from memory.
How cost scalesLinearly. Each replica is a full DB copy with its own storage and compute.With the size of the working set you cache, not with request volume.
Freshness modelAsync replication; lag can grow under write pressure.Kept current from the replication stream; no TTLs to tune.
Best fitRead-after-write consistency, geographic distribution, early general offload.High-frequency, stable, repetitive reads that dominate the load.
Operational surfaceGrows with the fleet: connections, failover, schema coordination.One component in front of the database; misses fall back cleanly.

Before you provision another replica, audit your read load

Before adding another replica or a caching layer, it helps to know which queries are actually driving your read load. The answer is usually lopsided: a handful of queries doing most of the damage. You can find them with pg_stat_statements in Postgres or the Performance Schema in MySQL, ranking by total execution time and call count rather than by the slowest single run. If you want a faster read on a whole fleet, rdst is a free CLI that audits your databases and surfaces the repetitive, cacheable queries for you, no commitment required.

If that audit turns up what it usually turns up, a small set of stable queries carrying most of the read traffic, then SQL-layer caching is worth a look before the next replica. Readyset caches those queries and keeps them fresh off your replication stream, with no application code changes, so the database can shrink instead of multiply.

For a deeper walk through the economics, Why Query Caching Is the Most Cost-Effective Way to Scale Databases goes line by line on where replica costs come from. The full GitLab on Azure field test and the Lemit 100K QPS case study have the complete numbers behind the two examples above.

Want to see Readyset in action?

Book a demo and see how Readyset can accelerate your database.

Still scaling the hard way?

Modern applications demand instant performance, even under unpredictable load. Readyset helps you eliminate slow queries, stabilize latency, and scale confidently.

Revolutionize your database performance with Readyset

Serve requests at sub-millisecond latencies with the modern database scaling and query caching system for MySQL and PostgreSQL.

Join our newsletter

Stay updated with the latest news, insights, and developments from Readyset — straight to your inbox.

© 2026 Readyset. All rights reserved.