Why LLMs Write Incorrect SQL (and What That Means for Your Database)

Most LLM-generated SQL doesn't fail. It runs and returns results, and that's exactly what makes it dangerous. The errors don't surface until they're already in your data. AI-assisted development has made it easier than ever to generate SQL quickly. The problem is that quick and correct are not the same thing, and with databases, the gap between the two has consequences that compound quietly over time. SQL that runs is not the same as SQL that works. LLMs are good at producing SQL that looks ri

Readyset

2026-04-23 · 5 min read

Most LLM-generated SQL doesn't fail. It runs and returns results, and that's exactly what makes it dangerous. The errors don't surface until they're already in your data.

AI-assisted development has made it easier than ever to generate SQL quickly. The problem is that quick and correct are not the same thing, and with databases, the gap between the two has consequences that compound quietly over time. SQL that runs is not the same as SQL that works.

LLMs are good at producing SQL that looks right. It parses, it runs, and it returns results. What it frequently gets wrong is the logic underneath: incorrect joins that multiply rows, aggregations applied to the wrong grouping level, missing WHERE clauses that quietly scan entire tables, or filter conditions that return plausible but semantically wrong results.

-- LLM-generated to calculate total order value per customer
SELECT c.name, SUM(oi.unit_price * oi.quantity) AS total_spent
FROM customers c
JOIN orders o ON c.id = o.customer_id
JOIN order_items oi ON o.id = oi.order_id
WHERE o.status = 'completed'
GROUP BY c.name;

This looks correct. But in this particular schema, unit_price reflects the price at time of listing, not time of purchase, the actual price paid is stored in oi.price_paid. The LLM picked the more obviously named column. The query runs, returns numbers, and passes code review.

These are not bugs that fail loudly. They fail silently, and in most production environments, the check happens much later than it should.

The Harder Failure Mode Is Not Hallucination

A hallucinated column or fabricated table name will throw an error. That is loud and catchable. More dangerous is SQL that executes but does the wrong thing.

-- LLM-generated: monthly active users for a SaaS dashboard
SELECT DATE_FORMAT(created_at, '%Y-%m') AS month,
       COUNT(DISTINCT user_id) AS active_users
FROM events
WHERE event_type = 'page_view'
GROUP BY month
ORDER BY month DESC;

The query is syntactically correct and the logic looks reasonable. The problem is that in this schema, events includes both internal staff activity and customer activity, distinguished by a is_internal flag. The LLM had no way to know that. MAU numbers are inflated by 15-20% and the error goes unnoticed until someone checks against a separate analytics system.

A GROUP BY that aggregates at the wrong granularity. A WHERE clause that pulls ten times more rows than it should.

-- LLM-generated: find users by email
SELECT * FROM users WHERE email = 'user@example.com';

On a utf8mb4_general_ci collation, this query also returns User@Example.COM and USER@EXAMPLE.COM, the collation is case-insensitive by default. The LLM generated a correct-looking query, but in a system where email is used as an authentication identifier, this can surface duplicate accounts or allow access to the wrong user's data. The model had no visibility into the collation setting.

A human engineer would add a date filter by default. Without it, this scans the entire table on every execution, returning correct results, just at full cost, repeatedly, at scale.

These queries hit your database repeatedly at full execution cost, and unoptimized LLM-generated SQL tends toward full table scans and broader joins than a human engineer would write.

The scale problem is real. In a high-read production environment, a single bad query pattern executing thousands of times per minute compounds fast. If your team has already seen what query overload does to a primary database, AI-generated SQL introduces a new and harder-to-trace source of that same pressure.

Production Schemas Are Not Benchmark Schemas

LLMs are trained on public schemas and benchmark datasets. Those schemas are clean, well-documented, and designed to be readable. Production schemas are none of those things: ambiguous column names, denormalized structures, business logic baked into table design, columns that mean different things depending on context.

The further your schema is from what the model was trained on, the more likely it is to generate queries that are structurally valid but logically wrong. This is not a problem that better prompting alone solves. It is a fundamental mismatch between how LLMs are trained and how real databases are built.

Schema complexity is one of the four documented failure modes for LLM-generated SQL, alongside faulty joins, incorrect aggregations, and missing filters. Each of these failure modes has a direct impact on database load, not just result correctness.

The Correctness Bar for SQL Is Higher Than for Most Code

A wrong function in application code often produces a visible error or a failed test. A wrong SQL query often produces a result set that is plausible enough to go unquestioned.

Think about what it means for an analytics query to return the wrong number. Or for a dashboard to display metrics calculated on a join that multiplies rows. The output looks like data. It behaves like data. But it is wrong, and it informs decisions accordingly. In database contexts, where performance and data correctness are tightly coupled, that is a risk that compounds over time.

What the Teams Getting This Right Are Doing

Teams shipping AI-generated SQL to production are beginning to treat query review the way they treat code review. A few patterns are emerging consistently:

Validation layers that execute generated SQL in staging environments before production use, catching execution errors and result-set anomalies before they reach real traffic
Execution plan checks that catch full table scans or missing index usage before queries reach production load, where the cost multiplies across every request
Replay testing against known result sets to catch semantic errors that syntax checks miss entirely
Schema context injection at prompt time to reduce hallucination rates on unfamiliar table structures, giving the model a better map of what actually exists

The tooling is still maturing, but the pattern is consistent: AI-generated SQL needs a review layer, not just a trust layer.

What This Means for Your Database Infrastructure

Incorrect or unoptimized queries do not just risk wrong results. They land on your database at full execution cost, repeatedly, often before anyone notices. The teams who get ahead of this are not working harder on query debugging. They have built enough visibility into their database layer to surface bad patterns early.

One practical consequence: as AI-generated queries become a larger share of production traffic, the read load hitting your primary database becomes less predictable. Your database is going to get hit with queries you did not write. The question is whether you catch them before they become incidents.

Readyset's rdst is built exactly for this. It gives you a safe way to analyze, understand, and validate queries against your real production database before problems surface at scale. As AI-generated queries become a larger share of your traffic, rdst helps you catch what is wrong with them early, during development, before they become platform incidents. Teams using rdst are not just moving faster. They are not letting the database be the reason they slow down.

Your database is going to get hit with queries you didn't write. See how Readyset keeps it fast.

When Query Optimization Is Not Enough: Solving MySQL Overload with Caching

What Makes SQL Query Caching Hard

How ClickFunnels Overcame Database Scaling Challenges with Readyset

Want to see Readyset in action?

Book a demo and see how Readyset can accelerate your database.

The Real Cost of Read Replicas (and When They Stop Paying Off)

Read replicas are the reflex fix for growing read traffic, but the cost scales linearly with no ceiling. Here's where replicas earn their keep, where they stop paying off, and how SQL-layer caching handles repetitive reads for a fraction of the spend.

Readyset Team

2026-07-19·7 min read

AI & Databases

Vibe Coding a High-Performance App with Readyset QueryPilot

In this blog post, we are going to use "Vibe Coding" to build an application and explore Readyset QueryPilot, a tool designed to automatically analyze and cache queries in a MySQL workload. The goal is to prove that even for automated queries with no human interaction, QueryPilot can automate query caching and improve performance. QueryPilot automatically identifies which queries should be cached and recommends the appropriate strategy, either deep caching or shallow caching. This automation de

Vinicius Grippa

2026-02-24·6 min read

AI & Databases

The Database Is About to Lose Its Last Line of Defense

For a long time, most serious database incidents shared a common root cause: a human made a mistake. It could be a poorly written query that caused a spike or a permission was too broad or maybe a review didn’t happen. Database security models, operational processes, and tooling were all built around that assumption. Humans were the risk, and humans were also the control. That assumption is starting to break down. Today, more and more SQL is generated without a human author. What used to be 1

Tanmay Sinha

2026-01-27·4 min read

Still scaling the hard way?

Modern applications demand instant performance, even under unpredictable load. Readyset helps you eliminate slow queries, stabilize latency, and scale confidently.

Revolutionize your database performance with Readyset

Serve requests at sub-millisecond latencies with the modern database scaling and query caching system for MySQL and PostgreSQL.

Product

Readyset Cloud Readyset Private Readyset Community QueryPilot rdst CLI Pricing

Resources

Blog Case Studies Company Documentation

Legal

Privacy Policy Terms of Service

Join our newsletter

Stay updated with the latest news, insights, and developments from Readyset — straight to your inbox.