How are MySQL and Postgres ENUMs different?
The Hidden Complexities of ENUM Data Types
Halloween’s approach means it's the spooky month, and there’s nothing more spooky than the horrors lurking behind everyday database features. The humble enum may seem friendly and innocent enough, but let’s peel back the wallpaper and take a look at what’s hiding underneath...
The concept of an enum is simple enough: database tables frequently contain string columns with a limited number of valid values, so storing that column as one of the standard text types is both risky and inefficient. It’s risky, because you could potentially insert an invalid value, and it’s inefficient because repeating the same string over and over again wastes unnecessary space. Different databases solve this problem in different ways, but generally speaking, the core idea is to use an alternative underlying representation so that the database can use smaller, more efficient values to track which enum value a given row might hold.
The Basics of ENUMs Across Databases
Before we dive in further though, it’s worth emphasizing that ENUM
is not an ANSI SQL type; while a number of different popular databases implement different versions of enum types, there is no underlying standard. As we often see at ReadySet, databases vary wildly in how they choose to represent underlying types. Let’s start unmasking some of these ghosts and see how MySQL and PostgreSQL represent enums - and how these horrors make enums more efficient for you as a developer.
MySQL ENUMs
A Closer Look at MySQL’s ENUM Implementation
MySQL enums normally look just like text values, but by adding a CAST
to a query, we can peek at the underlying representation:
mysql> CREATE TABLE enum_test(e ENUM('red', 'yellow', 'green'));
mysql> INSERT INTO enum_test VALUES ('green'), ('red'), ('purple');
mysql> SELECT e, CAST(e AS UNSIGNED INTEGER) FROM enum_test;
+-------+-----------------------------+
| e | cast(e as unsigned integer) |
+-------+-----------------------------+
| green | 3 |
| red | 1 |
| | 0 |
+-------+-----------------------------+
/* Import CSS reset and base styles */
@import "global.css";
@import "prism.css";
Ahhh! How did we get from strings to... integers?
MySQL takes a relatively simple approach for its enum representations; when you define an enum column, you specify a list of all the possible values, and then the actual enum values are just integer indexes into that list. Note that the indexes start at 1, because 0 is reserved for invalid enum values — okay, reserved values and one-based indexing are a little spooky, but all in all, not too bad right?
Pros and Cons of MySQL ENUMs
This approach has some definite upsides: it’s relatively easy to understand, and it’s quite efficient since integers are small and easy to work with. Sorting order is also easily and efficiently preserved, since the underlying integers naturally sort in the same order as the list of enum values. So what are the downsides? Let’s start with this example, straight from the MySQL docs:
mysql> CREATE TABLE t(numbers ENUM('0','1','2'));
mysql> INSERT INTO t VALUES (2), ('2'), ('3');
mysql> SELECT * FROM t;
+---------+
| numbers |
+---------+
| 1 |
| 2 |
| 2 |
+---------+
😱 What the heck is going on here? In this example the integer 2 matches the index of the second enum label, which has the string value ‘1’. Then the string ‘2’ matches the third enum label since it actually has the value ‘2’. But the real kicker is that third value: because ‘3’ doesn’t match any enum label, MySQL automatically tries casting it to an integer, succeeds, sees that the index 3 corresponds to the label ‘2’, so it goes ahead and inserts ‘2’ even though you said ‘3’.
...oookay so overall there are definitely some gotchas here, but if you can avoid doing weird stuff with number labels and implicit type casts like this, it’s actually not so bad.
There is another downside that’s more likely to bite you in practice: it’s not always possible to efficiently alter enum types once they’re created. If you want to add a new enum label to the middle of the list, then the list indexes of all the later values shift by one, and the whole table has to be locked and rewritten to match.
All that said, the MySQL enum model is efficient and easy to understand, and many of the gotchas have more to do with MySQL’s love of implicit casts than with the enum type itself. Still with us? Great, because we’re about to go full “don’t-open-that-door!” with...
PostgreSQL ENUMs
The Unique Approach of PostgreSQL ENUMs
Right off the bat, Postgres does things a little differently. Enum types must be created independently of any table; to use the type, you must refer back to the type name later on during table creation. Postgres is also sneakier about not revealing the underlying representation: if you try to cast an enum column to any sort of underlying representation, it simply doesn’t allow it. So what is Postgres doing behind the curtain? It turns out you can still see the underlying representation, even if you can’t directly use it:
postgres=# CREATE TYPE color AS ENUM ('red', 'yellow', 'green');
postgres=# SELECT * FROM pg_enum;
oid | enumtypid | enumsortorder | enumlabel
-------+-----------+---------------+-----------
16388 | 16386 | 1 | red
16390 | 16386 | 2 | yellow
16392 | 16386 | 3 | green
Hmm, oid
? What’s that? It doesn’t look like they’re simple indexes into a list like with MySQL. You also might notice that they’re all even values. Coincidence? Let’s take a look at the Postgres docs:
The OIDs for pgenum rows follow a special rule: even-numbered OIDs are guaranteed to be ordered in the same way as the sort ordering of their enum type. That is, if two even OIDs belong to the same enum type, the smaller OID must have the smaller enumsortorder value. Odd-numbered OID values need bear no relationship to the sort order.
Flexibility and Trade-offs with PostgreSQL ENUMs
Okay, so this seems pretty complex, but there is a method to the eldritch madness. The key here is that Postgres is doing a lot of work to actually allow you to alter enum types without having to lock and rewrite the underlying tables!
First, Postgres already stores sort order in the pg_enum
table - the enumsortorder
column is of type REAL
, so Postgres can (and does) add fractional or negative numbers as needed when adding new enum labels so that the desired sort order is maintained:
But, having to constantly refer back to the sort order in the pg_enum
table would make comparing/sorting on enum columns slower, so the “even OID” rule lets Postgres ignore the pg_enum
table in many cases; there’s no need to consult pg_enum
when comparing two enum values if they’re both even, so Postgres tries to always assign even OIDs. However, if the user tries to add too many new enum values in between existing ones, then assigning even OIDs in a way that preserves the desired order may be impossible, in which case odd OIDs offer an escape hatch that lets Postgres fall back on the ordering specified in pg_enum
.
The tradeoffs of ENUM support
Comparing ENUM Implementations: MySQL vs PostgreSQL
So there you have it; two great examples of the tricky tradeoffs involved in implementing enum types. MySQL has a simpler underlying representation that is easier to work with, but Postgres provides additional flexibility in altering enum types after they’re created.
Most of the time, as a MySQL or Postgres user, you don’t necessarily need to understand how things like enums are implemented, but there are times when understanding this stuff can help you to write more efficient code, as well as to more easily debug problems when things go wrong. For example, if you notice that a Postgres query sorting on an enum column is slower after an ALTER TYPE
, you could check for the presence of odd OIDs in pg_enum
; or, if you’re looking to add a new enum value to a MySQL table, you now understand why you can only efficiently add new values to the end of the list.
When we're building our SQL cache, ReadySet, we need to understand and interface with all of these various subtleties in order to efficiently maintain cache results with no effort on your part; replicating the behavior of the underlying database lets us incrementally update the cache values as the underlying data changes, thereby avoiding the need to constantly recompute your cached query results from scratch.
About Readyset: Performant SQL Caching Engine
Readyset is a SQL caching engine that helps you build performant, real-time applications without any code changes or switching databases. We're obsessed with making the cache we've always wanted as developers. Start caching queries today with Readyset Cloud, or self-host our open-source product, Readyset Core.