How are MySQL and Postgres ENUMs different?
We're exploring the spooky internals of how ENUMs are handled differently in MySQL and Postgres.
by Nick Marino
October 10, 2022 - 8 minutes read
Halloween’s approach means it's the spooky month, and there’s nothing more spooky than the horrors lurking behind everyday database features. The humble enum may seem friendly and innocent enough, but let’s peel back the wallpaper and take a look at what’s hiding underneath...
The concept of an enum is simple enough: database tables frequently contain string columns with a limited number of valid values, so storing that column as one of the standard text types is both risky and inefficient. It’s risky, because you could potentially insert an invalid value, and it’s inefficient because repeating the same string over and over again wastes unnecessary space. Different databases solve this problem in different ways, but generally speaking, the core idea is to use an alternative underlying representation so that the database can use smaller, more efficient values to track which enum value a given row might hold.
Before we dive in further though, it’s worth emphasizing that
ENUM is not an ANSI SQL type; while a number of different popular databases implement different versions of enum types, there is no underlying standard. As we often see at ReadySet, databases vary wildly in how they choose to represent underlying types. Let’s start unmasking some of these ghosts and see how MySQL and PostgreSQL represent enums - and how these horrors make enums more efficient for you as a developer.
MySQL enums normally look just like text values, but by adding a
CAST to a query, we can peek at the underlying representation:
mysql> CREATE TABLE enum_test(e ENUM('red', 'yellow', 'green')); mysql> INSERT INTO enum_test VALUES ('green'), ('red'), ('purple'); mysql> SELECT e, CAST(e AS UNSIGNED INTEGER) FROM enum_test; +-------+-----------------------------+ | e | cast(e as unsigned integer) | +-------+-----------------------------+ | green | 3 | | red | 1 | | | 0 | +-------+-----------------------------+
Ahhh! How did we get from strings to... integers?
MySQL takes a relatively simple approach for its enum representations; when you define an enum column, you specify a list of all the possible values, and then the actual enum values are just integer indexes into that list. Note that the indexes start at 1, because 0 is reserved for invalid enum values — okay, reserved values and one-based indexing are a little spooky, but all in all, not too bad right?
This approach has some definite upsides: it’s relatively easy to understand, and it’s quite efficient since integers are small and easy to work with. Sorting order is also easily and efficiently preserved, since the underlying integers naturally sort in the same order as the list of enum values. So what are the downsides? Let’s start with this example, straight from the MySQL docs:
mysql> CREATE TABLE t(numbers ENUM('0','1','2')); mysql> INSERT INTO t VALUES (2), ('2'), ('3'); mysql> SELECT * FROM t; +---------+ | numbers | +---------+ | 1 | | 2 | | 2 | +---------+
😱 What the heck is going on here? In this example the integer 2 matches the index of the second enum label, which has the string value ‘1’. Then the string ‘2’ matches the third enum label since it actually has the value ‘2’. But the real kicker is that third value: because ‘3’ doesn’t match any enum label, MySQL automatically tries casting it to an integer, succeeds, sees that the index 3 corresponds to the label ‘2’, so it goes ahead and inserts ‘2’ even though you said ‘3’.
...oookay so overall there are definitely some gotchas here, but if you can avoid doing weird stuff with number labels and implicit type casts like this, it’s actually not so bad.
There is another downside that’s more likely to bite you in practice: it’s not always possible to efficiently alter enum types once they’re created. If you want to add a new enum label to the middle of the list, then the list indexes of all the later values shift by one, and the whole table has to be locked and rewritten to match.
All that said, the MySQL enum model is efficient and easy to understand, and many of the gotchas have more to do with MySQL’s love of implicit casts than with the enum type itself. Still with us? Great, because we’re about to go full “don’t-open-that-door!” with...
Right off the bat, Postgres does things a little differently. Enum types must be created independently of any table; to use the type, you must refer back to the type name later on during table creation. Postgres is also sneakier about not revealing the underlying representation: if you try to cast an enum column to any sort of underlying representation, it simply doesn’t allow it. So what is Postgres doing behind the curtain? It turns out you can still see the underlying representation, even if you can’t directly use it:
postgres=# CREATE TYPE color AS ENUM ('red', 'yellow', 'green'); postgres=# SELECT * FROM pg_enum; oid | enumtypid | enumsortorder | enumlabel -------+-----------+---------------+----------- 16388 | 16386 | 1 | red 16390 | 16386 | 2 | yellow 16392 | 16386 | 3 | green
oid? What’s that? It doesn’t look like they’re simple indexes into a list like with MySQL. You also might notice that they’re all even values. Coincidence? Let’s take a look at the Postgres docs:
*The OIDs for pgenum rows follow a special rule: even-numbered OIDs are guaranteed to be ordered in the same way as the sort ordering of their enum type. That is, if two even OIDs belong to the same enum type, the smaller OID must have the smaller enumsortorder value. Odd-numbered OID values need bear no relationship to the sort order. *
Okay, so this seems pretty complex, but there is a method to the eldritch madness. The key here is that Postgres is doing a lot of work to actually allow you to alter enum types without having to lock and rewrite the underlying tables!
First, Postgres already stores sort order in the
pg_enum table - the
enumsortorder column is of type
REAL, so Postgres can (and does) add fractional or negative numbers as needed when adding new enum labels so that the desired sort order is maintained:
But, having to constantly refer back to the sort order in the
pg_enum table would make comparing/sorting on enum columns slower, so the “even OID” rule lets Postgres ignore the
pg_enum table in many cases; there’s no need to consult
pg_enum when comparing two enum values if they’re both even, so Postgres tries to always assign even OIDs. However, if the user tries to add too many new enum values in between existing ones, then assigning even OIDs in a way that preserves the desired order may be impossible, in which case odd OIDs offer an escape hatch that lets Postgres fall back on the ordering specified in
The tradeoffs of ENUM support
So there you have it; two great examples of the tricky tradeoffs involved in implementing enum types. MySQL has a simpler underlying representation that is easier to work with, but Postgres provides additional flexibility in altering enum types after they’re created.
Most of the time, as a MySQL or Postgres user, you don’t necessarily need to understand how things like enums are implemented, but there are times when understanding this stuff can help you to write more efficient code, as well as to more easily debug problems when things go wrong. For example, if you notice that a Postgres query sorting on an enum column is slower after an
ALTER TYPE, you could check for the presence of odd OIDs in
pg_enum; or, if you’re looking to add a new enum value to a MySQL table, you now understand why you can only efficiently add new values to the end of the list.
When we're building our SQL cache, ReadySet, we need to understand and interface with all of these various subtleties in order to efficiently maintain cache results with no effort on your part; replicating the behavior of the underlying database lets us incrementally update the cache values as the underlying data changes, thereby avoiding the need to constantly recompute your cached query results from scratch.
I hope you enjoyed this tour! If weird database behavior is the kind of thing you love thinking about, come work with us on building the next generation of caching - and if you’d like to learn more about how ReadySet caches your database with zero code changes on your part, check out the quick start in our documentation.