MySQL

Replication Internals: Decoding the MySQL Binary Log Part 7: TABLE_MAP_EVENT – Table Metadata for Row-Based Replication

11 min read

about 2 months ago

Replication Internals: Decoding the MySQL Binary Log Part 7: TABLE_MAP_EVENT – Table Metadata for Row-Based Replication

In this seventh post of our series, we decode the TABLE_MAP_EVENT — the event that maps a numeric table ID to a database/table name and the column layout that the row events immediately following it will reference.


Introduction

The TABLE_MAP_EVENT (event type 19, 0x13) is essential for row-based replication. It appears before any row event (INSERT, UPDATE, DELETE) and provides:

  • Table identification: Database name, table name, and a numeric table ID
  • Column metadata: Number of columns and their types
  • Nullability: Which columns can contain NULL values

Row events reference the table by its numeric ID, so the TABLE_MAP_EVENT must be parsed first to understand the row data that follows.


Event Location

In our binary log, TABLE_MAP_EVENTs appear before each set of row events:

Position 537: QUERY_EVENT - BEGIN Position 620: TABLE_MAP_EVENT (68 bytes) ← Before INSERT Position 688: WRITE_ROWS_EVENT Position 737: XID_EVENT Position 847: QUERY_EVENT - BEGIN Position 939: TABLE_MAP_EVENT (68 bytes) ← Before UPDATE Position 1007: UPDATE_ROWS_EVENT Position 1079: XID_EVENT Position 1189: QUERY_EVENT - BEGIN Position 1272: TABLE_MAP_EVENT (68 bytes) ← Before DELETE Position 1340: DELETE_ROWS_EVENT Position 1397: XID_EVENT

Let's decode the first one at position 620.


Reading the Raw Bytes

$ xxd -s 620 -l 68 binlog.000024 0000026c: 3210 3568 1301 0000 0044 0000 00b0 0200 2.5h.....D...... 0000027c: 0000 005f 0000 0000 0001 000c 7072 6573 ..._........pres 0000028c: 656e 7461 7469 6f6e 0006 7065 7273 6f6e entation..person 0000029c: 0002 030f 0258 0202 0101 0002 03fc ff00 .....X.......... 000002ac: fba8 d0d8 ....

The event is 68 bytes: 19-byte header + 8-byte post-header + 37-byte payload + 4-byte checksum.


Common Header (19 bytes)

From Part 2, we know that every binary log event starts with the same 19-byte common header. Here it is for our event at position 620:

32103568 13 01000000 44000000 b0020000 0000 │ │ │ │ │ │ │ │ │ │ │ └─→ Flags: 0x0000 │ │ │ │ └───────────→ Next Position: 688 │ │ │ └────────────────────→ Event Size: 68 bytes │ │ └─────────────────────────────→ Server ID: 1 │ └────────────────────────────────→ Event Type: 19 (TABLE_MAP_EVENT) └─────────────────────────────────────────→ Timestamp: 1748308018

Quick cross-check: 620 + 68 = 688, which matches the Next Position field. ✓


TABLE_MAP_EVENT Structure

The layout is defined by Table_map_event in libs/mysql/binlog/event/rows_event.h and read by its constructor in libs/mysql/binlog/event/rows_event.cpp. The post-header is fixed-size (TABLE_MAP_HEADER_LEN = 8, with TM_MAPID_OFFSET = 0 and TM_FLAGS_OFFSET = 6); the body that follows is variable.

FieldSizeDescription
Post-Header (8 bytes)
m_table_id6 bytesNumeric identifier for the table (little-endian, 48-bit)
m_flags2 bytesBit flags — see TM_*_F constants in sql/log_event.h
Body
m_dblenPacked intLength of database name
m_dbnamm_dblen + 1Database name + trailing \0
m_tbllenPacked intLength of table name
m_tblnamm_tbllen + 1Table name + trailing \0
m_colcntPacked intNumber of columns
m_coltypem_colcnt bytesOne MySQL type code per column
m_field_metadata_sizePacked intLength of the field metadata block
m_field_metadataVariable (≤ 4 bytes per column)Per-column type-specific metadata
m_null_bits(m_colcnt + 7) / 8 bytesLSB-first bit per column; 1 = nullable
m_optional_metadataVariableTLV-encoded extended column information

Field-by-Field Decoding

Post-Header (8 bytes)

5f0000000000 0100 │ │ │ └─→ m_flags: 0x0001 └──────────────→ m_table_id: 0x000000000005f = 95

The post-header is exactly 6 bytes of m_table_id followed by 2 bytes of m_flags — no padding. The constructor in rows_event.cpp reads the table id with reader.read<uint64_t>(6) (note the explicit length of 6), then reads m_flags with reader.read<uint16_t>().

Table ID

5f 00 00 00 00 00 → 0x000000000005f = 95

The numeric handle for the table. This same 95 will appear in the post-header of every row event (WRITE_ROWS_EVENTUPDATE_ROWS_EVENTDELETE_ROWS_EVENT) that follows in this transaction, telling the applier which TABLE_MAP_EVENT it should use to interpret the row payload.

Flags

01 00 → 0x0001 = TM_BIT_LEN_EXACT_F

m_flags is a 16-bit field. The values are defined on Table_map_log_event in sql/log_event.h:2471-2481:

MaskConstantMeaning
0x0001TM_BIT_LEN_EXACT_FAlways set when MySQL writes the event
0x0002TM_REFERRED_FK_DB_FTable is referenced by a foreign-key constraint in another schema
0x0004TM_GENERATED_INVISIBLE_PK_FTable has a server-generated invisible primary key

Our event has just TM_BIT_LEN_EXACT_F set, which is the normal case. ✓

Database Name

0c 70726573656e746174696f6e 00 │ │ │ │ │ └─→ Null terminator │ └─→ "presentation" (12 bytes) └─→ Length: 12

Database: "presentation"

Table Name

06 706572736f6e 00 │ │ │ │ │ └─→ Null terminator │ └─→ "person" (6 bytes) └─→ Length: 6

Table: "person"

Column Count

02

This is a net_field_length_ll packed integer (the same length encoding introduced in Part 1). Since 0x02 < 251, the value is a single byte: 2 columns.

Column Types (2 bytes, one per column)

03 0f │ │ │ └─→ Column 2: 0x0f = MYSQL_TYPE_VARCHAR (15) └────→ Column 1: 0x03 = MYSQL_TYPE_LONG (3) = INT

Our table has:

  • Column 1: ID INT → Type 3 (MYSQL_TYPE_LONG)
  • Column 2: name VARCHAR(150) → Type 15 (MYSQL_TYPE_VARCHAR)

Metadata Length

02

The metadata block is 2 bytes long.

Column Metadata

58 02

Each column writes 0–4 bytes of metadata, depending on its type. The encoding lives in each Field subclass's do_save_field_metadata() in sql/field.cc. For our two columns:

  • MYSQL_TYPE_LONG (INT, 0x03) — 0 bytes. Plain integers carry no metadata.
  • MYSQL_TYPE_VARCHAR (0x0f) — 2 bytes. Field_varstring::do_save_field_metadata() simply does int2store(metadata_ptr, field_length), where field_length is the column's max length in bytes (not characters).

Reading 58 02 as little-endian: 0x0258 = 600. For name VARCHAR(150) declared in a utf8mb4 schema, MySQL reserves 4 bytes per character, so field_length = 150 × 4 = 600. The same column declared as VARCHAR(150) CHARACTER SET latin1 would store 150 here instead — the metadata is always max bytes, never max characters.

Null Bitmap

02

The null bitmap tells us which columns of the table are allowed to hold NULL. It uses one bit per column, packed into as few bytes as possible.

Since a byte holds 8 bits, a single byte can describe up to 8 columns. The moment you have a 9th column you need a second byte (even though only 1 bit of it is actually used), the 17th column triggers a third byte, and so on. In other words, you take the number of columns, divide by 8, and round up to the next whole byte:

bytes needed = ceil(number_of_columns / 8)

Our table has 2 columns, so (2 + 7) / 8 = 1 byte.

Bits inside each byte are read starting from the least significant bit (the rightmost one when you write the byte in binary): the first column sits at bit 0, the second column at bit 1, the third at bit 2, and so on. A bit set to 1 means that column is nullable; a bit set to 0 means it was declared NOT NULL.

Our byte is 0x02, which is 0000 0010 in binary:

02 = 0000 0010 ││ │└─→ Column 1 (ID): bit 0 = 0 → NOT NULL └──→ Column 2 (name): bit 1 = 1 → NULLABLE

That matches the DDL exactly: ID is the PRIMARY KEY (so implicitly NOT NULL), and name was declared DEFAULT NULL.

One important distinction worth calling out: this bitmap only records whether each column can hold NULL — it describes the schema, not the data. Whether a given row actually contains a NULL value for a particular column is answered by a second, per-row null bitmap that lives inside each row of a WRITE_ROWS_EVENT / UPDATE_ROWS_EVENT / DELETE_ROWS_EVENT. We'll meet that one in Part 8.

If you want to see the exact bit-extraction pattern used on the read side, it's in sql/log_event.cc:11735.

Optional Metadata

01 01 00 02 03 fc ff 00

After m_null_bits, the rest of the body (everything up to the CRC32) is the optional metadata block. Each entry is a Type-Length-Value triple:

[type: 1 byte][length: packed int][value: <length> bytes]

The full set of types is the Optional_metadata_field_type enum in rows_event.h:548-573:

CodeTypePurpose
1SIGNEDNESSSign bit for each numeric column
2DEFAULT_CHARSETDefault charset + per-column overrides
3COLUMN_CHARSETOne charset per string column
4COLUMN_NAMEColumn names
5SET_STR_VALUEString values for SET columns
6ENUM_STR_VALUEString values for ENUM columns
7GEOMETRY_TYPEConcrete geometry subtype
8SIMPLE_PRIMARY_KEYPK column indices
9PRIMARY_KEY_WITH_PREFIXPK columns with prefix lengths
10ENUM_AND_SET_DEFAULT_CHARSETDefault charset for ENUM/SET
11ENUM_AND_SET_COLUMN_CHARSETPer-column charset for ENUM/SET
12COLUMN_VISIBILITYVisible/invisible flag per column
13VECTOR_DIMENSIONALITYDimensionality of VECTOR columns

Which entries appear depends on the table definition and on binlog_row_metadata (MINIMAL vs. FULL). Our 8 bytes contain two entries:

Entry 1 — SIGNEDNESS

01 01 00 │ │ │ │ │ └─→ value: 0x00 │ └─────→ length: 1 byte └────────→ type: 0x01 = SIGNEDNESS

This field uses one bit per numeric column in the table (string, blob, and date/time columns are skipped). A bit set to 1 means the column was declared UNSIGNED; a bit set to 0 means it's signed.

There's an important subtlety here. When we decoded the null bitmap earlier in this post, we read the bits right-to-left — the first column lived at the rightmost bit of the byte. SIGNEDNESS does the opposite: it reads left-to-right, starting from the leftmost bit of the first byte. So the first numeric column is the leftmost bit, the second numeric column is the next bit to its right, and so on. When we run out of bits in one byte, we move on to the leftmost bit of the next byte. You can see this walk in parse_signedness().

Our table has exactly one numeric column (ID INT), and the value byte is 0x00:

0x00 = 0000 0000 │ └─→ 1st numeric column (ID): 0 → signed

The leftmost bit — the one belonging to ID — is 0, so ID is signed. That matches the DDL: ID INT was declared without the UNSIGNED keyword.

Entry 2 — DEFAULT_CHARSET

02 03 fc ff 00 │ │ │ │ │ └─→ value: fc ff 00 │ └─────→ length: 3 bytes └────────→ type: 0x02 = DEFAULT_CHARSET

parse_default_charset() in rows_event.cpp:142-158 reads a packed integer for the default charset id, then walks (col_index, charset) pairs until it consumes length bytes.

  • First packed int: fc ff 00. The leading 0xfc is the 2-byte marker for net_field_length_ll, so the next two bytes ff 00 are the value: 0x00ff = 255 = utf8mb4_0900_ai_ci.
  • That fully consumes the 3-byte value, so there are no per-column overrides — every string column uses the default.

Together, the 8 bytes tell the applier: "the only numeric column is signed, and every string column is utf8mb4_0900_ai_ci."


Visual Breakdown

Position 620: TABLE_MAP_EVENT (68 bytes) ┌─────────────────────────────────────────────────────────────────────────┐ │ COMMON HEADER (19 bytes) │ ├─────────────────────────────────────────────────────────────────────────┤ │ 32103568 │ 13 │ 01000000 │ 44000000 │ b0020000 │ 0000 │ │ Timestamp │ Type │ ServerID │ Size │ NextPos │ Flags │ │ 1748308018 │ 19 │ 1 │ 68 │ 688 │ 0x0000 │ └─────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────┐ │ POST-HEADER (8 bytes) │ ├──────────────────────────────┬──────────────────────────────────────────┤ │ 5f0000000000 │ m_table_id: 95 (6 bytes, little-endian) │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 0100 │ m_flags: 0x0001 (TM_BIT_LEN_EXACT_F) │ └──────────────────────────────┴──────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────┐ │ BODY │ ├──────────────────────────────┬──────────────────────────────────────────┤ │ 0c │ m_dblen: 12 │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 70726573656e746174696f6e00 │ m_dbnam: "presentation\0" │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 06 │ m_tbllen: 6 │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 706572736f6e00 │ m_tblnam: "person\0" │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 02 │ m_colcnt: 2 │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 03 0f │ m_coltype: LONG (INT), VARCHAR │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 02 │ m_field_metadata_size: 2 │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 5802 │ m_field_metadata: VARCHAR max = 600 │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 02 │ m_null_bits: col 2 nullable │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 010100 │ SIGNEDNESS: ID is signed │ ├──────────────────────────────┼──────────────────────────────────────────┤ │ 0203fcff00 │ DEFAULT_CHARSET: utf8mb4_0900_ai_ci (255)│ └──────────────────────────────┴──────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────┐ │ CHECKSUM (4 bytes) │ ├─────────────────────────────────────────────────────────────────────────┤ │ fba8d0d8 │ CRC32 │ └──────────────────────────────┴──────────────────────────────────────────┘ Table: presentation.person (ID=95) Columns: ID INT NOT NULL, name VARCHAR(150) NULL

Table ID Lifecycle

The Table ID is assigned by the server and is valid only within a single binary log file or replication session. Key points:

  1. Transient: The same table may have different IDs in different binary logs
  2. Referenced by row events: WRITE/UPDATE/DELETE_ROWS_EVENT use this ID
  3. Reused after FLUSH TABLES: Table IDs may be reassigned

When parsing row events, you must first parse the preceding TABLE_MAP_EVENT to know how to interpret the row data.


Try It Yourself

First, dump the bytes with xxd:

$ xxd -s 620 -l 68 binlog.000024 0000026c: 3210 3568 1301 0000 0044 0000 00b0 0200 2.5h.....D...... 0000027c: 0000 005f 0000 0000 0001 000c 7072 6573 ..._........pres 0000028c: 656e 7461 7469 6f6e 0006 7065 7273 6f6e entation..person 0000029c: 0002 030f 0258 0202 0101 0002 03fc ff00 .....X.......... 000002ac: fba8 d0d8 ....

Then decode it programmatically. The script below walks the post-header, body, and TLV optional-metadata block end-to-end:

import struct # net_field_length_ll: MySQL's length-encoded integer def read_packed_int(data, offset): first = data[offset] if first < 251: return first, 1 elif first == 252: return struct.unpack('<H', data[offset+1:offset+3])[0], 3 elif first == 253: return struct.unpack('<I', data[offset+1:offset+4] + b'\x00')[0], 4 elif first == 254: return struct.unpack('<Q', data[offset+1:offset+9])[0], 9 OPT_TYPES = { 1: 'SIGNEDNESS', 2: 'DEFAULT_CHARSET', 3: 'COLUMN_CHARSET', 4: 'COLUMN_NAME', 5: 'SET_STR_VALUE', 6: 'ENUM_STR_VALUE', 7: 'GEOMETRY_TYPE', 8: 'SIMPLE_PRIMARY_KEY', 9: 'PRIMARY_KEY_WITH_PREFIX', 10: 'ENUM_AND_SET_DEFAULT_CHARSET', 11: 'ENUM_AND_SET_COLUMN_CHARSET', 12: 'COLUMN_VISIBILITY', 13: 'VECTOR_DIMENSIONALITY', } TYPE_NAMES = {3: 'INT', 15: 'VARCHAR'} with open('binlog.000024', 'rb') as f: f.seek(620) # Common header (19 bytes) header = f.read(19) timestamp, event_type, server_id, event_size, next_pos, flags = \ struct.unpack('<IBIIIH', header) # Post-header: 6-byte table_id + 2-byte flags post_header = f.read(8) table_id = struct.unpack('<Q', post_header[:6] + b'\x00\x00')[0] tm_flags = struct.unpack('<H', post_header[6:8])[0] print(f"Table ID: {table_id}") print(f"Flags: {tm_flags:#06x}") # Body (everything between post-header and the 4-byte CRC) body = f.read(event_size - 19 - 8 - 4) o = 0 db_len = body[o]; o += 1 db_name = body[o:o+db_len].decode(); o += db_len + 1 # +1 for trailing \0 tbl_len = body[o]; o += 1 tbl_name = body[o:o+tbl_len].decode(); o += tbl_len + 1 colcnt, n = read_packed_int(body, o); o += n coltype = list(body[o:o+colcnt]); o += colcnt meta_len, n = read_packed_int(body, o); o += n field_metadata = body[o:o+meta_len]; o += meta_len null_bytes = (colcnt + 7) // 8 null_bits = body[o:o+null_bytes]; o += null_bytes optional_metadata = body[o:] print(f"\nDatabase: {db_name}") print(f"Table: {tbl_name}") print(f"Columns: {colcnt}") # Walk per-column type + metadata + nullability mi = 0 for i, t in enumerate(coltype): nullable = bool(null_bits[i // 8] & (1 << (i % 8))) # LSB-first if t == 15: # MYSQL_TYPE_VARCHAR max_bytes = struct.unpack('<H', field_metadata[mi:mi+2])[0] extra = f", max_bytes={max_bytes}" mi += 2 else: extra = "" print(f" Column {i+1}: type {t} ({TYPE_NAMES.get(t, '?')})" f"{extra}, nullable={nullable}") # TLV optional metadata print("\nOptional metadata:") o = 0 while o < len(optional_metadata): ftype = optional_metadata[o]; o += 1 flen, n = read_packed_int(optional_metadata, o); o += n fval = optional_metadata[o:o+flen]; o += flen name = OPT_TYPES.get(ftype, f'UNKNOWN({ftype})') if ftype == 1: # SIGNEDNESS — MSB-first bit per numeric column bits = [] for byte in fval: for mask in (0x80, 0x40, 0x20, 0x10, 0x08, 0x04, 0x02, 0x01): bits.append('unsigned' if byte & mask else 'signed') print(f" {name}: {bits[:1]}") # one numeric column in this table elif ftype == 2: # DEFAULT_CHARSET default_cs, n2 = read_packed_int(fval, 0) print(f" {name}: default_charset={default_cs}, overrides={fval[n2:].hex()}") else: print(f" {name}: {fval.hex()}")

Output:

Table ID: 95 Flags: 0x0001 Database: presentation Table: person Columns: 2 Column 1: type 3 (INT), nullable=False Column 2: type 15 (VARCHAR), max_bytes=600, nullable=True Optional metadata: SIGNEDNESS: ['signed'] DEFAULT_CHARSET: default_charset=255, overrides=

The full source for this series, including a more complete decoder, lives in the presentations repo.


References


What's Next?

Now that we understand how table metadata is encoded, we're ready to decode actual row data. In the next post, we'll examine the WRITE_ROWS_EVENT — the event that records INSERT operations.


Next up: Part 8: WRITE_ROWS_EVENT — INSERT Operations


This series is based on a presentation given at the MySQL Online Summit. The goal is to help MySQL users understand what goes under the hood of replication by manually decoding binary log files.

MySQL Binary Log Internals Part 7: How TABLE_MAP_EVENT Works | Readyset