MySQL

Replication Internals: Decoding the MySQL Binary Log Part 4: PREVIOUS_GTIDS_LOG_EVENT — Tracking Transaction History

7 min read

2 months ago

Replication Internals: Decoding the MySQL Binary Log Part 4: PREVIOUS_GTIDS_LOG_EVENT — Tracking Transaction History

In this fourth post of our series, we decode the PREVIOUS_GTIDS_LOG_EVENT — the event that tracks which GTIDs were recorded in prior binary log files, enabling replicas to determine their starting point.


Introduction

When MySQL rotates to a new binary log file, it needs to record which transactions have already been committed in previous files. This is the job of PREVIOUS_GTIDS_LOG_EVENT—it appears near the beginning of every binlog file (right after the FORMAT_DESCRIPTION_EVENT) and contains the complete set of GTIDs from all prior logs.

This event is critical for GTID-based replication. When a replica connects to a source, it sends its own GTID set (the transactions it has already executed). The source compares this against its PREVIOUS_GTIDS to determine the starting point—which transactions the replica still needs. Without this event, the source would have to scan through all binary logs to figure out what the replica is missing.

MySQL 8.4 introduced tagged GTIDs, which add an optional tag to the traditional UUID:transaction_id format (e.g., UUID:mytag:1-5). This required a new binary encoding for PREVIOUS_GTIDS_LOG_EVENT while maintaining backward compatibility with older servers.


Event Header Recap

From Part 2, we know that every event starts with a 19-byte common header. Let's read the PREVIOUS_GTIDS_LOG_EVENT header from our MySQL 8.0.40 binary log:

$ xxd -s 126 -l 19 binlog.000024 0000007e: 6e0f 3568 2301 0000 0047 0000 00c5 0000 n.5h#....G...... 0000008e: 0080 00 ...

The hex bytes are: 6e0f3568 23 01000000 47000000 c5000000 8000

FieldBytesValueMeaning
Timestamp6e0f356817483078222025-05-27 01:03:42
Event Type2335PREVIOUS_GTIDS_LOG_EVENT
Server ID010000001Server ID
Event Size4700000071 bytesTotal event size
Next Positionc5000000197Next event starts at byte 197
Flags80000x0080LOG_EVENT_IGNORABLE_F

The event type 0x23 (35) confirms this is a PREVIOUS_GTIDS_LOG_EVENT. The payload size is 71 - 19 (header) - 4 (checksum) = 48 bytes.


PREVIOUS_GTIDS_LOG_EVENT Payload Structure

The payload structure depends on whether the binary log uses the untagged (classic) or tagged (MySQL 8.4+) GTID format. Before we parse the data, we need to detect which format we're dealing with.


Detecting Tagged vs. Untagged Format

The first 8 bytes of the payload encode both the SID count and the format type. The key is byte 7 (the 8th byte, using 0-based indexing):

Byte 7 ValueFormatMySQL Version
0x00Untagged (classic)< 8.4
0x01Tagged8.4+

Let's look at the first 8 bytes from our MySQL 8.0 payload:

01 00 00 00 00 00 00 00 ^^ Byte 7 = 0x00 → Untagged format

And from a MySQL 9.6 payload with tagged GTIDs:

01 02 00 00 00 00 00 01 ^^ ^^ Byte 0 = 0x01 Byte 7 = 0x01 → Tagged format n_tsids in bytes 1-6 (shifted)

The encoding differs between formats:

Untagged format:

[ n_sids (8 bytes, little-endian) ]

Tagged format:

[0x01] [ n_tsids (6 bytes) ] [0x01]

Why this detection is safe: For byte 7 to equal 0x01 in untagged format, n_sids would need to be at least 72 quadrillion (0x0100000000000000). However, MySQL internally uses rpl_sidno (defined as int in sql/rpl_gtid.h) which is a 32-bit signed integer—limiting the maximum to about 2.1 billion SIDs. Even this theoretical limit is astronomically higher than any real deployment. So if byte 7 is 0x01, it's definitively the tagged format indicator.

This encoding comes from MySQL's sql/rpl_gtid_set.cc:

// From Gtid_set::encode() if (gtid_format == Gtid_format::tagged) { n_sids_encoded = format_shifted | (n_sids << 8) | format_encoded; } else { n_sids_encoded = n_sids | format_shifted; }

Now that we know how to detect the format, let's decode each one.


Untagged Format Structure

FieldSizeDescription
n_sids8 bytesNumber of SIDs (Server UUIDs) in this event
SID entriesVariableOne entry per SID

Each SID entry contains:

FieldSizeDescription
UUID16 bytesServer UUID (raw bytes, not formatted)
n_intervals8 bytesNumber of transaction ranges for this UUID
intervals16 bytes eachstart (8 bytes) + end (8 bytes), end is exclusive

Let's read the payload:

$ xxd -s 145 -l 48 binlog.000024 00000091: 0100 0000 0000 0000 b8ae 2fd2 3005 11f0 ........../.0... 000000a1: 8be8 0242 ac15 0002 0100 0000 0000 0000 ...B............ 000000b1: 0100 0000 0000 0000 0c00 0000 0000 0000 ................

Field-by-Field Decoding (Untagged Format)

n_sids: 0100000000000000

01 00 00 00 00 00 00 00 → 0x0000000000000001 (little-endian) = 1

There is 1 SID (Server UUID) in this event. As we discussed above, byte 7 is 0x00, confirming this is the untagged format.

UUID: b8ae2fd2300511f08be80242ac150002

b8ae2fd2 3005 11f0 8be8 0242ac150002

This is a raw 16-byte UUID. To convert to the standard formatted string, we group the bytes:

Bytes 0-3: b8ae2fd2 → b8ae2fd2 Bytes 4-5: 3005 → 3005 Bytes 6-7: 11f0 → 11f0 Bytes 8-9: 8be8 → 8be8 Bytes 10-15: 0242ac150002 → 0242ac150002

UUID: b8ae2fd2-3005-11f0-8be8-0242ac150002

This is the server_uuid of the MySQL instance that created the transactions.

n_intervals: 0100000000000000

01 00 00 00 00 00 00 00 → 1

There is 1 interval (transaction range) for this UUID.

interval[0].start: 0100000000000000

01 00 00 00 00 00 00 00 → 1

The interval starts at transaction number 1.

interval[0].end: 0c00000000000000

0c 00 00 00 00 00 00 00 → 12

The interval ends at 12. But wait — this is an exclusive end! The actual last transaction is 12 - 1 = 11.

Decoded Result

Combining all fields:

UUID:b8ae2fd2-3005-11f0-8be8-0242ac150002 Interval: 1 to 11 (inclusive) GTID Set: b8ae2fd2-3005-11f0-8be8-0242ac150002:1-11

This means the binary log files prior to this one contained transactions 1 through 11 from this server.


Tagged Format: A MySQL 9.6 Example

Let's examine a PREVIOUS_GTIDS_LOG_EVENT from MySQL 9.6.0 that contains two TSIDs—one without a tag and one with the tag "mytag":

$ xxd -s 146 -l 95 binlog_gtid_tag.000001 00000092: 0102 0000 0000 0001 5577 8904 0299 11f1 ........Uw...... 000000a2: b1b8 4ef0 c495 6feb 0001 0000 0000 0000 ..N...o......... 000000b2: 0001 0000 0000 0000 000e 0000 0000 0000 ................ 000000c2: 0055 7789 0402 9911 f1b1 b84e f0c4 956f .Uw........N...o 000000d2: eb0a 6d79 7461 6701 0000 0000 0000 0001 ..mytag......... 000000e2: 0000 0000 0000 0003 0000 0000 0000 00 ...............

Tagged Format Structure

FieldSizeDescription
n_tsids8 bytesFormat indicator + count (encoded)
TSID entriesVariableOne entry per TSID

Each TSID entry contains:

FieldSizeDescription
UUID16 bytesServer UUID
tag_length1 byteactual_length * 2 (0 = no tag)
tagVariableTag string (if tag_length > 0)
n_intervals8 bytesNumber of transaction ranges
intervals16 bytes eachstart (8 bytes) + end (8 bytes)

Field-by-Field Decoding (Tagged Format)

n_tsids: 0102000000000001

01 02 00 00 00 00 00 01 ^^ ^^ Format indicator Format indicator (byte 7 = 0x01 → tagged) Bytes 1-6: 02 00 00 00 00 00 → n_tsids = 2

There are 2 TSIDs in this event, using the tagged format.


TSID #1 (Untagged Transaction)

UUID: 5577890402991f1b1b84ef0c4956feb
55778904 0299 11f1 b1b8 4ef0c4956feb

UUID: 55778904-0299-11f1-b1b8-4ef0c4956feb

tag_length: 00
00 → 0

Tag length is 0, meaning this TSID has no tag. Even in tagged format, individual GTIDs can be untagged.

n_intervals: 0100000000000000
01 00 00 00 00 00 00 00 → 1

There is 1 interval for this TSID.

interval[0].start: 0100000000000000
01 00 00 00 00 00 00 00 → 1

Start: 1

interval[0].end: 0e00000000000000
0e 00 00 00 00 00 00 00 → 14

End: 14 (exclusive), so actual end is 13.

TSID #1 Result
55778904-0299-11f1-b1b8-4ef0c4956feb:1-13

TSID #2 (Tagged Transaction)

UUID: 5577890402991f1b1b84ef0c4956feb

Same UUID as TSID #1 — this is the same server, but with a different tag.

tag_length: 0a
0a → 10 actual_length = 10 / 2 = 5

The tag is 5 characters long. The stored value is actual_length * 2.

tag: 6d7974616
6d = 'm' 79 = 'y' 74 = 't' 61 = 'a' 67 = 'g'

Tag: "mytag"

n_intervals: 0100000000000000
01 00 00 00 00 00 00 00 → 1

There is 1 interval.

interval[0].start: 0100000000000000
01 00 00 00 00 00 00 00 → 1

Start: 1

interval[0].end: 0300000000000000
03 00 00 00 00 00 00 00 → 3

End: 3 (exclusive), so actual end is 2.

TSID #2 Result
55778904-0299-11f1-b1b8-4ef0c4956feb:mytag:1-2

Complete Decoded Results

MySQL 8.0.40 (Untagged):

b8ae2fd2-3005-11f0-8be8-0242ac150002:1-11

MySQL 9.6.0 (Tagged):

55778904-0299-11f1-b1b8-4ef0c4956feb:1-13 55778904-0299-11f1-b1b8-4ef0c4956feb:mytag:1-2

Try It Yourself

Here's a Python script to parse PREVIOUS_GTIDS_LOG_EVENT payloads:

import struct def parse_previous_gtids(payload): """Parse PREVIOUS_GTIDS_LOG_EVENT payload.""" # Detect format from byte 7 is_tagged = payload[7] == 0x01 if is_tagged: # Tagged: n_tsids in bytes 1-6 n_sids = int.from_bytes(payload[1:7], 'little') print(f"Format: tagged, n_tsids: {n_sids}") else: # Untagged: n_sids in bytes 0-7 n_sids = struct.unpack('<Q', payload[0:8])[0] print(f"Format: untagged, n_sids: {n_sids}") offset = 8 for i in range(n_sids): # UUID (16 bytes) uuid = payload[offset:offset+16] uuid_str = f"{uuid[0:4].hex()}-{uuid[4:6].hex()}-{uuid[6:8].hex()}-{uuid[8:10].hex()}-{uuid[10:16].hex()}" offset += 16 print(f"\nSID #{i+1}: {uuid_str}") # Tag (tagged format only) tag = None if is_tagged: tag_len = payload[offset] // 2 offset += 1 if tag_len > 0: tag = payload[offset:offset+tag_len].decode('utf-8') offset += tag_len print(f" Tag: {tag}") else: print(f" Tag: (none)") # Number of intervals n_intervals = struct.unpack('<Q', payload[offset:offset+8])[0] offset += 8 print(f" Intervals: {n_intervals}") # Parse intervals for j in range(n_intervals): start = struct.unpack('<Q', payload[offset:offset+8])[0] end = struct.unpack('<Q', payload[offset+8:offset+16])[0] - 1 offset += 16 if tag: print(f" {uuid_str}:{tag}:{start}-{end}") else: print(f" {uuid_str}:{start}-{end}")
Note: The binary log files used in this series (binlog.000024, binlog_gtid_tag.000001, and others) are available at github.com/altmannmarcelo/presentations/tree/main/binlog.

References


What's Next?

Now that we understand how MySQL tracks GTID history across binary log files, we're ready to look at the event that assigns GTIDs to individual transactions. In the next post, we'll decode the GTID_LOG_EVENT — the event that marks the beginning of each transaction with its globally unique identifier.


Next up: Part 5: GTID_LOG_EVENT — The Transaction Identifier


This series is based on a presentation given at the MySQL Online Summit. The goal is to help MySQL users understand what goes under the hood of replication by manually decoding binary log files.